W. Li, Z. Zhou, AC: A data generator for evaluation of clustering, (2022). https://doi.org/10.36227/techrxiv.19091330.v1

See Document

Visualization only supports 2/3D data
Maximum supported file size = 1MB
Click to add new clusters
Example of generating a benchmark dataset
It is recommended to design your own parameter sets for specific purposes.
Density
Density 5UCs+5UCs+5Ucs: constant overlap, increasing sample size, random SD, random RC, random angle
Dataset: Coordinates Config
Regenerate
Adjacency
Overlap 5UCs+5UCs+5UCs: constant sample size, constant SD, increasing overlap, random RC, random angle
Dataset: Coordinates Config
Regenerate
Shape
Shape 10BCs: constant sample size, constant SD, random control points
Dataset: Coordinates Config
Regenerate
Complexity
Complexity 2BCs+5UCs+3BCs+5UCs: all random
Dataset: Coordinates Config
Regenerate
Hierarchy
Hierarchy Assess the hierarchy of clusters
Dataset: Coordinates Config
Regenerate
Custom examples
simpleDensity1
simple_density1 Constant SD, decreasing sample size
Dataset: Coordinates Config
Regenerate
simpleDensity2
simple_density2 Decreasing SD, constant sample size
Dataset: Coordinates Config
Regenerate
simple_overlap
simple_overlap Increasing overlap
Dataset: Coordinates Config
Regenerate
simple_all
simple_all Comprehensive displays
Dataset: Coordinates Config
Regenerate
simple_random
simple_random 2BCs+5UCs: all random
Dataset: Coordinates Config
Regenerate
Bezier
Bezier Simulate handwriting
Dataset: Coordinates Config
Regenerate
Lantern
Lantern Simulate a stick figure of a lantern
Dataset: Coordinates Config
Regenerate
design Check Arguments

Usage: Three Generation Models

node: Add UCs by relative positioning

All parameters:

Specify the number of clusters to add on this layer:

-nodeNum=1
-nodeNum=5
-nodeNum=10
-nodeNum=10
-nodeNum=10

Control sample size:

-ss=250
-ss=500
-ss=1000
-ss=2000
-ss=4000

Control Standard Deviation:

-sd

Control Overlap:

-overlap

Control Angle:

-angle

nodeFix: Add UCs by absolute positioning

All parameters:

Examples:

-angle

bezier: Add Bezier curve-based clusters (BCs)

All parameters:

Examples:

bezier

Splice clusters with -label

Examples:

-label

Add UC at the specified location of BC by -ref

Examples:

reference
-ss=500
-ss=1000
-ss=1000

Randomization
randomUC
randomUC
randomUC
randomUC
randomUC

randomBC
randomBC
randomBC
randomBC
randomBC

random
random
random
random
random


Arguments

ParametersDescriptionValueExampleDefault
Public parameters
-dDimensionPositive integer-d=22
-oOutputFilePath-o=/home/work/
-o=E:/work/
-o=/home/coordiante.txt
Current directory
-rg

Regenerate according to the original parameters in a parameter configure file

FilePath-rg=/home/myConfig.txtNone
-rp

Reproduce according to a parameter configure file

FilePath-rp=/home/myConfig.txtNone
Private parameters
-t

Three generation models:

node = add UCs using relative positioning

nodeFix = add UCs using absolute positioning

bezier = add BCs

node/nodeFix/bezier-t=nodeRequired if "-rg/rp" is not specified
Parameters for -t=node
-nodeNumThe number of UCs to add to the current layerPositive integer-nodeNum=51
-ssThe sample size for each cluster to add to the current layerPositive integer-ss=500300×(1+SD×random(0,1))
-sdThe standard deviation of normal distributionPositive number-sd=11+10×random(0,1)
-refThe reference UC for the new cluster. All UCs are numbered sequentially from 0-n. Each BC consists of multiple (default 200) UCs.A positive integer less than the number of added UCs.-ref=1round(random(0,1)×UC_Num)
-overlapThe overlap between the new cluster and the reference cluster. It is also the largest overlap between the new cluster and other clusters.Number-overlap=0[0.7×random(0,1), -1×random(0,1)]
-angleThe vector of the new cluster’s counterclockwise rotation angle, relative to the reference UC in each dimensionNumber vector. The first dimension does not need to be rotated and is denoted as 0-angle=0,30360×random(0,1)
-labelSpecify a label for the new clusterInteger-label=1Increment
-crossWhether clusters with conflicting parameters are shown (default is not displayed)0/1-cross=00
Parameters for -t=nodeFix
-ssSame as -ss in -t=node
-sdSame as -sd in -t=node
-labelSame as -label in -t=node
-coordinateThe centre coordinates of the new clusterNumber vector-coordinate=2,3Required
Parameters for -t=bezier
-bezierNumThe number of BCs to add to the current layerPositive integer-nodeNum=51
-ssSame as -ss in -t=node300×(1+SD×random(0,1))
-rssThe ratio of the ending sample size to the starting sample sizePositive numbers represent increases and negative numbers represent decreases. See publication for details.-rss=10[10×random(0,1), -10×random(0,1)]
-sdSame as -sd in -t=node2.2-2.1×random(0,1)
-rsdThe ratio of the ending SD to the starting SDPositive numbers represent increases and negative numbers represent decreases. See publication for details.-rsd=-2[5×random(0,1), -5×random(0,1)]
-controlThe control point of the Bezier curveNumber vector. The coordinate values of each control point are separated by commas-control=2,3,12,13,16,-330×max(1,SD)×random(0,1)
-labelSame as -label in -t=node
-offsetThe translation of the new BC in each dimensionNumber vector-offset=2,3None
Ⓒ 2020- FWgenetics.org