Data set | Dimensionality |
Euclidean, Manhattan and Arc-Cosine distance | |
All-Relevant | 10d 20d 40d 80d 160d 320d 640d |
10-Relevant | 10d 20d 40d 80d 160d 320d 640d |
Cyc-Relevant | 10d 20d 40d 80d 160d 320d 640d |
Half-Relevant | 10d 20d 40d 80d 160d 320d 640d |
All-Dependent | 10d 20d 40d 80d 160d 320d 640d |
10-Dependent | 10d 20d 40d 80d 160d 320d 640d |
SNN with s=200 based on the given distance | |
All-Relevant | 10d 20d 40d 80d 160d 320d 640d |
10-Relevant | 10d 20d 40d 80d 160d 320d 640d |
Cyc-Relevant | 10d 20d 40d 80d 160d 320d 640d |
Half-Relevant | 10d 20d 40d 80d 160d 320d 640d |
All-Dependent | 10d 20d 40d 80d 160d 320d 640d |
10-Dependent | 10d 20d 40d 80d 160d 320d 640d |
The following plots have for one data set and one distance function three dimensionalities at the same time.
All-Relevant | Manhattan | Euclidean | L0.6 | L0.8 | Arccosine |
10-Relevant | Manhattan | Euclidean | L0.6 | L0.8 | Arccosine |
Cyc-Relevant | Manhattan | Euclidean | L0.6 | L0.8 | Arccosine |
Half-Relevant | Manhattan | Euclidean | L0.6 | L0.8 | Arccosine |
All-Dependent | Manhattan | Euclidean | L0.6 | L0.8 | Arccosine |
10-Dependent | Manhattan | Euclidean | L0.6 | L0.8 | Arccosine |
The following plots have for one data set and one base distance function at 160 dimensions for different SNN s values at the same time.
All-Relevant | Manhattan | Euclidean | L0.6 | L0.8 | Arccosine |
10-Relevant | Manhattan | Euclidean | L0.6 | L0.8 | Arccosine |
Cyc-Relevant | Manhattan | Euclidean | L0.6 | L0.8 | Arccosine |
Half-Relevant | Manhattan | Euclidean | L0.6 | L0.8 | Arccosine |
All-Dependent | Manhattan | Euclidean | L0.6 | L0.8 | Arccosine |
10-Dependent | Manhattan | Euclidean | L0.6 | L0.8 | Arccosine |
Centrality is based on the distribution density (at data generation) and part of the ground truth. Central points are typical to the cluster (high density). Points are ordered by their density, no absolute values are used.
Many plots - especially at high dimensionality - are not very interesting.
The Cyc-relevant plots are much less stable because of the smaller data set size compared to the others (1:10).