Diese Seite ist aus Gründen der Barrierefreiheit optimiert für aktuelle Browser. Sollten Sie einen älteren Browser verwenden, kann es zu Einschränkungen der Darstellung und Benutzbarkeit der Website kommen!
Lehrstuhl  |  Institut  |  Fakultät  |  LMU
print
Distance Histograms
Data setEuclideanSNN distances
Artificial data series, at d=10, d=160, d=640
All-Relevant All-Relevant 50 80 100 125 200 500
10-Relevant 10-Relevant 50 80 100 125 200 500
Cyc-Relevant Cyc-Relevant 50 80 100 125 200 500
Half-Relevant Half-Relevant 50 80 100 125 200 500
All-Dependent All-Dependent 50 80 100 125 200 500
10-Dependent 10-Dependent 50 80 100 125 200 500
Real data, at native dimension of feature vector
ALOI ALOI 5 8 10 12 15 20 50 100 150 200 500 1000
Multiple Features (All) Multifeat-all 20 50 80 100 125 200 250 300 500 1000
Multiple Features (Pixel only) Multifeat-pixel 20 50 80 100 125 200 250 300 500 1000
Optical Digits optdigits.pdf 20 50 80 100 125 200 250 300 500 1000
Notes:

All results in this series were done using Euclidean distance or a SNN distance based on Euclidean distance.

For the artificial data sets, distances were scaled by 1/sqrt(d), since the diagnonal of the unit cube in Euclidean distance grows with sqrt(d). This way, multiple dimensionalities can be compared in the same plot.

In Euclidean distance, it can be clearly seen that even in the 10 dimensional data sets (but also all the real-world data sets), distances were approximately Gaussian distributed. A way to explain this is by using the Central Limit Theorem. It does not apply for the SNN setup, since these are not based on a sum of axis components.

Even the correlated data sets are approximately normally distributed. However the normalization applied when plotting the graphs fails for these data sets, causing the curves to not overlap.

blank
Datenschutz   Impressum