Supplementary Material for
On the Evaluation of Unsupervised Outlier Detection: Measures, Datasets, and an Empirical Study
by G. O. Campos, A. Zimek, J. Sander, R. J. G. B. Campello, B. Micenková, E. Schubert, I. Assent and M. E. Houle
Data Mining and Knowledge Discovery 30(4): 891-927, 2016, DOI: 10.1007/s10618-015-0444-8

WBC (version#10)

This dataset consists of examples of different cancer types, benign or malignant. Examples of benign cancer are considered inliers, examples of malignant cancer are considered outliers. After downsampling the outliers, following Schubert et al. [1], 10 outliers remain. 234 instances are duplicates (231 inliers and 3 outliers), therefore 229 outliers were removed from the data set with duplicates and 226 outliers from the dataset without duplicates. Furthermore, we removed 16 instances with missing values, two of them being outliers and 14 inliers. The processed data set has 9 numeric attributes and 454 instances, namely 10 outliers (2.2%) and 444 inliers (97.8%). The same pre-processing has also been applied in [2] and [3].

References:

[1] E. Schubert, R. Wojdanowski, A. Zimek, and H.-P. Kriegel. On evaluation of outlier rankings and outlier scores. In Proc. SDM, pages 1047-1058, 2012.
[2] A. Zimek, M. Gaudet, R. J. G. B. Campello, and J. Sander. Subsampling for efficient and effective unsupervised outlier detection ensembles. In Proc. KDD, pages 428-436, 2013.
[3] H.-P. Kriegel, P. Kroeger, E. Schubert, and A. Zimek. Interpreting and unifying outlier scores. In Proc. SDM, pages 13-24, 2011.

Download all data set variants used (57.1 kB). You can also access the original data. (breast-cancer-wisconsin.data)

Normalized, without duplicates

This version contains 9 attributes, 223 objects, 10 outliers (4.48%)

Download raw algorithm results (1.6 MB) Download raw algorithm evaluation table (34.0 kB)

Best Parameters

The following table contains the best (overall and per-method) results for each method and evaluation measure (when the same score was achieved twice, only the smallest k is given).
The Maximum F1-Measure is complimentary in addition to the measures in the original publication.

Algorithm k P@n Adj. P@n AP Adj. AP Max-F1 Adj. MF1 ROC AUC
KNN 12 0.90000 0.89531 0.95263 0.95041 0.94737 0.94490 0.99577
KNN 19 0.90000 0.89531 0.96250 0.96074 0.94737 0.94490 0.99718
KNNW 4 0.90000 0.89531 0.91735 0.91347 0.90000 0.89531 0.99155
KNNW 14 0.90000 0.89531 0.94545 0.94289 0.94737 0.94490 0.99437
KNNW 47 0.90000 0.89531 0.96250 0.96074 0.94737 0.94490 0.99718
LOF 60 0.90000 0.89531 0.88306 0.87757 0.90000 0.89531 0.99249
LOF 77 0.90000 0.89531 0.95000 0.94765 0.94737 0.94490 0.99531
LOF 98 0.90000 0.89531 0.95882 0.95689 0.94737 0.94490 0.99671
SimplifiedLOF 87 0.90000 0.89531 0.89972 0.89501 0.90000 0.89531 0.99296
SimplifiedLOF 99 0.90000 0.89531 0.91639 0.91246 0.90000 0.89531 0.99390
LoOP 89 0.40000 0.37183 0.60740 0.58896 0.72000 0.70685 0.97840
LoOP 94 0.50000 0.47653 0.62478 0.60716 0.72000 0.70685 0.97887
LoOP 99 0.50000 0.47653 0.64583 0.62920 0.72000 0.70685 0.98028
LDOF 86 0.40000 0.37183 0.38706 0.35828 0.56250 0.54196 0.95728
LDOF 96 0.40000 0.37183 0.42293 0.39584 0.64286 0.62609 0.96338
LDOF 99 0.40000 0.37183 0.43315 0.40654 0.64286 0.62609 0.96526
ODIN 75 0.50000 0.47653 0.38916 0.36049 0.58065 0.56096 0.95962
ODIN 79 0.42500 0.39800 0.38334 0.35439 0.62069 0.60288 0.96103
ODIN 100 0.50000 0.47653 0.43345 0.40685 0.62069 0.60288 0.96737
FastABOD 13 0.90000 0.89531 0.93348 0.93036 0.90000 0.89531 0.99343
FastABOD 38 0.90000 0.89531 0.94545 0.94289 0.94737 0.94490 0.99437
FastABOD 49 0.90000 0.89531 0.94762 0.94516 0.94737 0.94490 0.99484
KDEOS 5 0.10000 0.05775 0.11524 0.07370 0.25641 0.22150 0.64789
KDEOS 7 0.20000 0.16244 0.11096 0.06922 0.22222 0.18571 0.61690
KDEOS 8 0.20000 0.16244 0.14384 0.10365 0.25000 0.21479 0.62770
KDEOS 13 0.20000 0.16244 0.11709 0.07564 0.27273 0.23858 0.55587
LDF 35 0.90000 0.89531 0.80256 0.79329 0.90000 0.89531 0.99061
LDF 49 0.90000 0.89531 0.95556 0.95347 0.94737 0.94490 0.99624
LDF 71 0.90000 0.89531 0.96250 0.96074 0.94737 0.94490 0.99718
INFLO 99 0.90000 0.89531 0.91639 0.91246 0.90000 0.89531 0.99390
COF 74 0.90000 0.89531 0.82853 0.82048 0.90000 0.89531 0.99437
COF 75 0.70000 0.68592 0.86667 0.86041 0.77778 0.76734 0.99202

Plots

Precision at n
Adjusted precision at n
Average precision
Adjusted average precision
Maximum F1 score
Adjusted maximum F1 score
ROC AUC
Diversity
A: KNN, B: KNNW, C: LOF, D: SimplifiedLOF, E: LoOP, F: LDOF
G: ODIN, H: KDEOS, I: COF, J: FastABOD, K: LDF, L: INFLO

Normalized, duplicates

This version contains 9 attributes, 454 objects, 10 outliers (2.20%)

Download raw algorithm results (1.9 MB) Download raw algorithm evaluation table (39.4 kB)

Best Parameters

The following table contains the best (overall and per-method) results for each method and evaluation measure (when the same score was achieved twice, only the smallest k is given).
The Maximum F1-Measure is complimentary in addition to the measures in the original publication.

Algorithm k P@n Adj. P@n AP Adj. AP Max-F1 Adj. MF1 ROC AUC
KNN 3 0.80000 0.79550 0.93750 0.93609 0.88889 0.88639 0.99809
KNN 15 0.90000 0.89775 0.94000 0.93865 0.90000 0.89775 0.99752
KNNW 4 0.80000 0.79550 0.90444 0.90229 0.84211 0.83855 0.99662
KNNW 5 0.80000 0.79550 0.91556 0.91365 0.88889 0.88639 0.99685
KNNW 38 0.80000 0.79550 0.92186 0.92010 0.88889 0.88639 0.99707
LOF 92 0.70000 0.69324 0.65213 0.64430 0.70000 0.69324 0.99077
LOF 94 0.70000 0.69324 0.74238 0.73658 0.77778 0.77277 0.99347
SimplifiedLOF 84 0.40000 0.38649 0.40181 0.38833 0.69231 0.68538 0.98221
SimplifiedLOF 94 0.50000 0.48874 0.44486 0.43236 0.69231 0.68538 0.98626
SimplifiedLOF 98 0.50000 0.48874 0.44610 0.43362 0.66667 0.65916 0.98604
LoOP 1 0.20000 0.18198 0.11891 0.09907 0.28571 0.26963 0.79381
LoOP 96 0.20000 0.18198 0.34641 0.33169 0.64286 0.63481 0.97703
LoOP 99 0.20000 0.18198 0.34933 0.33467 0.64286 0.63481 0.97725
LDOF 85 0.20000 0.18198 0.21972 0.20214 0.42857 0.41570 0.95541
LDOF 98 0.20000 0.18198 0.26967 0.25322 0.46667 0.45465 0.96554
LDOF 99 0.20000 0.18198 0.27187 0.25547 0.46667 0.45465 0.96599
ODIN 94 0.43333 0.42057 0.39718 0.38360 0.55172 0.54163 0.98119
ODIN 97 0.45000 0.43761 0.40829 0.39496 0.55172 0.54163 0.98164
ODIN 98 0.45000 0.43761 0.40968 0.39638 0.55172 0.54163 0.98176
ODIN 99 0.40000 0.38649 0.41566 0.40250 0.55172 0.54163 0.98153
FastABOD 28 0.90000 0.89775 0.95000 0.94887 0.94737 0.94618 0.99775
FastABOD 36 0.90000 0.89775 0.95556 0.95455 0.94737 0.94618 0.99820
KDEOS 5 0.20000 0.18198 0.12685 0.10718 0.24000 0.22288 0.88694
KDEOS 6 0.10000 0.07973 0.13702 0.11758 0.31579 0.30038 0.89977
KDEOS 9 0.20000 0.18198 0.18133 0.16289 0.38889 0.37513 0.87050
LDF 87 0.80000 0.79550 0.71352 0.70707 0.84211 0.83855 0.99347
LDF 89 0.80000 0.79550 0.76683 0.76158 0.84211 0.83855 0.99392
LDF 100 0.80000 0.79550 0.82169 0.81767 0.84211 0.83855 0.99392
INFLO 85 0.50000 0.48874 0.43681 0.42413 0.66667 0.65916 0.98468
INFLO 94 0.50000 0.48874 0.50772 0.49663 0.69231 0.68538 0.98829
INFLO 99 0.50000 0.48874 0.50930 0.49825 0.66667 0.65916 0.98806
COF 92 0.40000 0.38649 0.36670 0.35244 0.51429 0.50335 0.97545
COF 100 0.40000 0.38649 0.35702 0.34254 0.56000 0.55009 0.97117

Plots

Precision at n
Adjusted precision at n
Average precision
Adjusted average precision
Maximum F1 score
Adjusted maximum F1 score
ROC AUC
Diversity
A: KNN, B: KNNW, C: LOF, D: SimplifiedLOF, E: LoOP, F: LDOF
G: ODIN, H: KDEOS, I: COF, J: FastABOD, K: LDF, L: INFLO

Not normalized, without duplicates

This version contains 9 attributes, 223 objects, 10 outliers (4.48%)

Download raw algorithm results (1.6 MB) Download raw algorithm evaluation table (33.6 kB)

Best Parameters

The following table contains the best (overall and per-method) results for each method and evaluation measure (when the same score was achieved twice, only the smallest k is given).
The Maximum F1-Measure is complimentary in addition to the measures in the original publication.

Algorithm k P@n Adj. P@n AP Adj. AP Max-F1 Adj. MF1 ROC AUC
KNN 2 0.90000 0.89531 0.90987 0.90564 0.90000 0.89531 0.99272
KNN 13 0.90000 0.89531 0.96250 0.96074 0.94737 0.94490 0.99718
KNN 66 0.90000 0.89531 0.96667 0.96510 0.94737 0.94490 0.99765
KNNW 4 0.90000 0.89531 0.90806 0.90374 0.90000 0.89531 0.99202
KNNW 16 0.90000 0.89531 0.95263 0.95041 0.94737 0.94490 0.99577
KNNW 37 0.90000 0.89531 0.96250 0.96074 0.94737 0.94490 0.99718
LOF 59 0.90000 0.89531 0.91401 0.90997 0.90000 0.89531 0.99343
LOF 61 0.90000 0.89531 0.94762 0.94516 0.94737 0.94490 0.99484
LOF 84 0.90000 0.89531 0.95882 0.95689 0.94737 0.94490 0.99671
SimplifiedLOF 84 0.90000 0.89531 0.93545 0.93242 0.90000 0.89531 0.99390
SimplifiedLOF 87 0.90000 0.89531 0.94545 0.94289 0.94737 0.94490 0.99437
SimplifiedLOF 98 0.90000 0.89531 0.95263 0.95041 0.94737 0.94490 0.99577
LoOP 68 0.50000 0.47653 0.49667 0.47304 0.66667 0.65102 0.96667
LoOP 91 0.50000 0.47653 0.71416 0.70074 0.75000 0.73826 0.98216
LoOP 96 0.50000 0.47653 0.73416 0.72168 0.75000 0.73826 0.98263
LDOF 75 0.40000 0.37183 0.32410 0.29237 0.50000 0.47653 0.94225
LDOF 96 0.40000 0.37183 0.49928 0.47577 0.66667 0.65102 0.96526
LDOF 98 0.40000 0.37183 0.51020 0.48720 0.66667 0.65102 0.96901
ODIN 80 0.35000 0.31948 0.38578 0.35694 0.62069 0.60288 0.95869
ODIN 100 0.50000 0.47653 0.41811 0.39079 0.62069 0.60288 0.96620
FastABOD 7 0.90000 0.89531 0.90987 0.90564 0.90000 0.89531 0.99249
FastABOD 40 0.90000 0.89531 0.94762 0.94516 0.94737 0.94490 0.99484
KDEOS 5 0.10000 0.05775 0.12288 0.08170 0.30769 0.27519 0.64413
KDEOS 6 0.20000 0.16244 0.11145 0.06974 0.22642 0.19010 0.64319
KDEOS 8 0.20000 0.16244 0.20335 0.16595 0.30769 0.27519 0.64131
LDF 30 0.90000 0.89531 0.89558 0.89068 0.90000 0.89531 0.99202
LDF 35 0.90000 0.89531 0.94762 0.94516 0.94737 0.94490 0.99484
LDF 77 0.90000 0.89531 0.96667 0.96510 0.94737 0.94490 0.99765
INFLO 97 0.90000 0.89531 0.92651 0.92306 0.90000 0.89531 0.99390
INFLO 98 0.90000 0.89531 0.94556 0.94300 0.90000 0.89531 0.99577
INFLO 100 0.90000 0.89531 0.95000 0.94765 0.94737 0.94490 0.99531
COF 74 0.80000 0.79061 0.76014 0.74888 0.86957 0.86344 0.99296
COF 75 0.70000 0.68592 0.82347 0.81518 0.72727 0.71447 0.98873

Plots

Precision at n
Adjusted precision at n
Average precision
Adjusted average precision
Maximum F1 score
Adjusted maximum F1 score
ROC AUC
Diversity
A: KNN, B: KNNW, C: LOF, D: SimplifiedLOF, E: LoOP, F: LDOF
G: ODIN, H: KDEOS, I: COF, J: FastABOD, K: LDF, L: INFLO

Not normalized, duplicates

This version contains 9 attributes, 454 objects, 10 outliers (2.20%)

Download raw algorithm results (1.9 MB) Download raw algorithm evaluation table (39.0 kB)

Best Parameters

The following table contains the best (overall and per-method) results for each method and evaluation measure (when the same score was achieved twice, only the smallest k is given).
The Maximum F1-Measure is complimentary in addition to the measures in the original publication.

Algorithm k P@n Adj. P@n AP Adj. AP Max-F1 Adj. MF1 ROC AUC
KNN 3 0.80000 0.79550 0.95325 0.95219 0.88889 0.88639 0.99887
KNN 15 0.90000 0.89775 0.94000 0.93865 0.90000 0.89775 0.99752
KNNW 4 0.80000 0.79550 0.90069 0.89846 0.84211 0.83855 0.99640
KNNW 5 0.80000 0.79550 0.91181 0.90982 0.88889 0.88639 0.99662
KNNW 38 0.80000 0.79550 0.92186 0.92010 0.88889 0.88639 0.99707
LOF 91 0.70000 0.69324 0.78611 0.78129 0.73684 0.73092 0.99302
LOF 94 0.70000 0.69324 0.87010 0.86717 0.82353 0.81955 0.99482
LOF 95 0.70000 0.69324 0.87273 0.86986 0.82353 0.81955 0.99505
SimplifiedLOF 83 0.50000 0.48874 0.40512 0.39173 0.66667 0.65916 0.98108
SimplifiedLOF 89 0.50000 0.48874 0.43831 0.42566 0.72000 0.71369 0.98468
SimplifiedLOF 96 0.50000 0.48874 0.46669 0.45468 0.69231 0.68538 0.98716
LoOP 90 0.20000 0.18198 0.32304 0.30779 0.62069 0.61215 0.97342
LoOP 98 0.30000 0.28423 0.35079 0.33617 0.62069 0.61215 0.97680
LoOP 100 0.30000 0.28423 0.35382 0.33927 0.62069 0.61215 0.97703
LDOF 86 0.20000 0.18198 0.20560 0.18771 0.40909 0.39578 0.95045
LDOF 99 0.20000 0.18198 0.26328 0.24669 0.45714 0.44492 0.96419
LDOF 100 0.20000 0.18198 0.25651 0.23976 0.45714 0.44492 0.96509
ODIN 87 0.40000 0.38649 0.35557 0.34106 0.53333 0.52282 0.97770
ODIN 94 0.36667 0.35240 0.39159 0.37789 0.53846 0.52807 0.97939
ODIN 95 0.35000 0.33536 0.36538 0.35109 0.55172 0.54163 0.97815
FastABOD 24 0.80000 0.79550 0.64226 0.63420 0.80000 0.79550 0.99392
FastABOD 28 0.80000 0.79550 0.94432 0.94306 0.88889 0.88639 0.99820
FastABOD 32 0.80000 0.79550 0.94848 0.94732 0.88889 0.88639 0.99842
KDEOS 2 0.00000 -0.02252 0.07311 0.05223 0.17582 0.15726 0.80327
LDF 80 0.80000 0.79550 0.70463 0.69798 0.80000 0.79550 0.99324
LDF 88 0.80000 0.79550 0.90167 0.89945 0.88889 0.88639 0.99550
LDF 100 0.80000 0.79550 0.90275 0.90056 0.88889 0.88639 0.99527
INFLO 84 0.50000 0.48874 0.44344 0.43091 0.69565 0.68880 0.98491
INFLO 98 0.60000 0.59099 0.60412 0.59520 0.66667 0.65916 0.98941
INFLO 100 0.60000 0.59099 0.64309 0.63505 0.66667 0.65916 0.99009
COF 92 0.40000 0.38649 0.47570 0.46389 0.53333 0.52282 0.97860
COF 100 0.50000 0.48874 0.42166 0.40863 0.60870 0.59988 0.96892

Plots

Precision at n
Adjusted precision at n
Average precision
Adjusted average precision
Maximum F1 score
Adjusted maximum F1 score
ROC AUC
Diversity
A: KNN, B: KNNW, C: LOF, D: SimplifiedLOF, E: LoOP, F: LDOF
G: ODIN, H: KDEOS, I: COF, J: FastABOD, K: LDF, L: INFLO