Supplementary Material for
On the Evaluation of Unsupervised Outlier Detection: Measures, Datasets, and an Empirical Study
by G. O. Campos, A. Zimek, J. Sander, R. J. G. B. Campello, B. Micenková, E. Schubert, I. Assent and M. E. Houle
Data Mining and Knowledge Discovery 30(4): 891-927, 2016, DOI: 10.1007/s10618-015-0444-8

WBC (version#06)

This dataset consists of examples of different cancer types, benign or malignant. Examples of benign cancer are considered inliers, examples of malignant cancer are considered outliers. After downsampling the outliers, following Schubert et al. [1], 10 outliers remain. 234 instances are duplicates (231 inliers and 3 outliers), therefore 229 outliers were removed from the data set with duplicates and 226 outliers from the dataset without duplicates. Furthermore, we removed 16 instances with missing values, two of them being outliers and 14 inliers. The processed data set has 9 numeric attributes and 454 instances, namely 10 outliers (2.2%) and 444 inliers (97.8%). The same pre-processing has also been applied in [2] and [3].

References:

[1] E. Schubert, R. Wojdanowski, A. Zimek, and H.-P. Kriegel. On evaluation of outlier rankings and outlier scores. In Proc. SDM, pages 1047-1058, 2012.
[2] A. Zimek, M. Gaudet, R. J. G. B. Campello, and J. Sander. Subsampling for efficient and effective unsupervised outlier detection ensembles. In Proc. KDD, pages 428-436, 2013.
[3] H.-P. Kriegel, P. Kroeger, E. Schubert, and A. Zimek. Interpreting and unifying outlier scores. In Proc. SDM, pages 13-24, 2011.

Download all data set variants used (57.1 kB). You can also access the original data. (breast-cancer-wisconsin.data)

Normalized, without duplicates

This version contains 9 attributes, 223 objects, 10 outliers (4.48%)

Download raw algorithm results (1.6 MB) Download raw algorithm evaluation table (33.0 kB)

Best Parameters

The following table contains the best (overall and per-method) results for each method and evaluation measure (when the same score was achieved twice, only the smallest k is given).
The Maximum F1-Measure is complimentary in addition to the measures in the original publication.

Algorithm k P@n Adj. P@n AP Adj. AP Max-F1 Adj. MF1 ROC AUC
KNN 1 0.80000 0.79061 0.88634 0.88100 0.88889 0.88367 0.98873
KNN 24 0.80000 0.79061 0.91556 0.91159 0.88889 0.88367 0.99343
KNN 95 0.80000 0.79061 0.91556 0.91159 0.88889 0.88367 0.99366
KNNW 1 0.80000 0.79061 0.88500 0.87960 0.88889 0.88367 0.98873
KNNW 68 0.80000 0.79061 0.91556 0.91159 0.88889 0.88367 0.99343
LOF 52 0.80000 0.79061 0.81414 0.80541 0.80000 0.79061 0.98357
LOF 56 0.80000 0.79061 0.87913 0.87346 0.88889 0.88367 0.98638
LOF 99 0.80000 0.79061 0.90000 0.89531 0.88889 0.88367 0.99108
SimplifiedLOF 71 0.80000 0.79061 0.80167 0.79236 0.80000 0.79061 0.98216
SimplifiedLOF 76 0.80000 0.79061 0.87617 0.87035 0.88889 0.88367 0.98545
SimplifiedLOF 99 0.80000 0.79061 0.88634 0.88100 0.88889 0.88367 0.98826
LoOP 87 0.50000 0.47653 0.67033 0.65486 0.69565 0.68136 0.96948
LoOP 96 0.60000 0.58122 0.72535 0.71246 0.69565 0.68136 0.97559
LoOP 100 0.60000 0.58122 0.74244 0.73034 0.69565 0.68136 0.97746
LDOF 57 0.40000 0.37183 0.31637 0.28427 0.44444 0.41836 0.92441
LDOF 90 0.40000 0.37183 0.53208 0.51011 0.61538 0.59733 0.95211
LDOF 99 0.40000 0.37183 0.60934 0.59100 0.61538 0.59733 0.95728
ODIN 98 0.45000 0.42418 0.40454 0.37659 0.57143 0.55131 0.96009
ODIN 100 0.48000 0.45559 0.40454 0.37659 0.57143 0.55131 0.96080
FastABOD 5 0.80000 0.79061 0.60992 0.59161 0.80000 0.79061 0.98122
FastABOD 7 0.70000 0.68592 0.87246 0.86647 0.82353 0.81524 0.98920
FastABOD 16 0.80000 0.79061 0.88831 0.88307 0.88889 0.88367 0.98873
FastABOD 77 0.80000 0.79061 0.89085 0.88572 0.88889 0.88367 0.98920
KDEOS 6 0.20000 0.16244 0.10862 0.06677 0.22222 0.18571 0.62770
KDEOS 16 0.00000 -0.04695 0.06856 0.02483 0.16092 0.12153 0.64554
LDF 31 0.80000 0.79061 0.88667 0.88135 0.88889 0.88367 0.98826
LDF 99 0.80000 0.79061 0.91556 0.91159 0.88889 0.88367 0.99343
INFLO 71 0.80000 0.79061 0.80241 0.79314 0.80000 0.79061 0.97934
INFLO 85 0.80000 0.79061 0.87246 0.86648 0.88889 0.88367 0.98404
INFLO 99 0.80000 0.79061 0.88091 0.87532 0.88889 0.88367 0.98685
COF 74 0.90000 0.89531 0.92521 0.92170 0.90000 0.89531 0.99531
COF 83 0.80000 0.79061 0.95874 0.95680 0.88889 0.88367 0.99765

Plots

Precision at n
Adjusted precision at n
Average precision
Adjusted average precision
Maximum F1 score
Adjusted maximum F1 score
ROC AUC
Diversity
A: KNN, B: KNNW, C: LOF, D: SimplifiedLOF, E: LoOP, F: LDOF
G: ODIN, H: KDEOS, I: COF, J: FastABOD, K: LDF, L: INFLO

Normalized, duplicates

This version contains 9 attributes, 454 objects, 10 outliers (2.20%)

Download raw algorithm results (2.1 MB) Download raw algorithm evaluation table (40.6 kB)

Best Parameters

The following table contains the best (overall and per-method) results for each method and evaluation measure (when the same score was achieved twice, only the smallest k is given).
The Maximum F1-Measure is complimentary in addition to the measures in the original publication.

Algorithm k P@n Adj. P@n AP Adj. AP Max-F1 Adj. MF1 ROC AUC
KNN 1 0.80000 0.79550 0.88000 0.87730 0.82353 0.81955 0.99505
KNN 3 0.80000 0.79550 0.90530 0.90316 0.85714 0.85393 0.99617
KNNW 1 0.80000 0.79550 0.88056 0.87787 0.82353 0.81955 0.99538
KNNW 2 0.80000 0.79550 0.90000 0.89775 0.88889 0.88639 0.99572
KNNW 4 0.80000 0.79550 0.90545 0.90333 0.88889 0.88639 0.99595
KNNW 21 0.70000 0.69324 0.87895 0.87623 0.78261 0.77771 0.99617
LOF 66 0.50000 0.48874 0.38750 0.37371 0.57143 0.56178 0.98153
LOF 73 0.50000 0.48874 0.43159 0.41879 0.66667 0.65916 0.98491
LOF 92 0.50000 0.48874 0.61784 0.60924 0.66667 0.65916 0.99032
LOF 96 0.50000 0.48874 0.66136 0.65373 0.64286 0.63481 0.99009
SimplifiedLOF 92 0.50000 0.48874 0.40005 0.38653 0.61538 0.60672 0.98198
SimplifiedLOF 96 0.50000 0.48874 0.41123 0.39797 0.64000 0.63189 0.98311
LoOP 100 0.20000 0.18198 0.26721 0.25071 0.48485 0.47325 0.96734
LDOF 92 0.10000 0.07973 0.17550 0.15693 0.36842 0.35420 0.94347
LDOF 100 0.10000 0.07973 0.20175 0.18377 0.35294 0.33837 0.95225
ODIN 85 0.20000 0.18198 0.21763 0.20001 0.38889 0.37513 0.95968
ODIN 91 0.20000 0.18198 0.23636 0.21916 0.41026 0.39697 0.96363
ODIN 92 0.20000 0.18198 0.23914 0.22200 0.41026 0.39697 0.96385
ODIN 100 0.20000 0.18198 0.23866 0.22151 0.40816 0.39483 0.96543
FastABOD 28 0.70000 0.69324 0.88161 0.87894 0.82353 0.81955 0.99572
FastABOD 71 0.80000 0.79550 0.89263 0.89021 0.82353 0.81955 0.99617
KDEOS 2 0.10000 0.07973 0.11182 0.09181 0.21053 0.19275 0.90338
KDEOS 7 0.10000 0.07973 0.12836 0.10873 0.29630 0.28045 0.81532
KDEOS 8 0.20000 0.18198 0.09622 0.07586 0.20000 0.18198 0.80315
LDF 68 0.70000 0.69324 0.63999 0.63188 0.70000 0.69324 0.99122
LDF 92 0.70000 0.69324 0.66986 0.66242 0.77778 0.77277 0.99144
LDF 96 0.70000 0.69324 0.81153 0.80728 0.77778 0.77277 0.99302
INFLO 92 0.40000 0.38649 0.39392 0.38027 0.61538 0.60672 0.98131
INFLO 99 0.50000 0.48874 0.45602 0.44376 0.60870 0.59988 0.98356
INFLO 100 0.50000 0.48874 0.45796 0.44576 0.60870 0.59988 0.98378
COF 82 0.40000 0.38649 0.29653 0.28068 0.42857 0.41570 0.96847
COF 91 0.40000 0.38649 0.35095 0.33633 0.56250 0.55265 0.97815
COF 98 0.40000 0.38649 0.44555 0.43306 0.64286 0.63481 0.97703

Plots

Precision at n
Adjusted precision at n
Average precision
Adjusted average precision
Maximum F1 score
Adjusted maximum F1 score
ROC AUC
Diversity
A: KNN, B: KNNW, C: LOF, D: SimplifiedLOF, E: LoOP, F: LDOF
G: ODIN, H: KDEOS, I: COF, J: FastABOD, K: LDF, L: INFLO

Not normalized, without duplicates

This version contains 9 attributes, 223 objects, 10 outliers (4.48%)

Download raw algorithm results (1.6 MB) Download raw algorithm evaluation table (32.8 kB)

Best Parameters

The following table contains the best (overall and per-method) results for each method and evaluation measure (when the same score was achieved twice, only the smallest k is given).
The Maximum F1-Measure is complimentary in addition to the measures in the original publication.

Algorithm k P@n Adj. P@n AP Adj. AP Max-F1 Adj. MF1 ROC AUC
KNN 1 0.80000 0.79061 0.88634 0.88100 0.88889 0.88367 0.98873
KNN 24 0.80000 0.79061 0.91556 0.91159 0.88889 0.88367 0.99343
KNN 95 0.80000 0.79061 0.91556 0.91159 0.88889 0.88367 0.99366
KNNW 1 0.80000 0.79061 0.88500 0.87960 0.88889 0.88367 0.98873
KNNW 68 0.80000 0.79061 0.91556 0.91159 0.88889 0.88367 0.99343
LOF 52 0.80000 0.79061 0.80442 0.79524 0.80000 0.79061 0.98310
LOF 56 0.80000 0.79061 0.87913 0.87346 0.88889 0.88367 0.98638
LOF 99 0.80000 0.79061 0.90000 0.89531 0.88889 0.88367 0.99108
SimplifiedLOF 68 0.80000 0.79061 0.78783 0.77786 0.80000 0.79061 0.98075
SimplifiedLOF 76 0.80000 0.79061 0.87617 0.87035 0.88889 0.88367 0.98545
SimplifiedLOF 99 0.80000 0.79061 0.88634 0.88100 0.88889 0.88367 0.98826
LoOP 85 0.50000 0.47653 0.66700 0.65137 0.69565 0.68136 0.96808
LoOP 94 0.60000 0.58122 0.70867 0.69500 0.69565 0.68136 0.97371
LoOP 100 0.60000 0.58122 0.73828 0.72600 0.69565 0.68136 0.97746
LDOF 57 0.40000 0.37183 0.31597 0.28385 0.44444 0.41836 0.92394
LDOF 89 0.40000 0.37183 0.54044 0.51886 0.61538 0.59733 0.94883
LDOF 99 0.40000 0.37183 0.60934 0.59100 0.61538 0.59733 0.95728
ODIN 91 0.45000 0.42418 0.39963 0.37144 0.55172 0.53068 0.95845
ODIN 98 0.45000 0.42418 0.40228 0.37422 0.57143 0.55131 0.95986
FastABOD 4 0.80000 0.79061 0.62492 0.60731 0.81818 0.80965 0.98263
FastABOD 17 0.80000 0.79061 0.88634 0.88100 0.88889 0.88367 0.98826
FastABOD 62 0.80000 0.79061 0.89499 0.89006 0.88889 0.88367 0.99014
KDEOS 6 0.20000 0.16244 0.10893 0.06709 0.22222 0.18571 0.63099
KDEOS 16 0.00000 -0.04695 0.06912 0.02541 0.15909 0.11961 0.64648
LDF 30 0.80000 0.79061 0.87556 0.86971 0.84211 0.83469 0.98779
LDF 31 0.80000 0.79061 0.88667 0.88135 0.88889 0.88367 0.98826
LDF 99 0.80000 0.79061 0.91556 0.91159 0.88889 0.88367 0.99343
INFLO 71 0.80000 0.79061 0.80241 0.79314 0.80000 0.79061 0.97934
INFLO 85 0.80000 0.79061 0.87246 0.86648 0.88889 0.88367 0.98404
INFLO 99 0.80000 0.79061 0.88091 0.87532 0.88889 0.88367 0.98685
COF 89 0.90000 0.89531 0.98333 0.98255 0.94737 0.94490 0.99906

Plots

Precision at n
Adjusted precision at n
Average precision
Adjusted average precision
Maximum F1 score
Adjusted maximum F1 score
ROC AUC
Diversity
A: KNN, B: KNNW, C: LOF, D: SimplifiedLOF, E: LoOP, F: LDOF
G: ODIN, H: KDEOS, I: COF, J: FastABOD, K: LDF, L: INFLO

Not normalized, duplicates

This version contains 9 attributes, 454 objects, 10 outliers (2.20%)

Download raw algorithm results (1.9 MB) Download raw algorithm evaluation table (41.5 kB)

Best Parameters

The following table contains the best (overall and per-method) results for each method and evaluation measure (when the same score was achieved twice, only the smallest k is given).
The Maximum F1-Measure is complimentary in addition to the measures in the original publication.

Algorithm k P@n Adj. P@n AP Adj. AP Max-F1 Adj. MF1 ROC AUC
KNN 1 0.80000 0.79550 0.81728 0.81316 0.80000 0.79550 0.99403
KNN 2 0.80000 0.79550 0.86681 0.86381 0.84211 0.83855 0.99426
KNN 13 0.80000 0.79550 0.87612 0.87333 0.84211 0.83855 0.99527
KNNW 3 0.80000 0.79550 0.88728 0.88475 0.84211 0.83855 0.99527
LOF 88 0.60000 0.59099 0.45032 0.43794 0.64516 0.63717 0.98581
LOF 92 0.60000 0.59099 0.49701 0.48568 0.66667 0.65916 0.98626
LOF 95 0.50000 0.48874 0.63417 0.62593 0.64000 0.63189 0.98896
SimplifiedLOF 89 0.40000 0.38649 0.33149 0.31644 0.51852 0.50767 0.97365
SimplifiedLOF 93 0.40000 0.38649 0.40240 0.38894 0.66667 0.65916 0.98243
LoOP 93 0.30000 0.28423 0.26592 0.24939 0.50000 0.48874 0.96351
LoOP 99 0.20000 0.18198 0.28797 0.27193 0.53846 0.52807 0.96712
LoOP 100 0.20000 0.18198 0.29094 0.27497 0.53846 0.52807 0.96757
LDOF 79 0.10000 0.07973 0.13808 0.11867 0.30435 0.28868 0.91847
LDOF 95 0.10000 0.07973 0.18912 0.17086 0.38889 0.37513 0.94617
LDOF 99 0.10000 0.07973 0.20379 0.18585 0.37838 0.36438 0.94977
ODIN 3 0.23077 0.21344 0.14033 0.12096 0.26087 0.24422 0.86453
ODIN 98 0.15000 0.13086 0.23673 0.21954 0.42553 0.41259 0.96385
ODIN 100 0.16667 0.14790 0.24158 0.22450 0.42553 0.41259 0.96464
FastABOD 28 0.80000 0.79550 0.90444 0.90229 0.84211 0.83855 0.99707
FastABOD 37 0.80000 0.79550 0.90812 0.90605 0.84211 0.83855 0.99730
FastABOD 60 0.80000 0.79550 0.91021 0.90819 0.84211 0.83855 0.99730
FastABOD 83 0.80000 0.79550 0.90487 0.90273 0.85714 0.85393 0.99707
KDEOS 2 0.00000 -0.02252 0.06180 0.04067 0.15054 0.13141 0.71712
KDEOS 3 0.00000 -0.02252 0.06071 0.03956 0.15190 0.13280 0.76982
LDF 88 0.70000 0.69324 0.77204 0.76690 0.70000 0.69324 0.99077
LDF 92 0.60000 0.59099 0.80782 0.80349 0.75000 0.74437 0.99189
LDF 93 0.70000 0.69324 0.82338 0.81940 0.75000 0.74437 0.99257
INFLO 86 0.40000 0.38649 0.36269 0.34833 0.64000 0.63189 0.97725
INFLO 93 0.50000 0.48874 0.39968 0.38616 0.64000 0.63189 0.98108
INFLO 99 0.40000 0.38649 0.40725 0.39390 0.64000 0.63189 0.98198
COF 89 0.20000 0.18198 0.30639 0.29077 0.50000 0.48874 0.97365
COF 91 0.20000 0.18198 0.29636 0.28051 0.52632 0.51565 0.97365
COF 99 0.30000 0.28423 0.31493 0.29950 0.40000 0.38649 0.94212
COF 100 0.20000 0.18198 0.32674 0.31157 0.42857 0.41570 0.95068

Plots

Precision at n
Adjusted precision at n
Average precision
Adjusted average precision
Maximum F1 score
Adjusted maximum F1 score
ROC AUC
Diversity
A: KNN, B: KNNW, C: LOF, D: SimplifiedLOF, E: LoOP, F: LDOF
G: ODIN, H: KDEOS, I: COF, J: FastABOD, K: LDF, L: INFLO