Supplementary Material for
On the Evaluation of Unsupervised Outlier Detection: Measures, Datasets, and an Empirical Study
by G. O. Campos, A. Zimek, J. Sander, R. J. G. B. Campello, B. Micenková, E. Schubert, I. Assent and M. E. Houle
Data Mining and Knowledge Discovery 30(4): 891-927, 2016, DOI: 10.1007/s10618-015-0444-8

WBC (version#03)

This dataset consists of examples of different cancer types, benign or malignant. Examples of benign cancer are considered inliers, examples of malignant cancer are considered outliers. After downsampling the outliers, following Schubert et al. [1], 10 outliers remain. 234 instances are duplicates (231 inliers and 3 outliers), therefore 229 outliers were removed from the data set with duplicates and 226 outliers from the dataset without duplicates. Furthermore, we removed 16 instances with missing values, two of them being outliers and 14 inliers. The processed data set has 9 numeric attributes and 454 instances, namely 10 outliers (2.2%) and 444 inliers (97.8%). The same pre-processing has also been applied in [2] and [3].

References:

[1] E. Schubert, R. Wojdanowski, A. Zimek, and H.-P. Kriegel. On evaluation of outlier rankings and outlier scores. In Proc. SDM, pages 1047-1058, 2012.
[2] A. Zimek, M. Gaudet, R. J. G. B. Campello, and J. Sander. Subsampling for efficient and effective unsupervised outlier detection ensembles. In Proc. KDD, pages 428-436, 2013.
[3] H.-P. Kriegel, P. Kroeger, E. Schubert, and A. Zimek. Interpreting and unifying outlier scores. In Proc. SDM, pages 13-24, 2011.

Download all data set variants used (57.1 kB). You can also access the original data. (breast-cancer-wisconsin.data)

Normalized, without duplicates

This version contains 9 attributes, 223 objects, 10 outliers (4.48%)

Download raw algorithm results (1.6 MB) Download raw algorithm evaluation table (33.4 kB)

Best Parameters

The following table contains the best (overall and per-method) results for each method and evaluation measure (when the same score was achieved twice, only the smallest k is given).
The Maximum F1-Measure is complimentary in addition to the measures in the original publication.

Algorithm k P@n Adj. P@n AP Adj. AP Max-F1 Adj. MF1 ROC AUC
KNN 1 0.90000 0.89531 0.93846 0.93557 0.94737 0.94490 0.99249
KNN 20 0.90000 0.89531 0.96250 0.96074 0.94737 0.94490 0.99718
KNNW 2 0.90000 0.89531 0.92846 0.92510 0.90000 0.89531 0.99202
KNNW 3 0.90000 0.89531 0.93846 0.93557 0.94737 0.94490 0.99249
KNNW 48 0.90000 0.89531 0.96250 0.96074 0.94737 0.94490 0.99718
LOF 57 0.90000 0.89531 0.83806 0.83045 0.90000 0.89531 0.99155
LOF 78 0.90000 0.89531 0.94762 0.94516 0.94737 0.94490 0.99484
LOF 98 0.90000 0.89531 0.95882 0.95689 0.94737 0.94490 0.99671
SimplifiedLOF 84 0.90000 0.89531 0.89558 0.89068 0.90000 0.89531 0.99202
SimplifiedLOF 98 0.90000 0.89531 0.92651 0.92306 0.90000 0.89531 0.99390
LoOP 69 0.40000 0.37183 0.42699 0.40009 0.69231 0.67786 0.96432
LoOP 96 0.50000 0.47653 0.61099 0.59273 0.66667 0.65102 0.97653
LoOP 100 0.50000 0.47653 0.63736 0.62034 0.66667 0.65102 0.97840
LDOF 87 0.40000 0.37183 0.36799 0.33832 0.56000 0.53934 0.95446
LDOF 97 0.40000 0.37183 0.40923 0.38150 0.61538 0.59733 0.95869
LDOF 98 0.40000 0.37183 0.41518 0.38772 0.61538 0.59733 0.96197
ODIN 99 0.48000 0.45559 0.42418 0.39714 0.60000 0.58122 0.96854
ODIN 100 0.48000 0.45559 0.42655 0.39962 0.62069 0.60288 0.96878
FastABOD 6 0.90000 0.89531 0.74877 0.73697 0.90000 0.89531 0.98920
FastABOD 7 0.90000 0.89531 0.94348 0.94082 0.94737 0.94490 0.99390
FastABOD 35 0.90000 0.89531 0.94762 0.94516 0.94737 0.94490 0.99484
KDEOS 2 0.00000 -0.04695 0.04344 -0.00146 0.09783 0.05547 0.43169
KDEOS 14 0.00000 -0.04695 0.07727 0.03395 0.15686 0.11728 0.66338
KDEOS 22 0.00000 -0.04695 0.07579 0.03240 0.21053 0.17346 0.63521
LDF 31 0.90000 0.89531 0.75256 0.74094 0.90000 0.89531 0.99014
LDF 49 0.90000 0.89531 0.95556 0.95347 0.94737 0.94490 0.99624
LDF 77 0.90000 0.89531 0.96250 0.96074 0.94737 0.94490 0.99718
INFLO 97 0.80000 0.79061 0.88598 0.88063 0.81818 0.80965 0.99108
INFLO 98 0.80000 0.79061 0.89012 0.88496 0.81818 0.80965 0.99202
INFLO 99 0.70000 0.68592 0.89535 0.89043 0.82353 0.81524 0.99202
COF 66 0.80000 0.79061 0.81364 0.80489 0.85714 0.85044 0.99061
COF 93 0.70000 0.68592 0.83153 0.82362 0.83333 0.82551 0.99155
COF 97 0.80000 0.79061 0.85083 0.84383 0.80000 0.79061 0.99061

Plots

Precision at n
Adjusted precision at n
Average precision
Adjusted average precision
Maximum F1 score
Adjusted maximum F1 score
ROC AUC
Diversity
A: KNN, B: KNNW, C: LOF, D: SimplifiedLOF, E: LoOP, F: LDOF
G: ODIN, H: KDEOS, I: COF, J: FastABOD, K: LDF, L: INFLO

Normalized, duplicates

This version contains 9 attributes, 454 objects, 10 outliers (2.20%)

Download raw algorithm results (1.9 MB) Download raw algorithm evaluation table (38.8 kB)

Best Parameters

The following table contains the best (overall and per-method) results for each method and evaluation measure (when the same score was achieved twice, only the smallest k is given).
The Maximum F1-Measure is complimentary in addition to the measures in the original publication.

Algorithm k P@n Adj. P@n AP Adj. AP Max-F1 Adj. MF1 ROC AUC
KNN 1 0.90000 0.89775 0.94545 0.94423 0.94737 0.94618 0.99741
KNN 4 0.90000 0.89775 0.95556 0.95455 0.94737 0.94618 0.99820
KNNW 2 0.90000 0.89775 0.94545 0.94423 0.94737 0.94618 0.99730
KNNW 72 0.90000 0.89775 0.95556 0.95455 0.94737 0.94618 0.99820
LOF 100 0.80000 0.79550 0.86016 0.85701 0.81818 0.81409 0.99505
SimplifiedLOF 79 0.50000 0.48874 0.37084 0.35667 0.56000 0.55009 0.97815
SimplifiedLOF 94 0.50000 0.48874 0.56281 0.55297 0.66667 0.65916 0.98716
SimplifiedLOF 98 0.50000 0.48874 0.56283 0.55299 0.69231 0.68538 0.98694
SimplifiedLOF 99 0.50000 0.48874 0.56738 0.55763 0.69231 0.68538 0.98716
LoOP 83 0.30000 0.28423 0.29124 0.27528 0.47368 0.46183 0.97027
LoOP 100 0.30000 0.28423 0.37487 0.36079 0.64286 0.63481 0.97995
LDOF 96 0.30000 0.28423 0.25758 0.24086 0.40909 0.39578 0.96126
LDOF 100 0.30000 0.28423 0.27836 0.26210 0.45161 0.43926 0.96577
ODIN 82 0.40000 0.38649 0.32861 0.31349 0.50000 0.48874 0.97331
ODIN 87 0.40000 0.38649 0.35442 0.33988 0.53333 0.52282 0.97568
ODIN 93 0.36667 0.35240 0.34232 0.32751 0.53333 0.52282 0.97635
FastABOD 28 0.90000 0.89775 0.93704 0.93562 0.94737 0.94618 0.99617
FastABOD 39 0.90000 0.89775 0.93846 0.93708 0.94737 0.94618 0.99640
KDEOS 6 0.20000 0.18198 0.12482 0.10511 0.30769 0.29210 0.86329
KDEOS 7 0.10000 0.07973 0.14225 0.12293 0.32000 0.30468 0.89009
LDF 88 0.90000 0.89775 0.93167 0.93013 0.90000 0.89775 0.99662
LDF 92 0.90000 0.89775 0.93846 0.93708 0.94737 0.94618 0.99640
LDF 94 0.90000 0.89775 0.94545 0.94423 0.94737 0.94618 0.99730
INFLO 78 0.50000 0.48874 0.43972 0.42710 0.69231 0.68538 0.98514
INFLO 95 0.50000 0.48874 0.64361 0.63558 0.72000 0.71369 0.98986
INFLO 99 0.50000 0.48874 0.67200 0.66461 0.72000 0.71369 0.99009
COF 90 0.30000 0.28423 0.38268 0.36877 0.62500 0.61655 0.98198
COF 98 0.50000 0.48874 0.46040 0.44824 0.61538 0.60672 0.98514

Plots

Precision at n
Adjusted precision at n
Average precision
Adjusted average precision
Maximum F1 score
Adjusted maximum F1 score
ROC AUC
Diversity
A: KNN, B: KNNW, C: LOF, D: SimplifiedLOF, E: LoOP, F: LDOF
G: ODIN, H: KDEOS, I: COF, J: FastABOD, K: LDF, L: INFLO

Not normalized, without duplicates

This version contains 9 attributes, 223 objects, 10 outliers (4.48%)

Download raw algorithm results (1.6 MB) Download raw algorithm evaluation table (33.3 kB)

Best Parameters

The following table contains the best (overall and per-method) results for each method and evaluation measure (when the same score was achieved twice, only the smallest k is given).
The Maximum F1-Measure is complimentary in addition to the measures in the original publication.

Algorithm k P@n Adj. P@n AP Adj. AP Max-F1 Adj. MF1 ROC AUC
KNN 1 0.90000 0.89531 0.93846 0.93557 0.94737 0.94490 0.99249
KNN 79 0.90000 0.89531 0.96667 0.96510 0.94737 0.94490 0.99765
KNNW 2 0.90000 0.89531 0.92846 0.92510 0.90000 0.89531 0.99202
KNNW 3 0.90000 0.89531 0.94000 0.93718 0.94737 0.94490 0.99296
KNNW 39 0.90000 0.89531 0.96250 0.96074 0.94737 0.94490 0.99718
LOF 61 0.90000 0.89531 0.94762 0.94516 0.94737 0.94490 0.99484
LOF 98 0.90000 0.89531 0.96250 0.96074 0.94737 0.94490 0.99718
SimplifiedLOF 88 0.90000 0.89531 0.92434 0.92079 0.90000 0.89531 0.99343
SimplifiedLOF 95 0.90000 0.89531 0.94545 0.94289 0.94737 0.94490 0.99437
SimplifiedLOF 100 0.90000 0.89531 0.95000 0.94765 0.94737 0.94490 0.99531
LoOP 95 0.50000 0.47653 0.68037 0.66536 0.75000 0.73826 0.97981
LoOP 98 0.60000 0.58122 0.70380 0.68989 0.75000 0.73826 0.98122
LoOP 100 0.60000 0.58122 0.72166 0.70859 0.75000 0.73826 0.98263
LDOF 89 0.40000 0.37183 0.45022 0.42441 0.59259 0.57347 0.95775
LDOF 98 0.40000 0.37183 0.48601 0.46188 0.64286 0.62609 0.96667
LDOF 100 0.40000 0.37183 0.48447 0.46026 0.66667 0.65102 0.96620
ODIN 100 0.45000 0.42418 0.40953 0.38181 0.62069 0.60288 0.96737
FastABOD 5 0.90000 0.89531 0.74414 0.73213 0.90000 0.89531 0.98779
FastABOD 6 0.90000 0.89531 0.94000 0.93718 0.94737 0.94490 0.99296
FastABOD 61 0.90000 0.89531 0.94762 0.94516 0.94737 0.94490 0.99484
KDEOS 5 0.00000 -0.04695 0.07758 0.03427 0.15385 0.11412 0.65258
KDEOS 11 0.10000 0.05775 0.07706 0.03373 0.14815 0.10816 0.61690
KDEOS 23 0.00000 -0.04695 0.08001 0.03682 0.19512 0.15733 0.63803
LDF 29 0.90000 0.89531 0.89558 0.89068 0.90000 0.89531 0.99202
LDF 35 0.90000 0.89531 0.94762 0.94516 0.94737 0.94490 0.99484
LDF 77 0.90000 0.89531 0.96667 0.96510 0.94737 0.94490 0.99765
INFLO 94 0.80000 0.79061 0.89685 0.89201 0.82353 0.81524 0.99202
INFLO 97 0.80000 0.79061 0.91833 0.91449 0.85714 0.85044 0.99343
COF 66 0.80000 0.79061 0.78155 0.77130 0.80000 0.79061 0.98685
COF 74 0.80000 0.79061 0.90763 0.90330 0.82353 0.81524 0.99343
COF 86 0.80000 0.79061 0.89488 0.88995 0.81818 0.80965 0.99437
COF 93 0.70000 0.68592 0.75422 0.74268 0.83333 0.82551 0.98920

Plots

Precision at n
Adjusted precision at n
Average precision
Adjusted average precision
Maximum F1 score
Adjusted maximum F1 score
ROC AUC
Diversity
A: KNN, B: KNNW, C: LOF, D: SimplifiedLOF, E: LoOP, F: LDOF
G: ODIN, H: KDEOS, I: COF, J: FastABOD, K: LDF, L: INFLO

Not normalized, duplicates

This version contains 9 attributes, 454 objects, 10 outliers (2.20%)

Download raw algorithm results (1.9 MB) Download raw algorithm evaluation table (38.6 kB)

Best Parameters

The following table contains the best (overall and per-method) results for each method and evaluation measure (when the same score was achieved twice, only the smallest k is given).
The Maximum F1-Measure is complimentary in addition to the measures in the original publication.

Algorithm k P@n Adj. P@n AP Adj. AP Max-F1 Adj. MF1 ROC AUC
KNN 1 0.90000 0.89775 0.94545 0.94423 0.94737 0.94618 0.99741
KNN 4 0.90000 0.89775 0.95556 0.95455 0.94737 0.94618 0.99820
KNNW 2 0.90000 0.89775 0.94545 0.94423 0.94737 0.94618 0.99730
KNNW 72 0.90000 0.89775 0.95556 0.95455 0.94737 0.94618 0.99820
LOF 100 0.80000 0.79550 0.85849 0.85530 0.81818 0.81409 0.99482
SimplifiedLOF 79 0.50000 0.48874 0.37084 0.35667 0.56000 0.55009 0.97815
SimplifiedLOF 97 0.50000 0.48874 0.55902 0.54909 0.69231 0.68538 0.98671
SimplifiedLOF 99 0.50000 0.48874 0.56738 0.55763 0.69231 0.68538 0.98716
LoOP 83 0.30000 0.28423 0.29060 0.27462 0.47368 0.46183 0.97005
LoOP 100 0.30000 0.28423 0.37571 0.36165 0.64286 0.63481 0.98018
LDOF 92 0.30000 0.28423 0.23894 0.22180 0.39130 0.37759 0.95676
LDOF 100 0.30000 0.28423 0.27793 0.26167 0.45161 0.43926 0.96554
ODIN 82 0.40000 0.38649 0.32803 0.31289 0.50000 0.48874 0.97309
ODIN 87 0.40000 0.38649 0.35378 0.33922 0.53333 0.52282 0.97556
ODIN 100 0.32500 0.30980 0.33833 0.32343 0.53333 0.52282 0.97590
FastABOD 28 0.90000 0.89775 0.93846 0.93708 0.94737 0.94618 0.99640
FastABOD 75 0.90000 0.89775 0.94000 0.93865 0.94737 0.94618 0.99662
KDEOS 2 0.00000 -0.02252 0.06865 0.04768 0.16000 0.14108 0.71937
KDEOS 7 0.00000 -0.02252 0.04472 0.02320 0.10101 0.08076 0.75293
LDF 88 0.90000 0.89775 0.93167 0.93013 0.90000 0.89775 0.99662
LDF 92 0.90000 0.89775 0.93846 0.93708 0.94737 0.94618 0.99640
LDF 94 0.90000 0.89775 0.94545 0.94423 0.94737 0.94618 0.99730
INFLO 78 0.50000 0.48874 0.43840 0.42575 0.69231 0.68538 0.98491
INFLO 95 0.50000 0.48874 0.64361 0.63558 0.72000 0.71369 0.98986
INFLO 99 0.50000 0.48874 0.67200 0.66461 0.72000 0.71369 0.99009
COF 87 0.40000 0.38649 0.39563 0.38202 0.58065 0.57120 0.98108
COF 90 0.40000 0.38649 0.41073 0.39746 0.64516 0.63717 0.98423
COF 92 0.40000 0.38649 0.43675 0.42406 0.66667 0.65916 0.98423

Plots

Precision at n
Adjusted precision at n
Average precision
Adjusted average precision
Maximum F1 score
Adjusted maximum F1 score
ROC AUC
Diversity
A: KNN, B: KNNW, C: LOF, D: SimplifiedLOF, E: LoOP, F: LDOF
G: ODIN, H: KDEOS, I: COF, J: FastABOD, K: LDF, L: INFLO