Supplementary Material for
On the Evaluation of Unsupervised Outlier Detection: Measures, Datasets, and an Empirical Study
by G. O. Campos, A. Zimek, J. Sander, R. J. G. B. Campello, B. Micenková, E. Schubert, I. Assent and M. E. Houle
Data Mining and Knowledge Discovery 30(4): 891-927, 2016, DOI: 10.1007/s10618-015-0444-8

WDBC (version#05)

This data set describes nuclear characteristics for breast cancer diagnosis. Again, we consider examples of benign cancer as inliers and malignant cancer as outliers. In the preprocessing, we follow Zhang et al. [1], downsampling the outliers to 10. The processed database has 30 numeric attributes and 367 instances, namely 10 outliers (2.72%) and 357 inliers (97.28%).

References:

[1] K. Zhang, M. Hutter, and H. Jin. A new local distance-based outlier detection approach for scattered real-world data. In Proc. PAKDD, pages 813-822, 2009.

Download all data set variants used (1.1 MB). You can also access the original data. (wdbc.data)

Normalized, without duplicates

This version contains 30 attributes, 367 objects, 10 outliers (2.72%)

Download raw algorithm results (3.3 MB) Download raw algorithm evaluation table (39.9 kB)

Best Parameters

The following table contains the best (overall and per-method) results for each method and evaluation measure (when the same score was achieved twice, only the smallest k is given).
The Maximum F1-Measure is complimentary in addition to the measures in the original publication.

Algorithm k P@n Adj. P@n AP Adj. AP Max-F1 Adj. MF1 ROC AUC
KNN 3 0.50000 0.48599 0.29421 0.27444 0.52632 0.51305 0.94314
KNN 79 0.50000 0.48599 0.42661 0.41055 0.52632 0.51305 0.95798
KNNW 26 0.50000 0.48599 0.33033 0.31157 0.50000 0.48599 0.94734
KNNW 88 0.50000 0.48599 0.40376 0.38706 0.52632 0.51305 0.95294
KNNW 96 0.50000 0.48599 0.40929 0.39275 0.52632 0.51305 0.95434
LOF 45 0.50000 0.48599 0.39707 0.38018 0.50000 0.48599 0.94874
LOF 81 0.50000 0.48599 0.43261 0.41671 0.54545 0.53272 0.95686
LOF 86 0.50000 0.48599 0.41965 0.40340 0.52632 0.51305 0.95714
SimplifiedLOF 70 0.50000 0.48599 0.39162 0.37457 0.50000 0.48599 0.95014
SimplifiedLOF 84 0.50000 0.48599 0.40401 0.38731 0.50000 0.48599 0.95406
SimplifiedLOF 95 0.50000 0.48599 0.40969 0.39315 0.52632 0.51305 0.95406
SimplifiedLOF 100 0.50000 0.48599 0.40976 0.39323 0.52632 0.51305 0.95350
LoOP 87 0.50000 0.48599 0.37476 0.35725 0.50000 0.48599 0.94706
LoOP 99 0.50000 0.48599 0.39251 0.37550 0.50000 0.48599 0.94874
LDOF 74 0.40000 0.38319 0.36964 0.35199 0.43478 0.41895 0.93838
LDOF 85 0.40000 0.38319 0.36268 0.34483 0.43478 0.41895 0.94398
LDOF 100 0.40000 0.38319 0.37471 0.35719 0.45455 0.43927 0.94342
ODIN 50 0.30000 0.28039 0.20049 0.17810 0.34483 0.32648 0.91779
ODIN 95 0.30000 0.28039 0.24527 0.22413 0.42857 0.41257 0.93543
ODIN 100 0.30000 0.28039 0.25263 0.23170 0.42857 0.41257 0.93810
FastABOD 25 0.50000 0.48599 0.30564 0.28619 0.50000 0.48599 0.94902
FastABOD 67 0.50000 0.48599 0.35186 0.33371 0.50000 0.48599 0.95854
FastABOD 100 0.50000 0.48599 0.35340 0.33529 0.50000 0.48599 0.95854
KDEOS 28 0.10000 0.07479 0.05435 0.02786 0.11465 0.08985 0.70700
KDEOS 84 0.00000 -0.02801 0.10915 0.08420 0.27907 0.25888 0.86303
LDF 5 0.60000 0.58880 0.36025 0.34233 0.60000 0.58880 0.88095
LDF 42 0.50000 0.48599 0.41747 0.40115 0.52632 0.51305 0.94622
LDF 100 0.50000 0.48599 0.40179 0.38503 0.50000 0.48599 0.95014
INFLO 81 0.50000 0.48599 0.38986 0.37277 0.50000 0.48599 0.95210
INFLO 83 0.50000 0.48599 0.39577 0.37885 0.50000 0.48599 0.95350
INFLO 86 0.50000 0.48599 0.39936 0.38253 0.52632 0.51305 0.95350
INFLO 99 0.50000 0.48599 0.40106 0.38428 0.52632 0.51305 0.95210
COF 2 0.20000 0.17759 0.08075 0.05500 0.21053 0.18841 0.68852
COF 24 0.20000 0.17759 0.23297 0.21148 0.30769 0.28830 0.88207
COF 64 0.10000 0.07479 0.19973 0.17732 0.40000 0.38319 0.91289
COF 69 0.20000 0.17759 0.21104 0.18894 0.38710 0.36993 0.92241

Plots

Precision at n
Adjusted precision at n
Average precision
Adjusted average precision
Maximum F1 score
Adjusted maximum F1 score
ROC AUC
Diversity
A: KNN, B: KNNW, C: LOF, D: SimplifiedLOF, E: LoOP, F: LDOF
G: ODIN, H: KDEOS, I: COF, J: FastABOD, K: LDF, L: INFLO

Not normalized, without duplicates

This version contains 30 attributes, 367 objects, 10 outliers (2.72%)

Download raw algorithm results (3.1 MB) Download raw algorithm evaluation table (32.2 kB)

Best Parameters

The following table contains the best (overall and per-method) results for each method and evaluation measure (when the same score was achieved twice, only the smallest k is given).
The Maximum F1-Measure is complimentary in addition to the measures in the original publication.

Algorithm k P@n Adj. P@n AP Adj. AP Max-F1 Adj. MF1 ROC AUC
KNN 2 0.80000 0.79440 0.78166 0.77555 0.80000 0.79440 0.98263
KNN 5 0.80000 0.79440 0.86588 0.86212 0.84211 0.83768 0.98207
KNN 10 0.80000 0.79440 0.84873 0.84449 0.85714 0.85314 0.98067
KNN 45 0.70000 0.69160 0.85035 0.84616 0.81818 0.81309 0.99104
KNNW 4 0.80000 0.79440 0.80015 0.79455 0.80000 0.79440 0.98431
KNNW 6 0.80000 0.79440 0.83965 0.83516 0.84211 0.83768 0.98431
KNNW 9 0.80000 0.79440 0.86726 0.86354 0.84211 0.83768 0.98375
KNNW 62 0.70000 0.69160 0.84872 0.84448 0.81818 0.81309 0.99048
LOF 18 0.70000 0.69160 0.79816 0.79251 0.81818 0.81309 0.92241
LOF 20 0.80000 0.79440 0.82194 0.81695 0.81818 0.81309 0.91933
LOF 61 0.70000 0.69160 0.84149 0.83705 0.78261 0.77652 0.98964
SimplifiedLOF 28 0.80000 0.79440 0.81654 0.81140 0.80000 0.79440 0.92689
SimplifiedLOF 29 0.80000 0.79440 0.83511 0.83049 0.84211 0.83768 0.92661
SimplifiedLOF 35 0.80000 0.79440 0.83530 0.83069 0.84211 0.83768 0.93025
SimplifiedLOF 93 0.70000 0.69160 0.83184 0.82713 0.78261 0.77652 0.98768
LoOP 34 0.70000 0.69160 0.71106 0.70296 0.70000 0.69160 0.91541
LoOP 52 0.70000 0.69160 0.80820 0.80282 0.81818 0.81309 0.96359
LoOP 82 0.70000 0.69160 0.81900 0.81393 0.78261 0.77652 0.98543
LDOF 39 0.70000 0.69160 0.65344 0.64373 0.70000 0.69160 0.94286
LDOF 45 0.70000 0.69160 0.66631 0.65696 0.78261 0.77652 0.94790
LDOF 86 0.70000 0.69160 0.80797 0.80259 0.76190 0.75524 0.98403
ODIN 50 0.70000 0.69160 0.74657 0.73947 0.70000 0.69160 0.95896
ODIN 76 0.70000 0.69160 0.82872 0.82392 0.78261 0.77652 0.98585
ODIN 79 0.70000 0.69160 0.82952 0.82474 0.78261 0.77652 0.98641
ODIN 82 0.70000 0.69160 0.82500 0.82010 0.77778 0.77155 0.98669
FastABOD 4 0.60000 0.58880 0.77681 0.77056 0.66667 0.65733 0.98936
FastABOD 8 0.60000 0.58880 0.78391 0.77785 0.69565 0.68713 0.98992
KDEOS 4 0.10000 0.07479 0.08320 0.05752 0.16667 0.14332 0.55434
KDEOS 100 0.00000 -0.02801 0.10765 0.08266 0.30508 0.28562 0.87479
LDF 27 0.80000 0.79440 0.82687 0.82202 0.80000 0.79440 0.93725
LDF 38 0.80000 0.79440 0.85697 0.85297 0.84211 0.83768 0.98992
LDF 41 0.80000 0.79440 0.86488 0.86110 0.84211 0.83768 0.99104
LDF 55 0.70000 0.69160 0.85940 0.85546 0.81818 0.81309 0.99328
INFLO 45 0.80000 0.79440 0.81401 0.80880 0.80000 0.79440 0.98319
INFLO 81 0.70000 0.69160 0.82783 0.82301 0.77778 0.77155 0.98852
COF 18 0.80000 0.79440 0.67795 0.66893 0.80000 0.79440 0.95042
COF 32 0.80000 0.79440 0.75772 0.75093 0.85714 0.85314 0.92409
COF 64 0.80000 0.79440 0.86718 0.86346 0.85714 0.85314 0.98936
COF 84 0.80000 0.79440 0.83379 0.82913 0.85714 0.85314 0.99132

Plots

Precision at n
Adjusted precision at n
Average precision
Adjusted average precision
Maximum F1 score
Adjusted maximum F1 score
ROC AUC
Diversity
A: KNN, B: KNNW, C: LOF, D: SimplifiedLOF, E: LoOP, F: LDOF
G: ODIN, H: KDEOS, I: COF, J: FastABOD, K: LDF, L: INFLO