Supplementary Material for
On the Evaluation of Unsupervised Outlier Detection: Measures, Datasets, and an Empirical Study
by G. O. Campos, A. Zimek, J. Sander, R. J. G. B. Campello, B. Micenková, E. Schubert, I. Assent and M. E. Houle
Data Mining and Knowledge Discovery 30(4): 891-927, 2016, DOI: 10.1007/s10618-015-0444-8

WBC (version#08)

This dataset consists of examples of different cancer types, benign or malignant. Examples of benign cancer are considered inliers, examples of malignant cancer are considered outliers. After downsampling the outliers, following Schubert et al. [1], 10 outliers remain. 234 instances are duplicates (231 inliers and 3 outliers), therefore 229 outliers were removed from the data set with duplicates and 226 outliers from the dataset without duplicates. Furthermore, we removed 16 instances with missing values, two of them being outliers and 14 inliers. The processed data set has 9 numeric attributes and 454 instances, namely 10 outliers (2.2%) and 444 inliers (97.8%). The same pre-processing has also been applied in [2] and [3].

References:

[1] E. Schubert, R. Wojdanowski, A. Zimek, and H.-P. Kriegel. On evaluation of outlier rankings and outlier scores. In Proc. SDM, pages 1047-1058, 2012.
[2] A. Zimek, M. Gaudet, R. J. G. B. Campello, and J. Sander. Subsampling for efficient and effective unsupervised outlier detection ensembles. In Proc. KDD, pages 428-436, 2013.
[3] H.-P. Kriegel, P. Kroeger, E. Schubert, and A. Zimek. Interpreting and unifying outlier scores. In Proc. SDM, pages 13-24, 2011.

Download all data set variants used (57.1 kB). You can also access the original data. (breast-cancer-wisconsin.data)

Normalized, without duplicates

This version contains 9 attributes, 223 objects, 10 outliers (4.48%)

Download raw algorithm results (1.7 MB) Download raw algorithm evaluation table (32.6 kB)

Best Parameters

The following table contains the best (overall and per-method) results for each method and evaluation measure (when the same score was achieved twice, only the smallest k is given).
The Maximum F1-Measure is complimentary in addition to the measures in the original publication.

Algorithm k P@n Adj. P@n AP Adj. AP Max-F1 Adj. MF1 ROC AUC
KNN 1 0.80000 0.79061 0.89448 0.88953 0.88889 0.88367 0.98826
KNN 6 0.90000 0.89531 0.91703 0.91313 0.90000 0.89531 0.98732
KNN 11 0.90000 0.89531 0.92703 0.92360 0.94737 0.94490 0.98756
KNNW 8 0.90000 0.89531 0.91857 0.91475 0.90000 0.89531 0.98779
KNNW 11 0.90000 0.89531 0.92778 0.92439 0.94737 0.94490 0.98779
KNNW 13 0.90000 0.89531 0.92857 0.92522 0.94737 0.94490 0.98826
LOF 56 0.90000 0.89531 0.92500 0.92148 0.94737 0.94490 0.98592
LOF 96 0.90000 0.89531 0.92703 0.92360 0.94737 0.94490 0.98732
SimplifiedLOF 73 0.90000 0.89531 0.92381 0.92023 0.94737 0.94490 0.98498
SimplifiedLOF 79 0.90000 0.89531 0.92439 0.92084 0.94737 0.94490 0.98545
LoOP 86 0.50000 0.47653 0.69196 0.67750 0.75000 0.73826 0.97230
LoOP 94 0.70000 0.68592 0.74772 0.73587 0.75000 0.73826 0.97512
LoOP 100 0.70000 0.68592 0.78519 0.77511 0.75000 0.73826 0.97746
LDOF 72 0.40000 0.37183 0.36160 0.33163 0.50000 0.47653 0.94930
LDOF 100 0.40000 0.37183 0.57736 0.55752 0.60870 0.59032 0.96385
ODIN 73 0.40000 0.37183 0.38483 0.35595 0.64286 0.62609 0.95094
ODIN 83 0.40000 0.37183 0.43981 0.41351 0.64286 0.62609 0.95845
ODIN 84 0.45000 0.42418 0.42973 0.40296 0.60870 0.59032 0.95728
FastABOD 14 0.90000 0.89531 0.92571 0.92223 0.90000 0.89531 0.99108
FastABOD 15 0.90000 0.89531 0.93571 0.93270 0.94737 0.94490 0.99155
FastABOD 41 0.90000 0.89531 0.93704 0.93408 0.94737 0.94490 0.99202
KDEOS 9 0.10000 0.05775 0.06167 0.01761 0.12963 0.08877 0.58216
KDEOS 13 0.00000 -0.04695 0.06875 0.02503 0.13861 0.09817 0.63803
KDEOS 99 0.00000 -0.04695 0.05554 0.01120 0.16949 0.13050 0.58967
LDF 28 0.90000 0.89531 0.86246 0.85601 0.90000 0.89531 0.98498
LDF 32 0.90000 0.89531 0.92703 0.92360 0.94737 0.94490 0.98732
LDF 97 0.90000 0.89531 0.92857 0.92522 0.94737 0.94490 0.98826
INFLO 82 0.90000 0.89531 0.90214 0.89755 0.90000 0.89531 0.98357
INFLO 84 0.90000 0.89531 0.92326 0.91965 0.94737 0.94490 0.98451
INFLO 94 0.90000 0.89531 0.92439 0.92084 0.94737 0.94490 0.98545
COF 54 0.80000 0.79061 0.79380 0.78412 0.85714 0.85044 0.98263
COF 92 0.80000 0.79061 0.87556 0.86972 0.84211 0.83469 0.98967

Plots

Precision at n
Adjusted precision at n
Average precision
Adjusted average precision
Maximum F1 score
Adjusted maximum F1 score
ROC AUC
Diversity
A: KNN, B: KNNW, C: LOF, D: SimplifiedLOF, E: LoOP, F: LDOF
G: ODIN, H: KDEOS, I: COF, J: FastABOD, K: LDF, L: INFLO

Normalized, duplicates

This version contains 9 attributes, 454 objects, 10 outliers (2.20%)

Download raw algorithm results (1.9 MB) Download raw algorithm evaluation table (41.0 kB)

Best Parameters

The following table contains the best (overall and per-method) results for each method and evaluation measure (when the same score was achieved twice, only the smallest k is given).
The Maximum F1-Measure is complimentary in addition to the measures in the original publication.

Algorithm k P@n Adj. P@n AP Adj. AP Max-F1 Adj. MF1 ROC AUC
KNN 14 0.70000 0.69324 0.86541 0.86238 0.82353 0.81955 0.99493
KNN 26 0.80000 0.79550 0.88013 0.87743 0.80000 0.79550 0.99595
KNN 53 0.80000 0.79550 0.88306 0.88042 0.80000 0.79550 0.99617
KNNW 1 0.70000 0.69324 0.83131 0.82751 0.77778 0.77277 0.99291
KNNW 50 0.70000 0.69324 0.87286 0.87000 0.77778 0.77277 0.99572
LOF 85 0.50000 0.48874 0.47312 0.46125 0.69565 0.68880 0.98716
LOF 96 0.60000 0.59099 0.65579 0.64804 0.66667 0.65916 0.99032
LOF 98 0.60000 0.59099 0.67139 0.66399 0.66667 0.65916 0.99077
SimplifiedLOF 91 0.40000 0.38649 0.40018 0.38667 0.64286 0.63481 0.98176
SimplifiedLOF 96 0.50000 0.48874 0.41455 0.40137 0.62069 0.61215 0.98266
SimplifiedLOF 100 0.50000 0.48874 0.43351 0.42076 0.64286 0.63481 0.98423
LoOP 90 0.40000 0.38649 0.29428 0.27838 0.46667 0.45465 0.97005
LoOP 98 0.40000 0.38649 0.33552 0.32056 0.53846 0.52807 0.97523
LoOP 100 0.40000 0.38649 0.33969 0.32482 0.53846 0.52807 0.97590
LDOF 85 0.20000 0.18198 0.19679 0.17870 0.34043 0.32557 0.94459
LDOF 100 0.20000 0.18198 0.25327 0.23646 0.43243 0.41965 0.96104
ODIN 96 0.35714 0.34266 0.30839 0.29282 0.47059 0.45866 0.97399
ODIN 100 0.35714 0.34266 0.31173 0.29622 0.47059 0.45866 0.97432
FastABOD 28 0.70000 0.69324 0.81976 0.81570 0.75000 0.74437 0.99302
FastABOD 85 0.70000 0.69324 0.83786 0.83421 0.75000 0.74437 0.99414
FastABOD 93 0.70000 0.69324 0.83879 0.83516 0.75000 0.74437 0.99414
KDEOS 2 0.00000 -0.02252 0.11154 0.09153 0.22785 0.21046 0.83795
KDEOS 8 0.30000 0.28423 0.13886 0.11946 0.33333 0.31832 0.73986
KDEOS 9 0.30000 0.28423 0.15938 0.14044 0.30000 0.28423 0.74910
LDF 96 0.70000 0.69324 0.73686 0.73094 0.73684 0.73092 0.99324
LDF 98 0.70000 0.69324 0.77020 0.76502 0.73684 0.73092 0.99347
INFLO 88 0.50000 0.48874 0.43963 0.42701 0.62069 0.61215 0.98491
INFLO 96 0.50000 0.48874 0.46907 0.45711 0.62069 0.61215 0.98604
INFLO 100 0.50000 0.48874 0.47937 0.46764 0.64286 0.63481 0.98604
COF 86 0.20000 0.18198 0.27303 0.25666 0.48649 0.47492 0.97005
COF 87 0.20000 0.18198 0.28632 0.27025 0.48276 0.47111 0.97140
COF 89 0.30000 0.28423 0.30157 0.28584 0.47368 0.46183 0.97140

Plots

Precision at n
Adjusted precision at n
Average precision
Adjusted average precision
Maximum F1 score
Adjusted maximum F1 score
ROC AUC
Diversity
A: KNN, B: KNNW, C: LOF, D: SimplifiedLOF, E: LoOP, F: LDOF
G: ODIN, H: KDEOS, I: COF, J: FastABOD, K: LDF, L: INFLO

Not normalized, without duplicates

This version contains 9 attributes, 223 objects, 10 outliers (4.48%)

Download raw algorithm results (1.6 MB) Download raw algorithm evaluation table (33.1 kB)

Best Parameters

The following table contains the best (overall and per-method) results for each method and evaluation measure (when the same score was achieved twice, only the smallest k is given).
The Maximum F1-Measure is complimentary in addition to the measures in the original publication.

Algorithm k P@n Adj. P@n AP Adj. AP Max-F1 Adj. MF1 ROC AUC
KNN 8 0.90000 0.89531 0.92030 0.91656 0.90000 0.89531 0.98944
KNN 9 0.90000 0.89531 0.93030 0.92703 0.94737 0.94490 0.98967
KNN 21 0.90000 0.89531 0.93125 0.92802 0.94737 0.94490 0.98967
KNNW 9 0.90000 0.89531 0.92125 0.91755 0.90000 0.89531 0.98920
KNNW 12 0.90000 0.89531 0.93226 0.92908 0.94737 0.94490 0.99014
LOF 54 0.90000 0.89531 0.90389 0.89938 0.90000 0.89531 0.98498
LOF 56 0.90000 0.89531 0.92500 0.92148 0.94737 0.94490 0.98592
LOF 99 0.90000 0.89531 0.93125 0.92802 0.94737 0.94490 0.98967
SimplifiedLOF 80 0.90000 0.89531 0.91439 0.91037 0.90000 0.89531 0.98498
SimplifiedLOF 82 0.90000 0.89531 0.92500 0.92148 0.94737 0.94490 0.98592
SimplifiedLOF 97 0.90000 0.89531 0.92703 0.92360 0.94737 0.94490 0.98732
LoOP 92 0.50000 0.47653 0.72501 0.71210 0.75000 0.73826 0.97418
LoOP 93 0.60000 0.58122 0.73105 0.71842 0.75000 0.73826 0.97512
LoOP 98 0.60000 0.58122 0.74059 0.72841 0.75000 0.73826 0.97606
LDOF 65 0.50000 0.47653 0.37618 0.34690 0.50000 0.47653 0.94930
LDOF 95 0.40000 0.37183 0.59217 0.57302 0.66667 0.65102 0.96854
LDOF 97 0.40000 0.37183 0.60061 0.58186 0.66667 0.65102 0.96948
ODIN 96 0.43333 0.40673 0.40987 0.38216 0.61538 0.59733 0.95657
ODIN 99 0.50000 0.47653 0.41321 0.38566 0.61538 0.59733 0.95798
ODIN 100 0.50000 0.47653 0.41321 0.38566 0.61538 0.59733 0.95822
FastABOD 6 0.90000 0.89531 0.93000 0.92671 0.90000 0.89531 0.99249
FastABOD 7 0.90000 0.89531 0.93167 0.92846 0.90000 0.89531 0.99296
FastABOD 42 0.90000 0.89531 0.93704 0.93408 0.94737 0.94490 0.99202
FastABOD 67 0.90000 0.89531 0.94000 0.93718 0.94737 0.94490 0.99296
KDEOS 9 0.10000 0.05775 0.06399 0.02005 0.12389 0.08276 0.57606
KDEOS 12 0.10000 0.05775 0.06879 0.02507 0.13462 0.09399 0.61502
KDEOS 13 0.00000 -0.04695 0.06522 0.02133 0.12857 0.08766 0.61784
KDEOS 100 0.00000 -0.04695 0.05386 0.00944 0.16529 0.12610 0.57418
LDF 31 0.90000 0.89531 0.90919 0.90493 0.90000 0.89531 0.98826
LDF 36 0.90000 0.89531 0.92941 0.92610 0.94737 0.94490 0.98873
LDF 91 0.90000 0.89531 0.93333 0.93020 0.94737 0.94490 0.99061
INFLO 83 0.90000 0.89531 0.89270 0.88767 0.90000 0.89531 0.98545
INFLO 84 0.90000 0.89531 0.92564 0.92215 0.94737 0.94490 0.98638
INFLO 97 0.90000 0.89531 0.92703 0.92360 0.94737 0.94490 0.98732
COF 51 0.70000 0.68592 0.67717 0.66202 0.81818 0.80965 0.97981
COF 53 0.60000 0.58122 0.83086 0.82292 0.75000 0.73826 0.98732
COF 70 0.80000 0.79061 0.82106 0.81266 0.80000 0.79061 0.98122
COF 93 0.70000 0.68592 0.84299 0.83562 0.78261 0.77240 0.98028

Plots

Precision at n
Adjusted precision at n
Average precision
Adjusted average precision
Maximum F1 score
Adjusted maximum F1 score
ROC AUC
Diversity
A: KNN, B: KNNW, C: LOF, D: SimplifiedLOF, E: LoOP, F: LDOF
G: ODIN, H: KDEOS, I: COF, J: FastABOD, K: LDF, L: INFLO

Not normalized, duplicates

This version contains 9 attributes, 454 objects, 10 outliers (2.20%)

Download raw algorithm results (1.9 MB) Download raw algorithm evaluation table (40.5 kB)

Best Parameters

The following table contains the best (overall and per-method) results for each method and evaluation measure (when the same score was achieved twice, only the smallest k is given).
The Maximum F1-Measure is complimentary in addition to the measures in the original publication.

Algorithm k P@n Adj. P@n AP Adj. AP Max-F1 Adj. MF1 ROC AUC
KNN 11 0.70000 0.69324 0.89013 0.88766 0.81818 0.81409 0.99640
KNN 14 0.70000 0.69324 0.87930 0.87658 0.82353 0.81955 0.99595
KNN 26 0.80000 0.79550 0.88306 0.88042 0.80000 0.79550 0.99617
KNNW 1 0.70000 0.69324 0.81944 0.81538 0.75000 0.74437 0.99268
KNNW 2 0.70000 0.69324 0.81488 0.81071 0.77778 0.77277 0.99234
KNNW 50 0.70000 0.69324 0.87578 0.87299 0.77778 0.77277 0.99595
LOF 96 0.70000 0.69324 0.78850 0.78373 0.73684 0.73092 0.99257
LOF 98 0.70000 0.69324 0.79921 0.79469 0.73684 0.73092 0.99279
SimplifiedLOF 82 0.50000 0.48874 0.36074 0.34635 0.56000 0.55009 0.97635
SimplifiedLOF 100 0.50000 0.48874 0.49963 0.48836 0.66667 0.65916 0.98649
LoOP 92 0.40000 0.38649 0.31892 0.30359 0.50000 0.48874 0.97275
LoOP 98 0.40000 0.38649 0.34960 0.33495 0.53846 0.52807 0.97635
LoOP 100 0.40000 0.38649 0.35037 0.33574 0.53846 0.52807 0.97635
LDOF 86 0.20000 0.18198 0.19025 0.17201 0.32000 0.30468 0.94369
LDOF 100 0.20000 0.18198 0.25748 0.24075 0.43243 0.41965 0.96171
ODIN 96 0.35714 0.34266 0.30839 0.29282 0.47059 0.45866 0.97399
ODIN 100 0.35714 0.34266 0.31173 0.29622 0.47059 0.45866 0.97432
FastABOD 28 0.80000 0.79550 0.85277 0.84945 0.80000 0.79550 0.99437
FastABOD 30 0.80000 0.79550 0.85540 0.85214 0.80000 0.79550 0.99459
KDEOS 2 0.00000 -0.02252 0.09218 0.07173 0.19355 0.17539 0.80518
LDF 96 0.80000 0.79550 0.88762 0.88509 0.82353 0.81955 0.99572
INFLO 82 0.50000 0.48874 0.39310 0.37943 0.58333 0.57395 0.98063
INFLO 100 0.50000 0.48874 0.61068 0.60191 0.66667 0.65916 0.98851
COF 86 0.30000 0.28423 0.33786 0.32294 0.58065 0.57120 0.97703
COF 87 0.20000 0.18198 0.33375 0.31875 0.59259 0.58342 0.97680
COF 89 0.50000 0.48874 0.36477 0.35046 0.57143 0.56178 0.97568
COF 92 0.30000 0.28423 0.45531 0.44305 0.51282 0.50185 0.97477

Plots

Precision at n
Adjusted precision at n
Average precision
Adjusted average precision
Maximum F1 score
Adjusted maximum F1 score
ROC AUC
Diversity
A: KNN, B: KNNW, C: LOF, D: SimplifiedLOF, E: LoOP, F: LDOF
G: ODIN, H: KDEOS, I: COF, J: FastABOD, K: LDF, L: INFLO