Supplementary Material for
On the Evaluation of Unsupervised Outlier Detection: Measures, Datasets, and an Empirical Study
by G. O. Campos, A. Zimek, J. Sander, R. J. G. B. Campello, B. Micenková, E. Schubert, I. Assent and M. E. Houle
Data Mining and Knowledge Discovery 30(4): 891-927, 2016, DOI: 10.1007/s10618-015-0444-8

WBC (version#02)

This dataset consists of examples of different cancer types, benign or malignant. Examples of benign cancer are considered inliers, examples of malignant cancer are considered outliers. After downsampling the outliers, following Schubert et al. [1], 10 outliers remain. 234 instances are duplicates (231 inliers and 3 outliers), therefore 229 outliers were removed from the data set with duplicates and 226 outliers from the dataset without duplicates. Furthermore, we removed 16 instances with missing values, two of them being outliers and 14 inliers. The processed data set has 9 numeric attributes and 454 instances, namely 10 outliers (2.2%) and 444 inliers (97.8%). The same pre-processing has also been applied in [2] and [3].

References:

[1] E. Schubert, R. Wojdanowski, A. Zimek, and H.-P. Kriegel. On evaluation of outlier rankings and outlier scores. In Proc. SDM, pages 1047-1058, 2012.
[2] A. Zimek, M. Gaudet, R. J. G. B. Campello, and J. Sander. Subsampling for efficient and effective unsupervised outlier detection ensembles. In Proc. KDD, pages 428-436, 2013.
[3] H.-P. Kriegel, P. Kroeger, E. Schubert, and A. Zimek. Interpreting and unifying outlier scores. In Proc. SDM, pages 13-24, 2011.

Download all data set variants used (57.1 kB). You can also access the original data. (breast-cancer-wisconsin.data)

Normalized, without duplicates

This version contains 9 attributes, 223 objects, 10 outliers (4.48%)

Download raw algorithm results (1.6 MB) Download raw algorithm evaluation table (32.9 kB)

Best Parameters

The following table contains the best (overall and per-method) results for each method and evaluation measure (when the same score was achieved twice, only the smallest k is given).
The Maximum F1-Measure is complimentary in addition to the measures in the original publication.

Algorithm k P@n Adj. P@n AP Adj. AP Max-F1 Adj. MF1 ROC AUC
KNN 2 0.80000 0.79061 0.89812 0.89334 0.84211 0.83469 0.99085
KNN 4 0.80000 0.79061 0.89107 0.88596 0.84211 0.83469 0.99202
KNNW 2 0.80000 0.79061 0.85937 0.85277 0.82353 0.81524 0.98545
KNNW 3 0.80000 0.79061 0.87235 0.86636 0.84211 0.83469 0.98685
KNNW 6 0.80000 0.79061 0.88170 0.87615 0.82353 0.81524 0.99014
KNNW 7 0.80000 0.79061 0.88728 0.88199 0.84211 0.83469 0.99014
LOF 49 0.80000 0.79061 0.80751 0.79848 0.80000 0.79061 0.98545
LOF 58 0.80000 0.79061 0.87626 0.87045 0.84211 0.83469 0.98779
LOF 66 0.80000 0.79061 0.87735 0.87159 0.84211 0.83469 0.98779
SimplifiedLOF 65 0.80000 0.79061 0.80442 0.79524 0.80000 0.79061 0.98310
SimplifiedLOF 75 0.70000 0.68592 0.85013 0.84309 0.82353 0.81524 0.98545
SimplifiedLOF 84 0.70000 0.68592 0.85152 0.84455 0.82353 0.81524 0.98592
LoOP 99 0.70000 0.68592 0.76069 0.74945 0.72727 0.71447 0.98075
LoOP 100 0.70000 0.68592 0.76846 0.75759 0.73684 0.72449 0.98122
LDOF 64 0.50000 0.47653 0.38156 0.35252 0.56000 0.53934 0.94366
LDOF 88 0.50000 0.47653 0.58546 0.56599 0.66667 0.65102 0.96573
LDOF 92 0.50000 0.47653 0.60372 0.58511 0.66667 0.65102 0.96573
LDOF 98 0.50000 0.47653 0.60212 0.58344 0.66667 0.65102 0.96667
ODIN 96 0.45000 0.42418 0.46991 0.44502 0.57143 0.55131 0.96244
ODIN 97 0.50000 0.47653 0.48824 0.46421 0.57143 0.55131 0.96385
ODIN 99 0.50000 0.47653 0.49154 0.46766 0.57143 0.55131 0.96479
FastABOD 5 0.70000 0.68592 0.53807 0.51638 0.70000 0.68592 0.96995
FastABOD 7 0.70000 0.68592 0.82509 0.81688 0.82353 0.81524 0.97981
FastABOD 86 0.70000 0.68592 0.84113 0.83367 0.82353 0.81524 0.98310
KDEOS 10 0.00000 -0.04695 0.07778 0.03449 0.18750 0.14935 0.65117
KDEOS 12 0.10000 0.05775 0.07735 0.03403 0.14925 0.10931 0.67277
KDEOS 14 0.00000 -0.04695 0.08060 0.03743 0.15842 0.11890 0.68967
KDEOS 17 0.00000 -0.04695 0.07664 0.03329 0.17978 0.14127 0.69061
LDF 25 0.70000 0.68592 0.61036 0.59206 0.76190 0.75073 0.97746
LDF 30 0.70000 0.68592 0.84654 0.83933 0.82353 0.81524 0.98545
LDF 39 0.70000 0.68592 0.86014 0.85358 0.82353 0.81524 0.98779
INFLO 65 0.80000 0.79061 0.81079 0.80191 0.80000 0.79061 0.98216
INFLO 86 0.80000 0.79061 0.86737 0.86114 0.82353 0.81524 0.98732
INFLO 98 0.80000 0.79061 0.88132 0.87575 0.88889 0.88367 0.98685
COF 72 0.80000 0.79061 0.85879 0.85216 0.80000 0.79061 0.99014

Plots

Precision at n
Adjusted precision at n
Average precision
Adjusted average precision
Maximum F1 score
Adjusted maximum F1 score
ROC AUC
Diversity
A: KNN, B: KNNW, C: LOF, D: SimplifiedLOF, E: LoOP, F: LDOF
G: ODIN, H: KDEOS, I: COF, J: FastABOD, K: LDF, L: INFLO

Normalized, duplicates

This version contains 9 attributes, 454 objects, 10 outliers (2.20%)

Download raw algorithm results (2.1 MB) Download raw algorithm evaluation table (42.9 kB)

Best Parameters

The following table contains the best (overall and per-method) results for each method and evaluation measure (when the same score was achieved twice, only the smallest k is given).
The Maximum F1-Measure is complimentary in addition to the measures in the original publication.

Algorithm k P@n Adj. P@n AP Adj. AP Max-F1 Adj. MF1 ROC AUC
KNN 35 0.80000 0.79550 0.86778 0.86480 0.80000 0.79550 0.99550
KNN 87 0.80000 0.79550 0.87762 0.87486 0.80000 0.79550 0.99617
KNNW 23 0.70000 0.69324 0.78368 0.77881 0.70588 0.69926 0.99077
KNNW 58 0.70000 0.69324 0.85221 0.84888 0.76190 0.75654 0.99459
KNNW 89 0.70000 0.69324 0.86051 0.85736 0.76190 0.75654 0.99527
LOF 98 0.60000 0.59099 0.68108 0.67389 0.64286 0.63481 0.98874
LOF 100 0.60000 0.59099 0.73363 0.72763 0.66667 0.65916 0.99032
SimplifiedLOF 85 0.40000 0.38649 0.27195 0.25555 0.40000 0.38649 0.96464
SimplifiedLOF 93 0.40000 0.38649 0.32865 0.31352 0.50000 0.48874 0.97230
SimplifiedLOF 100 0.40000 0.38649 0.36537 0.35108 0.50000 0.48874 0.97793
LoOP 85 0.30000 0.28423 0.20513 0.18723 0.34146 0.32663 0.94865
LoOP 97 0.30000 0.28423 0.24084 0.22375 0.40000 0.38649 0.95788
LoOP 100 0.30000 0.28423 0.24828 0.23135 0.36364 0.34930 0.95968
LDOF 98 0.20000 0.18198 0.16069 0.14178 0.26667 0.25015 0.92815
LDOF 100 0.10000 0.07973 0.17082 0.15215 0.29630 0.28045 0.93288
ODIN 77 0.30000 0.28423 0.17263 0.15400 0.30000 0.28423 0.93615
ODIN 98 0.30000 0.28423 0.24467 0.22766 0.40816 0.39483 0.96149
FastABOD 28 0.50000 0.48874 0.70705 0.70045 0.66667 0.65916 0.98581
FastABOD 31 0.60000 0.59099 0.71155 0.70506 0.66667 0.65916 0.98581
FastABOD 87 0.60000 0.59099 0.73926 0.73339 0.66667 0.65916 0.98761
KDEOS 2 0.00000 -0.02252 0.04914 0.02772 0.14458 0.12531 0.66948
KDEOS 3 0.00000 -0.02252 0.05769 0.03646 0.16667 0.14790 0.77297
KDEOS 20 0.00000 -0.02252 0.05862 0.03742 0.15789 0.13893 0.69054
LDF 93 0.70000 0.69324 0.73823 0.73233 0.70588 0.69926 0.99167
LDF 100 0.70000 0.69324 0.79967 0.79516 0.72000 0.71369 0.99324
INFLO 75 0.40000 0.38649 0.29588 0.28002 0.43750 0.42483 0.96779
INFLO 94 0.40000 0.38649 0.43612 0.42342 0.54545 0.53522 0.98131
INFLO 100 0.40000 0.38649 0.51602 0.50512 0.54054 0.53019 0.98243
COF 90 0.70000 0.69324 0.61166 0.60291 0.76190 0.75654 0.99144
COF 97 0.70000 0.69324 0.78055 0.77561 0.74074 0.73490 0.99392
COF 100 0.50000 0.48874 0.65981 0.65215 0.80000 0.79550 0.99167

Plots

Precision at n
Adjusted precision at n
Average precision
Adjusted average precision
Maximum F1 score
Adjusted maximum F1 score
ROC AUC
Diversity
A: KNN, B: KNNW, C: LOF, D: SimplifiedLOF, E: LoOP, F: LDOF
G: ODIN, H: KDEOS, I: COF, J: FastABOD, K: LDF, L: INFLO

Not normalized, without duplicates

This version contains 9 attributes, 223 objects, 10 outliers (4.48%)

Download raw algorithm results (1.6 MB) Download raw algorithm evaluation table (32.9 kB)

Best Parameters

The following table contains the best (overall and per-method) results for each method and evaluation measure (when the same score was achieved twice, only the smallest k is given).
The Maximum F1-Measure is complimentary in addition to the measures in the original publication.

Algorithm k P@n Adj. P@n AP Adj. AP Max-F1 Adj. MF1 ROC AUC
KNN 2 0.80000 0.79061 0.89812 0.89334 0.84211 0.83469 0.99108
KNN 4 0.80000 0.79061 0.89107 0.88596 0.84211 0.83469 0.99202
KNNW 2 0.80000 0.79061 0.85937 0.85277 0.82353 0.81524 0.98545
KNNW 3 0.80000 0.79061 0.87235 0.86636 0.84211 0.83469 0.98685
KNNW 6 0.80000 0.79061 0.88170 0.87615 0.82353 0.81524 0.99014
KNNW 7 0.80000 0.79061 0.88728 0.88199 0.84211 0.83469 0.99014
LOF 50 0.80000 0.79061 0.79815 0.78867 0.80000 0.79061 0.98357
LOF 65 0.80000 0.79061 0.87175 0.86572 0.84211 0.83469 0.98685
LOF 66 0.80000 0.79061 0.87735 0.87159 0.84211 0.83469 0.98779
LOF 86 0.70000 0.68592 0.86127 0.85475 0.82353 0.81524 0.98826
SimplifiedLOF 65 0.80000 0.79061 0.80442 0.79524 0.80000 0.79061 0.98310
SimplifiedLOF 73 0.70000 0.68592 0.85713 0.85043 0.82353 0.81524 0.98592
SimplifiedLOF 75 0.70000 0.68592 0.85404 0.84718 0.82353 0.81524 0.98638
LoOP 99 0.70000 0.68592 0.75235 0.74073 0.72727 0.71447 0.98028
LoOP 100 0.70000 0.68592 0.76846 0.75759 0.73684 0.72449 0.98122
LDOF 64 0.50000 0.47653 0.38066 0.35158 0.56000 0.53934 0.94272
LDOF 92 0.50000 0.47653 0.60672 0.58826 0.69565 0.68136 0.96526
LDOF 98 0.50000 0.47653 0.60212 0.58344 0.66667 0.65102 0.96667
ODIN 96 0.46667 0.44163 0.47369 0.44898 0.57143 0.55131 0.96291
ODIN 97 0.50000 0.47653 0.48824 0.46421 0.57143 0.55131 0.96385
ODIN 100 0.50000 0.47653 0.49154 0.46766 0.57143 0.55131 0.96502
FastABOD 6 0.80000 0.79061 0.85033 0.84330 0.82353 0.81524 0.98263
FastABOD 36 0.70000 0.68592 0.85186 0.84490 0.82353 0.81524 0.98498
KDEOS 10 0.00000 -0.04695 0.08039 0.03721 0.19355 0.15569 0.65446
KDEOS 12 0.10000 0.05775 0.07831 0.03504 0.15152 0.11168 0.67653
KDEOS 14 0.00000 -0.04695 0.08318 0.04013 0.15842 0.11890 0.69108
LDF 25 0.70000 0.68592 0.61036 0.59206 0.76190 0.75073 0.97746
LDF 30 0.70000 0.68592 0.84654 0.83933 0.82353 0.81524 0.98545
LDF 39 0.70000 0.68592 0.86014 0.85358 0.82353 0.81524 0.98779
INFLO 65 0.80000 0.79061 0.81079 0.80191 0.80000 0.79061 0.98216
INFLO 82 0.80000 0.79061 0.86583 0.85953 0.82353 0.81524 0.98685
INFLO 95 0.80000 0.79061 0.87989 0.87426 0.88889 0.88367 0.98638
INFLO 98 0.80000 0.79061 0.88132 0.87575 0.88889 0.88367 0.98685
COF 72 0.80000 0.79061 0.89181 0.88673 0.82353 0.81524 0.99202

Plots

Precision at n
Adjusted precision at n
Average precision
Adjusted average precision
Maximum F1 score
Adjusted maximum F1 score
ROC AUC
Diversity
A: KNN, B: KNNW, C: LOF, D: SimplifiedLOF, E: LoOP, F: LDOF
G: ODIN, H: KDEOS, I: COF, J: FastABOD, K: LDF, L: INFLO

Not normalized, duplicates

This version contains 9 attributes, 454 objects, 10 outliers (2.20%)

Download raw algorithm results (1.9 MB) Download raw algorithm evaluation table (43.1 kB)

Best Parameters

The following table contains the best (overall and per-method) results for each method and evaluation measure (when the same score was achieved twice, only the smallest k is given).
The Maximum F1-Measure is complimentary in addition to the measures in the original publication.

Algorithm k P@n Adj. P@n AP Adj. AP Max-F1 Adj. MF1 ROC AUC
KNN 15 0.70000 0.69324 0.87112 0.86822 0.82353 0.81955 0.99493
KNN 36 0.80000 0.79550 0.86778 0.86480 0.80000 0.79550 0.99550
KNN 68 0.80000 0.79550 0.87762 0.87486 0.80000 0.79550 0.99617
KNNW 22 0.70000 0.69324 0.76624 0.76097 0.70000 0.69324 0.99144
KNNW 52 0.70000 0.69324 0.84660 0.84314 0.76190 0.75654 0.99459
KNNW 79 0.70000 0.69324 0.86051 0.85736 0.76190 0.75654 0.99527
LOF 85 0.40000 0.38649 0.39742 0.38384 0.60000 0.59099 0.98243
LOF 92 0.60000 0.59099 0.62126 0.61273 0.60000 0.59099 0.98649
LOF 98 0.50000 0.48874 0.68268 0.67553 0.58824 0.57896 0.98761
SimplifiedLOF 89 0.40000 0.38649 0.26869 0.25222 0.40000 0.38649 0.96419
SimplifiedLOF 98 0.40000 0.38649 0.34312 0.32833 0.53846 0.52807 0.97568
LoOP 95 0.30000 0.28423 0.21955 0.20197 0.36364 0.34930 0.95113
LoOP 100 0.30000 0.28423 0.23468 0.21744 0.36364 0.34930 0.95405
LDOF 85 0.10000 0.07973 0.12424 0.10451 0.22222 0.20470 0.90450
LDOF 98 0.10000 0.07973 0.15943 0.14050 0.28571 0.26963 0.92928
LDOF 100 0.10000 0.07973 0.16005 0.14113 0.30769 0.29210 0.92860
ODIN 80 0.25000 0.23311 0.21871 0.20112 0.39216 0.37847 0.95608
ODIN 93 0.16000 0.14108 0.22927 0.21191 0.43478 0.42205 0.96273
ODIN 100 0.18000 0.16153 0.23377 0.21652 0.42553 0.41259 0.96453
FastABOD 28 0.60000 0.59099 0.68007 0.67286 0.60000 0.59099 0.98626
FastABOD 100 0.60000 0.59099 0.74162 0.73580 0.70588 0.69926 0.98874
KDEOS 2 0.00000 -0.02252 0.05389 0.03259 0.13333 0.11381 0.64088
KDEOS 3 0.00000 -0.02252 0.04340 0.02185 0.11392 0.09397 0.70946
LDF 92 0.60000 0.59099 0.80167 0.79720 0.75000 0.74437 0.99189
LDF 98 0.70000 0.69324 0.81997 0.81591 0.75000 0.74437 0.99302
INFLO 85 0.40000 0.38649 0.29115 0.27518 0.42105 0.40801 0.96847
INFLO 99 0.40000 0.38649 0.46450 0.45244 0.52941 0.51881 0.98198
INFLO 100 0.40000 0.38649 0.47343 0.46157 0.52941 0.51881 0.98176
COF 94 0.50000 0.48874 0.48438 0.47277 0.72000 0.71369 0.98604
COF 99 0.50000 0.48874 0.61839 0.60979 0.58824 0.57896 0.98536

Plots

Precision at n
Adjusted precision at n
Average precision
Adjusted average precision
Maximum F1 score
Adjusted maximum F1 score
ROC AUC
Diversity
A: KNN, B: KNNW, C: LOF, D: SimplifiedLOF, E: LoOP, F: LDOF
G: ODIN, H: KDEOS, I: COF, J: FastABOD, K: LDF, L: INFLO