Supplementary Material for
On the Evaluation of Unsupervised Outlier Detection: Measures, Datasets, and an Empirical Study
by G. O. Campos, A. Zimek, J. Sander, R. J. G. B. Campello, B. Micenková, E. Schubert, I. Assent and M. E. Houle
Data Mining and Knowledge Discovery 30(4): 891-927, 2016, DOI: 10.1007/s10618-015-0444-8

WBC (version#09)

This dataset consists of examples of different cancer types, benign or malignant. Examples of benign cancer are considered inliers, examples of malignant cancer are considered outliers. After downsampling the outliers, following Schubert et al. [1], 10 outliers remain. 234 instances are duplicates (231 inliers and 3 outliers), therefore 229 outliers were removed from the data set with duplicates and 226 outliers from the dataset without duplicates. Furthermore, we removed 16 instances with missing values, two of them being outliers and 14 inliers. The processed data set has 9 numeric attributes and 454 instances, namely 10 outliers (2.2%) and 444 inliers (97.8%). The same pre-processing has also been applied in [2] and [3].

References:

[1] E. Schubert, R. Wojdanowski, A. Zimek, and H.-P. Kriegel. On evaluation of outlier rankings and outlier scores. In Proc. SDM, pages 1047-1058, 2012.
[2] A. Zimek, M. Gaudet, R. J. G. B. Campello, and J. Sander. Subsampling for efficient and effective unsupervised outlier detection ensembles. In Proc. KDD, pages 428-436, 2013.
[3] H.-P. Kriegel, P. Kroeger, E. Schubert, and A. Zimek. Interpreting and unifying outlier scores. In Proc. SDM, pages 13-24, 2011.

Download all data set variants used (57.1 kB). You can also access the original data. (breast-cancer-wisconsin.data)

Normalized, without duplicates

This version contains 9 attributes, 223 objects, 10 outliers (4.48%)

Download raw algorithm results (1.7 MB) Download raw algorithm evaluation table (38.0 kB)

Best Parameters

The following table contains the best (overall and per-method) results for each method and evaluation measure (when the same score was achieved twice, only the smallest k is given).
The Maximum F1-Measure is complimentary in addition to the measures in the original publication.

Algorithm k P@n Adj. P@n AP Adj. AP Max-F1 Adj. MF1 ROC AUC
KNN 5 0.60000 0.58122 0.70750 0.69377 0.72727 0.71447 0.96197
KNN 10 0.60000 0.58122 0.71800 0.70476 0.66667 0.65102 0.96432
KNN 13 0.70000 0.68592 0.72321 0.71021 0.70000 0.68592 0.96385
KNN 17 0.70000 0.68592 0.72440 0.71146 0.70000 0.68592 0.96338
KNNW 21 0.70000 0.68592 0.74461 0.73262 0.70000 0.68592 0.96432
KNNW 29 0.70000 0.68592 0.72828 0.71552 0.70000 0.68592 0.96479
LOF 80 0.60000 0.58122 0.67330 0.65797 0.60000 0.58122 0.95869
LOF 83 0.60000 0.58122 0.67065 0.65519 0.64000 0.62310 0.95915
LOF 88 0.60000 0.58122 0.68757 0.67290 0.62500 0.60739 0.95915
LOF 98 0.60000 0.58122 0.68246 0.66755 0.63636 0.61929 0.96009
SimplifiedLOF 73 0.50000 0.47653 0.50977 0.48676 0.51613 0.49341 0.94742
SimplifiedLOF 87 0.50000 0.47653 0.62097 0.60318 0.58824 0.56890 0.95305
SimplifiedLOF 88 0.50000 0.47653 0.63031 0.61295 0.62500 0.60739 0.95305
SimplifiedLOF 100 0.50000 0.47653 0.63050 0.61315 0.57143 0.55131 0.95305
LoOP 77 0.40000 0.37183 0.31133 0.27899 0.42105 0.39387 0.92864
LoOP 98 0.40000 0.37183 0.45138 0.42563 0.50000 0.47653 0.93897
LoOP 99 0.40000 0.37183 0.45261 0.42691 0.50000 0.47653 0.93944
LDOF 79 0.30000 0.26714 0.23674 0.20091 0.36000 0.32995 0.90798
LDOF 98 0.30000 0.26714 0.29854 0.26561 0.38710 0.35832 0.92394
LDOF 100 0.30000 0.26714 0.29417 0.26103 0.39024 0.36162 0.92254
ODIN 96 0.40000 0.37183 0.29057 0.25727 0.43243 0.40579 0.92817
ODIN 100 0.40000 0.37183 0.29834 0.26540 0.45714 0.43166 0.92911
FastABOD 5 0.60000 0.58122 0.41999 0.39276 0.60000 0.58122 0.94789
FastABOD 9 0.60000 0.58122 0.73256 0.72000 0.75000 0.73826 0.95775
FastABOD 23 0.60000 0.58122 0.73687 0.72451 0.75000 0.73826 0.96150
KDEOS 11 0.20000 0.16244 0.12154 0.08029 0.28571 0.25218 0.69061
LDF 60 0.70000 0.68592 0.71516 0.70178 0.70000 0.68592 0.96197
LDF 80 0.70000 0.68592 0.71576 0.70241 0.70000 0.68592 0.96338
INFLO 47 0.40000 0.37183 0.28237 0.24868 0.40000 0.37183 0.91549
INFLO 98 0.40000 0.37183 0.58829 0.56896 0.55172 0.53068 0.94836
COF 47 0.50000 0.47653 0.53078 0.50875 0.63636 0.61929 0.95587
COF 50 0.60000 0.58122 0.53814 0.51645 0.63636 0.61929 0.95164
COF 62 0.50000 0.47653 0.59370 0.57462 0.60870 0.59032 0.96197

Plots

Precision at n
Adjusted precision at n
Average precision
Adjusted average precision
Maximum F1 score
Adjusted maximum F1 score
ROC AUC
Diversity
A: KNN, B: KNNW, C: LOF, D: SimplifiedLOF, E: LoOP, F: LDOF
G: ODIN, H: KDEOS, I: COF, J: FastABOD, K: LDF, L: INFLO

Normalized, duplicates

This version contains 9 attributes, 454 objects, 10 outliers (2.20%)

Download raw algorithm results (2.0 MB) Download raw algorithm evaluation table (41.0 kB)

Best Parameters

The following table contains the best (overall and per-method) results for each method and evaluation measure (when the same score was achieved twice, only the smallest k is given).
The Maximum F1-Measure is complimentary in addition to the measures in the original publication.

Algorithm k P@n Adj. P@n AP Adj. AP Max-F1 Adj. MF1 ROC AUC
KNN 1 0.60000 0.59099 0.69876 0.69198 0.66667 0.65916 0.98176
KNN 14 0.60000 0.59099 0.74615 0.74044 0.75000 0.74437 0.98468
KNN 20 0.60000 0.59099 0.75230 0.74672 0.75000 0.74437 0.98581
KNN 53 0.60000 0.59099 0.74843 0.74277 0.66667 0.65916 0.98739
KNNW 2 0.60000 0.59099 0.72092 0.71464 0.75000 0.74437 0.98176
KNNW 91 0.60000 0.59099 0.74540 0.73967 0.66667 0.65916 0.98671
LOF 68 0.50000 0.48874 0.33727 0.32234 0.56000 0.55009 0.96847
LOF 74 0.50000 0.48874 0.36975 0.35555 0.63636 0.62817 0.97207
LOF 94 0.50000 0.48874 0.52090 0.51011 0.60870 0.59988 0.97680
LOF 100 0.50000 0.48874 0.51263 0.50165 0.57143 0.56178 0.97703
SimplifiedLOF 88 0.40000 0.38649 0.31954 0.30422 0.60870 0.59988 0.96261
SimplifiedLOF 97 0.50000 0.48874 0.33837 0.32347 0.56000 0.55009 0.96667
SimplifiedLOF 98 0.50000 0.48874 0.34237 0.32756 0.56000 0.55009 0.96757
LoOP 93 0.30000 0.28423 0.27338 0.25702 0.48276 0.47111 0.96104
LoOP 95 0.30000 0.28423 0.28160 0.26541 0.51852 0.50767 0.96239
LoOP 100 0.30000 0.28423 0.29293 0.27701 0.50000 0.48874 0.96441
LDOF 100 0.20000 0.18198 0.20796 0.19012 0.36364 0.34930 0.94910
ODIN 98 0.44000 0.42739 0.30382 0.28814 0.47619 0.46439 0.95327
ODIN 100 0.44000 0.42739 0.30559 0.28995 0.47619 0.46439 0.95360
FastABOD 28 0.60000 0.59099 0.71448 0.70805 0.70588 0.69926 0.98311
FastABOD 66 0.60000 0.59099 0.71881 0.71248 0.70588 0.69926 0.98356
KDEOS 2 0.00000 -0.02252 0.08993 0.06943 0.18824 0.16995 0.76036
KDEOS 4 0.10000 0.07973 0.07113 0.05021 0.14085 0.12149 0.73176
KDEOS 9 0.00000 -0.02252 0.07534 0.05451 0.17073 0.15205 0.80518
LDF 61 0.60000 0.59099 0.39216 0.37847 0.60000 0.59099 0.97342
LDF 95 0.60000 0.59099 0.56868 0.55897 0.66667 0.65916 0.98086
LDF 98 0.60000 0.59099 0.61035 0.60157 0.63158 0.62328 0.98086
INFLO 85 0.50000 0.48874 0.33800 0.32309 0.56000 0.55009 0.96824
INFLO 88 0.50000 0.48874 0.34920 0.33454 0.58333 0.57395 0.96892
INFLO 100 0.50000 0.48874 0.39046 0.37673 0.56000 0.55009 0.97162
COF 85 0.30000 0.28423 0.31887 0.30353 0.48485 0.47325 0.97005
COF 90 0.20000 0.18198 0.43458 0.42184 0.55172 0.54163 0.97635
COF 99 0.30000 0.28423 0.35810 0.34364 0.55172 0.54163 0.97680

Plots

Precision at n
Adjusted precision at n
Average precision
Adjusted average precision
Maximum F1 score
Adjusted maximum F1 score
ROC AUC
Diversity
A: KNN, B: KNNW, C: LOF, D: SimplifiedLOF, E: LoOP, F: LDOF
G: ODIN, H: KDEOS, I: COF, J: FastABOD, K: LDF, L: INFLO

Not normalized, without duplicates

This version contains 9 attributes, 223 objects, 10 outliers (4.48%)

Download raw algorithm results (1.6 MB) Download raw algorithm evaluation table (39.0 kB)

Best Parameters

The following table contains the best (overall and per-method) results for each method and evaluation measure (when the same score was achieved twice, only the smallest k is given).
The Maximum F1-Measure is complimentary in addition to the measures in the original publication.

Algorithm k P@n Adj. P@n AP Adj. AP Max-F1 Adj. MF1 ROC AUC
KNN 3 0.60000 0.58122 0.68209 0.66717 0.70588 0.69207 0.95986
KNN 17 0.60000 0.58122 0.72807 0.71530 0.69565 0.68136 0.96690
KNN 23 0.70000 0.68592 0.72745 0.71466 0.70000 0.68592 0.96690
KNNW 2 0.60000 0.58122 0.63123 0.61392 0.60000 0.58122 0.96197
KNNW 7 0.60000 0.58122 0.71293 0.69946 0.70588 0.69207 0.96244
KNNW 10 0.60000 0.58122 0.73943 0.72719 0.70588 0.69207 0.96479
KNNW 33 0.60000 0.58122 0.72367 0.71070 0.66667 0.65102 0.96573
LOF 64 0.50000 0.47653 0.67132 0.65588 0.66667 0.65102 0.95540
LOF 70 0.60000 0.58122 0.70076 0.68671 0.66667 0.65102 0.96009
LOF 87 0.60000 0.58122 0.71456 0.70116 0.66667 0.65102 0.96197
LOF 99 0.60000 0.58122 0.70709 0.69334 0.66667 0.65102 0.96338
SimplifiedLOF 55 0.50000 0.47653 0.41202 0.38441 0.50000 0.47653 0.93803
SimplifiedLOF 99 0.50000 0.47653 0.65550 0.63932 0.62500 0.60739 0.95587
LoOP 94 0.50000 0.47653 0.44100 0.41476 0.50000 0.47653 0.94178
LoOP 100 0.50000 0.47653 0.48785 0.46380 0.50000 0.47653 0.94554
LDOF 72 0.30000 0.26714 0.24900 0.21374 0.37209 0.34261 0.91268
LDOF 89 0.30000 0.26714 0.28508 0.25152 0.43478 0.40825 0.92254
LDOF 100 0.30000 0.26714 0.36391 0.33405 0.43478 0.40825 0.92817
ODIN 98 0.23333 0.19734 0.27546 0.24144 0.44444 0.41836 0.92746
ODIN 99 0.30000 0.26714 0.28328 0.24963 0.44444 0.41836 0.92793
ODIN 100 0.30000 0.26714 0.28419 0.25058 0.44444 0.41836 0.92793
FastABOD 5 0.60000 0.58122 0.53437 0.51251 0.60870 0.59032 0.95493
FastABOD 7 0.60000 0.58122 0.71513 0.70175 0.70588 0.69207 0.96056
FastABOD 10 0.60000 0.58122 0.71988 0.70673 0.70588 0.69207 0.96244
FastABOD 68 0.60000 0.58122 0.70210 0.68811 0.66667 0.65102 0.96385
KDEOS 11 0.20000 0.16244 0.12767 0.08672 0.24000 0.20432 0.65775
KDEOS 12 0.20000 0.16244 0.13862 0.09818 0.26087 0.22617 0.63380
KDEOS 13 0.20000 0.16244 0.17504 0.13631 0.20000 0.16244 0.62582
LDF 34 0.60000 0.58122 0.65002 0.63359 0.61538 0.59733 0.96197
LDF 52 0.60000 0.58122 0.71990 0.70675 0.70588 0.69207 0.96526
LDF 71 0.60000 0.58122 0.71516 0.70178 0.66667 0.65102 0.96667
INFLO 64 0.50000 0.47653 0.44313 0.41699 0.50000 0.47653 0.93803
INFLO 94 0.50000 0.47653 0.61189 0.59367 0.57143 0.55131 0.95070
INFLO 95 0.50000 0.47653 0.61475 0.59666 0.57143 0.55131 0.95117
INFLO 99 0.40000 0.37183 0.61169 0.59345 0.57143 0.55131 0.95164
COF 60 0.60000 0.58122 0.47827 0.45378 0.63158 0.61428 0.93099
COF 63 0.40000 0.37183 0.52546 0.50318 0.60000 0.58122 0.96620
COF 67 0.60000 0.58122 0.52857 0.50644 0.66667 0.65102 0.87700
COF 69 0.60000 0.58122 0.56941 0.54920 0.63158 0.61428 0.86291

Plots

Precision at n
Adjusted precision at n
Average precision
Adjusted average precision
Maximum F1 score
Adjusted maximum F1 score
ROC AUC
Diversity
A: KNN, B: KNNW, C: LOF, D: SimplifiedLOF, E: LoOP, F: LDOF
G: ODIN, H: KDEOS, I: COF, J: FastABOD, K: LDF, L: INFLO

Not normalized, duplicates

This version contains 9 attributes, 454 objects, 10 outliers (2.20%)

Download raw algorithm results (1.9 MB) Download raw algorithm evaluation table (41.1 kB)

Best Parameters

The following table contains the best (overall and per-method) results for each method and evaluation measure (when the same score was achieved twice, only the smallest k is given).
The Maximum F1-Measure is complimentary in addition to the measures in the original publication.

Algorithm k P@n Adj. P@n AP Adj. AP Max-F1 Adj. MF1 ROC AUC
KNN 1 0.60000 0.59099 0.68280 0.67565 0.66667 0.65916 0.98198
KNN 14 0.60000 0.59099 0.75498 0.74946 0.75000 0.74437 0.98694
KNN 17 0.60000 0.59099 0.76228 0.75693 0.75000 0.74437 0.98716
KNN 57 0.60000 0.59099 0.74928 0.74363 0.66667 0.65916 0.98784
KNNW 2 0.60000 0.59099 0.67970 0.67249 0.66667 0.65916 0.98164
KNNW 3 0.60000 0.59099 0.69899 0.69221 0.70588 0.69926 0.98311
KNNW 83 0.60000 0.59099 0.75039 0.74477 0.66667 0.65916 0.98784
LOF 75 0.50000 0.48874 0.38045 0.36650 0.63636 0.62817 0.97320
LOF 89 0.60000 0.59099 0.44956 0.43716 0.63636 0.62817 0.97432
LOF 98 0.60000 0.59099 0.67548 0.66817 0.63636 0.62817 0.97883
LOF 100 0.60000 0.59099 0.68505 0.67796 0.63636 0.62817 0.97883
SimplifiedLOF 87 0.50000 0.48874 0.33440 0.31941 0.58333 0.57395 0.96126
SimplifiedLOF 90 0.50000 0.48874 0.34750 0.33280 0.60870 0.59988 0.96532
SimplifiedLOF 97 0.50000 0.48874 0.35325 0.33868 0.60870 0.59988 0.96892
SimplifiedLOF 99 0.50000 0.48874 0.35748 0.34301 0.60870 0.59988 0.96892
LoOP 100 0.40000 0.38649 0.30352 0.28784 0.53846 0.52807 0.96509
LDOF 78 0.10000 0.07973 0.15708 0.13809 0.31111 0.29560 0.93333
LDOF 99 0.10000 0.07973 0.22068 0.20312 0.41379 0.40059 0.95135
ODIN 100 0.44000 0.42739 0.31823 0.30288 0.47619 0.46439 0.95687
FastABOD 24 0.60000 0.59099 0.46470 0.45264 0.60000 0.59099 0.97928
FastABOD 35 0.60000 0.59099 0.73248 0.72645 0.75000 0.74437 0.98423
FastABOD 76 0.60000 0.59099 0.74092 0.73509 0.75000 0.74437 0.98559
KDEOS 2 0.00000 -0.02252 0.06909 0.04812 0.17021 0.15152 0.74020
LDF 62 0.60000 0.59099 0.38275 0.36885 0.60000 0.59099 0.97207
LDF 97 0.60000 0.59099 0.73314 0.72713 0.70588 0.69926 0.98356
LDF 98 0.60000 0.59099 0.73516 0.72920 0.70588 0.69926 0.98423
INFLO 78 0.50000 0.48874 0.35513 0.34060 0.57143 0.56178 0.97005
INFLO 89 0.50000 0.48874 0.37831 0.36431 0.61538 0.60672 0.97117
INFLO 97 0.50000 0.48874 0.41912 0.40604 0.60870 0.59988 0.97297
INFLO 98 0.50000 0.48874 0.42393 0.41095 0.60870 0.59988 0.97297
COF 87 0.30000 0.28423 0.38090 0.36695 0.46154 0.44941 0.97185
COF 92 0.20000 0.18198 0.34725 0.33255 0.56000 0.55009 0.97815
COF 100 0.30000 0.28423 0.41199 0.39874 0.48000 0.46829 0.96306

Plots

Precision at n
Adjusted precision at n
Average precision
Adjusted average precision
Maximum F1 score
Adjusted maximum F1 score
ROC AUC
Diversity
A: KNN, B: KNNW, C: LOF, D: SimplifiedLOF, E: LoOP, F: LDOF
G: ODIN, H: KDEOS, I: COF, J: FastABOD, K: LDF, L: INFLO