Supplementary Material for
On the Evaluation of Unsupervised Outlier Detection: Measures, Datasets, and an Empirical Study
by G. O. Campos, A. Zimek, J. Sander, R. J. G. B. Campello, B. Micenková, E. Schubert, I. Assent and M. E. Houle
Data Mining and Knowledge Discovery 30(4): 891-927, 2016, DOI: 10.1007/s10618-015-0444-8

WBC (version#01)

This dataset consists of examples of different cancer types, benign or malignant. Examples of benign cancer are considered inliers, examples of malignant cancer are considered outliers. After downsampling the outliers, following Schubert et al. [1], 10 outliers remain. 234 instances are duplicates (231 inliers and 3 outliers), therefore 229 outliers were removed from the data set with duplicates and 226 outliers from the dataset without duplicates. Furthermore, we removed 16 instances with missing values, two of them being outliers and 14 inliers. The processed data set has 9 numeric attributes and 454 instances, namely 10 outliers (2.2%) and 444 inliers (97.8%). The same pre-processing has also been applied in [2] and [3].

References:

[1] E. Schubert, R. Wojdanowski, A. Zimek, and H.-P. Kriegel. On evaluation of outlier rankings and outlier scores. In Proc. SDM, pages 1047-1058, 2012.
[2] A. Zimek, M. Gaudet, R. J. G. B. Campello, and J. Sander. Subsampling for efficient and effective unsupervised outlier detection ensembles. In Proc. KDD, pages 428-436, 2013.
[3] H.-P. Kriegel, P. Kroeger, E. Schubert, and A. Zimek. Interpreting and unifying outlier scores. In Proc. SDM, pages 13-24, 2011.

Download all data set variants used (57.1 kB). You can also access the original data. (breast-cancer-wisconsin.data)

Normalized, without duplicates

This version contains 9 attributes, 223 objects, 10 outliers (4.48%)

Download raw algorithm results (1.6 MB) Download raw algorithm evaluation table (36.6 kB)

Best Parameters

The following table contains the best (overall and per-method) results for each method and evaluation measure (when the same score was achieved twice, only the smallest k is given).
The Maximum F1-Measure is complimentary in addition to the measures in the original publication.

Algorithm k P@n Adj. P@n AP Adj. AP Max-F1 Adj. MF1 ROC AUC
KNN 12 0.80000 0.79061 0.90045 0.89578 0.82353 0.81524 0.99202
KNN 15 0.80000 0.79061 0.90727 0.90292 0.85714 0.85044 0.99272
KNN 16 0.80000 0.79061 0.90934 0.90509 0.84211 0.83469 0.99272
KNN 24 0.80000 0.79061 0.90581 0.90138 0.84211 0.83469 0.99296
KNNW 41 0.70000 0.68592 0.88209 0.87655 0.78261 0.77240 0.99202
KNNW 50 0.80000 0.79061 0.88179 0.87624 0.80000 0.79061 0.99155
LOF 73 0.80000 0.79061 0.87626 0.87045 0.81818 0.80965 0.99061
LOF 99 0.80000 0.79061 0.88415 0.87871 0.84211 0.83469 0.99061
SimplifiedLOF 78 0.60000 0.58122 0.71833 0.70511 0.63636 0.61929 0.97934
SimplifiedLOF 100 0.60000 0.58122 0.80333 0.79410 0.75000 0.73826 0.98357
LoOP 59 0.40000 0.37183 0.33317 0.30186 0.46154 0.43626 0.94085
LoOP 85 0.40000 0.37183 0.55097 0.52989 0.62069 0.60288 0.97042
LoOP 97 0.40000 0.37183 0.63458 0.61742 0.62069 0.60288 0.97371
LDOF 80 0.40000 0.37183 0.33332 0.30202 0.48276 0.45847 0.94131
LDOF 96 0.40000 0.37183 0.43654 0.41009 0.56000 0.53934 0.95822
LDOF 100 0.40000 0.37183 0.44209 0.41590 0.58333 0.56377 0.95822
ODIN 75 0.40000 0.37183 0.32131 0.28945 0.48649 0.46238 0.94765
ODIN 90 0.40000 0.37183 0.35574 0.32550 0.52941 0.50732 0.95563
ODIN 98 0.40000 0.37183 0.36388 0.33402 0.50000 0.47653 0.95634
ODIN 99 0.40000 0.37183 0.36767 0.33798 0.50000 0.47653 0.95540
FastABOD 7 0.70000 0.68592 0.81217 0.80335 0.75000 0.73826 0.98310
FastABOD 92 0.70000 0.68592 0.85418 0.84734 0.75000 0.73826 0.98920
KDEOS 7 0.10000 0.05775 0.05956 0.01541 0.12069 0.07941 0.52770
KDEOS 10 0.10000 0.05775 0.09903 0.05673 0.16667 0.12754 0.53474
KDEOS 17 0.10000 0.05775 0.09246 0.04985 0.15385 0.11412 0.60235
LDF 44 0.80000 0.79061 0.85751 0.85082 0.80000 0.79061 0.98873
LDF 62 0.80000 0.79061 0.89280 0.88776 0.85714 0.85044 0.99155
LDF 69 0.80000 0.79061 0.89694 0.89210 0.85714 0.85044 0.99249
LDF 84 0.80000 0.79061 0.89513 0.89021 0.81818 0.80965 0.99296
INFLO 84 0.60000 0.58122 0.73885 0.72659 0.64286 0.62609 0.98028
INFLO 87 0.60000 0.58122 0.76750 0.75659 0.70588 0.69207 0.98216
INFLO 100 0.60000 0.58122 0.79127 0.78148 0.75000 0.73826 0.98216
COF 60 0.70000 0.68592 0.77365 0.76302 0.77778 0.76734 0.98638
COF 66 0.70000 0.68592 0.81286 0.80407 0.70588 0.69207 0.98638
COF 93 0.80000 0.79061 0.73331 0.72079 0.80000 0.79061 0.98075

Plots

Precision at n
Adjusted precision at n
Average precision
Adjusted average precision
Maximum F1 score
Adjusted maximum F1 score
ROC AUC
Diversity
A: KNN, B: KNNW, C: LOF, D: SimplifiedLOF, E: LoOP, F: LDOF
G: ODIN, H: KDEOS, I: COF, J: FastABOD, K: LDF, L: INFLO

Normalized, duplicates

This version contains 9 attributes, 454 objects, 10 outliers (2.20%)

Download raw algorithm results (1.9 MB) Download raw algorithm evaluation table (39.6 kB)

Best Parameters

The following table contains the best (overall and per-method) results for each method and evaluation measure (when the same score was achieved twice, only the smallest k is given).
The Maximum F1-Measure is complimentary in addition to the measures in the original publication.

Algorithm k P@n Adj. P@n AP Adj. AP Max-F1 Adj. MF1 ROC AUC
KNN 1 0.60000 0.59099 0.75475 0.74922 0.66667 0.65916 0.99077
KNN 12 0.65000 0.64212 0.75221 0.74663 0.66667 0.65916 0.98941
KNNW 2 0.60000 0.59099 0.73611 0.73017 0.66667 0.65916 0.98919
KNNW 5 0.50000 0.48874 0.74421 0.73845 0.66667 0.65916 0.99009
KNNW 19 0.60000 0.59099 0.75464 0.74912 0.66667 0.65916 0.98964
LOF 86 0.60000 0.59099 0.49749 0.48617 0.60000 0.59099 0.98041
LOF 93 0.60000 0.59099 0.70574 0.69911 0.66667 0.65916 0.98559
LOF 100 0.60000 0.59099 0.71078 0.70427 0.66667 0.65916 0.98581
SimplifiedLOF 79 0.40000 0.38649 0.34575 0.33102 0.60870 0.59988 0.96757
SimplifiedLOF 83 0.50000 0.48874 0.33776 0.32284 0.56000 0.55009 0.96554
SimplifiedLOF 100 0.50000 0.48874 0.46505 0.45300 0.59259 0.58342 0.97680
LoOP 92 0.40000 0.38649 0.32795 0.31281 0.58333 0.57395 0.96622
LoOP 99 0.50000 0.48874 0.36041 0.34600 0.58333 0.57395 0.96869
LoOP 100 0.50000 0.48874 0.36107 0.34668 0.58333 0.57395 0.96892
LDOF 86 0.20000 0.18198 0.25845 0.24175 0.50000 0.48874 0.94505
LDOF 94 0.30000 0.28423 0.27145 0.25504 0.50000 0.48874 0.95315
LDOF 100 0.30000 0.28423 0.28714 0.27109 0.48276 0.47111 0.95766
ODIN 87 0.30000 0.28423 0.32946 0.31436 0.50000 0.48874 0.96622
ODIN 100 0.40000 0.38649 0.34537 0.33063 0.50000 0.48874 0.97038
FastABOD 22 0.60000 0.59099 0.39656 0.38297 0.60000 0.59099 0.97905
FastABOD 28 0.60000 0.59099 0.76869 0.76348 0.75000 0.74437 0.98874
FastABOD 92 0.60000 0.59099 0.78761 0.78283 0.75000 0.74437 0.99032
KDEOS 9 0.30000 0.28423 0.15868 0.13973 0.30000 0.28423 0.80721
KDEOS 19 0.10000 0.07973 0.08842 0.06789 0.17857 0.16007 0.82477
LDF 37 0.60000 0.59099 0.40523 0.39183 0.66667 0.65916 0.97230
LDF 94 0.60000 0.59099 0.73040 0.72433 0.66667 0.65916 0.98716
INFLO 65 0.50000 0.48874 0.33402 0.31902 0.53846 0.52807 0.96577
INFLO 78 0.50000 0.48874 0.37862 0.36462 0.63636 0.62817 0.97252
INFLO 98 0.50000 0.48874 0.56064 0.55075 0.58333 0.57395 0.98243
INFLO 100 0.50000 0.48874 0.55651 0.54652 0.59259 0.58342 0.98288
COF 81 0.30000 0.28423 0.30695 0.29134 0.40909 0.39578 0.95180
COF 87 0.20000 0.18198 0.43677 0.42409 0.48276 0.47111 0.96937
COF 91 0.30000 0.28423 0.40236 0.38890 0.50000 0.48874 0.97185
COF 92 0.20000 0.18198 0.39519 0.38157 0.46667 0.45465 0.97297

Plots

Precision at n
Adjusted precision at n
Average precision
Adjusted average precision
Maximum F1 score
Adjusted maximum F1 score
ROC AUC
Diversity
A: KNN, B: KNNW, C: LOF, D: SimplifiedLOF, E: LoOP, F: LDOF
G: ODIN, H: KDEOS, I: COF, J: FastABOD, K: LDF, L: INFLO

Not normalized, without duplicates

This version contains 9 attributes, 223 objects, 10 outliers (4.48%)

Download raw algorithm results (1.6 MB) Download raw algorithm evaluation table (36.4 kB)

Best Parameters

The following table contains the best (overall and per-method) results for each method and evaluation measure (when the same score was achieved twice, only the smallest k is given).
The Maximum F1-Measure is complimentary in addition to the measures in the original publication.

Algorithm k P@n Adj. P@n AP Adj. AP Max-F1 Adj. MF1 ROC AUC
KNN 11 0.80000 0.79061 0.87463 0.86874 0.80000 0.79061 0.99108
KNN 13 0.80000 0.79061 0.90909 0.90482 0.85714 0.85044 0.99272
KNN 15 0.80000 0.79061 0.90944 0.90519 0.85714 0.85044 0.99319
KNNW 15 0.80000 0.79061 0.86403 0.85764 0.80000 0.79061 0.99014
KNNW 25 0.70000 0.68592 0.88523 0.87984 0.81818 0.80965 0.99202
KNNW 54 0.80000 0.79061 0.88673 0.88141 0.80000 0.79061 0.99202
LOF 71 0.80000 0.79061 0.86323 0.85681 0.80000 0.79061 0.98967
LOF 73 0.80000 0.79061 0.90366 0.89914 0.85714 0.85044 0.99249
SimplifiedLOF 69 0.70000 0.68592 0.72813 0.71537 0.70000 0.68592 0.97887
SimplifiedLOF 83 0.70000 0.68592 0.81461 0.80590 0.75000 0.73826 0.98451
SimplifiedLOF 98 0.60000 0.58122 0.82051 0.81208 0.75000 0.73826 0.98592
LoOP 83 0.50000 0.47653 0.61114 0.59288 0.62069 0.60288 0.97136
LoOP 91 0.50000 0.47653 0.64417 0.62747 0.63636 0.61929 0.97371
LoOP 97 0.50000 0.47653 0.69469 0.68036 0.63636 0.61929 0.97653
LDOF 72 0.40000 0.37183 0.32666 0.29505 0.46667 0.44163 0.92864
LDOF 92 0.40000 0.37183 0.46072 0.43541 0.58333 0.56377 0.95446
LDOF 100 0.40000 0.37183 0.53996 0.51836 0.58333 0.56377 0.96103
ODIN 75 0.40000 0.37183 0.32897 0.29747 0.48649 0.46238 0.94930
ODIN 84 0.40000 0.37183 0.34193 0.31104 0.51429 0.49148 0.95399
ODIN 98 0.40000 0.37183 0.36503 0.33522 0.51282 0.48995 0.95657
ODIN 100 0.40000 0.37183 0.36882 0.33918 0.51282 0.48995 0.95587
FastABOD 7 0.70000 0.68592 0.82514 0.81694 0.75000 0.73826 0.98545
FastABOD 62 0.70000 0.68592 0.84112 0.83366 0.76190 0.75073 0.98779
FastABOD 98 0.70000 0.68592 0.85221 0.84527 0.76190 0.75073 0.98873
KDEOS 7 0.10000 0.05775 0.08148 0.03836 0.14286 0.10262 0.57042
KDEOS 8 0.10000 0.05775 0.15027 0.11037 0.18182 0.14341 0.53568
KDEOS 17 0.10000 0.05775 0.16554 0.12636 0.18182 0.14341 0.61596
LDF 34 0.80000 0.79061 0.85113 0.84414 0.80000 0.79061 0.98826
LDF 56 0.80000 0.79061 0.90583 0.90140 0.85714 0.85044 0.99296
INFLO 71 0.60000 0.58122 0.70416 0.69027 0.63636 0.61929 0.97700
INFLO 87 0.60000 0.58122 0.80510 0.79595 0.75000 0.73826 0.98404
INFLO 96 0.60000 0.58122 0.81266 0.80386 0.75000 0.73826 0.98498
COF 65 0.70000 0.68592 0.81974 0.81128 0.73684 0.72449 0.98826
COF 66 0.70000 0.68592 0.84869 0.84158 0.77778 0.76734 0.98638
COF 67 0.80000 0.79061 0.74758 0.73573 0.80000 0.79061 0.98592

Plots

Precision at n
Adjusted precision at n
Average precision
Adjusted average precision
Maximum F1 score
Adjusted maximum F1 score
ROC AUC
Diversity
A: KNN, B: KNNW, C: LOF, D: SimplifiedLOF, E: LoOP, F: LDOF
G: ODIN, H: KDEOS, I: COF, J: FastABOD, K: LDF, L: INFLO

Not normalized, duplicates

This version contains 9 attributes, 454 objects, 10 outliers (2.20%)

Download raw algorithm results (1.9 MB) Download raw algorithm evaluation table (39.8 kB)

Best Parameters

The following table contains the best (overall and per-method) results for each method and evaluation measure (when the same score was achieved twice, only the smallest k is given).
The Maximum F1-Measure is complimentary in addition to the measures in the original publication.

Algorithm k P@n Adj. P@n AP Adj. AP Max-F1 Adj. MF1 ROC AUC
KNN 1 0.60000 0.59099 0.75475 0.74922 0.66667 0.65916 0.99065
KNN 12 0.65000 0.64212 0.75221 0.74663 0.66667 0.65916 0.98941
KNNW 2 0.60000 0.59099 0.73611 0.73017 0.66667 0.65916 0.98919
KNNW 5 0.50000 0.48874 0.74421 0.73845 0.66667 0.65916 0.99009
KNNW 19 0.60000 0.59099 0.75464 0.74912 0.66667 0.65916 0.98964
LOF 86 0.60000 0.59099 0.49853 0.48723 0.60000 0.59099 0.98063
LOF 95 0.60000 0.59099 0.70574 0.69911 0.66667 0.65916 0.98559
LOF 100 0.60000 0.59099 0.71078 0.70427 0.66667 0.65916 0.98581
SimplifiedLOF 83 0.50000 0.48874 0.33776 0.32284 0.56000 0.55009 0.96554
SimplifiedLOF 89 0.50000 0.48874 0.35990 0.34548 0.60870 0.59988 0.96982
SimplifiedLOF 100 0.50000 0.48874 0.45752 0.44530 0.57143 0.56178 0.97613
LoOP 91 0.40000 0.38649 0.32662 0.31146 0.58333 0.57395 0.96554
LoOP 96 0.50000 0.48874 0.34494 0.33018 0.58333 0.57395 0.96712
LoOP 100 0.50000 0.48874 0.36107 0.34668 0.58333 0.57395 0.96892
LDOF 94 0.30000 0.28423 0.27725 0.26097 0.50000 0.48874 0.95315
LDOF 99 0.30000 0.28423 0.28848 0.27245 0.48276 0.47111 0.95563
LDOF 100 0.30000 0.28423 0.28684 0.27077 0.48000 0.46829 0.95743
ODIN 87 0.30000 0.28423 0.32996 0.31487 0.50000 0.48874 0.96644
ODIN 100 0.40000 0.38649 0.34537 0.33063 0.50000 0.48874 0.97038
FastABOD 22 0.60000 0.59099 0.40430 0.39088 0.60000 0.59099 0.98063
FastABOD 28 0.60000 0.59099 0.77301 0.76790 0.75000 0.74437 0.98964
FastABOD 37 0.60000 0.59099 0.78630 0.78149 0.75000 0.74437 0.99054
KDEOS 2 0.00000 -0.02252 0.04594 0.02445 0.13187 0.11232 0.68052
KDEOS 13 0.00000 -0.02252 0.04316 0.02161 0.09722 0.07689 0.71689
LDF 37 0.60000 0.59099 0.40523 0.39183 0.66667 0.65916 0.97230
LDF 94 0.60000 0.59099 0.73040 0.72433 0.66667 0.65916 0.98716
INFLO 65 0.50000 0.48874 0.33328 0.31826 0.53846 0.52807 0.96532
INFLO 78 0.50000 0.48874 0.38232 0.36841 0.63636 0.62817 0.97320
INFLO 98 0.50000 0.48874 0.56064 0.55075 0.58333 0.57395 0.98243
INFLO 100 0.50000 0.48874 0.55651 0.54652 0.59259 0.58342 0.98288
COF 81 0.30000 0.28423 0.33525 0.32028 0.42857 0.41570 0.95788
COF 86 0.30000 0.28423 0.44559 0.43310 0.46667 0.45465 0.96712
COF 91 0.30000 0.28423 0.43851 0.42586 0.51613 0.50523 0.97613
COF 99 0.30000 0.28423 0.31556 0.30015 0.53333 0.52282 0.97365

Plots

Precision at n
Adjusted precision at n
Average precision
Adjusted average precision
Maximum F1 score
Adjusted maximum F1 score
ROC AUC
Diversity
A: KNN, B: KNNW, C: LOF, D: SimplifiedLOF, E: LoOP, F: LDOF
G: ODIN, H: KDEOS, I: COF, J: FastABOD, K: LDF, L: INFLO