Supplementary Material for
On the Evaluation of Unsupervised Outlier Detection: Measures, Datasets, and an Empirical Study
by G. O. Campos, A. Zimek, J. Sander, R. J. G. B. Campello, B. Micenková, E. Schubert, I. Assent and M. E. Houle
Data Mining and Knowledge Discovery 30(4): 891-927, 2016, DOI: 10.1007/s10618-015-0444-8

PenDigits (version#04)

The 10 classes contained in this data set correspond to the digits from 0 to 9, with examples created by different hand writings. Class 4, defined here as outlier, was downsampled to only 20 objects. After the preprocessing, this database has 16 numeric attributes and 9868 instances, divided into 20 outliers (0.2%) and 9848 inliers (99.8%). This dataset is already normalized, i.e., all 16 attributes (spatial coordinates) have the same range [0,100]. It has been used in this form in [1,2].

References:

[1] H.-P. Kriegel, P. Kroeger, E. Schubert, and A. Zimek. Interpreting and unifying outlier scores. In Proc. SDM, pages 13-24, 2011.
[2] E. Schubert, R. Wojdanowski, A. Zimek, and H.-P. Kriegel. On evaluation of outlier rankings and outlier scores. In Proc. SDM, pages 1047-1058, 2012.

Download all data set variants used (2.1 MB). You can also access the original data. (merge train and test [pendigits.tes and pendigits.tra])

Normalized, without duplicates

This version contains 16 attributes, 9868 objects, 20 outliers (0.20%)

Download raw algorithm results (83.1 MB) Download raw algorithm evaluation table (60.2 kB)

Best Parameters

The following table contains the best (overall and per-method) results for each method and evaluation measure (when the same score was achieved twice, only the smallest k is given).
The Maximum F1-Measure is complimentary in addition to the measures in the original publication.

Algorithm k P@n Adj. P@n AP Adj. AP Max-F1 Adj. MF1 ROC AUC
KNN 6 0.10000 0.09817 0.11218 0.11037 0.22951 0.22794 0.99052
KNN 9 0.05000 0.04807 0.12342 0.12164 0.23729 0.23574 0.99019
KNN 11 0.00000 -0.00203 0.11680 0.11500 0.25000 0.24848 0.98976
KNNW 14 0.05000 0.04807 0.09737 0.09554 0.21538 0.21379 0.98902
KNNW 17 0.00000 -0.00203 0.09761 0.09578 0.22581 0.22423 0.98884
KNNW 22 0.00000 -0.00203 0.09946 0.09763 0.21875 0.21716 0.98856
LOF 14 0.05000 0.04807 0.04363 0.04169 0.12308 0.12130 0.96709
LOF 19 0.05000 0.04807 0.05681 0.05489 0.14516 0.14343 0.97931
LOF 23 0.05000 0.04807 0.06122 0.05931 0.13115 0.12938 0.97990
SimplifiedLOF 17 0.05000 0.04807 0.03169 0.02972 0.08187 0.08001 0.94914
SimplifiedLOF 49 0.05000 0.04807 0.05424 0.05232 0.11321 0.11141 0.97713
SimplifiedLOF 59 0.05000 0.04807 0.05654 0.05463 0.12500 0.12322 0.97666
LoOP 52 0.05000 0.04807 0.04353 0.04159 0.11321 0.11141 0.97120
LoOP 68 0.05000 0.04807 0.04825 0.04632 0.12371 0.12193 0.97303
LoOP 75 0.05000 0.04807 0.04896 0.04703 0.13187 0.13011 0.97195
LoOP 80 0.05000 0.04807 0.04966 0.04773 0.12766 0.12589 0.97224
LDOF 13 0.05000 0.04807 0.00466 0.00264 0.05128 0.04936 0.51376
LDOF 16 0.05000 0.04807 0.00560 0.00358 0.05556 0.05364 0.56922
LDOF 81 0.00000 -0.00203 0.01074 0.00873 0.03457 0.03261 0.83253
LDOF 90 0.00000 -0.00203 0.01054 0.00853 0.03493 0.03297 0.83660
ODIN 20 0.05000 0.04807 0.01064 0.00863 0.05128 0.04936 0.87734
ODIN 91 0.00000 -0.00203 0.03918 0.03723 0.09655 0.09472 0.96992
ODIN 98 0.00000 -0.00203 0.03896 0.03701 0.08800 0.08615 0.97105
ODIN 100 0.00000 -0.00203 0.03937 0.03742 0.09091 0.08906 0.97016
FastABOD 3 0.05000 0.04807 0.02588 0.02390 0.08000 0.07813 0.90836
FastABOD 41 0.05000 0.04807 0.04281 0.04086 0.11494 0.11315 0.96335
FastABOD 79 0.05000 0.04807 0.04576 0.04382 0.10112 0.09930 0.96595
KDEOS 2 0.00000 -0.00203 0.00285 0.00082 0.01254 0.01053 0.52708
KDEOS 12 0.00000 -0.00203 0.00269 0.00067 0.02174 0.01975 0.53031
KDEOS 98 0.00000 -0.00203 0.00704 0.00502 0.01744 0.01545 0.85824
KDEOS 100 0.00000 -0.00203 0.00700 0.00498 0.01805 0.01606 0.85851
LDF 8 0.10000 0.09817 0.07696 0.07508 0.15385 0.15213 0.97764
LDF 10 0.10000 0.09817 0.10211 0.10028 0.17778 0.17611 0.98667
LDF 13 0.10000 0.09817 0.10709 0.10528 0.18033 0.17866 0.98247
LDF 18 0.10000 0.09817 0.08319 0.08133 0.18557 0.18391 0.98071
INFLO 52 0.05000 0.04807 0.03883 0.03688 0.10526 0.10345 0.96321
INFLO 57 0.05000 0.04807 0.04149 0.03954 0.11009 0.10828 0.96556
INFLO 64 0.05000 0.04807 0.04236 0.04042 0.10294 0.10112 0.96626
INFLO 81 0.05000 0.04807 0.04275 0.04081 0.10000 0.09817 0.96520
COF 31 0.05000 0.04807 0.16578 0.16409 0.28916 0.28771 0.98661
COF 33 0.05000 0.04807 0.16373 0.16203 0.29268 0.29125 0.98106
COF 46 0.20000 0.19838 0.17616 0.17449 0.25926 0.25775 0.97409
COF 50 0.15000 0.14827 0.18325 0.18159 0.26000 0.25850 0.96856

Plots

Precision at n
Adjusted precision at n
Average precision
Adjusted average precision
Maximum F1 score
Adjusted maximum F1 score
ROC AUC
Diversity
A: KNN, B: KNNW, C: LOF, D: SimplifiedLOF, E: LoOP, F: LDOF
G: ODIN, H: KDEOS, I: COF, J: FastABOD, K: LDF, L: INFLO