Supplementary Material for
On the Evaluation of Unsupervised Outlier Detection: Measures, Datasets, and an Empirical Study
by G. O. Campos, A. Zimek, J. Sander, R. J. G. B. Campello, B. Micenková, E. Schubert, I. Assent and M. E. Houle
Data Mining and Knowledge Discovery 30(4): 891-927, 2016, DOI: 10.1007/s10618-015-0444-8

WDBC (version#06)

This data set describes nuclear characteristics for breast cancer diagnosis. Again, we consider examples of benign cancer as inliers and malignant cancer as outliers. In the preprocessing, we follow Zhang et al. [1], downsampling the outliers to 10. The processed database has 30 numeric attributes and 367 instances, namely 10 outliers (2.72%) and 357 inliers (97.28%).

References:

[1] K. Zhang, M. Hutter, and H. Jin. A new local distance-based outlier detection approach for scattered real-world data. In Proc. PAKDD, pages 813-822, 2009.

Download all data set variants used (1.1 MB). You can also access the original data. (wdbc.data)

Normalized, without duplicates

This version contains 30 attributes, 367 objects, 10 outliers (2.72%)

Download raw algorithm results (3.2 MB) Download raw algorithm evaluation table (40.1 kB)

Best Parameters

The following table contains the best (overall and per-method) results for each method and evaluation measure (when the same score was achieved twice, only the smallest k is given).
The Maximum F1-Measure is complimentary in addition to the measures in the original publication.

Algorithm k P@n Adj. P@n AP Adj. AP Max-F1 Adj. MF1 ROC AUC
KNN 1 0.10000 0.07479 0.15830 0.13472 0.32000 0.30095 0.87535
KNN 2 0.40000 0.38319 0.24615 0.22503 0.40000 0.38319 0.87311
KNN 78 0.40000 0.38319 0.29927 0.27965 0.44444 0.42888 0.85322
KNN 84 0.40000 0.38319 0.30038 0.28078 0.44444 0.42888 0.85462
KNNW 4 0.40000 0.38319 0.21003 0.18790 0.40000 0.38319 0.86891
KNNW 7 0.40000 0.38319 0.23722 0.21586 0.40000 0.38319 0.86975
KNNW 100 0.40000 0.38319 0.28704 0.26706 0.40000 0.38319 0.85406
LOF 6 0.30000 0.28039 0.29286 0.27305 0.40000 0.38319 0.91064
LOF 7 0.40000 0.38319 0.42186 0.40567 0.44444 0.42888 0.91064
LOF 11 0.40000 0.38319 0.47099 0.45618 0.57143 0.55942 0.89132
SimplifiedLOF 8 0.30000 0.28039 0.32177 0.30277 0.37500 0.35749 0.91597
SimplifiedLOF 11 0.30000 0.28039 0.39338 0.37638 0.42857 0.41257 0.90784
SimplifiedLOF 12 0.40000 0.38319 0.38200 0.36469 0.40000 0.38319 0.90392
SimplifiedLOF 100 0.40000 0.38319 0.29657 0.27686 0.44444 0.42888 0.85826
LoOP 9 0.30000 0.28039 0.24985 0.22884 0.37500 0.35749 0.91569
LoOP 18 0.40000 0.38319 0.35100 0.33282 0.40000 0.38319 0.89720
LoOP 99 0.40000 0.38319 0.29691 0.27722 0.44444 0.42888 0.85770
LDOF 21 0.40000 0.38319 0.24020 0.21891 0.40000 0.38319 0.89468
LDOF 22 0.30000 0.28039 0.23495 0.21352 0.36364 0.34581 0.89692
LDOF 41 0.40000 0.38319 0.31952 0.30046 0.42857 0.41257 0.87507
LDOF 97 0.40000 0.38319 0.30389 0.28439 0.47059 0.45576 0.86078
ODIN 13 0.15556 0.13190 0.13468 0.11044 0.22857 0.20696 0.89412
ODIN 35 0.40000 0.38319 0.21265 0.19059 0.40000 0.38319 0.84930
ODIN 36 0.40000 0.38319 0.22280 0.20103 0.42105 0.40484 0.84552
ODIN 95 0.33333 0.31466 0.27229 0.25191 0.37500 0.35749 0.83880
FastABOD 22 0.40000 0.38319 0.29218 0.27235 0.47619 0.46152 0.88908
FastABOD 90 0.40000 0.38319 0.30916 0.28981 0.47619 0.46152 0.89216
FastABOD 92 0.40000 0.38319 0.30944 0.29009 0.47619 0.46152 0.89216
KDEOS 4 0.10000 0.07479 0.07122 0.04520 0.18182 0.15890 0.66863
KDEOS 40 0.00000 -0.02801 0.10293 0.07780 0.24242 0.22120 0.84706
KDEOS 41 0.00000 -0.02801 0.10032 0.07511 0.25000 0.22899 0.84118
KDEOS 59 0.00000 -0.02801 0.09780 0.07252 0.21176 0.18969 0.85462
LDF 3 0.30000 0.28039 0.37795 0.36053 0.42857 0.41257 0.78347
LDF 58 0.30000 0.28039 0.28343 0.26336 0.38095 0.36361 0.84258
LDF 81 0.40000 0.38319 0.27331 0.25296 0.40000 0.38319 0.84202
INFLO 10 0.30000 0.28039 0.41707 0.40074 0.46154 0.44646 0.90392
INFLO 17 0.40000 0.38319 0.31524 0.29606 0.40000 0.38319 0.87675
COF 8 0.40000 0.38319 0.29950 0.27988 0.47059 0.45576 0.90140
COF 11 0.20000 0.17759 0.32149 0.30249 0.33333 0.31466 0.90196
COF 12 0.30000 0.28039 0.27780 0.25757 0.31579 0.29662 0.90224

Plots

Precision at n
Adjusted precision at n
Average precision
Adjusted average precision
Maximum F1 score
Adjusted maximum F1 score
ROC AUC
Diversity
A: KNN, B: KNNW, C: LOF, D: SimplifiedLOF, E: LoOP, F: LDOF
G: ODIN, H: KDEOS, I: COF, J: FastABOD, K: LDF, L: INFLO

Not normalized, without duplicates

This version contains 30 attributes, 367 objects, 10 outliers (2.72%)

Download raw algorithm results (3.1 MB) Download raw algorithm evaluation table (36.7 kB)

Best Parameters

The following table contains the best (overall and per-method) results for each method and evaluation measure (when the same score was achieved twice, only the smallest k is given).
The Maximum F1-Measure is complimentary in addition to the measures in the original publication.

Algorithm k P@n Adj. P@n AP Adj. AP Max-F1 Adj. MF1 ROC AUC
KNN 1 0.70000 0.69160 0.75520 0.74835 0.73684 0.72947 0.98824
KNNW 1 0.70000 0.69160 0.70056 0.69218 0.77778 0.77155 0.98305
KNNW 2 0.70000 0.69160 0.74985 0.74284 0.73684 0.72947 0.98739
LOF 8 0.70000 0.69160 0.73913 0.73183 0.70000 0.69160 0.98039
LOF 10 0.70000 0.69160 0.76044 0.75373 0.77778 0.77155 0.97983
SimplifiedLOF 10 0.70000 0.69160 0.73311 0.72563 0.70588 0.69764 0.98207
SimplifiedLOF 13 0.70000 0.69160 0.75313 0.74622 0.77778 0.77155 0.98319
SimplifiedLOF 14 0.70000 0.69160 0.75754 0.75075 0.73684 0.72947 0.98235
LoOP 15 0.70000 0.69160 0.65442 0.64474 0.73684 0.72947 0.98151
LoOP 24 0.70000 0.69160 0.70056 0.69218 0.70588 0.69764 0.97535
LDOF 14 0.40000 0.38319 0.51231 0.49865 0.59259 0.58118 0.97591
LDOF 29 0.70000 0.69160 0.69084 0.68218 0.70000 0.69160 0.97535
LDOF 31 0.70000 0.69160 0.68830 0.67957 0.73684 0.72947 0.97003
ODIN 40 0.60000 0.58880 0.61703 0.60630 0.63158 0.62126 0.91975
ODIN 66 0.60000 0.58880 0.65939 0.64985 0.70588 0.69764 0.90126
ODIN 90 0.50000 0.48599 0.59519 0.58385 0.62500 0.61450 0.94356
FastABOD 3 0.70000 0.69160 0.66124 0.65176 0.77778 0.77155 0.97115
FastABOD 10 0.70000 0.69160 0.75460 0.74773 0.73684 0.72947 0.98627
FastABOD 37 0.70000 0.69160 0.74896 0.74193 0.70588 0.69764 0.98824
KDEOS 6 0.10000 0.07479 0.06256 0.03630 0.13953 0.11543 0.72381
KDEOS 61 0.00000 -0.02801 0.11917 0.09450 0.29091 0.27105 0.89104
KDEOS 100 0.10000 0.07479 0.12361 0.09906 0.26866 0.24817 0.89440
LDF 4 0.70000 0.69160 0.81440 0.80920 0.82353 0.81859 0.98655
INFLO 12 0.70000 0.69160 0.69207 0.68344 0.70000 0.69160 0.97759
INFLO 13 0.70000 0.69160 0.73018 0.72262 0.73684 0.72947 0.97871
INFLO 15 0.70000 0.69160 0.69415 0.68558 0.73684 0.72947 0.98011
INFLO 20 0.70000 0.69160 0.71440 0.70639 0.77778 0.77155 0.97423
COF 7 0.70000 0.69160 0.64925 0.63942 0.70000 0.69160 0.96975
COF 9 0.70000 0.69160 0.78334 0.77727 0.82353 0.81859 0.98039
COF 10 0.70000 0.69160 0.78747 0.78151 0.82353 0.81859 0.98151

Plots

Precision at n
Adjusted precision at n
Average precision
Adjusted average precision
Maximum F1 score
Adjusted maximum F1 score
ROC AUC
Diversity
A: KNN, B: KNNW, C: LOF, D: SimplifiedLOF, E: LoOP, F: LDOF
G: ODIN, H: KDEOS, I: COF, J: FastABOD, K: LDF, L: INFLO