Supplementary Material for
On the Evaluation of Unsupervised Outlier Detection: Measures, Datasets, and an Empirical Study
by G. O. Campos, A. Zimek, J. Sander, R. J. G. B. Campello, B. Micenková, E. Schubert, I. Assent and M. E. Houle
Data Mining and Knowledge Discovery 30(4): 891-927, 2016, DOI: 10.1007/s10618-015-0444-8

WDBC (version#03)

This data set describes nuclear characteristics for breast cancer diagnosis. Again, we consider examples of benign cancer as inliers and malignant cancer as outliers. In the preprocessing, we follow Zhang et al. [1], downsampling the outliers to 10. The processed database has 30 numeric attributes and 367 instances, namely 10 outliers (2.72%) and 357 inliers (97.28%).

References:

[1] K. Zhang, M. Hutter, and H. Jin. A new local distance-based outlier detection approach for scattered real-world data. In Proc. PAKDD, pages 813-822, 2009.

Download all data set variants used (1.1 MB). You can also access the original data. (wdbc.data)

Normalized, without duplicates

This version contains 30 attributes, 367 objects, 10 outliers (2.72%)

Download raw algorithm results (3.3 MB) Download raw algorithm evaluation table (38.6 kB)

Best Parameters

The following table contains the best (overall and per-method) results for each method and evaluation measure (when the same score was achieved twice, only the smallest k is given).
The Maximum F1-Measure is complimentary in addition to the measures in the original publication.

Algorithm k P@n Adj. P@n AP Adj. AP Max-F1 Adj. MF1 ROC AUC
KNN 4 0.50000 0.48599 0.43154 0.41562 0.54545 0.53272 0.95714
KNN 9 0.60000 0.58880 0.46924 0.45437 0.60000 0.58880 0.95434
KNNW 10 0.50000 0.48599 0.41622 0.39986 0.52632 0.51305 0.95714
KNNW 72 0.60000 0.58880 0.46306 0.44802 0.60000 0.58880 0.95406
LOF 16 0.60000 0.58880 0.67154 0.66234 0.63158 0.62126 0.97059
LOF 17 0.60000 0.58880 0.66848 0.65919 0.63158 0.62126 0.97087
LOF 18 0.60000 0.58880 0.67243 0.66326 0.63158 0.62126 0.96891
LOF 32 0.60000 0.58880 0.54212 0.52929 0.64000 0.62992 0.96303
SimplifiedLOF 19 0.50000 0.48599 0.62256 0.61199 0.57143 0.55942 0.96835
SimplifiedLOF 29 0.50000 0.48599 0.55361 0.54110 0.56000 0.54768 0.97059
SimplifiedLOF 79 0.60000 0.58880 0.51068 0.49697 0.60000 0.58880 0.96415
SimplifiedLOF 99 0.60000 0.58880 0.52397 0.51064 0.63636 0.62618 0.96218
LoOP 33 0.40000 0.38319 0.47226 0.45747 0.56000 0.54768 0.96415
LoOP 41 0.40000 0.38319 0.51202 0.49835 0.53333 0.52026 0.96190
LoOP 93 0.60000 0.58880 0.49587 0.48175 0.60000 0.58880 0.96218
LDOF 51 0.40000 0.38319 0.47035 0.45551 0.53333 0.52026 0.96415
LDOF 73 0.50000 0.48599 0.45794 0.44275 0.52941 0.51623 0.96331
LDOF 84 0.40000 0.38319 0.48742 0.47306 0.58065 0.56890 0.96275
ODIN 80 0.60000 0.58880 0.48127 0.46674 0.60000 0.58880 0.95224
ODIN 93 0.50000 0.48599 0.48488 0.47046 0.58824 0.57670 0.95490
ODIN 94 0.50000 0.48599 0.48211 0.46761 0.58824 0.57670 0.95504
FastABOD 84 0.50000 0.48599 0.40984 0.39331 0.50000 0.48599 0.95826
FastABOD 91 0.50000 0.48599 0.42180 0.40560 0.52632 0.51305 0.95938
FastABOD 94 0.50000 0.48599 0.42409 0.40796 0.52632 0.51305 0.95966
KDEOS 81 0.20000 0.17759 0.14172 0.11768 0.23881 0.21748 0.89916
KDEOS 82 0.10000 0.07479 0.13551 0.11129 0.25806 0.23728 0.90112
LDF 6 0.60000 0.58880 0.56714 0.55502 0.70588 0.69764 0.95462
LDF 7 0.40000 0.38319 0.60843 0.59746 0.59259 0.58118 0.94986
INFLO 71 0.50000 0.48599 0.49039 0.47612 0.58333 0.57166 0.96303
INFLO 79 0.60000 0.58880 0.50938 0.49564 0.60000 0.58880 0.96134
INFLO 81 0.60000 0.58880 0.51313 0.49949 0.60870 0.59773 0.96134
INFLO 82 0.60000 0.58880 0.51323 0.49959 0.60870 0.59773 0.96162
COF 1 0.30000 0.28039 0.19830 0.17585 0.37500 0.35749 0.53852
COF 20 0.20000 0.17759 0.24417 0.22300 0.42105 0.40484 0.94006
COF 35 0.30000 0.28039 0.31175 0.29247 0.40000 0.38319 0.92213
COF 76 0.20000 0.17759 0.27666 0.25640 0.46667 0.45173 0.91401

Plots

Precision at n
Adjusted precision at n
Average precision
Adjusted average precision
Maximum F1 score
Adjusted maximum F1 score
ROC AUC
Diversity
A: KNN, B: KNNW, C: LOF, D: SimplifiedLOF, E: LoOP, F: LDOF
G: ODIN, H: KDEOS, I: COF, J: FastABOD, K: LDF, L: INFLO

Not normalized, without duplicates

This version contains 30 attributes, 367 objects, 10 outliers (2.72%)

Download raw algorithm results (3.1 MB) Download raw algorithm evaluation table (31.9 kB)

Best Parameters

The following table contains the best (overall and per-method) results for each method and evaluation measure (when the same score was achieved twice, only the smallest k is given).
The Maximum F1-Measure is complimentary in addition to the measures in the original publication.

Algorithm k P@n Adj. P@n AP Adj. AP Max-F1 Adj. MF1 ROC AUC
KNN 2 0.90000 0.89720 0.87565 0.87216 0.90000 0.89720 0.97171
KNN 3 0.80000 0.79440 0.87291 0.86935 0.85714 0.85314 0.98263
KNNW 1 0.90000 0.89720 0.88610 0.88291 0.90000 0.89720 0.95630
KNNW 12 0.80000 0.79440 0.87119 0.86759 0.85714 0.85314 0.98011
LOF 15 0.90000 0.89720 0.89972 0.89691 0.90000 0.89720 0.98880
LOF 20 0.80000 0.79440 0.86773 0.86403 0.85714 0.85314 0.98964
SimplifiedLOF 20 0.90000 0.89720 0.89850 0.89565 0.90000 0.89720 0.98796
SimplifiedLOF 22 0.90000 0.89720 0.91961 0.91736 0.94737 0.94589 0.98852
SimplifiedLOF 49 0.80000 0.79440 0.86342 0.85959 0.84211 0.83768 0.99048
LoOP 25 0.90000 0.89720 0.85544 0.85139 0.90000 0.89720 0.98739
LoOP 37 0.90000 0.89720 0.87536 0.87187 0.90000 0.89720 0.98964
LoOP 47 0.80000 0.79440 0.85472 0.85065 0.85714 0.85314 0.99132
LDOF 38 0.90000 0.89720 0.80829 0.80292 0.90000 0.89720 0.98515
LDOF 47 0.80000 0.79440 0.85106 0.84689 0.85714 0.85314 0.98936
LDOF 80 0.80000 0.79440 0.85201 0.84786 0.81818 0.81309 0.98627
ODIN 49 0.80000 0.79440 0.82459 0.81968 0.80000 0.79440 0.99286
ODIN 63 0.80000 0.79440 0.86481 0.86103 0.84211 0.83768 0.99342
ODIN 64 0.80000 0.79440 0.86195 0.85809 0.84211 0.83768 0.99356
ODIN 80 0.80000 0.79440 0.87419 0.87067 0.84211 0.83768 0.99160
FastABOD 4 0.80000 0.79440 0.86494 0.86116 0.85714 0.85314 0.98347
FastABOD 6 0.90000 0.89720 0.87889 0.87550 0.90000 0.89720 0.97955
FastABOD 8 0.90000 0.89720 0.88976 0.88667 0.90000 0.89720 0.97647
KDEOS 2 0.10000 0.07479 0.12821 0.10379 0.18182 0.15890 0.52101
KDEOS 62 0.00000 -0.02801 0.12177 0.09717 0.32787 0.30904 0.89692
KDEOS 100 0.10000 0.07479 0.12502 0.10051 0.31746 0.29834 0.89720
LDF 11 0.80000 0.79440 0.88389 0.88064 0.84211 0.83768 0.98768
LDF 12 0.80000 0.79440 0.85983 0.85590 0.84211 0.83768 0.98880
LDF 36 0.90000 0.89720 0.86072 0.85682 0.90000 0.89720 0.96919
INFLO 21 0.90000 0.89720 0.90063 0.89784 0.90000 0.89720 0.98936
INFLO 22 0.90000 0.89720 0.92222 0.92004 0.94737 0.94589 0.99020
INFLO 27 0.90000 0.89720 0.92381 0.92168 0.94737 0.94589 0.99104
INFLO 50 0.80000 0.79440 0.87159 0.86799 0.84211 0.83768 0.99300
COF 15 0.90000 0.89720 0.87292 0.86937 0.90000 0.89720 0.95910
COF 17 0.90000 0.89720 0.90521 0.90255 0.94737 0.94589 0.94902
COF 19 0.90000 0.89720 0.90671 0.90410 0.94737 0.94589 0.96106
COF 27 0.90000 0.89720 0.90563 0.90298 0.90000 0.89720 0.98459

Plots

Precision at n
Adjusted precision at n
Average precision
Adjusted average precision
Maximum F1 score
Adjusted maximum F1 score
ROC AUC
Diversity
A: KNN, B: KNNW, C: LOF, D: SimplifiedLOF, E: LoOP, F: LDOF
G: ODIN, H: KDEOS, I: COF, J: FastABOD, K: LDF, L: INFLO