Supplementary Material for
On the Evaluation of Unsupervised Outlier Detection: Measures, Datasets, and an Empirical Study
by G. O. Campos, A. Zimek, J. Sander, R. J. G. B. Campello, B. Micenková, E. Schubert, I. Assent and M. E. Houle
Data Mining and Knowledge Discovery 30(4): 891-927, 2016, DOI: 10.1007/s10618-015-0444-8

WDBC (version#08)

This data set describes nuclear characteristics for breast cancer diagnosis. Again, we consider examples of benign cancer as inliers and malignant cancer as outliers. In the preprocessing, we follow Zhang et al. [1], downsampling the outliers to 10. The processed database has 30 numeric attributes and 367 instances, namely 10 outliers (2.72%) and 357 inliers (97.28%).

References:

[1] K. Zhang, M. Hutter, and H. Jin. A new local distance-based outlier detection approach for scattered real-world data. In Proc. PAKDD, pages 813-822, 2009.

Download all data set variants used (1.1 MB). You can also access the original data. (wdbc.data)

Normalized, without duplicates

This version contains 30 attributes, 367 objects, 10 outliers (2.72%)

Download raw algorithm results (3.3 MB) Download raw algorithm evaluation table (34.2 kB)

Best Parameters

The following table contains the best (overall and per-method) results for each method and evaluation measure (when the same score was achieved twice, only the smallest k is given).
The Maximum F1-Measure is complimentary in addition to the measures in the original publication.

Algorithm k P@n Adj. P@n AP Adj. AP Max-F1 Adj. MF1 ROC AUC
KNN 7 0.70000 0.69160 0.61675 0.60601 0.70000 0.69160 0.98347
KNN 94 0.70000 0.69160 0.67429 0.66516 0.70000 0.69160 0.98768
KNNW 15 0.70000 0.69160 0.62216 0.61157 0.70000 0.69160 0.98347
KNNW 59 0.60000 0.58880 0.64658 0.63668 0.66667 0.65733 0.98571
LOF 24 0.70000 0.69160 0.64409 0.63412 0.70000 0.69160 0.98319
LOF 36 0.70000 0.69160 0.69076 0.68210 0.73684 0.72947 0.98571
LOF 37 0.70000 0.69160 0.69540 0.68686 0.73684 0.72947 0.98599
LOF 96 0.70000 0.69160 0.68291 0.67403 0.73684 0.72947 0.98768
SimplifiedLOF 28 0.70000 0.69160 0.59420 0.58283 0.70000 0.69160 0.98319
SimplifiedLOF 44 0.70000 0.69160 0.66505 0.65566 0.73684 0.72947 0.98403
SimplifiedLOF 63 0.70000 0.69160 0.68643 0.67764 0.73684 0.72947 0.98459
SimplifiedLOF 98 0.70000 0.69160 0.66480 0.65541 0.73684 0.72947 0.98627
LoOP 66 0.70000 0.69160 0.62174 0.61115 0.70000 0.69160 0.97983
LoOP 98 0.70000 0.69160 0.64945 0.63963 0.73684 0.72947 0.98375
LoOP 100 0.70000 0.69160 0.65278 0.64306 0.73684 0.72947 0.98403
LDOF 100 0.70000 0.69160 0.63171 0.62139 0.70000 0.69160 0.98291
ODIN 81 0.40000 0.38319 0.29962 0.28001 0.48276 0.46827 0.96162
ODIN 99 0.36667 0.34893 0.34954 0.33132 0.53846 0.52553 0.96765
ODIN 100 0.36667 0.34893 0.35421 0.33612 0.53846 0.52553 0.96821
FastABOD 9 0.50000 0.48599 0.50790 0.49411 0.66667 0.65733 0.98375
FastABOD 38 0.50000 0.48599 0.55985 0.54752 0.66667 0.65733 0.98599
KDEOS 2 0.00000 -0.02801 0.02844 0.00122 0.06040 0.03408 0.51443
KDEOS 69 0.00000 -0.02801 0.08714 0.06157 0.22472 0.20300 0.85154
KDEOS 89 0.00000 -0.02801 0.09675 0.07145 0.21918 0.19731 0.86695
LDF 5 0.60000 0.58880 0.73195 0.72444 0.70588 0.69764 0.96975
LDF 10 0.70000 0.69160 0.68155 0.67262 0.70000 0.69160 0.95126
LDF 67 0.60000 0.58880 0.64300 0.63300 0.66667 0.65733 0.98655
INFLO 62 0.70000 0.69160 0.62701 0.61656 0.70000 0.69160 0.98151
INFLO 71 0.70000 0.69160 0.64047 0.63040 0.73684 0.72947 0.98067
INFLO 92 0.70000 0.69160 0.65730 0.64770 0.73684 0.72947 0.98487
COF 23 0.60000 0.58880 0.46159 0.44651 0.60000 0.58880 0.95378
COF 24 0.60000 0.58880 0.48603 0.47163 0.66667 0.65733 0.95658
COF 31 0.60000 0.58880 0.51743 0.50391 0.63158 0.62126 0.96218
COF 69 0.50000 0.48599 0.40920 0.39265 0.57143 0.55942 0.97367

Plots

Precision at n
Adjusted precision at n
Average precision
Adjusted average precision
Maximum F1 score
Adjusted maximum F1 score
ROC AUC
Diversity
A: KNN, B: KNNW, C: LOF, D: SimplifiedLOF, E: LoOP, F: LDOF
G: ODIN, H: KDEOS, I: COF, J: FastABOD, K: LDF, L: INFLO

Not normalized, without duplicates

This version contains 30 attributes, 367 objects, 10 outliers (2.72%)

Download raw algorithm results (3.1 MB) Download raw algorithm evaluation table (22.6 kB)

Best Parameters

The following table contains the best (overall and per-method) results for each method and evaluation measure (when the same score was achieved twice, only the smallest k is given).
The Maximum F1-Measure is complimentary in addition to the measures in the original publication.

Algorithm k P@n Adj. P@n AP Adj. AP Max-F1 Adj. MF1 ROC AUC
KNN 1 0.80000 0.79440 0.85989 0.85596 0.85714 0.85314 0.99552
KNN 12 0.80000 0.79440 0.91389 0.91148 0.84211 0.83768 0.99692
KNNW 1 0.90000 0.89720 0.84847 0.84423 0.90000 0.89720 0.99230
KNNW 11 0.80000 0.79440 0.91376 0.91135 0.85714 0.85314 0.99664
LOF 14 0.80000 0.79440 0.89271 0.88970 0.82353 0.81859 0.99468
LOF 16 0.80000 0.79440 0.88613 0.88294 0.84211 0.83768 0.99468
LOF 18 0.80000 0.79440 0.90317 0.90046 0.84211 0.83768 0.99552
LOF 66 0.80000 0.79440 0.89806 0.89520 0.81818 0.81309 0.99608
SimplifiedLOF 17 0.80000 0.79440 0.85500 0.85094 0.82353 0.81859 0.98852
SimplifiedLOF 26 0.80000 0.79440 0.89812 0.89527 0.84211 0.83768 0.99440
SimplifiedLOF 27 0.80000 0.79440 0.90160 0.89884 0.84211 0.83768 0.99496
SimplifiedLOF 66 0.80000 0.79440 0.88936 0.88626 0.80000 0.79440 0.99552
LoOP 26 0.70000 0.69160 0.83508 0.83046 0.82353 0.81859 0.98768
LoOP 71 0.70000 0.69160 0.88247 0.87918 0.82353 0.81859 0.99440
LoOP 84 0.80000 0.79440 0.87940 0.87603 0.80000 0.79440 0.99468
LoOP 97 0.80000 0.79440 0.88179 0.87847 0.80000 0.79440 0.99496
LDOF 52 0.80000 0.79440 0.85910 0.85515 0.80000 0.79440 0.99272
LDOF 86 0.70000 0.69160 0.88049 0.87714 0.82353 0.81859 0.99412
ODIN 67 0.80000 0.79440 0.85492 0.85086 0.80000 0.79440 0.99230
ODIN 77 0.80000 0.79440 0.87639 0.87293 0.84211 0.83768 0.99328
ODIN 79 0.80000 0.79440 0.88266 0.87937 0.84211 0.83768 0.99370
ODIN 81 0.80000 0.79440 0.88266 0.87937 0.84211 0.83768 0.99384
FastABOD 3 0.80000 0.79440 0.83467 0.83004 0.84211 0.83768 0.99328
FastABOD 4 0.80000 0.79440 0.82987 0.82511 0.85714 0.85314 0.99468
FastABOD 30 0.80000 0.79440 0.91084 0.90834 0.85714 0.85314 0.99636
KDEOS 3 0.20000 0.17759 0.06148 0.03519 0.20000 0.17759 0.58011
KDEOS 62 0.00000 -0.02801 0.11970 0.09504 0.30000 0.28039 0.89356
KDEOS 100 0.00000 -0.02801 0.11315 0.08830 0.31746 0.29834 0.88711
LDF 5 0.90000 0.89720 0.91401 0.91160 0.90000 0.89720 0.99608
LDF 6 0.80000 0.79440 0.95325 0.95194 0.88889 0.88578 0.99832
INFLO 18 0.80000 0.79440 0.84376 0.83939 0.82353 0.81859 0.98235
INFLO 54 0.80000 0.79440 0.88936 0.88626 0.80000 0.79440 0.99552
COF 24 0.90000 0.89720 0.93444 0.93261 0.90000 0.89720 0.99734
COF 85 0.90000 0.89720 0.94331 0.94172 0.90000 0.89720 0.99832

Plots

Precision at n
Adjusted precision at n
Average precision
Adjusted average precision
Maximum F1 score
Adjusted maximum F1 score
ROC AUC
Diversity
A: KNN, B: KNNW, C: LOF, D: SimplifiedLOF, E: LoOP, F: LDOF
G: ODIN, H: KDEOS, I: COF, J: FastABOD, K: LDF, L: INFLO