Supplementary Material for
On the Evaluation of Unsupervised Outlier Detection: Measures, Datasets, and an Empirical Study
by G. O. Campos, A. Zimek, J. Sander, R. J. G. B. Campello, B. Micenková, E. Schubert, I. Assent and M. E. Houle
Data Mining and Knowledge Discovery 30(4): 891-927, 2016, DOI: 10.1007/s10618-015-0444-8

WDBC (version#10)

This data set describes nuclear characteristics for breast cancer diagnosis. Again, we consider examples of benign cancer as inliers and malignant cancer as outliers. In the preprocessing, we follow Zhang et al. [1], downsampling the outliers to 10. The processed database has 30 numeric attributes and 367 instances, namely 10 outliers (2.72%) and 357 inliers (97.28%).

References:

[1] K. Zhang, M. Hutter, and H. Jin. A new local distance-based outlier detection approach for scattered real-world data. In Proc. PAKDD, pages 813-822, 2009.

Download all data set variants used (1.1 MB). You can also access the original data. (wdbc.data)

Normalized, without duplicates

This version contains 30 attributes, 367 objects, 10 outliers (2.72%)

Download raw algorithm results (3.3 MB) Download raw algorithm evaluation table (35.8 kB)

Best Parameters

The following table contains the best (overall and per-method) results for each method and evaluation measure (when the same score was achieved twice, only the smallest k is given).
The Maximum F1-Measure is complimentary in addition to the measures in the original publication.

Algorithm k P@n Adj. P@n AP Adj. AP Max-F1 Adj. MF1 ROC AUC
KNN 14 0.50000 0.48599 0.57751 0.56568 0.59259 0.58118 0.98375
KNN 67 0.50000 0.48599 0.61222 0.60136 0.64000 0.62992 0.98599
KNN 90 0.50000 0.48599 0.61120 0.60031 0.64516 0.63522 0.98627
KNNW 44 0.50000 0.48599 0.56206 0.54980 0.58824 0.57670 0.98235
KNNW 73 0.50000 0.48599 0.57591 0.56403 0.61538 0.60461 0.98375
KNNW 90 0.50000 0.48599 0.58455 0.57291 0.61538 0.60461 0.98459
LOF 86 0.60000 0.58880 0.68659 0.67781 0.66667 0.65733 0.98880
LOF 89 0.60000 0.58880 0.69354 0.68495 0.66667 0.65733 0.98908
LOF 100 0.60000 0.58880 0.68609 0.67730 0.69231 0.68369 0.98880
SimplifiedLOF 63 0.50000 0.48599 0.62601 0.61554 0.60000 0.58880 0.98375
SimplifiedLOF 70 0.50000 0.48599 0.64514 0.63521 0.64286 0.63285 0.98571
SimplifiedLOF 90 0.50000 0.48599 0.66243 0.65297 0.64286 0.63285 0.98683
LoOP 39 0.40000 0.38319 0.45797 0.44279 0.46667 0.45173 0.96667
LoOP 78 0.40000 0.38319 0.59352 0.58213 0.60606 0.59503 0.98179
LoOP 100 0.40000 0.38319 0.61548 0.60471 0.60606 0.59503 0.98403
LDOF 61 0.40000 0.38319 0.50204 0.48810 0.56250 0.55025 0.97395
LDOF 80 0.40000 0.38319 0.56609 0.55393 0.60606 0.59503 0.98011
LDOF 84 0.40000 0.38319 0.54187 0.52904 0.62500 0.61450 0.98039
LDOF 99 0.40000 0.38319 0.56176 0.54948 0.60606 0.59503 0.98179
ODIN 84 0.30000 0.28039 0.35273 0.33460 0.56250 0.55025 0.97199
ODIN 85 0.27500 0.25469 0.35663 0.33861 0.56250 0.55025 0.97185
ODIN 93 0.32000 0.30095 0.34591 0.32759 0.56250 0.55025 0.97227
ODIN 95 0.35000 0.33179 0.34222 0.32379 0.54545 0.53272 0.97115
FastABOD 12 0.40000 0.38319 0.39992 0.38311 0.53333 0.52026 0.97283
FastABOD 84 0.40000 0.38319 0.48668 0.47230 0.64000 0.62992 0.98235
FastABOD 97 0.40000 0.38319 0.49113 0.47687 0.64000 0.62992 0.98263
KDEOS 9 0.20000 0.17759 0.06202 0.03575 0.20000 0.17759 0.49552
KDEOS 77 0.00000 -0.02801 0.08992 0.06443 0.21429 0.19228 0.85602
KDEOS 80 0.00000 -0.02801 0.09376 0.06838 0.20339 0.18108 0.86106
LDF 4 0.50000 0.48599 0.47457 0.45985 0.55556 0.54311 0.97535
LDF 7 0.50000 0.48599 0.69285 0.68425 0.62500 0.61450 0.98039
LDF 33 0.50000 0.48599 0.64882 0.63899 0.66667 0.65733 0.98543
INFLO 86 0.50000 0.48599 0.58433 0.57269 0.62069 0.61006 0.98375
INFLO 89 0.50000 0.48599 0.58988 0.57839 0.64286 0.63285 0.98431
INFLO 95 0.50000 0.48599 0.59533 0.58400 0.62069 0.61006 0.98487
COF 22 0.50000 0.48599 0.40187 0.38512 0.50000 0.48599 0.95854
COF 26 0.50000 0.48599 0.48146 0.46694 0.62500 0.61450 0.96162
COF 45 0.50000 0.48599 0.50103 0.48705 0.58824 0.57670 0.97507
COF 55 0.50000 0.48599 0.47574 0.46106 0.58065 0.56890 0.98067

Plots

Precision at n
Adjusted precision at n
Average precision
Adjusted average precision
Maximum F1 score
Adjusted maximum F1 score
ROC AUC
Diversity
A: KNN, B: KNNW, C: LOF, D: SimplifiedLOF, E: LoOP, F: LDOF
G: ODIN, H: KDEOS, I: COF, J: FastABOD, K: LDF, L: INFLO

Not normalized, without duplicates

This version contains 30 attributes, 367 objects, 10 outliers (2.72%)

Download raw algorithm results (3.1 MB) Download raw algorithm evaluation table (19.1 kB)

Best Parameters

The following table contains the best (overall and per-method) results for each method and evaluation measure (when the same score was achieved twice, only the smallest k is given).
The Maximum F1-Measure is complimentary in addition to the measures in the original publication.

Algorithm k P@n Adj. P@n AP Adj. AP Max-F1 Adj. MF1 ROC AUC
KNN 1 0.90000 0.89720 0.97333 0.97259 0.90909 0.90654 0.99916
KNN 2 0.90000 0.89720 0.95730 0.95610 0.95238 0.95105 0.99888
KNNW 1 0.90000 0.89720 0.98182 0.98131 0.95238 0.95105 0.99944
LOF 14 0.90000 0.89720 0.97333 0.97259 0.90909 0.90654 0.99916
LOF 15 0.90000 0.89720 0.98091 0.98037 0.95238 0.95105 0.99944
SimplifiedLOF 21 0.90000 0.89720 0.94972 0.94831 0.90909 0.90654 0.99860
SimplifiedLOF 23 0.90000 0.89720 0.96980 0.96895 0.95238 0.95105 0.99916
LoOP 31 0.80000 0.79440 0.88126 0.87794 0.90909 0.90654 0.99692
LoOP 73 0.90000 0.89720 0.96692 0.96600 0.90000 0.89720 0.99888
LDOF 43 0.80000 0.79440 0.85745 0.85346 0.85714 0.85314 0.99608
LDOF 95 0.80000 0.79440 0.96515 0.96418 0.90909 0.90654 0.99888
ODIN 63 0.80000 0.79440 0.93265 0.93077 0.90909 0.90654 0.99832
ODIN 72 0.90000 0.89720 0.96222 0.96116 0.90909 0.90654 0.99902
FastABOD 3 0.90000 0.89720 0.95581 0.95457 0.90000 0.89720 0.99860
FastABOD 4 0.90000 0.89720 0.96980 0.96895 0.95238 0.95105 0.99916
FastABOD 6 0.90000 0.89720 0.98091 0.98037 0.95238 0.95105 0.99944
KDEOS 5 0.10000 0.07479 0.04272 0.01590 0.12500 0.10049 0.46807
KDEOS 63 0.00000 -0.02801 0.11070 0.08579 0.31746 0.29834 0.88151
KDEOS 100 0.00000 -0.02801 0.11373 0.08891 0.30769 0.28830 0.88880
LDF 9 0.80000 0.79440 0.86936 0.86570 0.90909 0.90654 0.99664
LDF 10 0.90000 0.89720 0.91877 0.91649 0.90909 0.90654 0.99804
LDF 12 0.90000 0.89720 0.96222 0.96116 0.90909 0.90654 0.99888
INFLO 20 0.80000 0.79440 0.90864 0.90608 0.90909 0.90654 0.99748
INFLO 22 0.90000 0.89720 0.93544 0.93363 0.90909 0.90654 0.99832
INFLO 24 0.90000 0.89720 0.96222 0.96116 0.90909 0.90654 0.99888
COF 20 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000

Plots

Precision at n
Adjusted precision at n
Average precision
Adjusted average precision
Maximum F1 score
Adjusted maximum F1 score
ROC AUC
Diversity
A: KNN, B: KNNW, C: LOF, D: SimplifiedLOF, E: LoOP, F: LDOF
G: ODIN, H: KDEOS, I: COF, J: FastABOD, K: LDF, L: INFLO