## Objectives

Clustering real-world data sets is often hampered by the so-called curse of dimensionality: many real-world data sets consist of a very high dimensional feature space. In general, most of the common algorithms fail to generate meaningful results because of the inherent sparsity of the data space. Usually, clusters cannot be found in the original feature space because several features may be irrelevant for clustering due to correlation and/or redundancy. However, clusters are usually embedded in lower dimensional subspaces. In addition, different sets of features may be relevant for different sets of objects. Thus, objects can often be clustered differently in varying subspaces of the original feature space.

## Techniques

### Arbitrarily-oriented Correlation Clustering

Correlation clustering aims at partitioning the data objects into distinct sets of points that exhibit a dense, arbitrarily linear correlation, so called correlation sets. Points that do not belong to any correlation set are classified as noise.

4C (Computing Correlation Connected Clusters)4C computes so-called correlation connected clusters, i.e.sets of neighboring points that exhibit an arbitrary linear correlation in a subset of the attributes. For that purpose, 4C examines the correlation in the neighborhood of each point and groups points with similar local correlation into clusters. | |

COPAC (COrrelation PArtition Clustering)Description | |

HICO (HIrachical Correlation Clustering)Description | |

ERiC (Exploring Relationships among Correlation clusters)Description | |

Increasing the Robustness of PCA-based AlgorithmsDescription | |

CASH (Clustering in Arbitrary Subspaces based on the Hough transform)Description |

### Axis-parallel Projected Clustering

Projected clustering aims at partitioning the data into a set of clusters and a noise set while allowing the clusters to exist in different subspaces. Thus, for each cluster, a different set of features is relevant, whith the remaining features being irrelevant. The points of each cluster are densely packed in the subspace spanned by the corresponding relevant features but are sparsely distributed in the subspace spanned by the other dimensions.

PreDeCon (subspace PREference weighted DEnsity CONnnected clustering)PreDeCon extends the 'full-dimensional' clustering algorithm DBSCAN with the concept of subspace preference weights. For each point, the local neighborhood is examined to determine a subspace tendency, i.e. an axis parallel projection onto which the point is located in a dense area. If the point is noise, this subspace is the entire feature space. Neighboring points having a similar subspace preference weight are grouped into clusters using the density-connected clustering notion. | |

HiSC (HIerarchical Subspace Clustering)Description | |

DiSH (DerIving Subspace Hierarchies)Description |

### Axis-parallel Subspace Clustering

Subspace clustering aims at computing all clusters in all subspaces of the feature space. The information of objects clustered differently in varying subspaces is conserved. Objects may be assigned to several clusters (in different subspaces). Subspace ranking aims at identifying all subspaces of a (high dimensional) feature space that contain interesting clustering structures. The subspaces should be ranked according to this interestingness.

SUBCLU (density-based SUBspace CLUstering)SUBCLU is a subspace clustering algorithm that relies on the density-based clustering notion of DBSCAN. It efficiently computes all clusters DBSCAN would have found if applied to the set of all possible subspaces of the feature space. SUBCLU uses an Apriori-like subspace generation procedure which is based on the monotonicity property of density-connected sets and which ensures that only those subspaces are examined that contain clusters. | |

RIS (Ranking Interesting Subspaces)RIS is a subspace ranking algorithm that uses a quality criterion to rate the interestingness of subspaces. This criterion is based on the monotonicity of core points which are the central concept of the density-based clustering notion of DBSCAN. An Apriori-like subspace generation method (similar to SUBCLU) is used to compute all relevant subspaces and rank them by interestingness. The clusters can be computed in the generated subspaces using any clustering method of choice. | |

SURFING (SUbspaces Relevant For clusterING)SURFING is a subspace ranking algorithm that does not rely on a global density threshold. It computes the interstingness of a subspace based on the distribution of the k-nearest neighbors of all data points in the corresponding projection. An efficient, bottom-up subspace expansion heuristics ensures that less interesting subspaces are not generated for examination. | |

FIRES (FIlter/REfinement Subspace clustering)Description |

## Publications

2015 | |

36 | A. Zimek, J. VreekenThe Blind Men and the Elephant: On Meeting the Problem of Multiple Truths in Data from Clustering and Pattern Mining PerspectivesMachine Learning, 98(1–2): 121–155, 2015. |

2014 | |

35 | A. Zimek, I. Assent, J. VreekenFrequent Pattern Mining Algorithms for Data ClusteringIn C. C. Aggarwal, J. Han (ed.): Frequent Pattern Mining, Springer: 403–423, 2014. |

2013 | |

34 | K. Sim, V. Gopalkrishnan, A. Zimek, G. CongA survey on enhanced subspace clusteringData Mining and Knowledge Discovery, 26(2): 332–397, 2013. |

33 | A. ZimekClustering High-Dimensional DataIn C. C. Aggarwal, C. K. Reddy (ed.): Data Clustering: Algorithms and Applications, CRC Press: 201–230, 2013. |

32 | E. Achtert, H.-P. Kriegel, E. Schubert, A. ZimekInteractive Data Mining with 3D-Parallel-Coordinate-TreesIn Proceedings of the ACM International Conference on Management of Data (SIGMOD), New York City, NY: 1009–1012, 2013. |

2012 | |

31 | H.-P. Kriegel, P. Kröger, A. ZimekSubspace ClusteringWiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2(4): 351–364, 2012. |

30 | E. Ntoutsi, A. Zimek, T. Palpanas, P. Kröger, H.-P. KriegelDensity-based Projected Clustering over High Dimensional Data StreamsIn Proceedings of the 12th SIAM International Conference on Data Mining (SDM), Anaheim, CA: 987–998, 2012. |

2011 | |

29 | H.-P. Kriegel, P. Kröger, E. Ntoutsi, A. ZimekDensity Based Subspace Clustering over Dynamic DataIn Proceedings of the 23rd International Conference on Scientific and Statistical Database Management (SSDBM), Portland, OR: 387–404, 2011. |

28 | H.-P. Kriegel, E. Ntoutsi, M. Spiliopoulou, G. Tsoumakas, A. ZimekMining Complex Dynamic DataTutorial at the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD), Athens, Greece, 2011. |

2010 | |

27 | I. Färber, S. Günnemann, H.-P. Kriegel, P. Kröger, E. Müller, E. Schubert, T. Seidl, A. ZimekOn Using Class-Labels in Evaluation of ClusteringsIn MultiClust: 1st International Workshop on Discovering, Summarizing and Using Multiple Clusterings Held in Conjunction with KDD 2010, Washington, DC, 2010. |

26 | H.-P. Kriegel, A. ZimekSubspace Clustering, Ensemble Clustering, Alternative Clustering, Multiview Clustering: What Can We Learn From Each Other?In MultiClust: 1st International Workshop on Discovering, Summarizing and Using Multiple Clusterings Held in Conjunction with KDD 2010, Washington, DC, 2010. |

2009 | |

25 | H.-P. Kriegel, P. Kröger, A. ZimekClustering High Dimensional Data: A Survey on Subspace Clustering, Pattern-based Clustering, and Correlation ClusteringACM Transactions on Knowledge Discovery from Data (TKDD), 3(1): 1–58, 2009. |

24 | A. ZimekCorrelation ClusteringACM SIGKDD Explorations, 11(1): 53–54, 2009. |

23 | G. Moise, A. Zimek, P. Kröger, H.-P. Kriegel, J. SanderSubspace and Projected Clustering: Experimental Evaluation and AnalysisKnowledge and Information Systems (KAIS), 21(3): 299–326, 2009. |

22 | P. Kröger, A. ZimekSubspace Clustering TechniquesIn L. Liu, M. T. Özsu (ed.): Encyclopedia of Database Systems, Springer: 2873–2875, 2009. |

2008 | |

21 | H.-P. Kriegel, P. Kröger, A. ZimekDetecting clusters in moderate-to-high dimensional data: subspace clustering, pattern-based clustering, and correlation clusteringProceedings of the VLDB Endowment, 1(2): 1528–1529, 2008. |

20 | E. Achtert, C. Böhm, J. David, P. Kröger, A. ZimekGlobal Correlation Clustering Based on the Hough TransformStatistical Analysis and Data Mining, 1(3): 111–127, 2008. |

19 | H.-P. Kriegel, P. Kröger, E. Schubert, A. ZimekA General Framework for Increasing the Robustness of PCA-based Correlation Clustering AlgorithmsIn Proceedings of the 20th International Conference on Scientific and Statistical Database Management (SSDBM), Hong Kong, China: 418–435, 2008. |

18 | E. Achtert, H.-P. Kriegel, A. ZimekELKI: A Software System for Evaluation of Subspace Clustering AlgorithmsIn Proceedings of the 20th International Conference on Scientific and Statistical Database Management (SSDBM), Hong Kong, China: 580–585, 2008. |

17 | E. Achtert, C. Böhm, J. David, P. Kröger, A. ZimekRobust Clustering in Arbitrarily Oriented SubspacesIn Proceedings of the 8th SIAM International Conference on Data Mining (SDM), Atlanta, GA: 763–774, 2008. |

16 | H.-P. Kriegel, P. Kröger, A. ZimekDetecting Clusters in Moderate-to-High Dimensional Data: Subspace Clustering, Pattern-based Clustering, and Correlation ClusteringTutorial at the 14th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), Las Vegas, NV, 2008. |

15 | H.-P. Kriegel, P. Kröger, A. ZimekDetecting Clusters in Moderate-to-High Dimensional Data: Subspace Clustering, Pattern-based Clustering, and Correlation ClusteringTutorial at the 12th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), Osaka, Japan, 2008. |

14 | H.-P. Kriegel, P. Kröger, A. ZimekDetecting Clusters in Moderate-to-High Dimensional Data: Subspace Clustering, Pattern-based Clustering, and Correlation ClusteringTutorial at the 34nd International Conference on Very Large Data Bases (VLDB), Auckland, New Zealand, 2008. |

13 | A. ZimekCorrelation ClusteringPhD Thesis, Ludwig-Maximilians-Universität München, Munich, Germany, 2008. |

2007 | |

12 | E. Achtert, C. Böhm, H.-P. Kriegel, P. Kröger, I. Müller-Gorman, A. ZimekDetection and Visualization of Subspace Cluster HierarchiesIn Proceedings of the 12th International Conference on Database Systems for Advanced Applications (DASFAA), Bangkok, Thailand: 152–163, 2007. |

11 | E. Achtert, C. Böhm, H.-P. Kriegel, P. Kröger, A. ZimekOn Exploring Complex Relationships of Correlation ClustersIn Proceedings of the 19th International Conference on Scientific and Statistical Database Management (SSDBM), Banff, Canada: 7–16, 2007. |

10 | E. Achtert, C. Böhm, H.-P. Kriegel, P. Kröger, A. ZimekRobust, Complete, and Efficient Correlation ClusteringIn Proceedings of the 7th SIAM International Conference on Data Mining (SDM), Minneapolis, MN: 413–418, 2007. |

9 | H.-P. Kriegel, P. Kröger, A. Zimek Tutorial at the 7th International Conference on Data Mining (ICDM), Omaha, NE, 2007. |

2006 | |

8 | E. Achtert, C. Böhm, H.-P. Kriegel, P. Kröger, I. Müller-Gorman, A. ZimekFinding Hierarchies of Subspace ClustersIn Proceedings of the 10th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), Berlin, Germany: 446–453, 2006. |

7 | E. Achtert, C. Böhm, P. Kröger, A. ZimekMining Hierarchies of Correlation ClustersIn Proceedings of the 18th International Conference on Scientific and Statistical Database Management (SSDBM), Vienna, Austria: 119–128, 2006. |

2005 | |

6 | H.-P. Kriegel, P. Kröger, M. Renz, S. WurstA Generic Framework for Efficient Subspace Clustering of High-Dimensional DataIn Proceedings of the 5th IEEE International Conference on Data Mining (ICDM), Houston, TX: 250–257, 2005. |

2004 | |

5 | C. Böhm, K. Kailing, P. Kröger, H.-P. KriegelImmer größere und komplexere Datenmengen: Herausforderungen für Clustering-AlgorithmenDatenbank-Spektrum, 4(9): 11–17, 2004. |

4 | C. Böhm, K. Kailing, H.-P. Kriegel, P. KrögerDensity Connected Clustering with Local Subspace PreferencesIn Proceedings of the 4th IEEE International Conference on Data Mining (ICDM), Brighton, UK: 27–34, 2004. |

3 | C. Baumgartner, K. Kailing, H.-P. Kriegel, P. Kröger, C. PlantSubspace Selection for Clustering High-Dimensional DataIn Proceedings of the 4th IEEE International Conference on Data Mining (ICDM), Brighton, UK: 11–18, 2004. |

2 | K. Kailing, H.-P. Kriegel, P. KrögerDensity-Connected Subspace Clustering for High-Dimensional DataIn Proceedings of the 4th SIAM International Conference on Data Mining (SDM), Lake Buena Vista, FL: 246–257, 2004. |

1 | C. Böhm, K. Kailing, P. Kröger, A. ZimekComputing Clusters of Correlation Connected ObjectsIn Proceedings of the ACM International Conference on Management of Data (SIGMOD), Paris, France: 455–466, 2004. |

## Team

Scientific Head: | Prof. Dr. Hans-Peter Kriegel |

Project Leaders: | |

Current Members: |