Search:
Lehrstuhl  |  Institut  |  Fakultät  |  LMU
print


Objectives

Clustering real-world data sets is often hampered by the so-called curse of dimensionality: many real-world data sets consist of a very high dimensional feature space. In general, most of the common algorithms fail to generate meaningful results because of the inherent sparsity of the data space. Usually, clusters cannot be found in the original feature space because several features may be irrelevant for clustering due to correlation and/or redundancy. However, clusters are usually embedded in lower dimensional subspaces. In addition, different sets of features may be relevant for different sets of objects. Thus, objects can often be clustered differently in varying subspaces of the original feature space.


Techniques

Arbitrarily-oriented Correlation Clustering

Correlation clustering aims at partitioning the data objects into distinct sets of points that exhibit a dense, arbitrarily linear correlation, so called correlation sets. Points that do not belong to any correlation set are classified as noise.

4C (Computing Correlation Connected Clusters)
4C computes so-called correlation connected clusters, i.e.sets of neighboring points that exhibit an arbitrary linear correlation in a subset of the attributes. For that purpose, 4C examines the correlation in the neighborhood of each point and groups points with similar local correlation into clusters.
COPAC (COrrelation PArtition Clustering)
Description
HICO (HIrachical Correlation Clustering)
Description
ERiC (Exploring Relationships among Correlation clusters)
Description
Increasing the Robustness of PCA-based Algorithms
Description
CASH (Clustering in Arbitrary Subspaces based on the Hough transform)
Description

Axis-parallel Projected Clustering

Projected clustering aims at partitioning the data into a set of clusters and a noise set while allowing the clusters to exist in different subspaces. Thus, for each cluster, a different set of features is relevant, whith the remaining features being irrelevant. The points of each cluster are densely packed in the subspace spanned by the corresponding relevant features but are sparsely distributed in the subspace spanned by the other dimensions.

PreDeCon (subspace PREference weighted DEnsity CONnnected clustering)
PreDeCon extends the 'full-dimensional' clustering algorithm DBSCAN with the concept of subspace preference weights. For each point, the local neighborhood is examined to determine a subspace tendency, i.e. an axis parallel projection onto which the point is located in a dense area. If the point is noise, this subspace is the entire feature space. Neighboring points having a similar subspace preference weight are grouped into clusters using the density-connected clustering notion.
HiSC (HIerarchical Subspace Clustering)
Description
DiSH (DerIving Subspace Hierarchies)
Description

Axis-parallel Subspace Clustering

Subspace clustering aims at computing all clusters in all subspaces of the feature space. The information of objects clustered differently in varying subspaces is conserved. Objects may be assigned to several clusters (in different subspaces). Subspace ranking aims at identifying all subspaces of a (high dimensional) feature space that contain interesting clustering structures. The subspaces should be ranked according to this interestingness.

SUBCLU (density-based SUBspace CLUstering)
SUBCLU is a subspace clustering algorithm that relies on the density-based clustering notion of DBSCAN. It efficiently computes all clusters DBSCAN would have found if applied to the set of all possible subspaces of the feature space. SUBCLU uses an Apriori-like subspace generation procedure which is based on the monotonicity property of density-connected sets and which ensures that only those subspaces are examined that contain clusters.
RIS (Ranking Interesting Subspaces)
RIS is a subspace ranking algorithm that uses a quality criterion to rate the interestingness of subspaces. This criterion is based on the monotonicity of core points which are the central concept of the density-based clustering notion of DBSCAN. An Apriori-like subspace generation method (similar to SUBCLU) is used to compute all relevant subspaces and rank them by interestingness. The clusters can be computed in the generated subspaces using any clustering method of choice.
SURFING (SUbspaces Relevant For clusterING)
SURFING is a subspace ranking algorithm that does not rely on a global density threshold. It computes the interstingness of a subspace based on the distribution of the k-nearest neighbors of all data points in the corresponding projection. An efficient, bottom-up subspace expansion heuristics ensures that less interesting subspaces are not generated for examination.
FIRES (FIlter/REfinement Subspace clustering)
Description


Publications

2013
35K. Sim, V. Gopalkrishnan, A. Zimek, G. Cong
A survey on enhanced subspace clustering
Data Mining and Knowledge Discovery, 26(2): 332–397, 2013.
34A. Zimek, J. Vreeken
The Blind Men and the Elephant: On Meeting the Problem of Multiple Truths in Data from Clustering and Pattern Mining Perspectives
Machine Learning, 2013.
33A. Zimek
Clustering High-Dimensional Data
In C. C. Aggarwal, C. K. Reddy (ed.): Data Clustering: Algorithms and Applications, CRC Press: 201–230, 2013.
32E. Achtert, H.-P. Kriegel, E. Schubert, A. Zimek
Interactive Data Mining with 3D-Parallel-Coordinate-Trees
In Proceedings of the ACM International Conference on Management of Data (SIGMOD), New York City, NY: 1009–1012, 2013.
2012
31H.-P. Kriegel, P. Kröger, A. Zimek
Subspace Clustering
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2(4): 351–364, 2012.
30E. Ntoutsi, A. Zimek, T. Palpanas, P. Kröger, H.-P. Kriegel
Density-based Projected Clustering over High Dimensional Data Streams
In Proceedings of the 12th SIAM International Conference on Data Mining (SDM), Anaheim, CA: 987–998, 2012.
2011
29H.-P. Kriegel, P. Kröger, E. Ntoutsi, A. Zimek
Density Based Subspace Clustering over Dynamic Data
In Proceedings of the 23rd International Conference on Scientific and Statistical Database Management (SSDBM), Portland, OR: 387–404, 2011.
28H.-P. Kriegel, E. Ntoutsi, M. Spiliopoulou, G. Tsoumakas, A. Zimek
Mining Complex Dynamic Data
Tutorial at the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD), Athens, Greece, 2011.
2010
27I. Färber, S. Günnemann, H.-P. Kriegel, P. Kröger, E. Müller, E. Schubert, T. Seidl, A. Zimek
On Using Class-Labels in Evaluation of Clusterings
In MultiClust: 1st International Workshop on Discovering, Summarizing and Using Multiple Clusterings Held in Conjunction with KDD 2010, Washington, DC, 2010.
26H.-P. Kriegel, A. Zimek
Subspace Clustering, Ensemble Clustering, Alternative Clustering, Multiview Clustering: What Can We Learn From Each Other?
In MultiClust: 1st International Workshop on Discovering, Summarizing and Using Multiple Clusterings Held in Conjunction with KDD 2010, Washington, DC, 2010.
2009
25H.-P. Kriegel, P. Kröger, A. Zimek
Clustering High Dimensional Data: A Survey on Subspace Clustering, Pattern-based Clustering, and Correlation Clustering
ACM Transactions on Knowledge Discovery from Data (TKDD), 3(1): 1–58, 2009.
24A. Zimek
Correlation Clustering
ACM SIGKDD Explorations, 11(1): 53–54, 2009.
23G. Moise, A. Zimek, P. Kröger, H.-P. Kriegel, J. Sander
Subspace and Projected Clustering: Experimental Evaluation and Analysis
Knowledge and Information Systems (KAIS), 21(3): 299–326, 2009.
22P. Kröger, A. Zimek
Subspace Clustering Techniques
In L. Liu, M. T. Özsu (ed.): Encyclopedia of Database Systems, Springer: 2873–2875, 2009.
2008
21H.-P. Kriegel, P. Kröger, A. Zimek
Detecting clusters in moderate-to-high dimensional data: subspace clustering, pattern-based clustering, and correlation clustering
Proceedings of the VLDB Endowment, 1(2): 1528–1529, 2008.
20E. Achtert, C. Böhm, J. David, P. Kröger, A. Zimek
Global Correlation Clustering Based on the Hough Transform
Statistical Analysis and Data Mining, 1(3): 111–127, 2008.
19H.-P. Kriegel, P. Kröger, E. Schubert, A. Zimek
A General Framework for Increasing the Robustness of PCA-based Correlation Clustering Algorithms
In Proceedings of the 20th International Conference on Scientific and Statistical Database Management (SSDBM), Hong Kong, China: 418–435, 2008.
18E. Achtert, H.-P. Kriegel, A. Zimek
ELKI: A Software System for Evaluation of Subspace Clustering Algorithms
In Proceedings of the 20th International Conference on Scientific and Statistical Database Management (SSDBM), Hong Kong, China: 580–585, 2008.
17E. Achtert, C. Böhm, J. David, P. Kröger, A. Zimek
Robust Clustering in Arbitrarily Oriented Subspaces
In Proceedings of the 8th SIAM International Conference on Data Mining (SDM), Atlanta, GA: 763–774, 2008.
16H.-P. Kriegel, P. Kröger, A. Zimek
Detecting Clusters in Moderate-to-High Dimensional Data: Subspace Clustering, Pattern-based Clustering, and Correlation Clustering
Tutorial at the 14th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), Las Vegas, NV, 2008.
15H.-P. Kriegel, P. Kröger, A. Zimek
Detecting Clusters in Moderate-to-High Dimensional Data: Subspace Clustering, Pattern-based Clustering, and Correlation Clustering
Tutorial at the 34nd International Conference on Very Large Data Bases (VLDB), Auckland, New Zealand, 2008.
14H.-P. Kriegel, P. Kröger, A. Zimek
Detecting Clusters in Moderate-to-High Dimensional Data: Subspace Clustering, Pattern-based Clustering, and Correlation Clustering
Tutorial at the 12th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), Osaka, Japan, 2008.
13A. Zimek
Correlation Clustering
PhD Thesis, Ludwig-Maximilians-Universität München, Munich, Germany, 2008.
2007
12E. Achtert, C. Böhm, H.-P. Kriegel, P. Kröger, I. Müller-Gorman, A. Zimek
Detection and Visualization of Subspace Cluster Hierarchies
In Proceedings of the 12th International Conference on Database Systems for Advanced Applications (DASFAA), Bangkok, Thailand: 152–163, 2007.
11E. Achtert, C. Böhm, H.-P. Kriegel, P. Kröger, A. Zimek
On Exploring Complex Relationships of Correlation Clusters
In Proceedings of the 19th International Conference on Scientific and Statistical Database Management (SSDBM), Banff, Canada: 7–16, 2007.
10E. Achtert, C. Böhm, H.-P. Kriegel, P. Kröger, A. Zimek
Robust, Complete, and Efficient Correlation Clustering
In Proceedings of the 7th SIAM International Conference on Data Mining (SDM), Minneapolis, MN: 413–418, 2007.
9H.-P. Kriegel, P. Kröger, A. Zimek
Detecting Clusters in Moderate-to-High Dimensional Data: Subspace Clustering, Pattern-based Clustering, and Correlation Clustering
Tutorial at the 7th International Conference on Data Mining (ICDM), Omaha, NE, 2007.
2006
8E. Achtert, C. Böhm, H.-P. Kriegel, P. Kröger, I. Müller-Gorman, A. Zimek
Finding Hierarchies of Subspace Clusters
In Proceedings of the 10th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), Berlin, Germany: 446–453, 2006.
7E. Achtert, C. Böhm, P. Kröger, A. Zimek
Mining Hierarchies of Correlation Clusters
In Proceedings of the 18th International Conference on Scientific and Statistical Database Management (SSDBM), Vienna, Austria: 119–128, 2006.
2005
6 H.-P. Kriegel, P. Kröger, M. Renz, S. Wurst
A Generic Framework for Efficient Subspace Clustering of High-Dimensional Data
In Proceedings of the 5th IEEE International Conference on Data Mining (ICDM), Houston, TX: 250–257, 2005.
2004
5C. Böhm, K. Kailing, P. Kröger, H.-P. Kriegel
Immer größere und komplexere Datenmengen: Herausforderungen für Clustering-Algorithmen
Datenbank-Spektrum, 4(9): 11–17, 2004.
4C. Böhm, K. Kailing, H.-P. Kriegel, P. Kröger
Density Connected Clustering with Local Subspace Preferences
In Proceedings of the 4th IEEE International Conference on Data Mining (ICDM), Brighton, UK: 27–34, 2004.
3C. Baumgartner, K. Kailing, H.-P. Kriegel, P. Kröger, C. Plant
Subspace Selection for Clustering High-Dimensional Data
In Proceedings of the 4th IEEE International Conference on Data Mining (ICDM), Brighton, UK: 11–18, 2004.
2K. Kailing, H.-P. Kriegel, P. Kröger
Density-Connected Subspace Clustering for High-Dimensional Data
In Proceedings of the 4th SIAM International Conference on Data Mining (SDM), Lake Buena Vista, FL: 246–257, 2004.
1C. Böhm, K. Kailing, P. Kröger, A. Zimek
Computing Clusters of Correlation Connected Objects
In Proceedings of the ACM International Conference on Management of Data (SIGMOD), Paris, France: 455–466, 2004.

Team

Scientific Head: Prof. Dr. Hans-Peter Kriegel
Project Leaders:
Current Members:
blank
Datenschutz   Impressum