3rd MultiClust Workshop: Discovering, Summarizing and Using Multiple Clusterings

Emmanuel Müller; Thomas Seidl; Suresh Venkatasubramanian; Arthur Zimek

3rd MultiClust Workshop: Discovering, Summarizing and Using Multiple Clusterings

in conjunction with 2012 SIAM International Conference on Data Mining, April 26-28, 2012, Anaheim, California, USA

Objectives of the MultiClust Workshop

This cross-disciplinary research topic on multiple clustering solutions has received significant attention in recent years. However, since it is relatively young, important research challenges remain. Specifically, we observe an emerging interest in discovering multiple clustering solutions from very high dimensional and complex databases. Detecting alternatives while avoiding redundancy is a key challenge for multiple clustering solutions. Toward this goal, important research issues include how to define redundancy among clusterings, whether existing algorithms can be modified to accommodate this goal, how many solutions should be extracted, how to select among far too many possible solutions, how to evaluate and visualize results, brief, how to most effectively help the data analyst in finding what he or she is looking for. Recent work approaches this problem by looking for non-redundant, alternative, disparate or orthogonal clustering. Research in this area benefits from well-established related areas, such as ensemble clustering, constraint-based clustering, frequent pattern mining, theory on summarization of results, consensus mining and general techniques coping with complex and high dimensional databases.

The aim of this workshop is to establish a venue for the growing community interested in multiple clustering solutions. It should increase the visibility of the topic itself but also bridge it to closely related research areas such as ensemble clustering, co-clustering, clustering with constraints, and frequent pattern mining. As a platform for exchange of ideas, the workshop should attract both newcomers and experts in this area.

Description

Content

In today's applications, data is collected for multiple analysis tasks. For any data object, several features or measurements provide a variety of information in complex and high dimensional databases. In such data, one typically observes several valid groupings for each object, i.e. objects fit in different roles. For example, in customer segmentation, any customer might show multiple behaviors or properties that suggest that the customer is part of several distinct clusters based on the respective aspect considered. In domains such as sensor networks, each sensor node can be a member of multiple clusters according to different environmental events. In gene expression analysis, objects should be detected in multiple clusters due to the various functions of each gene. In general, multiple groupings are desired by many applications as they characterize different views of the data. In contrast to these application demands, traditional clustering techniques detect only a single grouping and miss the alternative clusterings.

Similarly, the topic of multiple clustering solutions fits into several roles: Both, multiple alternative solutions as well as a single consensus derived out of multiple clusters by ensemble techniques are important perspectives on this research field. Looking at the given information, one observes two perspectives of given views in multi-source clustering in contrast to the detection of novel views by feature selection and space transformation techniques. Further perspectives can be derived by looking at the underlying data: from traditional continuous valued vector spaces up to complex databases (e.g. graphs, sequences, or streams). In all of these areas multiple clustering solutions have opened novel research challenges. Ideas to solve these problems come from a variety of traditional mining paradigms. Frequent itemset mining, ensemble mining, constraint-based mining are only few of the related fields from machine learning and knowledge discovery.

Topics of Interest

The workshop covers several aspects of multiple clustering solutions and of related research fields. A non-exhaustive list of topics of interest is given below:

Discovering multiple clustering solutions

Alternative clusters / disparate clusters / orthogonal clusters
Multi-view clustering / subspace clustering / co-clustering
Multi-source clustering / clustering in parallel universes / multi-represented clustering
Feature selection and space transformation techniques
Constraint-based mining for the detection of alternatives
Non-redundant view detection and non-redundant cluster detection
Model selection problem: how many clusterings / how many clusters
Iterative vs. simultaneous processing of multiple views
Scalability to large and high dimensional databases
Tackling complex databases (e.g. graphs, sequences, or streams)

Summarizing multiple clustering solutions

Ensemble techniques
Meta clustering
Consensus mining
Summarization and compression theory

Using and evaluating multiple clustering solutions

Classification based on multiple clusterings
Evaluation metrics / evaluation methodology for multiple clustering solutions
Visualization and exploration of multiple clusterings

Related research fields

Frequent itemset mining
Subgroup mining
Subspace learning
Multilabel classification
Relational data mining
Transfer mining

Applications of multiple clustering solutions

Bioinformatics: gene expression analysis / proteomics / ...
Sensor network analysis
Social network analysis
Health surveillance
Customer segmentation
... and many more ...

Format

The workshop shall comprise invited talks, technical talks for peer reviewed paper contributions, and a panel discussion: The talks shall provide deep technical insights in related fields (by invited talks) and emerging research work (by paper presentations). In the spirit of the previous workshops (held at KDD 2010 and ECML PKDD 2011), the panel opens for a discussion of state-of-the-art, open challenges, and visions for future research.

Target Audience

The target audience consists of researchers and practitioners working on clustering. Besides the researchers directly working on non-redundant clustering, alternative clustering, ensemble clustering, subspace clustering, and clustering with constraints, we will also actively encourage other researchers to submit and attend the workshop. Overall we cover three major groups of potential attendees:

Researchers already working on multiple clustering solutions
Experts in related fields interested in this topic
Newcomers attracted by the past tutorials on the topic (e.g. the tutorial on ''Discovering Multiple Clustering Solutions'' by Müller et al. at SDM 2011)

Submission Guidelines

We invite submission of unpublished original research papers that are not under review elsewhere. All papers will be peer reviewed. Papers may be up to 8 pages long. We also invite vision papers and descriptions of work-in-progress or case studies on benchmark data as short paper submissions of up to 4 pages. If accepted, at least one of the authors must attend the workshop to present the work.

Contributions should be submitted in pdf format using the workshop’s EasyChair submission site at http://www.easychair.org/conferences/?conf=multiclust2012. The submitted papers must be written in English and formatted according to the SDM 2012 submission guidelines. We would like to encourage you to prepare your paper in LaTeX2e. Papers should be formatted using the SIAM SODA macro, which is available through the SIAM website. You can access it at http://www.siam.org/proceedings/macros.php. The filename is soda2e.all. Make sure you use the macros for SODA and Data Mining Proceedings; papers prepared using other proceedings macros will not be accepted.

If you are considering submitting to the workshop and have questions regarding the workshop scope or need further information, please do not hesitate to contact the PC chairs.

Important Dates

paper submissions due	~~Jan 13, 2012~~EXTENDED: Jan 25, 2012
notification of acceptance	Feb 7, 2012
cameraready due	Feb 14, 2012

Proceedings

Proceedings are available here: MultiClust 2012 proceedings

Note that a special issue of the Machine Learning journal will cover the MultiClust topics and is open for submissions.

Information for Participants

The workshop is planned for Saturday, April 28, 2012, 8:30AM - 12:00PM (see SIAM SDM conference program for location information).

The participants will be able to register for the workshop (on Saturday only) if they do not plan to attend the entire SIAM Data Mining conference. Also, students can register at a relatively lower price. More information is available at: http://www.siam.org/meetings/sdm12/reginfo.php

For general information regarding the main conference, SIAM Data Mining, see http://www.siam.org/meetings/sdm12/general.php. For international speakers, the visa information is available at the bottom of this page.

Invited Speaker

Carlotta Domeniconi (George Mason University, USA):
"Subspace Clustering Ensembles"

Schedule

Saturday, April 28, 2012, 8:30AM - 12:00PM (see SIAM SDM conference program for location information).

The talks are 20+5 minutes for long papers, 10+5 minutes for short papers, we ask the authors to prepare talks that encourage discussion.

The program is as follows:

08:30 Invited Talk: Carlotta Domeniconi: "Subspace Clustering Ensembles"
slides (pdf)
09:00 Discussion
09:10 Paper 1: Shehroz Khan and Amir Ahmad: "Cluster Center Initialization for Categorical Data Using Multiple Attribute Clustering Approach"
slides (pdf)
09:35 Paper 2: Matthias Schubert and Hans-Peter Kriegel: "Co-RCA: Unsupervised Distance-Learning for Multi-View Clustering"
10:00 coffee break
10:30 Paper 3: Dimitrios Gunopulos and Vana Kalogeraki: "New Subspace Clustering Problems in the Smartphone Era"
10:45 Paper 4: Xuan-Hong Dang, Ira Assent and James Bailey: "Multiple Clustering Views via Constrained Projection"
slides (pdf)
11:10 Paper 5: Martin Hahmann, Markus Dumat, Dirk Habich and Wolfgang Lehner: "Explorative Multi-View Clustering Using Frequent-Groupings"
slides (pdf)
11:35 Discussion and Conclusion

Organizers

Emmanuel Müller
Karlsruhe Institute of Technology (KIT)
Am Fasanengarten 5, 76131 Karlsruhe, Germany
emmanuel.mueller [at] kit.edu
http://www.ipd.kit.edu/~muellere/

Thomas Seidl
RWTH Aachen University
Ahornstrasse 55, 52056 Aachen, Germany
seidl [at] cs.rwth-aachen.de
http://www.dme.rwth-aachen.de/team/seidl

Suresh Venkatasubramanian
School of Computing
University of Utah
Salt Lake City, UT 84112, USA
suresh [at] cs.utah.edu
http://www.cs.utah.edu/~suresh

Arthur Zimek
Department of Computing Science
University of Alberta
Athabasca Hall 423
Edmonton, AB T6G 2E8, Canada
zimek [at] dbs.ifi.lmu.de
http://webdocs.cs.ualberta.ca/~zimek

Program Committee

Ira Assent
James Bailey
Carlotta Domeniconi
Xiaoli Fern
Stephan Günnemann
Francesco Gullo
Shahriar Hossain
Michael Houle
Daniel Keim
Themis Palpanas
Jörg Sander
Andrea Tagarelli
Alexander Topchy
Jilles Vreeken

History of the MultiClust Workshop

1st MultiClust at KDD 2010
- Organizers: Xiaoli Z. Fern, Ian Davidson, Jennifer G. Dy
- webpage
- Summary of the Workshop in SIGKDD Explorations
2nd MultiClust at ECML PKDD 2011
- Organizers: Emmanuel Müller, Stephan Günnemann, Ira Assent, Thomas Seidl
- webpage
- proceedings