3rd MultiClust Workshop: Discovering, Summarizing and Using Multiple Clusterings

in conjunction with 2012 SIAM International Conference on Data Mining, April 26-28, 2012, Anaheim, California, USA

Objectives of the MultiClust Workshop

This cross-disciplinary research topic on multiple clustering solutions has received significant attention in recent years. However, since it is relatively young, important research challenges remain. Specifically, we observe an emerging interest in discovering multiple clustering solutions from very high dimensional and complex databases. Detecting alternatives while avoiding redundancy is a key challenge for multiple clustering solutions. Toward this goal, important research issues include how to define redundancy among clusterings, whether existing algorithms can be modified to accommodate this goal, how many solutions should be extracted, how to select among far too many possible solutions, how to evaluate and visualize results, brief, how to most effectively help the data analyst in finding what he or she is looking for. Recent work approaches this problem by looking for non-redundant, alternative, disparate or orthogonal clustering. Research in this area benefits from well-established related areas, such as ensemble clustering, constraint-based clustering, frequent pattern mining, theory on summarization of results, consensus mining and general techniques coping with complex and high dimensional databases.

The aim of this workshop is to establish a venue for the growing community interested in multiple clustering solutions. It should increase the visibility of the topic itself but also bridge it to closely related research areas such as ensemble clustering, co-clustering, clustering with constraints, and frequent pattern mining. As a platform for exchange of ideas, the workshop should attract both newcomers and experts in this area.

Description

Content

In today's applications, data is collected for multiple analysis tasks. For any data object, several features or measurements provide a variety of information in complex and high dimensional databases. In such data, one typically observes several valid groupings for each object, i.e. objects fit in different roles. For example, in customer segmentation, any customer might show multiple behaviors or properties that suggest that the customer is part of several distinct clusters based on the respective aspect considered. In domains such as sensor networks, each sensor node can be a member of multiple clusters according to different environmental events. In gene expression analysis, objects should be detected in multiple clusters due to the various functions of each gene. In general, multiple groupings are desired by many applications as they characterize different views of the data. In contrast to these application demands, traditional clustering techniques detect only a single grouping and miss the alternative clusterings.

Similarly, the topic of multiple clustering solutions fits into several roles: Both, multiple alternative solutions as well as a single consensus derived out of multiple clusters by ensemble techniques are important perspectives on this research field. Looking at the given information, one observes two perspectives of given views in multi-source clustering in contrast to the detection of novel views by feature selection and space transformation techniques. Further perspectives can be derived by looking at the underlying data: from traditional continuous valued vector spaces up to complex databases (e.g. graphs, sequences, or streams). In all of these areas multiple clustering solutions have opened novel research challenges. Ideas to solve these problems come from a variety of traditional mining paradigms. Frequent itemset mining, ensemble mining, constraint-based mining are only few of the related fields from machine learning and knowledge discovery.

Topics of Interest

The workshop covers several aspects of multiple clustering solutions and of related research fields. A non-exhaustive list of topics of interest is given below:

Format

The workshop shall comprise invited talks, technical talks for peer reviewed paper contributions, and a panel discussion: The talks shall provide deep technical insights in related fields (by invited talks) and emerging research work (by paper presentations). In the spirit of the previous workshops (held at KDD 2010 and ECML PKDD 2011), the panel opens for a discussion of state-of-the-art, open challenges, and visions for future research.

Target Audience

The target audience consists of researchers and practitioners working on clustering. Besides the researchers directly working on non-redundant clustering, alternative clustering, ensemble clustering, subspace clustering, and clustering with constraints, we will also actively encourage other researchers to submit and attend the workshop. Overall we cover three major groups of potential attendees:

Submission Guidelines

We invite submission of unpublished original research papers that are not under review elsewhere. All papers will be peer reviewed. Papers may be up to 8 pages long. We also invite vision papers and descriptions of work-in-progress or case studies on benchmark data as short paper submissions of up to 4 pages. If accepted, at least one of the authors must attend the workshop to present the work.

Contributions should be submitted in pdf format using the workshop’s EasyChair submission site at http://www.easychair.org/conferences/?conf=multiclust2012. The submitted papers must be written in English and formatted according to the SDM 2012 submission guidelines. We would like to encourage you to prepare your paper in LaTeX2e. Papers should be formatted using the SIAM SODA macro, which is available through the SIAM website. You can access it at http://www.siam.org/proceedings/macros.php. The filename is soda2e.all. Make sure you use the macros for SODA and Data Mining Proceedings; papers prepared using other proceedings macros will not be accepted.

If you are considering submitting to the workshop and have questions regarding the workshop scope or need further information, please do not hesitate to contact the PC chairs.

Important Dates

paper submissions due Jan 13, 2012EXTENDED: Jan 25, 2012
notification of acceptance Feb 7, 2012
cameraready due Feb 14, 2012

Proceedings

Proceedings are available here: MultiClust 2012 proceedings

Note that a special issue of the Machine Learning journal will cover the MultiClust topics and is open for submissions.

Information for Participants

The workshop is planned for Saturday, April 28, 2012, 8:30AM - 12:00PM (see SIAM SDM conference program for location information).

The participants will be able to register for the workshop (on Saturday only) if they do not plan to attend the entire SIAM Data Mining conference. Also, students can register at a relatively lower price. More information is available at: http://www.siam.org/meetings/sdm12/reginfo.php

For general information regarding the main conference, SIAM Data Mining, see http://www.siam.org/meetings/sdm12/general.php. For international speakers, the visa information is available at the bottom of this page.

Invited Speaker

Carlotta Domeniconi (George Mason University, USA):
"Subspace Clustering Ensembles"

Schedule

Saturday, April 28, 2012, 8:30AM - 12:00PM (see SIAM SDM conference program for location information).

The talks are 20+5 minutes for long papers, 10+5 minutes for short papers, we ask the authors to prepare talks that encourage discussion.

The program is as follows:

Organizers

Program Committee


History of the MultiClust Workshop