Environment for
DeveLoping
KDD-Applications
Supported by Index-Structures

de.lmu.ifi.dbs.elki.algorithm.clustering.subspace
Class DiSH<V extends NumberVector<V,?>>

java.lang.Object
  extended by de.lmu.ifi.dbs.elki.logging.AbstractLoggable
      extended by de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm<V,Clustering<SubspaceModel<V>>>
          extended by de.lmu.ifi.dbs.elki.algorithm.clustering.subspace.DiSH<V>
Type Parameters:
V - the type of NumberVector handled by this Algorithm
All Implemented Interfaces:
Algorithm<V,Clustering<SubspaceModel<V>>>, ClusteringAlgorithm<Clustering<SubspaceModel<V>>,V>, Parameterizable

@Title(value="DiSH: Detecting Subspace cluster Hierarchies")
@Description(value="Algorithm to find hierarchical correlation clusters in subspaces.")
@Reference(authors="E. Achtert, C. B\u00f6hm, H.-P. Kriegel, P. Kr\u00f6ger, I. M\u00fcller-Gorman, A. Zimek",
           title="Detection and Visualization of Subspace Cluster Hierarchies",
           booktitle="Proc. 12th International Conference on Database Systems for Advanced Applications (DASFAA), Bangkok, Thailand, 2007",
           url="http://dx.doi.org/10.1007/978-3-540-71703-4_15")
public class DiSH<V extends NumberVector<V,?>>
extends AbstractAlgorithm<V,Clustering<SubspaceModel<V>>>
implements ClusteringAlgorithm<Clustering<SubspaceModel<V>>,V>

Algorithm for detecting subspace hierarchies.

Reference:
E. Achtert, C. Böhm, H.-P. Kriegel, P. Kröger, I. Müller-Gorman, A. Zimek: Detection and Visualization of Subspace Cluster Hierarchies.
In Proc. 12th International Conference on Database Systems for Advanced Applications (DASFAA), Bangkok, Thailand, 2007.

Author:
Elke Achtert

Field Summary
private  double epsilon
          Holds the value of EPSILON_PARAM.
static OptionID EPSILON_ID
          OptionID for EPSILON_PARAM
private  DoubleParameter EPSILON_PARAM
          Parameter that specifies the maximum radius of the neighborhood to be considered in each dimension for determination of the preference vector, must be a double equal to or greater than 0.
static OptionID MU_ID
          OptionID for MU_PARAM
private  IntParameter MU_PARAM
          Parameter that specifies the a minimum number of points as a smoothing factor to avoid the single-link-effect, must be an integer greater than 0.
private  OPTICS<V,PreferenceVectorBasedCorrelationDistance> optics
          The optics algorithm to determine the cluster order.
 
Fields inherited from class de.lmu.ifi.dbs.elki.logging.AbstractLoggable
debug, logger
 
Constructor Summary
DiSH(Parameterization config)
          Constructor, adhering to Parameterizable
 
Method Summary
private  void buildHierarchy(Database<V> database, DiSHDistanceFunction<V,DiSHPreprocessor<V>> distanceFunction, List<Cluster<SubspaceModel<V>>> clusters, int dimensionality)
          Builds the cluster hierarchy
private  void checkClusters(Database<V> database, DiSHDistanceFunction<V,DiSHPreprocessor<V>> distanceFunction, Map<BitSet,List<Pair<BitSet,DatabaseObjectGroupCollection<List<Integer>>>>> clustersMap)
          Removes the clusters with size < minpts from the cluster map and adds them to their parents.
private  Clustering<SubspaceModel<V>> computeClusters(Database<V> database, ClusterOrderResult<PreferenceVectorBasedCorrelationDistance> clusterOrder)
          Computes the hierarchical clusters according to the cluster order.
private  Map<BitSet,List<Pair<BitSet,DatabaseObjectGroupCollection<List<Integer>>>>> extractClusters(Database<V> database, DiSHDistanceFunction<V,DiSHPreprocessor<V>> distanceFunction, ClusterOrderResult<PreferenceVectorBasedCorrelationDistance> clusterOrder)
          Extracts the clusters from the cluster order.
private  Pair<BitSet,DatabaseObjectGroupCollection<List<Integer>>> findParent(Database<V> database, DiSHDistanceFunction<V,DiSHPreprocessor<V>> distanceFunction, Pair<BitSet,DatabaseObjectGroupCollection<List<Integer>>> child, Map<BitSet,List<Pair<BitSet,DatabaseObjectGroupCollection<List<Integer>>>>> clustersMap)
          Returns the parent of the specified cluster
private  boolean isParent(Database<V> database, DiSHDistanceFunction<V,DiSHPreprocessor<V>> distanceFunction, Cluster<SubspaceModel<V>> parent, List<Cluster<SubspaceModel<V>>> children)
          Returns true, if the specified parent cluster is a parent of one child of the children clusters.
protected  Clustering<SubspaceModel<V>> runInTime(Database<V> database)
          Performs the DiSH algorithm on the given database.
private  List<Cluster<SubspaceModel<V>>> sortClusters(Map<BitSet,List<Pair<BitSet,DatabaseObjectGroupCollection<List<Integer>>>>> clustersMap, int dimensionality)
          Returns a sorted list of the clusters w.r.t. the subspace dimensionality in descending order.
 
Methods inherited from class de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm
isTime, isVerbose, run, setTime, setVerbose
 
Methods inherited from class de.lmu.ifi.dbs.elki.logging.AbstractLoggable
debugFine, debugFiner, debugFinest, exception, progress, verbose, warning
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface de.lmu.ifi.dbs.elki.algorithm.clustering.ClusteringAlgorithm
run
 
Methods inherited from interface de.lmu.ifi.dbs.elki.algorithm.Algorithm
setTime, setVerbose
 

Field Detail

EPSILON_ID

public static final OptionID EPSILON_ID
OptionID for EPSILON_PARAM


EPSILON_PARAM

private final DoubleParameter EPSILON_PARAM
Parameter that specifies the maximum radius of the neighborhood to be considered in each dimension for determination of the preference vector, must be a double equal to or greater than 0.

Default value: 0.001

Key: -dish.epsilon


epsilon

private double epsilon
Holds the value of EPSILON_PARAM.


MU_ID

public static final OptionID MU_ID
OptionID for MU_PARAM


MU_PARAM

private final IntParameter MU_PARAM
Parameter that specifies the a minimum number of points as a smoothing factor to avoid the single-link-effect, must be an integer greater than 0.

Default value: 1

Key: -dish.mu


optics

private OPTICS<V extends NumberVector<V,?>,PreferenceVectorBasedCorrelationDistance> optics
The optics algorithm to determine the cluster order.

Constructor Detail

DiSH

public DiSH(Parameterization config)
Constructor, adhering to Parameterizable

Parameters:
config - Parameterization
Method Detail

runInTime

protected Clustering<SubspaceModel<V>> runInTime(Database<V> database)
                                                                    throws IllegalStateException
Performs the DiSH algorithm on the given database.

Specified by:
runInTime in class AbstractAlgorithm<V extends NumberVector<V,?>,Clustering<SubspaceModel<V extends NumberVector<V,?>>>>
Parameters:
database - the database to run the algorithm on
Returns:
the Result computed by this algorithm
Throws:
IllegalStateException - if the algorithm has not been initialized properly (e.g. the setParameters(String[]) method has been failed to be called).

computeClusters

private Clustering<SubspaceModel<V>> computeClusters(Database<V> database,
                                                     ClusterOrderResult<PreferenceVectorBasedCorrelationDistance> clusterOrder)
Computes the hierarchical clusters according to the cluster order.

Parameters:
database - the database holding the objects
clusterOrder - the cluster order

extractClusters

private Map<BitSet,List<Pair<BitSet,DatabaseObjectGroupCollection<List<Integer>>>>> extractClusters(Database<V> database,
                                                                                                    DiSHDistanceFunction<V,DiSHPreprocessor<V>> distanceFunction,
                                                                                                    ClusterOrderResult<PreferenceVectorBasedCorrelationDistance> clusterOrder)
Extracts the clusters from the cluster order.

Parameters:
database - the database storing the objects
distanceFunction - the distance function
clusterOrder - the cluster order to extract the clusters from
Returns:
the extracted clusters

sortClusters

private List<Cluster<SubspaceModel<V>>> sortClusters(Map<BitSet,List<Pair<BitSet,DatabaseObjectGroupCollection<List<Integer>>>>> clustersMap,
                                                     int dimensionality)
Returns a sorted list of the clusters w.r.t. the subspace dimensionality in descending order.

Parameters:
clustersMap - the mapping of bits sets to clusters
dimensionality - the dimensionality of the data objects
Returns:
a sorted list of the clusters

checkClusters

private void checkClusters(Database<V> database,
                           DiSHDistanceFunction<V,DiSHPreprocessor<V>> distanceFunction,
                           Map<BitSet,List<Pair<BitSet,DatabaseObjectGroupCollection<List<Integer>>>>> clustersMap)
Removes the clusters with size < minpts from the cluster map and adds them to their parents.

Parameters:
database - the database storing the objects
distanceFunction - the distance function
clustersMap - the map containing the clusters

findParent

private Pair<BitSet,DatabaseObjectGroupCollection<List<Integer>>> findParent(Database<V> database,
                                                                             DiSHDistanceFunction<V,DiSHPreprocessor<V>> distanceFunction,
                                                                             Pair<BitSet,DatabaseObjectGroupCollection<List<Integer>>> child,
                                                                             Map<BitSet,List<Pair<BitSet,DatabaseObjectGroupCollection<List<Integer>>>>> clustersMap)
Returns the parent of the specified cluster

Parameters:
database - the database storing the objects
distanceFunction - the distance function
child - the child to search the parent for
clustersMap - the map containing the clusters
Returns:
the parent of the specified cluster

buildHierarchy

private void buildHierarchy(Database<V> database,
                            DiSHDistanceFunction<V,DiSHPreprocessor<V>> distanceFunction,
                            List<Cluster<SubspaceModel<V>>> clusters,
                            int dimensionality)
Builds the cluster hierarchy

Parameters:
distanceFunction - the distance function
clusters - the sorted list of clusters
dimensionality - the dimensionality of the data
database - the database containing the data objects

isParent

private boolean isParent(Database<V> database,
                         DiSHDistanceFunction<V,DiSHPreprocessor<V>> distanceFunction,
                         Cluster<SubspaceModel<V>> parent,
                         List<Cluster<SubspaceModel<V>>> children)
Returns true, if the specified parent cluster is a parent of one child of the children clusters.

Parameters:
database - the database containing the objects
distanceFunction - the distance function for distance computation between the clusters
parent - the parent to be tested
children - the list of children to be tested
Returns:
true, if the specified parent cluster is a parent of one child of the children clusters, false otherwise

Release 0.3 (2010-03-31_1612)