Environment for
DeveLoping
KDD-Applications
Supported by Index-Structures

de.lmu.ifi.dbs.elki.algorithm.clustering.subspace
Class DiSH<V extends RealVector<V,?>>

java.lang.Object
  extended by de.lmu.ifi.dbs.elki.logging.AbstractLoggable
      extended by de.lmu.ifi.dbs.elki.utilities.optionhandling.AbstractParameterizable
          extended by de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm<V,Clustering<AxesModel>>
              extended by de.lmu.ifi.dbs.elki.algorithm.clustering.subspace.DiSH<V>
Type Parameters:
V - the type of Realvector handled by this Algorithm
All Implemented Interfaces:
Algorithm<V,Clustering<AxesModel>>, ClusteringAlgorithm<Clustering<AxesModel>,V>, Parameterizable

public class DiSH<V extends RealVector<V,?>>
extends AbstractAlgorithm<V,Clustering<AxesModel>>
implements ClusteringAlgorithm<Clustering<AxesModel>,V>

Algorithm for detecting subspace hierarchies.

Reference:
E. Achtert, C. Böhm, H.-P. Kriegel, P. Kröger, I. Müller-Gorman, A. Zimek: Detection and Visualization of Subspace Cluster Hierarchies.
In Proc. 12th International Conference on Database Systems for Advanced Applications (DASFAA), Bangkok, Thailand, 2007.

Author:
Elke Achtert

Field Summary
private  double epsilon
          Holds the value of EPSILON_PARAM.
static OptionID EPSILON_ID
          OptionID for EPSILON_PARAM
private  DoubleParameter EPSILON_PARAM
          Parameter that specifies the maximum radius of the neighborhood to be considered in each dimension for determination of the preference vector, must be a double equal to or greater than 0.
static OptionID MU_ID
          OptionID for MU_PARAM
private  IntParameter MU_PARAM
          Parameter that specifies the a minimum number of points as a smoothing factor to avoid the single-link-effect, must be an integer greater than 0.
private  OPTICS<V,PreferenceVectorBasedCorrelationDistance> optics
          The optics algorithm to determine the cluster order.
private  Clustering<AxesModel> result
          Holds the result;
 
Fields inherited from class de.lmu.ifi.dbs.elki.utilities.optionhandling.AbstractParameterizable
optionHandler
 
Fields inherited from class de.lmu.ifi.dbs.elki.logging.AbstractLoggable
debug, logger
 
Constructor Summary
DiSH()
          Provides the DiSH algorithm, adding parameters EPSILON_PARAM and MU_PARAM to the option handler additionally to parameters of super class.
 
Method Summary
private  void buildHierarchy(Database<V> database, DiSHDistanceFunction<V,DiSHPreprocessor<V,?>> distanceFunction, List<Cluster<AxesModel>> clusters, int dimensionality)
          Builds the cluster hierarchy
private  void checkClusters(Database<V> database, DiSHDistanceFunction<V,DiSHPreprocessor<V,?>> distanceFunction, Map<BitSet,List<Pair<BitSet,DatabaseObjectGroupCollection<List<Integer>>>>> clustersMap)
          Removes the clusters with size < minpts from the cluster map and adds them to their parents.
private  void computeClusters(Database<V> database, ClusterOrderResult<PreferenceVectorBasedCorrelationDistance> clusterOrder)
          Computes the hierarchical clusters according to the cluster order.
private  Map<BitSet,List<Pair<BitSet,DatabaseObjectGroupCollection<List<Integer>>>>> extractClusters(Database<V> database, DiSHDistanceFunction<V,DiSHPreprocessor<V,?>> distanceFunction, ClusterOrderResult<PreferenceVectorBasedCorrelationDistance> clusterOrder)
          Extracts the clusters from the cluster order.
private  Pair<BitSet,DatabaseObjectGroupCollection<List<Integer>>> findParent(Database<V> database, DiSHDistanceFunction<V,DiSHPreprocessor<V,?>> distanceFunction, Pair<BitSet,DatabaseObjectGroupCollection<List<Integer>>> child, Map<BitSet,List<Pair<BitSet,DatabaseObjectGroupCollection<List<Integer>>>>> clustersMap)
          Returns the parent of the specified cluster
 Description getDescription()
          Returns a description of the algorithm.
 Clustering<AxesModel> getResult()
          Returns the result of the algorithm.
private  boolean isParent(Database<V> database, DiSHDistanceFunction<V,DiSHPreprocessor<V,?>> distanceFunction, Cluster<AxesModel> parent, List<Cluster<AxesModel>> children)
          Returns true, if the specified parent cluster is a parent of one child of the children clusters.
protected  Clustering<AxesModel> runInTime(Database<V> database)
          Performs the DiSH algorithm on the given database.
 List<String> setParameters(List<String> args)
          Calls the super method and sets additionally the value of the parameters EPSILON_PARAM and MU_PARAM.
private  List<Cluster<AxesModel>> sortClusters(Map<BitSet,List<Pair<BitSet,DatabaseObjectGroupCollection<List<Integer>>>>> clustersMap, int dimensionality)
          Sets the levels and indices in the clusters and returns a sorted list of the clusters.
 
Methods inherited from class de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm
isTime, isVerbose, run, setTime, setVerbose
 
Methods inherited from class de.lmu.ifi.dbs.elki.utilities.optionhandling.AbstractParameterizable
addOption, addParameterizable, addParameterizable, checkGlobalParameterConstraints, collectOptions, getAttributeSettings, getParameters, rememberParametersExcept, removeOption, removeParameterizable, shortDescription
 
Methods inherited from class de.lmu.ifi.dbs.elki.logging.AbstractLoggable
debugFine, debugFiner, debugFinest, exception, progress, verbose, warning
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface de.lmu.ifi.dbs.elki.algorithm.clustering.ClusteringAlgorithm
run
 
Methods inherited from interface de.lmu.ifi.dbs.elki.algorithm.Algorithm
setTime, setVerbose
 
Methods inherited from interface de.lmu.ifi.dbs.elki.utilities.optionhandling.Parameterizable
checkGlobalParameterConstraints, collectOptions, getParameters, shortDescription
 

Field Detail

EPSILON_ID

public static final OptionID EPSILON_ID
OptionID for EPSILON_PARAM


EPSILON_PARAM

private final DoubleParameter EPSILON_PARAM
Parameter that specifies the maximum radius of the neighborhood to be considered in each dimension for determination of the preference vector, must be a double equal to or greater than 0.

Default value: 0.001

Key: -dish.epsilon


epsilon

private double epsilon
Holds the value of EPSILON_PARAM.


MU_ID

public static final OptionID MU_ID
OptionID for MU_PARAM


MU_PARAM

private final IntParameter MU_PARAM
Parameter that specifies the a minimum number of points as a smoothing factor to avoid the single-link-effect, must be an integer greater than 0.

Default value: 1

Key: -dish.mu


optics

private OPTICS<V extends RealVector<V,?>,PreferenceVectorBasedCorrelationDistance> optics
The optics algorithm to determine the cluster order.


result

private Clustering<AxesModel> result
Holds the result;

Constructor Detail

DiSH

public DiSH()
Provides the DiSH algorithm, adding parameters EPSILON_PARAM and MU_PARAM to the option handler additionally to parameters of super class.

Method Detail

runInTime

protected Clustering<AxesModel> runInTime(Database<V> database)
                                   throws IllegalStateException
Performs the DiSH algorithm on the given database.

Specified by:
runInTime in class AbstractAlgorithm<V extends RealVector<V,?>,Clustering<AxesModel>>
Parameters:
database - the database to run the algorithm on
Returns:
the Result computed by this algorithm
Throws:
IllegalStateException - if the algorithm has not been initialized properly (e.g. the setParameters(String[]) method has been failed to be called).

getResult

public Clustering<AxesModel> getResult()
Returns the result of the algorithm.

Specified by:
getResult in interface Algorithm<V extends RealVector<V,?>,Clustering<AxesModel>>
Specified by:
getResult in interface ClusteringAlgorithm<Clustering<AxesModel>,V extends RealVector<V,?>>
Returns:
the result of the algorithm

getDescription

public Description getDescription()
Description copied from interface: Algorithm
Returns a description of the algorithm.

Specified by:
getDescription in interface Algorithm<V extends RealVector<V,?>,Clustering<AxesModel>>
Returns:
a description of the algorithm

setParameters

public List<String> setParameters(List<String> args)
                           throws ParameterException
Calls the super method and sets additionally the value of the parameters EPSILON_PARAM and MU_PARAM. Then the parameters for the algorithm optics are set.

Specified by:
setParameters in interface Parameterizable
Overrides:
setParameters in class AbstractAlgorithm<V extends RealVector<V,?>,Clustering<AxesModel>>
Parameters:
args - parameters to set the attributes accordingly to
Returns:
a list containing the unused parameters
Throws:
ParameterException - in case of wrong parameter-setting

computeClusters

private void computeClusters(Database<V> database,
                             ClusterOrderResult<PreferenceVectorBasedCorrelationDistance> clusterOrder)
Computes the hierarchical clusters according to the cluster order.

Parameters:
database - the database holding the objects
clusterOrder - the cluster order

extractClusters

private Map<BitSet,List<Pair<BitSet,DatabaseObjectGroupCollection<List<Integer>>>>> extractClusters(Database<V> database,
                                                                                                    DiSHDistanceFunction<V,DiSHPreprocessor<V,?>> distanceFunction,
                                                                                                    ClusterOrderResult<PreferenceVectorBasedCorrelationDistance> clusterOrder)
Extracts the clusters from the cluster order.

Parameters:
database - the database storing the objects
distanceFunction - the distance function
clusterOrder - the cluster order to extract the clusters from
Returns:
the extracted clusters

sortClusters

private List<Cluster<AxesModel>> sortClusters(Map<BitSet,List<Pair<BitSet,DatabaseObjectGroupCollection<List<Integer>>>>> clustersMap,
                                              int dimensionality)
Sets the levels and indices in the clusters and returns a sorted list of the clusters.

Parameters:
clustersMap - the mapping of bits sets to clusters
dimensionality - the dimensionality of the data
Returns:
a sorted list of the clusters

checkClusters

private void checkClusters(Database<V> database,
                           DiSHDistanceFunction<V,DiSHPreprocessor<V,?>> distanceFunction,
                           Map<BitSet,List<Pair<BitSet,DatabaseObjectGroupCollection<List<Integer>>>>> clustersMap)
Removes the clusters with size < minpts from the cluster map and adds them to their parents.

Parameters:
database - the database storing the objects
distanceFunction - the distance function
clustersMap - the map containing the clusters

findParent

private Pair<BitSet,DatabaseObjectGroupCollection<List<Integer>>> findParent(Database<V> database,
                                                                             DiSHDistanceFunction<V,DiSHPreprocessor<V,?>> distanceFunction,
                                                                             Pair<BitSet,DatabaseObjectGroupCollection<List<Integer>>> child,
                                                                             Map<BitSet,List<Pair<BitSet,DatabaseObjectGroupCollection<List<Integer>>>>> clustersMap)
Returns the parent of the specified cluster

Parameters:
database - the database storing the objects
distanceFunction - the distance function
child - the child to search teh parent for
clustersMap - the map containing the clusters
Returns:
the parent of the specified cluster

buildHierarchy

private void buildHierarchy(Database<V> database,
                            DiSHDistanceFunction<V,DiSHPreprocessor<V,?>> distanceFunction,
                            List<Cluster<AxesModel>> clusters,
                            int dimensionality)
Builds the cluster hierarchy

Parameters:
distanceFunction - the distance function
clusters - the sorted list of clusters
dimensionality - the dimensionality of the data
database - the database containing the data objects

isParent

private boolean isParent(Database<V> database,
                         DiSHDistanceFunction<V,DiSHPreprocessor<V,?>> distanceFunction,
                         Cluster<AxesModel> parent,
                         List<Cluster<AxesModel>> children)
Returns true, if the specified parent cluster is a parent of one child of the children clusters.

Parameters:
database - the database containing the objects
distanceFunction - the distance function for distance computation between the clusters
parent - the parent to be tested
children - the list of children to be tested
Returns:
true, if the specified parent cluster is a parent of one child of the children clusters, false otherwise

Release 0.2 (2009-07-06_1820)