Environment for
DeveLoping
KDD-Applications
Supported by Index-Structures

de.lmu.ifi.dbs.elki.algorithm.clustering.subspace
Class DiSH<V extends RealVector<V,?>>

java.lang.Object
  extended by de.lmu.ifi.dbs.elki.logging.AbstractLoggable
      extended by de.lmu.ifi.dbs.elki.utilities.optionhandling.AbstractParameterizable
          extended by de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm<V>
              extended by de.lmu.ifi.dbs.elki.algorithm.clustering.subspace.DiSH<V>
Type Parameters:
V - the type of Realvector handled by this Algorithm
All Implemented Interfaces:
Algorithm<V>, Loggable, Parameterizable

public class DiSH<V extends RealVector<V,?>>
extends AbstractAlgorithm<V>

Algorithm for detecting subspace hierarchies.

Author:
Elke Achtert

Field Summary
private  double epsilon
          Holds the value of epsilon parameter.
private  DoubleParameter EPSILON_PARAM
          Parameter that specifies the maximum radius of the neighborhood to be considered in each dimension for determination of the preference vector, must be a double equal to or greater than 0.
private  IntParameter MU_PARAM
          Parameter that specifies the a minimum number of points as a smoothing factor to avoid the single-link-effect, must be an integer greater than 0.
private  OPTICS<V,PreferenceVectorBasedCorrelationDistance> optics
          The optics algorithm to determine the cluster order.
private  Result<V> result
          Holds the result;
 
Fields inherited from class de.lmu.ifi.dbs.elki.utilities.optionhandling.AbstractParameterizable
optionHandler
 
Fields inherited from class de.lmu.ifi.dbs.elki.logging.AbstractLoggable
debug
 
Constructor Summary
DiSH()
          Provides a new algorithm for detecting supspace hierarchies.
 
Method Summary
private  void buildHierarchy(Database<V> database, DiSHDistanceFunction<V,DiSHPreprocessor<V,?>> distanceFunction, List<HierarchicalAxesParallelCorrelationCluster> clusters, int dimensionality)
          Builds the cluster hierarchy
private  void checkClusters(Database<V> database, DiSHDistanceFunction<V,DiSHPreprocessor<V,?>> distanceFunction, Map<BitSet,List<HierarchicalAxesParallelCorrelationCluster>> clustersMap)
          Removes the clusters with size < minpts from the cluster map and adds them to their parents.
private  void computeClusters(Database<V> database, ClusterOrder<V,PreferenceVectorBasedCorrelationDistance> clusterOrder)
          Computes the hierarchical clusters according to the cluster order.
private  Map<BitSet,List<HierarchicalAxesParallelCorrelationCluster>> extractClusters(Database<V> database, DiSHDistanceFunction<V,DiSHPreprocessor<V,?>> distanceFunction, ClusterOrder<V,PreferenceVectorBasedCorrelationDistance> clusterOrder)
          Extracts the clusters from the cluster order.
private  HierarchicalAxesParallelCorrelationCluster findParent(Database<V> database, DiSHDistanceFunction<V,DiSHPreprocessor<V,?>> distanceFunction, HierarchicalAxesParallelCorrelationCluster child, Map<BitSet,List<HierarchicalAxesParallelCorrelationCluster>> clustersMap)
          Returns the parent of the specified cluster
 List<AttributeSettings> getAttributeSettings()
          Returns the settings of all options assigned to the option handler.
 Description getDescription()
          Returns a description of the algorithm.
 Result<V> getResult()
          Returns the result of the algorithm.
private  boolean isParent(Database<V> database, DiSHDistanceFunction<V,DiSHPreprocessor<V,?>> distanceFunction, HierarchicalAxesParallelCorrelationCluster parent, List<HierarchicalAxesParallelCorrelationCluster> children)
          Returns true, if the specified parent cluster is a parent of one child of the children clusters.
protected  void runInTime(Database<V> database)
          The run method encapsulated in measure of runtime.
 String[] setParameters(String[] args)
          Grabs all specified options from the option handler and sets the values for the flags AbstractAlgorithm.VERBOSE_FLAG and AbstractAlgorithm.TIME_FLAG.
private  List<HierarchicalAxesParallelCorrelationCluster> sortClusters(Map<BitSet,List<HierarchicalAxesParallelCorrelationCluster>> clustersMap, int dimensionality)
          Sets the levels and indices in the clusters and returns a sorted list of the clusters.
 
Methods inherited from class de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm
description, isTime, isVerbose, run, setTime, setVerbose
 
Methods inherited from class de.lmu.ifi.dbs.elki.utilities.optionhandling.AbstractParameterizable
addOption, checkGlobalParameterConstraints, deleteOption, description, description, getParameters, getParameterValue, getPossibleOptions, inlineDescription, isSet, setParameters
 
Methods inherited from class de.lmu.ifi.dbs.elki.logging.AbstractLoggable
debugFine, debugFiner, debugFinest, exception, message, progress, progress, progress, verbose, verbose, warning
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface de.lmu.ifi.dbs.elki.utilities.optionhandling.Parameterizable
checkGlobalParameterConstraints, getParameters, getPossibleOptions, inlineDescription
 

Field Detail

EPSILON_PARAM

private final DoubleParameter EPSILON_PARAM
Parameter that specifies the maximum radius of the neighborhood to be considered in each dimension for determination of the preference vector, must be a double equal to or greater than 0.

Default value: 0.001

Key: -dish.epsilon


MU_PARAM

private final IntParameter MU_PARAM
Parameter that specifies the a minimum number of points as a smoothing factor to avoid the single-link-effect, must be an integer greater than 0.

Default value: 1

Key: -dish.mu


optics

private OPTICS<V extends RealVector<V,?>,PreferenceVectorBasedCorrelationDistance> optics
The optics algorithm to determine the cluster order.


result

private Result<V extends RealVector<V,?>> result
Holds the result;


epsilon

private double epsilon
Holds the value of epsilon parameter.

Constructor Detail

DiSH

public DiSH()
Provides a new algorithm for detecting supspace hierarchies.

Method Detail

runInTime

protected void runInTime(Database<V> database)
                  throws IllegalStateException
The run method encapsulated in measure of runtime. An extending class needs not to take care of runtime itself.

Specified by:
runInTime in class AbstractAlgorithm<V extends RealVector<V,?>>
Parameters:
database - the database to run the algorithm on
Throws:
IllegalStateException - if the algorithm has not been initialized properly (e.g. the setParameters(String[]) method has been failed to be called).

getResult

public Result<V> getResult()
Returns the result of the algorithm.

Returns:
the result of the algorithm

getDescription

public Description getDescription()
Description copied from interface: Algorithm
Returns a description of the algorithm.

Returns:
a description of the algorithm
See Also:
Algorithm.getDescription()

setParameters

public String[] setParameters(String[] args)
                       throws ParameterException
Description copied from class: AbstractAlgorithm
Grabs all specified options from the option handler and sets the values for the flags AbstractAlgorithm.VERBOSE_FLAG and AbstractAlgorithm.TIME_FLAG. Any extending class should call this method first and return the returned array without further changes, but after setting further required parameters. An example for overwritting this method taking advantage from the previously (in superclasses) defined options would be:

 {
   String[] remainingParameters = super.setParameters(args);
   // set parameters for your class
   // for example like this:
   if(isSet(MY_PARAM_VALUE_PARAM))
   {
      myParamValue = getParameterValue(MY_PARAM_VALUE_PARAM);
   }
   .
   .
   .
   return remainingParameters;
   // or in case of attributes requesting parameters themselves
   // return parameterizableAttribbute.setParameters(remainingParameters);
 }
 

Specified by:
setParameters in interface Parameterizable
Overrides:
setParameters in class AbstractAlgorithm<V extends RealVector<V,?>>
Parameters:
args - parameters to set the attributes accordingly to
Returns:
String[] an array containing the unused parameters
Throws:
ParameterException - in case of wrong parameter-setting
See Also:
AbstractAlgorithm.setParameters(String[])

getAttributeSettings

public List<AttributeSettings> getAttributeSettings()
Description copied from class: AbstractParameterizable
Returns the settings of all options assigned to the option handler.

Specified by:
getAttributeSettings in interface Parameterizable
Overrides:
getAttributeSettings in class AbstractParameterizable
Returns:
the setting of the attributes of the parameterizable
See Also:
Parameterizable.getAttributeSettings()

computeClusters

private void computeClusters(Database<V> database,
                             ClusterOrder<V,PreferenceVectorBasedCorrelationDistance> clusterOrder)
Computes the hierarchical clusters according to the cluster order.

Parameters:
database - the database holding the objects
clusterOrder - the cluster order

extractClusters

private Map<BitSet,List<HierarchicalAxesParallelCorrelationCluster>> extractClusters(Database<V> database,
                                                                                     DiSHDistanceFunction<V,DiSHPreprocessor<V,?>> distanceFunction,
                                                                                     ClusterOrder<V,PreferenceVectorBasedCorrelationDistance> clusterOrder)
Extracts the clusters from the cluster order.

Parameters:
database - the database storing the objects
distanceFunction - the distance function
clusterOrder - the cluster order to extract the clusters from
Returns:
the extracted clusters

sortClusters

private List<HierarchicalAxesParallelCorrelationCluster> sortClusters(Map<BitSet,List<HierarchicalAxesParallelCorrelationCluster>> clustersMap,
                                                                      int dimensionality)
Sets the levels and indices in the clusters and returns a sorted list of the clusters.

Parameters:
clustersMap - the mapping of bits sets to clusters
dimensionality - the dimensionality of the data
Returns:
a sorted list of the clusters

checkClusters

private void checkClusters(Database<V> database,
                           DiSHDistanceFunction<V,DiSHPreprocessor<V,?>> distanceFunction,
                           Map<BitSet,List<HierarchicalAxesParallelCorrelationCluster>> clustersMap)
Removes the clusters with size < minpts from the cluster map and adds them to their parents.

Parameters:
database - the database storing the objects
distanceFunction - the distance function
clustersMap - the map containing the clusters

findParent

private HierarchicalAxesParallelCorrelationCluster findParent(Database<V> database,
                                                              DiSHDistanceFunction<V,DiSHPreprocessor<V,?>> distanceFunction,
                                                              HierarchicalAxesParallelCorrelationCluster child,
                                                              Map<BitSet,List<HierarchicalAxesParallelCorrelationCluster>> clustersMap)
Returns the parent of the specified cluster

Parameters:
database - the database storing the objects
distanceFunction - the distance function
child - the child to search teh parent for
clustersMap - the map containing the clusters
Returns:
the parent of the specified cluster

buildHierarchy

private void buildHierarchy(Database<V> database,
                            DiSHDistanceFunction<V,DiSHPreprocessor<V,?>> distanceFunction,
                            List<HierarchicalAxesParallelCorrelationCluster> clusters,
                            int dimensionality)
Builds the cluster hierarchy

Parameters:
distanceFunction - the distance function
clusters - the sorted list of clusters
dimensionality - the dimensionality of the data
database - the fatabase containing the data objects

isParent

private boolean isParent(Database<V> database,
                         DiSHDistanceFunction<V,DiSHPreprocessor<V,?>> distanceFunction,
                         HierarchicalAxesParallelCorrelationCluster parent,
                         List<HierarchicalAxesParallelCorrelationCluster> children)
Returns true, if the specified parent cluster is a parent of one child of the children clusters.

Parameters:
database - the database containing the objects
distanceFunction - the distance function for distance computation between the clusters
parent - the parent to be tested
children - the list of children to be tested
Returns:
true, if the specified parent cluster is a parent of one child of the children clusters, false otherwise

Release 0.1 (2008-07-10_1838)