Environment for
DeveLoping
KDD-Applications
Supported by Index-Structures

de.lmu.ifi.dbs.elki.algorithm.clustering.correlation
Class CASH

java.lang.Object
  extended by de.lmu.ifi.dbs.elki.logging.AbstractLoggable
      extended by de.lmu.ifi.dbs.elki.utilities.optionhandling.AbstractParameterizable
          extended by de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm<ParameterizationFunction>
              extended by de.lmu.ifi.dbs.elki.algorithm.clustering.correlation.CASH
All Implemented Interfaces:
Algorithm<ParameterizationFunction>, Loggable, Parameterizable

public class CASH
extends AbstractAlgorithm<ParameterizationFunction>

Subspace clustering algorithm based on the hough transform. todo elke hierarchy (later)

Author:
Elke Achtert

Field Summary
private  boolean adjust
          Flag indicating that an adjustment of the applied heuristic for choosing an interval is performed after an interval is selected.
private  Flag ADJUST_FLAG
          Flag to indicate that an adjustment of the applied heuristic for choosing an interval is performed after an interval is selected.
private  Database<ParameterizationFunction> database
          The database holding the objects.
private  double jitter
          The maximum allowed jitter for distance values.
private  DoubleParameter JITTER_PARAM
          Parameter to specify the maximum jitter for distance values, must be a double greater than 0.
private  int maxLevel
          The maximum level for splitting the hypercube.
private  IntParameter MAXLEVEL_PARAM
          Parameter to specify the maximum level for splitting the hypercube, must be an integer greater than 0.
private  int minDim
          The minmum dimensionality for the subspaces to be found.
private  IntParameter MINDIM_PARAM
          Parameter to specify the minimum dimensionality of the subspaces to be found, must be an integer greater than 0.
private  int minPts
          Minimum points in a cluster.
private  IntParameter MINPTS_PARAM
          Parameter to specify the threshold for minimum number of points in a cluster, must be an integer greater than 0.
private  int noiseDim
          Holds the dimensionality for noise.
private  Set<Integer> processedIDs
          Holds a set of processed ids.
private  CASHResult result
          The result.
 
Fields inherited from class de.lmu.ifi.dbs.elki.utilities.optionhandling.AbstractParameterizable
optionHandler
 
Fields inherited from class de.lmu.ifi.dbs.elki.logging.AbstractLoggable
debug
 
Constructor Summary
CASH()
          Provides a new CASH algorithm.
 
Method Summary
private  Database<ParameterizationFunction> buildDB(int dim, Matrix basis, Set<Integer> ids, Database<ParameterizationFunction> database)
          Builds a dim-1 dimensional database where the objects are projected into the specified subspace.
private  Database<RealVector> buildDerivatorDB(Database<ParameterizationFunction> database, CASHInterval interval)
          Builds a database for the derivator consisting of the ids in the specified interval.
private  Matrix determineBasis(double[] alpha)
          Determines a basis defining a subspace described by the specified alpha values.
private  double[] determineMinMaxDistance(Database<ParameterizationFunction> database, int dimensionality)
          Determines the minimum and maximum function value of all parametrization functions stored in the specified database.
private  CASHInterval determineNextIntervalAtMaxLevel(DefaultHeap<Integer,CASHInterval> heap)
          Determines the next ''best'' interval at maximum level, i.e. the next interval containing the most unprocessed obejcts.
private  CASHInterval doDetermineNextIntervalAtMaxLevel(DefaultHeap<Integer,CASHInterval> heap)
          Recursive helper method to determine the next ''best'' interval at maximum level, i.e. the next interval containing the most unprocessed obejcts
private  SubspaceClusterMap doRun(Database<ParameterizationFunction> database, Progress progress)
          Runs the CASH algorithm on the specified database, this method is recursively called until only noise is left.
private  Set<Integer> getDatabaseIDs(Database<ParameterizationFunction> database)
          Returns the set of ids belonging to the specified database.
 Description getDescription()
          Returns a description of the algorithm.
 Result<ParameterizationFunction> getResult()
          Returns the result of the algorithm.
private  void initHeap(DefaultHeap<Integer,CASHInterval> heap, Database<ParameterizationFunction> database, int dim, Set<Integer> ids)
          Initializes the heap with the root intervals.
private  ParameterizationFunction project(Matrix basis, ParameterizationFunction f)
          Projects the specified parametrization function into the subspace described by the given basis.
private  Matrix runDerivator(Database<ParameterizationFunction> database, int dim, CASHInterval interval, Set<Integer> ids)
          Runs the derivator on the specified inerval and assigns all points having a distance less then the standard deviation of the derivator model to the model to this model.
protected  void runInTime(Database<ParameterizationFunction> database)
          The run method encapsulated in measure of runtime.
 String[] setParameters(String[] args)
          Grabs all specified options from the option handler and sets the values for the flags AbstractAlgorithm.VERBOSE_FLAG and AbstractAlgorithm.TIME_FLAG.
private  double sinusProduct(int start, int end, double[] alpha)
          Computes the product of all sinus values of the specified angles from start to end index.
 
Methods inherited from class de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm
description, isTime, isVerbose, run, setTime, setVerbose
 
Methods inherited from class de.lmu.ifi.dbs.elki.utilities.optionhandling.AbstractParameterizable
addOption, checkGlobalParameterConstraints, deleteOption, description, description, getAttributeSettings, getParameters, getParameterValue, getPossibleOptions, inlineDescription, isSet, setParameters
 
Methods inherited from class de.lmu.ifi.dbs.elki.logging.AbstractLoggable
debugFine, debugFiner, debugFinest, exception, message, progress, progress, progress, verbose, verbose, warning
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface de.lmu.ifi.dbs.elki.utilities.optionhandling.Parameterizable
checkGlobalParameterConstraints, getAttributeSettings, getParameters, getPossibleOptions, inlineDescription
 

Field Detail

MINPTS_PARAM

private final IntParameter MINPTS_PARAM
Parameter to specify the threshold for minimum number of points in a cluster, must be an integer greater than 0.

Key: -cash.minpts


MAXLEVEL_PARAM

private final IntParameter MAXLEVEL_PARAM
Parameter to specify the maximum level for splitting the hypercube, must be an integer greater than 0.

Key: -cash.maxlevel


MINDIM_PARAM

private final IntParameter MINDIM_PARAM
Parameter to specify the minimum dimensionality of the subspaces to be found, must be an integer greater than 0.

Default value: 1

Key: -cash.mindim


JITTER_PARAM

private final DoubleParameter JITTER_PARAM
Parameter to specify the maximum jitter for distance values, must be a double greater than 0.

Key: -cash.jitter


ADJUST_FLAG

private final Flag ADJUST_FLAG
Flag to indicate that an adjustment of the applied heuristic for choosing an interval is performed after an interval is selected.

Key: -cash.adjust


result

private CASHResult result
The result.


minPts

private int minPts
Minimum points in a cluster.


adjust

private boolean adjust
Flag indicating that an adjustment of the applied heuristic for choosing an interval is performed after an interval is selected.


maxLevel

private int maxLevel
The maximum level for splitting the hypercube.


minDim

private int minDim
The minmum dimensionality for the subspaces to be found.


jitter

private double jitter
The maximum allowed jitter for distance values.


noiseDim

private int noiseDim
Holds the dimensionality for noise.


processedIDs

private Set<Integer> processedIDs
Holds a set of processed ids.


database

private Database<ParameterizationFunction> database
The database holding the objects.

Constructor Detail

CASH

public CASH()
Provides a new CASH algorithm.

Method Detail

runInTime

protected void runInTime(Database<ParameterizationFunction> database)
                  throws IllegalStateException
The run method encapsulated in measure of runtime. An extending class needs not to take care of runtime itself.

Specified by:
runInTime in class AbstractAlgorithm<ParameterizationFunction>
Parameters:
database - the database to run the algorithm on
Throws:
IllegalStateException - if the algorithm has not been initialized properly (e.g. the setParameters(String[]) method has been failed to be called).

getResult

public Result<ParameterizationFunction> getResult()
Returns the result of the algorithm.

Returns:
the result of the algorithm

getDescription

public Description getDescription()
Returns a description of the algorithm.

Returns:
a description of the algorithm

setParameters

public String[] setParameters(String[] args)
                       throws ParameterException
Description copied from class: AbstractAlgorithm
Grabs all specified options from the option handler and sets the values for the flags AbstractAlgorithm.VERBOSE_FLAG and AbstractAlgorithm.TIME_FLAG. Any extending class should call this method first and return the returned array without further changes, but after setting further required parameters. An example for overwritting this method taking advantage from the previously (in superclasses) defined options would be:

 {
   String[] remainingParameters = super.setParameters(args);
   // set parameters for your class
   // for example like this:
   if(isSet(MY_PARAM_VALUE_PARAM))
   {
      myParamValue = getParameterValue(MY_PARAM_VALUE_PARAM);
   }
   .
   .
   .
   return remainingParameters;
   // or in case of attributes requesting parameters themselves
   // return parameterizableAttribbute.setParameters(remainingParameters);
 }
 

Specified by:
setParameters in interface Parameterizable
Overrides:
setParameters in class AbstractAlgorithm<ParameterizationFunction>
Parameters:
args - parameters to set the attributes accordingly to
Returns:
String[] an array containing the unused parameters
Throws:
ParameterException - in case of wrong parameter-setting
See Also:
Parameterizable.setParameters(String[])

doRun

private SubspaceClusterMap doRun(Database<ParameterizationFunction> database,
                                 Progress progress)
                          throws UnableToComplyException,
                                 ParameterException,
                                 NonNumericFeaturesException
Runs the CASH algorithm on the specified database, this method is recursively called until only noise is left.

Parameters:
database - the current database to run the CASH algorithm on
progress - the progress object for verbose messages
Returns:
a mapping of subspace dimensionalites to clusters
Throws:
UnableToComplyException - if an error according to the database occurs
ParameterException - if the parameter setting is wrong
NonNumericFeaturesException - if non numeric feature vectors are used

initHeap

private void initHeap(DefaultHeap<Integer,CASHInterval> heap,
                      Database<ParameterizationFunction> database,
                      int dim,
                      Set<Integer> ids)
Initializes the heap with the root intervals.

Parameters:
heap - the heap to be initialized
database - the database storing the paramterization functions
dim - the dimensionality of the database
ids - the ids of the database

buildDB

private Database<ParameterizationFunction> buildDB(int dim,
                                                   Matrix basis,
                                                   Set<Integer> ids,
                                                   Database<ParameterizationFunction> database)
                                            throws UnableToComplyException
Builds a dim-1 dimensional database where the objects are projected into the specified subspace.

Parameters:
dim - the dimensionality of the database
basis - the basis defining the subspace
ids - the ids for the new database
database - the database storing the paramterization functions
Returns:
a dim-1 dimensional database where the objects are projected into the specified subspace
Throws:
UnableToComplyException - if an error according to the database occurs

project

private ParameterizationFunction project(Matrix basis,
                                         ParameterizationFunction f)
Projects the specified parametrization function into the subspace described by the given basis.

Parameters:
basis - the basis defining he subspace
f - the parametrization function to be projected
Returns:
the projected parametrization function

determineBasis

private Matrix determineBasis(double[] alpha)
Determines a basis defining a subspace described by the specified alpha values.

Parameters:
alpha - the alpha values
Returns:
a basis defining a subspace described by the specified alpha values

sinusProduct

private double sinusProduct(int start,
                            int end,
                            double[] alpha)
Computes the product of all sinus values of the specified angles from start to end index.

Parameters:
start - the index to start
end - the index to end
alpha - the array of angles
Returns:
the product of all sinus values of the specified angles from start to end index

determineNextIntervalAtMaxLevel

private CASHInterval determineNextIntervalAtMaxLevel(DefaultHeap<Integer,CASHInterval> heap)
Determines the next ''best'' interval at maximum level, i.e. the next interval containing the most unprocessed obejcts.

Parameters:
heap - the heap storing the intervals
Returns:
the next ''best'' interval at maximum level

doDetermineNextIntervalAtMaxLevel

private CASHInterval doDetermineNextIntervalAtMaxLevel(DefaultHeap<Integer,CASHInterval> heap)
Recursive helper method to determine the next ''best'' interval at maximum level, i.e. the next interval containing the most unprocessed obejcts

Parameters:
heap - the heap storing the intervals
Returns:
the next ''best'' interval at maximum level

getDatabaseIDs

private Set<Integer> getDatabaseIDs(Database<ParameterizationFunction> database)
Returns the set of ids belonging to the specified database.

Parameters:
database - the database containing the parametrization functions.
Returns:
the set of ids belonging to the specified database

determineMinMaxDistance

private double[] determineMinMaxDistance(Database<ParameterizationFunction> database,
                                         int dimensionality)
Determines the minimum and maximum function value of all parametrization functions stored in the specified database.

Parameters:
database - the database containing the parametrization functions.
dimensionality - the dimensionality of the database
Returns:
an array containing the minimum and maximum function value of all parametrization functions stored in the specified database

runDerivator

private Matrix runDerivator(Database<ParameterizationFunction> database,
                            int dim,
                            CASHInterval interval,
                            Set<Integer> ids)
                     throws UnableToComplyException,
                            ParameterException
Runs the derivator on the specified inerval and assigns all points having a distance less then the standard deviation of the derivator model to the model to this model.

Parameters:
database - the database containing the parametrization functions
interval - the interval to build the model
dim - the dimensinality of the database
ids - an empty set to assign the ids
Returns:
a basis of the found subspace
Throws:
UnableToComplyException - if an error according to the database occurs
ParameterException - if the parameter setting is wrong

buildDerivatorDB

private Database<RealVector> buildDerivatorDB(Database<ParameterizationFunction> database,
                                              CASHInterval interval)
                                       throws UnableToComplyException
Builds a database for the derivator consisting of the ids in the specified interval.

Parameters:
database - the database storing the paramterization functions
interval - the interval to build the database from
Returns:
a database for the derivator consisting of the ids in the specified interval
Throws:
UnableToComplyException - if an error according to the database occurs

Release 0.1 (2008-07-10_1838)