Environment for
DeveLoping
KDD-Applications
Supported by Index-Structures

de.lmu.ifi.dbs.elki.algorithm.clustering.subspace
Class CLIQUE<V extends RealVector<V,?>>

java.lang.Object
  extended by de.lmu.ifi.dbs.elki.logging.AbstractLoggable
      extended by de.lmu.ifi.dbs.elki.utilities.optionhandling.AbstractParameterizable
          extended by de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm<V>
              extended by de.lmu.ifi.dbs.elki.algorithm.clustering.subspace.CLIQUE<V>
Type Parameters:
V - the type of RealVector handled by this Algorithm
All Implemented Interfaces:
Algorithm<V>, Clustering<V>, Loggable, Parameterizable

public class CLIQUE<V extends RealVector<V,?>>
extends AbstractAlgorithm<V>
implements Clustering<V>

Implementation of the CLIQUE algorithm, a grid-based algorithm to identify dense clusters in subspaces of maximum dimensionality.

The implementation consists of two steps:br> 1. Identification of subspaces that contain clusters
2. Identification of clusters

The third step of the original algorithm (Generation of minimal description for the clusters) is not (yet) implemented.

Author:
Elke Achtert

Field Summary
private  boolean prune
          Flag indicating that subspaces with low coverage are pruned.
private  Flag PRUNE_FLAG
          Flag to indicate that that only subspaces with large coverage (i.e. the fraction of the database that is covered by the dense units) are selected, the rest will be pruned.
private  Clusters<V> result
          The result of the algorithm;
private  double tau
          Threshold for the selectivity of a unit.
private  DoubleParameter TAU_PARAM
          Parameter to specify the density threshold for the selectivity of a unit, where the selectivity is the fraction of total feature vectors contained in this unit, must be a double greater than 0 and less than 1.
private  int xsi
          Number of units in each dimension.
private  IntParameter XSI_PARAM
          Parameter to specify the number of intervals (units) in each dimension, must be an integer greater than 0.
 
Fields inherited from class de.lmu.ifi.dbs.elki.utilities.optionhandling.AbstractParameterizable
optionHandler
 
Fields inherited from class de.lmu.ifi.dbs.elki.logging.AbstractLoggable
debug
 
Constructor Summary
CLIQUE()
           
 
Method Summary
private  double[][] computeDiffs(SortedSet<CLIQUESubspace<V>> denseSubspaces, int[] mi, int[] mp)
          The specified sorted set of dense subspaces is divided into the selected set I and the pruned set P.
private  int[][] computeMeans(SortedSet<CLIQUESubspace<V>> denseSubspaces)
          The specified sorted set of dense subspaces is divided into the selected set I and the pruned set P.
private  Map<CLIQUEModel<V>,Set<Integer>> determineClusters(Database<V> database, SortedSet<CLIQUESubspace<V>> denseSubspaces)
          Determines the clusters in the specified dense subspaces.
private  SortedSet<CLIQUESubspace<V>> findDenseSubspaceCandidates(Database<V> database, Set<CLIQUESubspace<V>> denseSubspaces)
          Determines the k-dimensional dense subspace candidates.
private  SortedSet<CLIQUESubspace<V>> findDenseSubspaces(Database<V> database, SortedSet<CLIQUESubspace<V>> denseSubspaces)
          Determines the k>1 dimensional dense subspaces and performs a pruning if this option is chosen.
private  SortedSet<CLIQUESubspace<V>> findOneDimensionalDenseSubspaceCandidates(Database<V> database)
          Determines the one-dimensional dense subspace candidates by making a pass over the database.
private  SortedSet<CLIQUESubspace<V>> findOneDimensionalDenseSubspaces(Database<V> database)
          Determines the one dimensional dense subspaces and performs a pruning if this option is chosen.
 Description getDescription()
          Returns a description of the algorithm.
 ClusteringResult<V> getResult()
          Returns the result of the algorithm.
private  Collection<CLIQUEUnit<V>> initOneDimensionalUnits(Database<V> database)
          Initializes and returns the one dimensional units.
private  SortedSet<CLIQUESubspace<V>> pruneDenseSubspaces(SortedSet<CLIQUESubspace<V>> denseSubspaces)
          Performs a MDL-based pruning of the specified dense sunspaces as described in the CLIQUE algorithm.
protected  void runInTime(Database<V> database)
          Performs the CLIQUE algorithm on the given database.
 String[] setParameters(String[] args)
          Sets the parameters xsi, tau and prune additionally to the parameters set by the super-class' method.
private  void updateMinMax(V featureVector, double[] minima, double[] maxima)
          Updates the minima and maxima array according to the specified feature vector.
 
Methods inherited from class de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm
description, isTime, isVerbose, run, setTime, setVerbose
 
Methods inherited from class de.lmu.ifi.dbs.elki.utilities.optionhandling.AbstractParameterizable
addOption, checkGlobalParameterConstraints, deleteOption, description, description, getAttributeSettings, getParameters, getParameterValue, getPossibleOptions, inlineDescription, isSet, setParameters
 
Methods inherited from class de.lmu.ifi.dbs.elki.logging.AbstractLoggable
debugFine, debugFiner, debugFinest, exception, message, progress, progress, progress, verbose, verbose, warning
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface de.lmu.ifi.dbs.elki.algorithm.Algorithm
run, setTime, setVerbose
 
Methods inherited from interface de.lmu.ifi.dbs.elki.utilities.optionhandling.Parameterizable
checkGlobalParameterConstraints, description, getAttributeSettings, getParameters, getPossibleOptions, inlineDescription
 

Field Detail

XSI_PARAM

private final IntParameter XSI_PARAM
Parameter to specify the number of intervals (units) in each dimension, must be an integer greater than 0.

Key: -clique.xsi


TAU_PARAM

private final DoubleParameter TAU_PARAM
Parameter to specify the density threshold for the selectivity of a unit, where the selectivity is the fraction of total feature vectors contained in this unit, must be a double greater than 0 and less than 1.

Key: -clique.tau


PRUNE_FLAG

private final Flag PRUNE_FLAG
Flag to indicate that that only subspaces with large coverage (i.e. the fraction of the database that is covered by the dense units) are selected, the rest will be pruned.

Key: -clique.prune


xsi

private int xsi
Number of units in each dimension.


tau

private double tau
Threshold for the selectivity of a unit.


prune

private boolean prune
Flag indicating that subspaces with low coverage are pruned.


result

private Clusters<V extends RealVector<V,?>> result
The result of the algorithm;

Constructor Detail

CLIQUE

public CLIQUE()
Method Detail

getResult

public ClusteringResult<V> getResult()
Returns the result of the algorithm.

Specified by:
getResult in interface Algorithm<V extends RealVector<V,?>>
Specified by:
getResult in interface Clustering<V extends RealVector<V,?>>
Returns:
the result of the algorithm
See Also:
Algorithm.getResult()

getDescription

public Description getDescription()
Returns a description of the algorithm.

Specified by:
getDescription in interface Algorithm<V extends RealVector<V,?>>
Returns:
a description of the algorithm

runInTime

protected void runInTime(Database<V> database)
                  throws IllegalStateException
Performs the CLIQUE algorithm on the given database.

Specified by:
runInTime in class AbstractAlgorithm<V extends RealVector<V,?>>
Parameters:
database - the database to run the algorithm on
Throws:
IllegalStateException - if the algorithm has not been initialized properly (e.g. the setParameters(String[]) method has been failed to be called).

setParameters

public String[] setParameters(String[] args)
                       throws ParameterException
Sets the parameters xsi, tau and prune additionally to the parameters set by the super-class' method.

Specified by:
setParameters in interface Parameterizable
Overrides:
setParameters in class AbstractAlgorithm<V extends RealVector<V,?>>
Parameters:
args - parameters to set the attributes accordingly to
Returns:
String[] an array containing the unused parameters
Throws:
ParameterException - in case of wrong parameter-setting
See Also:
Parameterizable.setParameters(String[])

determineClusters

private Map<CLIQUEModel<V>,Set<Integer>> determineClusters(Database<V> database,
                                                           SortedSet<CLIQUESubspace<V>> denseSubspaces)
Determines the clusters in the specified dense subspaces.

Parameters:
database - the database to run the algorithm on
denseSubspaces - the dense subspaces
Returns:
the clusters in the specified dense subspaces and the corresponding cluster models

findOneDimensionalDenseSubspaces

private SortedSet<CLIQUESubspace<V>> findOneDimensionalDenseSubspaces(Database<V> database)
Determines the one dimensional dense subspaces and performs a pruning if this option is chosen.

Parameters:
database - the database to run the algorithm on
Returns:
the one dimensional dense subspaces

findDenseSubspaces

private SortedSet<CLIQUESubspace<V>> findDenseSubspaces(Database<V> database,
                                                        SortedSet<CLIQUESubspace<V>> denseSubspaces)
Determines the k>1 dimensional dense subspaces and performs a pruning if this option is chosen.

Parameters:
database - the database to run the algorithm on
denseSubspaces - the (k-1)-dimensional dense subspaces
Returns:
the k-dimensional dense subspaces

initOneDimensionalUnits

private Collection<CLIQUEUnit<V>> initOneDimensionalUnits(Database<V> database)
Initializes and returns the one dimensional units.

Parameters:
database - the database to run the algorithm on
Returns:
the created one dimensional units

updateMinMax

private void updateMinMax(V featureVector,
                          double[] minima,
                          double[] maxima)
Updates the minima and maxima array according to the specified feature vector.

Parameters:
featureVector - the feature vector
minima - the array of minima
maxima - the array of maxima

findOneDimensionalDenseSubspaceCandidates

private SortedSet<CLIQUESubspace<V>> findOneDimensionalDenseSubspaceCandidates(Database<V> database)
Determines the one-dimensional dense subspace candidates by making a pass over the database.

Parameters:
database - the database to run the algorithm on
Returns:
the one-dimensional dense subspace candidates

findDenseSubspaceCandidates

private SortedSet<CLIQUESubspace<V>> findDenseSubspaceCandidates(Database<V> database,
                                                                 Set<CLIQUESubspace<V>> denseSubspaces)
Determines the k-dimensional dense subspace candidates.

Parameters:
database - the database to run the algorithm on
denseSubspaces - the (k-1)-dimensional dense subspace
Returns:
the k-dimensional dense subspace candidates

pruneDenseSubspaces

private SortedSet<CLIQUESubspace<V>> pruneDenseSubspaces(SortedSet<CLIQUESubspace<V>> denseSubspaces)
Performs a MDL-based pruning of the specified dense sunspaces as described in the CLIQUE algorithm.

Parameters:
denseSubspaces - the subspaces to be pruned
Returns:
the subspaces which are not pruned

computeMeans

private int[][] computeMeans(SortedSet<CLIQUESubspace<V>> denseSubspaces)
The specified sorted set of dense subspaces is divided into the selected set I and the pruned set P. For each set the mean of the cover fractions is computed.

Parameters:
denseSubspaces - the dense subspaces
Returns:
the mean of the cover fractions, the first value is the mean of the selected set I, the second value is the mean of the pruned set P.

computeDiffs

private double[][] computeDiffs(SortedSet<CLIQUESubspace<V>> denseSubspaces,
                                int[] mi,
                                int[] mp)
The specified sorted set of dense subspaces is divided into the selected set I and the pruned set P. For each set the difference from the specified mean values is computed.

Parameters:
denseSubspaces - the dense subspaces
mi - the mean of the selected sets I
mp - the mean of the pruned sets P
Returns:
the difference from the specified mean values, the first value is the difference from the mean of the selected set I, the second value is the difference from the mean of the pruned set P.

Release 0.1 (2008-07-10_1838)