Environment for
DeveLoping
KDD-Applications
Supported by Index-Structures

de.lmu.ifi.dbs.elki.algorithm.clustering.subspace
Class CLIQUE<V extends RealVector<V,?>>

java.lang.Object
  extended by de.lmu.ifi.dbs.elki.logging.AbstractLoggable
      extended by de.lmu.ifi.dbs.elki.utilities.optionhandling.AbstractParameterizable
          extended by de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm<V,Clustering<CLIQUESubspace<V>>>
              extended by de.lmu.ifi.dbs.elki.algorithm.clustering.subspace.CLIQUE<V>
Type Parameters:
V - the type of RealVector handled by this Algorithm
All Implemented Interfaces:
Algorithm<V,Clustering<CLIQUESubspace<V>>>, ClusteringAlgorithm<Clustering<CLIQUESubspace<V>>,V>, Parameterizable

public class CLIQUE<V extends RealVector<V,?>>
extends AbstractAlgorithm<V,Clustering<CLIQUESubspace<V>>>
implements ClusteringAlgorithm<Clustering<CLIQUESubspace<V>>,V>

Implementation of the CLIQUE algorithm, a grid-based algorithm to identify dense clusters in subspaces of maximum dimensionality.

The implementation consists of two steps:
1. Identification of subspaces that contain clusters
2. Identification of clusters

The third step of the original algorithm (Generation of minimal description for the clusters) is not (yet) implemented.

Reference:
R. Agrawal, J. Gehrke, D. Gunopulos, P. Raghavan:: Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications.
In Proc. ACM SIGMOD Int. Conf. on Management of Data, Seattle, WA, 1998.

Author:
Elke Achtert

Field Summary
private  boolean prune
          Holds the value of PRUNE_FLAG.
private  Flag PRUNE_FLAG
          Flag to indicate that that only subspaces with large coverage (i.e. the fraction of the database that is covered by the dense units) are selected, the rest will be pruned.
static OptionID PRUNE_ID
          OptionID for PRUNE_FLAG
private  Clustering<CLIQUESubspace<V>> result
          The result of the algorithm;
private  double tau
          Holds the value of TAU_PARAM.
static OptionID TAU_ID
          OptionID for TAU_PARAM
private  DoubleParameter TAU_PARAM
          Parameter to specify the density threshold for the selectivity of a unit, where the selectivity is the fraction of total feature vectors contained in this unit, must be a double greater than 0 and less than 1.
private  int xsi
          Holds the value of XSI_PARAM.
static OptionID XSI_ID
          OptionID for XSI_PARAM
private  IntParameter XSI_PARAM
          Parameter to specify the number of intervals (units) in each dimension, must be an integer greater than 0.
 
Fields inherited from class de.lmu.ifi.dbs.elki.utilities.optionhandling.AbstractParameterizable
optionHandler
 
Fields inherited from class de.lmu.ifi.dbs.elki.logging.AbstractLoggable
debug, logger
 
Constructor Summary
CLIQUE()
          Provides the CLIQUE algorithm, adding parameters XSI_PARAM, TAU_PARAM, and flag PRUNE_FLAG to the option handler additionally to parameters of super class.
 
Method Summary
private  double[][] computeDiffs(SortedSet<CLIQUESubspace<V>> denseSubspaces, int[] mi, int[] mp)
          The specified sorted set of dense subspaces is divided into the selected set I and the pruned set P.
private  int[][] computeMeans(SortedSet<CLIQUESubspace<V>> denseSubspaces)
          The specified sorted set of dense subspaces is divided into the selected set I and the pruned set P.
private  Map<CLIQUESubspace<V>,Set<Integer>> determineClusters(Database<V> database, SortedSet<CLIQUESubspace<V>> denseSubspaces)
          Determines the clusters in the specified dense subspaces.
private  SortedSet<CLIQUESubspace<V>> findDenseSubspaceCandidates(Database<V> database, Set<CLIQUESubspace<V>> denseSubspaces)
          Determines the k-dimensional dense subspace candidates.
private  SortedSet<CLIQUESubspace<V>> findDenseSubspaces(Database<V> database, SortedSet<CLIQUESubspace<V>> denseSubspaces)
          Determines the k>1 dimensional dense subspaces and performs a pruning if this option is chosen.
private  SortedSet<CLIQUESubspace<V>> findOneDimensionalDenseSubspaceCandidates(Database<V> database)
          Determines the one-dimensional dense subspace candidates by making a pass over the database.
private  SortedSet<CLIQUESubspace<V>> findOneDimensionalDenseSubspaces(Database<V> database)
          Determines the one dimensional dense subspaces and performs a pruning if this option is chosen.
 Description getDescription()
          Returns a description of the algorithm.
 Clustering<CLIQUESubspace<V>> getResult()
          Returns the result of the algorithm.
private  Collection<CLIQUEUnit<V>> initOneDimensionalUnits(Database<V> database)
          Initializes and returns the one dimensional units.
private  SortedSet<CLIQUESubspace<V>> pruneDenseSubspaces(SortedSet<CLIQUESubspace<V>> denseSubspaces)
          Performs a MDL-based pruning of the specified dense sunspaces as described in the CLIQUE algorithm.
protected  Clustering<CLIQUESubspace<V>> runInTime(Database<V> database)
          Performs the CLIQUE algorithm on the given database.
 List<String> setParameters(List<String> args)
          Calls the super method and sets additionally the value of the parameters XSI_PARAM, TAU_PARAM, and {flag @link #PRUNE_FLAG}.
private  void updateMinMax(V featureVector, double[] minima, double[] maxima)
          Updates the minima and maxima array according to the specified feature vector.
 
Methods inherited from class de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm
isTime, isVerbose, run, setTime, setVerbose
 
Methods inherited from class de.lmu.ifi.dbs.elki.utilities.optionhandling.AbstractParameterizable
addOption, addParameterizable, addParameterizable, checkGlobalParameterConstraints, collectOptions, getAttributeSettings, getParameters, rememberParametersExcept, removeOption, removeParameterizable, shortDescription
 
Methods inherited from class de.lmu.ifi.dbs.elki.logging.AbstractLoggable
debugFine, debugFiner, debugFinest, exception, progress, verbose, warning
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface de.lmu.ifi.dbs.elki.algorithm.clustering.ClusteringAlgorithm
run
 
Methods inherited from interface de.lmu.ifi.dbs.elki.algorithm.Algorithm
setTime, setVerbose
 
Methods inherited from interface de.lmu.ifi.dbs.elki.utilities.optionhandling.Parameterizable
checkGlobalParameterConstraints, collectOptions, getParameters, shortDescription
 

Field Detail

XSI_ID

public static final OptionID XSI_ID
OptionID for XSI_PARAM


XSI_PARAM

private final IntParameter XSI_PARAM
Parameter to specify the number of intervals (units) in each dimension, must be an integer greater than 0.

Key: -clique.xsi


xsi

private int xsi
Holds the value of XSI_PARAM.


TAU_ID

public static final OptionID TAU_ID
OptionID for TAU_PARAM


TAU_PARAM

private final DoubleParameter TAU_PARAM
Parameter to specify the density threshold for the selectivity of a unit, where the selectivity is the fraction of total feature vectors contained in this unit, must be a double greater than 0 and less than 1.

Key: -clique.tau


tau

private double tau
Holds the value of TAU_PARAM.


PRUNE_ID

public static final OptionID PRUNE_ID
OptionID for PRUNE_FLAG


PRUNE_FLAG

private final Flag PRUNE_FLAG
Flag to indicate that that only subspaces with large coverage (i.e. the fraction of the database that is covered by the dense units) are selected, the rest will be pruned.

Key: -clique.prune


prune

private boolean prune
Holds the value of PRUNE_FLAG.


result

private Clustering<CLIQUESubspace<V extends RealVector<V,?>>> result
The result of the algorithm;

Constructor Detail

CLIQUE

public CLIQUE()
Provides the CLIQUE algorithm, adding parameters XSI_PARAM, TAU_PARAM, and flag PRUNE_FLAG to the option handler additionally to parameters of super class.

Method Detail

getResult

public Clustering<CLIQUESubspace<V>> getResult()
Returns the result of the algorithm.

Specified by:
getResult in interface Algorithm<V extends RealVector<V,?>,Clustering<CLIQUESubspace<V extends RealVector<V,?>>>>
Specified by:
getResult in interface ClusteringAlgorithm<Clustering<CLIQUESubspace<V extends RealVector<V,?>>>,V extends RealVector<V,?>>
Returns:
the result of the algorithm

getDescription

public Description getDescription()
Returns a description of the algorithm.

Specified by:
getDescription in interface Algorithm<V extends RealVector<V,?>,Clustering<CLIQUESubspace<V extends RealVector<V,?>>>>
Returns:
a description of the algorithm

runInTime

protected Clustering<CLIQUESubspace<V>> runInTime(Database<V> database)
                                                                   throws IllegalStateException
Performs the CLIQUE algorithm on the given database.

Specified by:
runInTime in class AbstractAlgorithm<V extends RealVector<V,?>,Clustering<CLIQUESubspace<V extends RealVector<V,?>>>>
Parameters:
database - the database to run the algorithm on
Returns:
the Result computed by this algorithm
Throws:
IllegalStateException - if the algorithm has not been initialized properly (e.g. the setParameters(String[]) method has been failed to be called).

setParameters

public List<String> setParameters(List<String> args)
                           throws ParameterException
Calls the super method and sets additionally the value of the parameters XSI_PARAM, TAU_PARAM, and {flag @link #PRUNE_FLAG}.

Specified by:
setParameters in interface Parameterizable
Overrides:
setParameters in class AbstractAlgorithm<V extends RealVector<V,?>,Clustering<CLIQUESubspace<V extends RealVector<V,?>>>>
Parameters:
args - parameters to set the attributes accordingly to
Returns:
a list containing the unused parameters
Throws:
ParameterException - in case of wrong parameter-setting

determineClusters

private Map<CLIQUESubspace<V>,Set<Integer>> determineClusters(Database<V> database,
                                                              SortedSet<CLIQUESubspace<V>> denseSubspaces)
Determines the clusters in the specified dense subspaces.

Parameters:
database - the database to run the algorithm on
denseSubspaces - the dense subspaces
Returns:
the clusters in the specified dense subspaces and the corresponding cluster models

findOneDimensionalDenseSubspaces

private SortedSet<CLIQUESubspace<V>> findOneDimensionalDenseSubspaces(Database<V> database)
Determines the one dimensional dense subspaces and performs a pruning if this option is chosen.

Parameters:
database - the database to run the algorithm on
Returns:
the one dimensional dense subspaces

findDenseSubspaces

private SortedSet<CLIQUESubspace<V>> findDenseSubspaces(Database<V> database,
                                                        SortedSet<CLIQUESubspace<V>> denseSubspaces)
Determines the k>1 dimensional dense subspaces and performs a pruning if this option is chosen.

Parameters:
database - the database to run the algorithm on
denseSubspaces - the (k-1)-dimensional dense subspaces
Returns:
the k-dimensional dense subspaces

initOneDimensionalUnits

private Collection<CLIQUEUnit<V>> initOneDimensionalUnits(Database<V> database)
Initializes and returns the one dimensional units.

Parameters:
database - the database to run the algorithm on
Returns:
the created one dimensional units

updateMinMax

private void updateMinMax(V featureVector,
                          double[] minima,
                          double[] maxima)
Updates the minima and maxima array according to the specified feature vector.

Parameters:
featureVector - the feature vector
minima - the array of minima
maxima - the array of maxima

findOneDimensionalDenseSubspaceCandidates

private SortedSet<CLIQUESubspace<V>> findOneDimensionalDenseSubspaceCandidates(Database<V> database)
Determines the one-dimensional dense subspace candidates by making a pass over the database.

Parameters:
database - the database to run the algorithm on
Returns:
the one-dimensional dense subspace candidates

findDenseSubspaceCandidates

private SortedSet<CLIQUESubspace<V>> findDenseSubspaceCandidates(Database<V> database,
                                                                 Set<CLIQUESubspace<V>> denseSubspaces)
Determines the k-dimensional dense subspace candidates.

Parameters:
database - the database to run the algorithm on
denseSubspaces - the (k-1)-dimensional dense subspace
Returns:
the k-dimensional dense subspace candidates

pruneDenseSubspaces

private SortedSet<CLIQUESubspace<V>> pruneDenseSubspaces(SortedSet<CLIQUESubspace<V>> denseSubspaces)
Performs a MDL-based pruning of the specified dense sunspaces as described in the CLIQUE algorithm.

Parameters:
denseSubspaces - the subspaces to be pruned
Returns:
the subspaces which are not pruned

computeMeans

private int[][] computeMeans(SortedSet<CLIQUESubspace<V>> denseSubspaces)
The specified sorted set of dense subspaces is divided into the selected set I and the pruned set P. For each set the mean of the cover fractions is computed.

Parameters:
denseSubspaces - the dense subspaces
Returns:
the mean of the cover fractions, the first value is the mean of the selected set I, the second value is the mean of the pruned set P.

computeDiffs

private double[][] computeDiffs(SortedSet<CLIQUESubspace<V>> denseSubspaces,
                                int[] mi,
                                int[] mp)
The specified sorted set of dense subspaces is divided into the selected set I and the pruned set P. For each set the difference from the specified mean values is computed.

Parameters:
denseSubspaces - the dense subspaces
mi - the mean of the selected sets I
mp - the mean of the pruned sets P
Returns:
the difference from the specified mean values, the first value is the difference from the mean of the selected set I, the second value is the difference from the mean of the pruned set P.

Release 0.2.1 (2009-07-13_1605)