Environment for
DeveLoping
KDD-Applications
Supported by Index-Structures

de.lmu.ifi.dbs.elki.algorithm.clustering.subspace
Class CLIQUE<V extends NumberVector<V,?>>

java.lang.Object
  extended by de.lmu.ifi.dbs.elki.logging.AbstractLoggable
      extended by de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm<V,Clustering<SubspaceModel<V>>>
          extended by de.lmu.ifi.dbs.elki.algorithm.clustering.subspace.CLIQUE<V>
Type Parameters:
V - the type of NumberVector handled by this Algorithm
All Implemented Interfaces:
Algorithm<V,Clustering<SubspaceModel<V>>>, ClusteringAlgorithm<Clustering<SubspaceModel<V>>,V>, Parameterizable

@Title(value="CLIQUE: Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications")
@Description(value="Grid-based algorithm to identify dense clusters in subspaces of maximum dimensionality.")
@Reference(authors="R. Agrawal, J. Gehrke, D. Gunopulos, P. Raghavan",
           title="Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications",
           booktitle="Proc. SIGMOD Conference, Seattle, WA, 1998",
           url="http://dx.doi.org/10.1145/276304.276314")
public class CLIQUE<V extends NumberVector<V,?>>
extends AbstractAlgorithm<V,Clustering<SubspaceModel<V>>>
implements ClusteringAlgorithm<Clustering<SubspaceModel<V>>,V>

Implementation of the CLIQUE algorithm, a grid-based algorithm to identify dense clusters in subspaces of maximum dimensionality.

The implementation consists of two steps:
1. Identification of subspaces that contain clusters
2. Identification of clusters

The third step of the original algorithm (Generation of minimal description for the clusters) is not (yet) implemented.

Reference:
R. Agrawal, J. Gehrke, D. Gunopulos, P. Raghavan:: Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications.
In Proc. ACM SIGMOD Int. Conf. on Management of Data, Seattle, WA, 1998.

Author:
Elke Achtert

Field Summary
private  boolean prune
          Holds the value of PRUNE_FLAG.
private  Flag PRUNE_FLAG
          Flag to indicate that only subspaces with large coverage (i.e. the fraction of the database that is covered by the dense units) are selected, the rest will be pruned.
static OptionID PRUNE_ID
          OptionID for PRUNE_FLAG
private  double tau
          Holds the value of TAU_PARAM.
static OptionID TAU_ID
          OptionID for TAU_PARAM
private  DoubleParameter TAU_PARAM
          Parameter to specify the density threshold for the selectivity of a unit, where the selectivity is the fraction of total feature vectors contained in this unit, must be a double greater than 0 and less than 1.
private  int xsi
          Holds the value of XSI_PARAM.
static OptionID XSI_ID
          OptionID for XSI_PARAM
private  IntParameter XSI_PARAM
          Parameter to specify the number of intervals (units) in each dimension, must be an integer greater than 0.
 
Fields inherited from class de.lmu.ifi.dbs.elki.logging.AbstractLoggable
debug, logger
 
Constructor Summary
CLIQUE(Parameterization config)
          Constructor, adhering to Parameterizable
 
Method Summary
private  double[][] computeDiffs(List<CLIQUESubspace<V>> denseSubspaces, int[] mi, int[] mp)
          The specified sorted list of dense subspaces is divided into the selected set I and the pruned set P.
private  int[][] computeMeans(List<CLIQUESubspace<V>> denseSubspaces)
          The specified sorted list of dense subspaces is divided into the selected set I and the pruned set P.
private  List<Pair<Subspace<V>,Set<Integer>>> determineClusters(Database<V> database, List<CLIQUESubspace<V>> denseSubspaces)
          Determines the clusters in the specified dense subspaces.
private  List<CLIQUESubspace<V>> findDenseSubspaceCandidates(Database<V> database, List<CLIQUESubspace<V>> denseSubspaces)
          Determines the k-dimensional dense subspace candidates from the specified (k-1)-dimensional dense subspaces.
private  List<CLIQUESubspace<V>> findDenseSubspaces(Database<V> database, List<CLIQUESubspace<V>> denseSubspaces)
          Determines the k-dimensional dense subspaces and performs a pruning if this option is chosen.
private  List<CLIQUESubspace<V>> findOneDimensionalDenseSubspaceCandidates(Database<V> database)
          Determines the one-dimensional dense subspace candidates by making a pass over the database.
private  List<CLIQUESubspace<V>> findOneDimensionalDenseSubspaces(Database<V> database)
          Determines the one dimensional dense subspaces and performs a pruning if this option is chosen.
private  Collection<CLIQUEUnit<V>> initOneDimensionalUnits(Database<V> database)
          Initializes and returns the one dimensional units.
private  List<CLIQUESubspace<V>> pruneDenseSubspaces(List<CLIQUESubspace<V>> denseSubspaces)
          Performs a MDL-based pruning of the specified dense subspaces as described in the CLIQUE algorithm.
protected  Clustering<SubspaceModel<V>> runInTime(Database<V> database)
          Performs the CLIQUE algorithm on the given database.
private  void updateMinMax(V featureVector, double[] minima, double[] maxima)
          Updates the minima and maxima array according to the specified feature vector.
 
Methods inherited from class de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm
isTime, isVerbose, run, setTime, setVerbose
 
Methods inherited from class de.lmu.ifi.dbs.elki.logging.AbstractLoggable
debugFine, debugFiner, debugFinest, exception, progress, verbose, warning
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface de.lmu.ifi.dbs.elki.algorithm.clustering.ClusteringAlgorithm
run
 
Methods inherited from interface de.lmu.ifi.dbs.elki.algorithm.Algorithm
setTime, setVerbose
 

Field Detail

XSI_ID

public static final OptionID XSI_ID
OptionID for XSI_PARAM


XSI_PARAM

private final IntParameter XSI_PARAM
Parameter to specify the number of intervals (units) in each dimension, must be an integer greater than 0.

Key: -clique.xsi


xsi

private int xsi
Holds the value of XSI_PARAM.


TAU_ID

public static final OptionID TAU_ID
OptionID for TAU_PARAM


TAU_PARAM

private final DoubleParameter TAU_PARAM
Parameter to specify the density threshold for the selectivity of a unit, where the selectivity is the fraction of total feature vectors contained in this unit, must be a double greater than 0 and less than 1.

Key: -clique.tau


tau

private double tau
Holds the value of TAU_PARAM.


PRUNE_ID

public static final OptionID PRUNE_ID
OptionID for PRUNE_FLAG


PRUNE_FLAG

private final Flag PRUNE_FLAG
Flag to indicate that only subspaces with large coverage (i.e. the fraction of the database that is covered by the dense units) are selected, the rest will be pruned.

Key: -clique.prune


prune

private boolean prune
Holds the value of PRUNE_FLAG.

Constructor Detail

CLIQUE

public CLIQUE(Parameterization config)
Constructor, adhering to Parameterizable

Parameters:
config - Parameterization
Method Detail

runInTime

protected Clustering<SubspaceModel<V>> runInTime(Database<V> database)
                                                                    throws IllegalStateException
Performs the CLIQUE algorithm on the given database.

Specified by:
runInTime in class AbstractAlgorithm<V extends NumberVector<V,?>,Clustering<SubspaceModel<V extends NumberVector<V,?>>>>
Parameters:
database - the database to run the algorithm on
Returns:
the Result computed by this algorithm
Throws:
IllegalStateException - if the algorithm has not been initialized properly (e.g. the setParameters(String[]) method has been failed to be called).

determineClusters

private List<Pair<Subspace<V>,Set<Integer>>> determineClusters(Database<V> database,
                                                               List<CLIQUESubspace<V>> denseSubspaces)
Determines the clusters in the specified dense subspaces.

Parameters:
database - the database to run the algorithm on
denseSubspaces - the dense subspaces in reverse order by their coverage
Returns:
the clusters in the specified dense subspaces and the corresponding cluster models

findOneDimensionalDenseSubspaces

private List<CLIQUESubspace<V>> findOneDimensionalDenseSubspaces(Database<V> database)
Determines the one dimensional dense subspaces and performs a pruning if this option is chosen.

Parameters:
database - the database to run the algorithm on
Returns:
the one dimensional dense subspaces reverse ordered by their coverage

findDenseSubspaces

private List<CLIQUESubspace<V>> findDenseSubspaces(Database<V> database,
                                                   List<CLIQUESubspace<V>> denseSubspaces)
Determines the k-dimensional dense subspaces and performs a pruning if this option is chosen.

Parameters:
database - the database to run the algorithm on
denseSubspaces - the (k-1)-dimensional dense subspaces
Returns:
a list of the k-dimensional dense subspaces sorted in reverse order by their coverage

initOneDimensionalUnits

private Collection<CLIQUEUnit<V>> initOneDimensionalUnits(Database<V> database)
Initializes and returns the one dimensional units.

Parameters:
database - the database to run the algorithm on
Returns:
the created one dimensional units

updateMinMax

private void updateMinMax(V featureVector,
                          double[] minima,
                          double[] maxima)
Updates the minima and maxima array according to the specified feature vector.

Parameters:
featureVector - the feature vector
minima - the array of minima
maxima - the array of maxima

findOneDimensionalDenseSubspaceCandidates

private List<CLIQUESubspace<V>> findOneDimensionalDenseSubspaceCandidates(Database<V> database)
Determines the one-dimensional dense subspace candidates by making a pass over the database.

Parameters:
database - the database to run the algorithm on
Returns:
the one-dimensional dense subspace candidates reverse ordered by their coverage

findDenseSubspaceCandidates

private List<CLIQUESubspace<V>> findDenseSubspaceCandidates(Database<V> database,
                                                            List<CLIQUESubspace<V>> denseSubspaces)
Determines the k-dimensional dense subspace candidates from the specified (k-1)-dimensional dense subspaces.

Parameters:
database - the database to run the algorithm on
denseSubspaces - the (k-1)-dimensional dense subspaces
Returns:
a list of the k-dimensional dense subspace candidates reverse ordered by their coverage

pruneDenseSubspaces

private List<CLIQUESubspace<V>> pruneDenseSubspaces(List<CLIQUESubspace<V>> denseSubspaces)
Performs a MDL-based pruning of the specified dense subspaces as described in the CLIQUE algorithm.

Parameters:
denseSubspaces - the subspaces to be pruned sorted in reverse order by their coverage
Returns:
the subspaces which are not pruned reverse ordered by their coverage

computeMeans

private int[][] computeMeans(List<CLIQUESubspace<V>> denseSubspaces)
The specified sorted list of dense subspaces is divided into the selected set I and the pruned set P. For each set the mean of the cover fractions is computed.

Parameters:
denseSubspaces - the dense subspaces in reverse order by their coverage
Returns:
the mean of the cover fractions, the first value is the mean of the selected set I, the second value is the mean of the pruned set P.

computeDiffs

private double[][] computeDiffs(List<CLIQUESubspace<V>> denseSubspaces,
                                int[] mi,
                                int[] mp)
The specified sorted list of dense subspaces is divided into the selected set I and the pruned set P. For each set the difference from the specified mean values is computed.

Parameters:
denseSubspaces - denseSubspaces the dense subspaces in reverse order by their coverage
mi - the mean of the selected sets I
mp - the mean of the pruned sets P
Returns:
the difference from the specified mean values, the first value is the difference from the mean of the selected set I, the second value is the difference from the mean of the pruned set P.

Release 0.3 (2010-03-31_1612)