Environment for
DeveLoping
KDD-Applications
Supported by Index-Structures

de.lmu.ifi.dbs.elki.algorithm.clustering.subspace
Class SUBCLU<V extends NumberVector<V,?>,D extends Distance<D>>

java.lang.Object
  extended by de.lmu.ifi.dbs.elki.logging.AbstractLoggable
      extended by de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm<V,Clustering<SubspaceModel<V>>>
          extended by de.lmu.ifi.dbs.elki.algorithm.clustering.subspace.SUBCLU<V,D>
Type Parameters:
V - the type of FeatureVector handled by this Algorithm
D - the type of Distance used
All Implemented Interfaces:
Algorithm<V,Clustering<SubspaceModel<V>>>, ClusteringAlgorithm<Clustering<SubspaceModel<V>>,V>, Parameterizable

@Title(value="SUBCLU: Density connected Subspace Clustering")
@Description(value="Algorithm to detect arbitrarily shaped and positioned clusters in subspaces. SUBCLU delivers for each subspace the same clusters DBSCAN would have found, when applied to this subspace seperately.")
@Reference(authors="K. Kailing, H.-P. Kriegel, P. Kr\u00f6ger",
           title="Density connected Subspace Clustering for High Dimensional Data. ",
           booktitle="Proc. SIAM Int. Conf. on Data Mining (SDM\'04), Lake Buena Vista, FL, 2004")
public class SUBCLU<V extends NumberVector<V,?>,D extends Distance<D>>
extends AbstractAlgorithm<V,Clustering<SubspaceModel<V>>>
implements ClusteringAlgorithm<Clustering<SubspaceModel<V>>,V>

Implementation of the SUBCLU algorithm, an algorithm to detect arbitrarily shaped and positioned clusters in subspaces. SUBCLU delivers for each subspace the same clusters DBSCAN would have found, when applied to this subspace separately.

Reference:
K. Kailing, H.-P. Kriegel, P. Kroeger: Density connected Subspace Clustering for High Dimensional Data.
In Proc. SIAM Int. Conf. on Data Mining (SDM'04), Lake Buena Vista, FL, 2004.

Author:
Elke Achtert

Field Summary
static OptionID DISTANCE_FUNCTION_ID
          OptionID for DISTANCE_FUNCTION_PARAM
private  ObjectParameter<AbstractDimensionsSelectingDoubleDistanceFunction<V>> DISTANCE_FUNCTION_PARAM
          The distance function to determine the distance between database objects.
private  AbstractDimensionsSelectingDoubleDistanceFunction<V> distanceFunction
          Holds the instance of the distance function specified by DISTANCE_FUNCTION_PARAM.
private  DoubleDistance epsilon
          Holds the value of EPSILON_PARAM.
static OptionID EPSILON_ID
          OptionID for EPSILON_PARAM
private  DistanceParameter<DoubleDistance> EPSILON_PARAM
          Parameter to specify the maximum radius of the neighborhood to be considered, must be suitable to AbstractDimensionsSelectingDoubleDistanceFunction.
private  int minpts
          Holds the value of MINPTS_PARAM.
static OptionID MINPTS_ID
          OptionID for MINPTS_PARAM
private  IntParameter MINPTS_PARAM
          Parameter to specify the threshold for minimum number of points in the epsilon-neighborhood of a point, must be an integer greater than 0.
private  Clustering<SubspaceModel<V>> result
          Holds the result;
 
Fields inherited from class de.lmu.ifi.dbs.elki.logging.AbstractLoggable
debug, logger
 
Constructor Summary
SUBCLU(Parameterization config)
          Constructor, adhering to Parameterizable
 
Method Summary
private  Subspace<V> bestSubspace(List<Subspace<V>> subspaces, Subspace<V> candidate, TreeMap<Subspace<V>,List<Cluster<Model>>> clusterMap)
          Determines the d-dimensional subspace of the (d+1) -dimensional candidate with minimal number of objects in the cluster.
private  List<Subspace<V>> generateSubspaceCandidates(List<Subspace<V>> subspaces)
          Generates d+1-dimensional subspace candidates from the specified d-dimensional subspaces.
 Clustering<SubspaceModel<V>> getResult()
          Returns the result of the algorithm.
private  List<Subspace<V>> lowerSubspaces(Subspace<V> subspace)
          Returns the list of all (d-1)-dimensional subspaces of the specified d-dimensional subspace.
private  List<Cluster<Model>> runDBSCAN(Database<V> database, List<Integer> ids, Subspace<V> subspace)
          Runs the DBSCAN algorithm on the specified partition of the database in the given subspace.
protected  Clustering<SubspaceModel<V>> runInTime(Database<V> database)
          Performs the SUBCLU algorithm on the given database.
 
Methods inherited from class de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm
isTime, isVerbose, run, setTime, setVerbose
 
Methods inherited from class de.lmu.ifi.dbs.elki.logging.AbstractLoggable
debugFine, debugFiner, debugFinest, exception, progress, verbose, warning
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface de.lmu.ifi.dbs.elki.algorithm.clustering.ClusteringAlgorithm
run
 
Methods inherited from interface de.lmu.ifi.dbs.elki.algorithm.Algorithm
setTime, setVerbose
 

Field Detail

DISTANCE_FUNCTION_ID

public static final OptionID DISTANCE_FUNCTION_ID
OptionID for DISTANCE_FUNCTION_PARAM


DISTANCE_FUNCTION_PARAM

private final ObjectParameter<AbstractDimensionsSelectingDoubleDistanceFunction<V extends NumberVector<V,?>>> DISTANCE_FUNCTION_PARAM
The distance function to determine the distance between database objects.

Default value: DimensionsSelectingEuclideanDistanceFunction

Key: -subclu.distancefunction


distanceFunction

private AbstractDimensionsSelectingDoubleDistanceFunction<V extends NumberVector<V,?>> distanceFunction
Holds the instance of the distance function specified by DISTANCE_FUNCTION_PARAM.


EPSILON_ID

public static final OptionID EPSILON_ID
OptionID for EPSILON_PARAM


EPSILON_PARAM

private final DistanceParameter<DoubleDistance> EPSILON_PARAM
Parameter to specify the maximum radius of the neighborhood to be considered, must be suitable to AbstractDimensionsSelectingDoubleDistanceFunction.

Key: -subclu.epsilon


epsilon

private DoubleDistance epsilon
Holds the value of EPSILON_PARAM.


MINPTS_ID

public static final OptionID MINPTS_ID
OptionID for MINPTS_PARAM


MINPTS_PARAM

private final IntParameter MINPTS_PARAM
Parameter to specify the threshold for minimum number of points in the epsilon-neighborhood of a point, must be an integer greater than 0.

Key: -subclu.minpts


minpts

private int minpts
Holds the value of MINPTS_PARAM.


result

private Clustering<SubspaceModel<V extends NumberVector<V,?>>> result
Holds the result;

Constructor Detail

SUBCLU

public SUBCLU(Parameterization config)
Constructor, adhering to Parameterizable

Parameters:
config - Parameterization
Method Detail

runInTime

protected Clustering<SubspaceModel<V>> runInTime(Database<V> database)
                                                                    throws IllegalStateException
Performs the SUBCLU algorithm on the given database.

Specified by:
runInTime in class AbstractAlgorithm<V extends NumberVector<V,?>,Clustering<SubspaceModel<V extends NumberVector<V,?>>>>
Parameters:
database - the database to run the algorithm on
Returns:
the Result computed by this algorithm
Throws:
IllegalStateException - if the algorithm has not been initialized properly (e.g. the setParameters(String[]) method has been failed to be called).

getResult

public Clustering<SubspaceModel<V>> getResult()
Returns the result of the algorithm.

Returns:
the result of the algorithm

runDBSCAN

private List<Cluster<Model>> runDBSCAN(Database<V> database,
                                       List<Integer> ids,
                                       Subspace<V> subspace)
                                throws ParameterException,
                                       UnableToComplyException
Runs the DBSCAN algorithm on the specified partition of the database in the given subspace. If parameter ids is null DBSCAN will be applied to the whole database.

Parameters:
database - the database holding the objects to run DBSCAN on
ids - the IDs of the database defining the partition to run DBSCAN on - if this parameter is null DBSCAN will be applied to the whole database
subspace - the subspace to run DBSCAN on
Returns:
the clustering result of the DBSCAN run
Throws:
ParameterException - in case of wrong parameter-setting
UnableToComplyException - in case of problems during the creation of the database partition

generateSubspaceCandidates

private List<Subspace<V>> generateSubspaceCandidates(List<Subspace<V>> subspaces)
Generates d+1-dimensional subspace candidates from the specified d-dimensional subspaces.

Parameters:
subspaces - the d-dimensional subspaces
Returns:
the d+1-dimensional subspace candidates

lowerSubspaces

private List<Subspace<V>> lowerSubspaces(Subspace<V> subspace)
Returns the list of all (d-1)-dimensional subspaces of the specified d-dimensional subspace.

Parameters:
subspace - the d-dimensional subspace
Returns:
a list of all (d-1)-dimensional subspaces

bestSubspace

private Subspace<V> bestSubspace(List<Subspace<V>> subspaces,
                                 Subspace<V> candidate,
                                 TreeMap<Subspace<V>,List<Cluster<Model>>> clusterMap)
Determines the d-dimensional subspace of the (d+1) -dimensional candidate with minimal number of objects in the cluster.

Parameters:
subspaces - the list of d-dimensional subspaces containing clusters
candidate - the (d+1)-dimensional candidate subspace
clusterMap - the mapping of subspaces to clusters
Returns:
the d-dimensional subspace of the (d+1) -dimensional candidate with minimal number of objects in the cluster

Release 0.3 (2010-03-31_1612)