Environment for
DeveLoping
KDD-Applications
Supported by Index-Structures

de.lmu.ifi.dbs.elki.algorithm.clustering.subspace
Class PROCLUS<V extends RealVector<V,?>>

java.lang.Object
  extended by de.lmu.ifi.dbs.elki.logging.AbstractLoggable
      extended by de.lmu.ifi.dbs.elki.utilities.optionhandling.AbstractParameterizable
          extended by de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm<V,Clustering<Model>>
              extended by de.lmu.ifi.dbs.elki.algorithm.clustering.subspace.ProjectedClustering<V>
                  extended by de.lmu.ifi.dbs.elki.algorithm.clustering.subspace.PROCLUS<V>
Type Parameters:
V - the type of Realvector handled by this Algorithm
All Implemented Interfaces:
Algorithm<V,Clustering<Model>>, ClusteringAlgorithm<Clustering<Model>,V>, Parameterizable

public class PROCLUS<V extends RealVector<V,?>>
extends ProjectedClustering<V>

Provides the PROCLUS algorithm, an algorithm to find subspace clusters in high dimensional spaces.

Reference:
C. C. Aggrawal, C. Procopiuc, J. L. Wolf, P. S. Yu, J. S. Park: Fast Algorithms for Projected Clustering.
In: Proc. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD '99).

Author:
Elke Achtert

Nested Class Summary
private  class PROCLUS.PROCLUSCluster
          Encapsulates the attributes of a cluster.
 
Field Summary
private  int m_i
          Holds the value of M_I_PARAM.
static OptionID M_I_ID
          OptionID for M_I_PARAM
private  IntParameter M_I_PARAM
          Parameter to specify the multiplier for the initial number of medoids, must be an integer greater than 0.
 
Fields inherited from class de.lmu.ifi.dbs.elki.algorithm.clustering.subspace.ProjectedClustering
K_I_ID, K_ID, L_ID
 
Fields inherited from class de.lmu.ifi.dbs.elki.utilities.optionhandling.AbstractParameterizable
optionHandler
 
Fields inherited from class de.lmu.ifi.dbs.elki.logging.AbstractLoggable
debug, logger
 
Constructor Summary
PROCLUS()
          Provides the PROCLUS algorithm, adding parameter M_I_PARAM to the option handler additionally to parameters of super class.
 
Method Summary
private  Map<Integer,PROCLUS.PROCLUSCluster> assignPoints(Map<Integer,Set<Integer>> dimensions, Database<V> database)
          Assigns the objects to the clusters.
private  double avgDistance(V centroid, Set<Integer> objectIDs, Database<V> database, int dimension)
          Computes the average distance of the objects to the centroid along the specified dimension.
private  Set<Integer> computeBadMedoids(Map<Integer,PROCLUS.PROCLUSCluster> clusters, int threshold)
          Computes the bad medoids, where the medoid of a cluster with less than the specified threshold of objects is bad.
private  Set<Integer> computeM_current(Set<Integer> m, Set<Integer> m_best, Set<Integer> m_bad)
          Computes the set of medoids in current iteration.
private  double evaluateClusters(Map<Integer,PROCLUS.PROCLUSCluster> clusters, Map<Integer,Set<Integer>> dimensions, Database<V> database)
          Evaluates the quality of the clusters.
private  Map<Integer,Set<Integer>> findDimensions(Set<Integer> medoids, Database<V> database)
          Determines the set of correlated dimensions for each medoid in the specified medoid set.
 Description getDescription()
          Returns a description of the algorithm.
private  Map<Integer,List<DistanceResultPair<DoubleDistance>>> getLocalities(Set<Integer> m_c, Database<V> database)
          Computes the localities of the specified medoids.
private  Set<Integer> greedy(Set<Integer> sampleSet, int m)
          Returns a piercing set of k medoids from the specified sample set.
private  Set<Integer> initialSet(Set<Integer> sampleSet, int k)
          Returns a set of k elements from the specified sample set.
private  double manhattanSegmentalDistance(V o1, V o2, Set<Integer> dimensions)
          Returns the Manhattan segmental distance between o1 and o2 realtive to the specified dimensions.
protected  Clustering<Model> runInTime(Database<V> database)
          Performs the PROCLUS algorithm on the given database.
 List<String> setParameters(List<String> args)
          Calls the super method and sets additionally the value of the parameter M_I_PARAM.
 
Methods inherited from class de.lmu.ifi.dbs.elki.algorithm.clustering.subspace.ProjectedClustering
getDistanceFunction, getK_i, getK, getL, getResult, setResult
 
Methods inherited from class de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm
isTime, isVerbose, run, setTime, setVerbose
 
Methods inherited from class de.lmu.ifi.dbs.elki.utilities.optionhandling.AbstractParameterizable
addOption, addParameterizable, addParameterizable, checkGlobalParameterConstraints, collectOptions, getAttributeSettings, getParameters, rememberParametersExcept, removeOption, removeParameterizable, shortDescription
 
Methods inherited from class de.lmu.ifi.dbs.elki.logging.AbstractLoggable
debugFine, debugFiner, debugFinest, exception, progress, verbose, warning
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface de.lmu.ifi.dbs.elki.algorithm.clustering.ClusteringAlgorithm
run
 
Methods inherited from interface de.lmu.ifi.dbs.elki.algorithm.Algorithm
setTime, setVerbose
 
Methods inherited from interface de.lmu.ifi.dbs.elki.utilities.optionhandling.Parameterizable
checkGlobalParameterConstraints, collectOptions, getParameters, shortDescription
 

Field Detail

M_I_ID

public static final OptionID M_I_ID
OptionID for M_I_PARAM


M_I_PARAM

private final IntParameter M_I_PARAM
Parameter to specify the multiplier for the initial number of medoids, must be an integer greater than 0.

Default value: 10

Key: -proclus.mi


m_i

private int m_i
Holds the value of M_I_PARAM.

Constructor Detail

PROCLUS

public PROCLUS()
Provides the PROCLUS algorithm, adding parameter M_I_PARAM to the option handler additionally to parameters of super class.

Method Detail

runInTime

protected Clustering<Model> runInTime(Database<V> database)
                               throws IllegalStateException
Performs the PROCLUS algorithm on the given database.

Specified by:
runInTime in class AbstractAlgorithm<V extends RealVector<V,?>,Clustering<Model>>
Parameters:
database - the database to run the algorithm on
Returns:
the Result computed by this algorithm
Throws:
IllegalStateException - if the algorithm has not been initialized properly (e.g. the setParameters(String[]) method has been failed to be called).

getDescription

public Description getDescription()
Description copied from interface: Algorithm
Returns a description of the algorithm.

Returns:
a description of the algorithm

setParameters

public List<String> setParameters(List<String> args)
                           throws ParameterException
Calls the super method and sets additionally the value of the parameter M_I_PARAM.

Specified by:
setParameters in interface Parameterizable
Overrides:
setParameters in class ProjectedClustering<V extends RealVector<V,?>>
Parameters:
args - parameters to set the attributes accordingly to
Returns:
a list containing the unused parameters
Throws:
ParameterException - in case of wrong parameter-setting

greedy

private Set<Integer> greedy(Set<Integer> sampleSet,
                            int m)
Returns a piercing set of k medoids from the specified sample set.

Parameters:
sampleSet - the sample set
m - the number of medoids to be returned
Returns:
a piercing set of m medoids from the specified sample set

initialSet

private Set<Integer> initialSet(Set<Integer> sampleSet,
                                int k)
Returns a set of k elements from the specified sample set.

Parameters:
sampleSet - the sample set
k - the number of samples to be returned
Returns:
a set of k elements from the specified sample set

computeM_current

private Set<Integer> computeM_current(Set<Integer> m,
                                      Set<Integer> m_best,
                                      Set<Integer> m_bad)
Computes the set of medoids in current iteration.

Parameters:
m - the medoids
m_best - the best set of medoids found so far
m_bad - the bad medoids
Returns:
m_current, the set of medoids in current iteration

getLocalities

private Map<Integer,List<DistanceResultPair<DoubleDistance>>> getLocalities(Set<Integer> m_c,
                                                                            Database<V> database)
Computes the localities of the specified medoids.

Parameters:
m_c - the ids of the medoids
database - the database holding the objects
Returns:
a mapping of the medoid's id to its locality

findDimensions

private Map<Integer,Set<Integer>> findDimensions(Set<Integer> medoids,
                                                 Database<V> database)
Determines the set of correlated dimensions for each medoid in the specified medoid set.

Parameters:
medoids - the set of medoids
database - the database containing the objects
Returns:
the set of correlated dimensions for each medoid in the specified medoid set

assignPoints

private Map<Integer,PROCLUS.PROCLUSCluster> assignPoints(Map<Integer,Set<Integer>> dimensions,
                                                         Database<V> database)
Assigns the objects to the clusters.

Parameters:
dimensions - set of correlated dimensions for each medoid of the cluster
database - the database containing the objects
Returns:
the assignments of the object to the clusters

manhattanSegmentalDistance

private double manhattanSegmentalDistance(V o1,
                                          V o2,
                                          Set<Integer> dimensions)
Returns the Manhattan segmental distance between o1 and o2 realtive to the specified dimensions.

Parameters:
o1 - the first object
o2 - the second object
dimensions - the dimensions to be considered
Returns:
the Manhattan segmental distance between o1 and o2 realtive to the specified dimensions

evaluateClusters

private double evaluateClusters(Map<Integer,PROCLUS.PROCLUSCluster> clusters,
                                Map<Integer,Set<Integer>> dimensions,
                                Database<V> database)
Evaluates the quality of the clusters.

Parameters:
clusters - the clusters to be evaluated
dimensions - the dimensions associated with each cluster
database - the database holding the objects
Returns:
a measure for the cluster quality

avgDistance

private double avgDistance(V centroid,
                           Set<Integer> objectIDs,
                           Database<V> database,
                           int dimension)
Computes the average distance of the objects to the centroid along the specified dimension.

Parameters:
centroid - the centroid
objectIDs - the set of objects ids
database - the database holding the objects
dimension - the dimension for which the average distance is computed
Returns:
the average distance of the objects to the centroid along the specified dimension

computeBadMedoids

private Set<Integer> computeBadMedoids(Map<Integer,PROCLUS.PROCLUSCluster> clusters,
                                       int threshold)
Computes the bad medoids, where the medoid of a cluster with less than the specified threshold of objects is bad.

Parameters:
clusters - the clusters
threshold - the threshold
Returns:
the bad medoids

Release 0.2.1 (2009-07-13_1605)