Environment for
DeveLoping
KDD-Applications
Supported by Index-Structures

de.lmu.ifi.dbs.elki.algorithm.clustering.subspace
Class PROCLUS<V extends RealVector<V,?>>

java.lang.Object
  extended by de.lmu.ifi.dbs.elki.logging.AbstractLoggable
      extended by de.lmu.ifi.dbs.elki.utilities.optionhandling.AbstractParameterizable
          extended by de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm<V>
              extended by de.lmu.ifi.dbs.elki.algorithm.clustering.subspace.ProjectedClustering<V>
                  extended by de.lmu.ifi.dbs.elki.algorithm.clustering.subspace.PROCLUS<V>
Type Parameters:
V - the type of Realvector handled by this Algorithm
All Implemented Interfaces:
Algorithm<V>, Clustering<V>, Loggable, Parameterizable

public class PROCLUS<V extends RealVector<V,?>>
extends ProjectedClustering<V>

PROCLUS provides the PROCLUS algorithm.

Author:
Elke Achtert

Nested Class Summary
private  class PROCLUS.Cluster
          Encapsulates the attributes of a cluster.
 
Field Summary
private  int m_i
          Holds m_i.
private  IntParameter M_I_PARAM
          Parameter to specify the the multiplier for the initial number of medoids, must be an integer greater than 0.
 
Fields inherited from class de.lmu.ifi.dbs.elki.algorithm.clustering.subspace.ProjectedClustering
K_I_ID, K_ID, L_ID
 
Fields inherited from class de.lmu.ifi.dbs.elki.utilities.optionhandling.AbstractParameterizable
optionHandler
 
Fields inherited from class de.lmu.ifi.dbs.elki.logging.AbstractLoggable
debug
 
Constructor Summary
PROCLUS()
          Adds the parameter M_I_PARAM to the option handler additionally to the parameters provided by super-classes.
 
Method Summary
private  Map<Integer,PROCLUS.Cluster> assignPoints(Map<Integer,Set<Integer>> dimensions, Database<V> database)
          Assigns the objects to the clusters.
private  double avgDistance(V centroid, Set<Integer> objectIDs, Database<V> database, int dimension)
          Computes the average distance of the objects to the centroid along the specified dimension.
private  Set<Integer> computeBadMedoids(Map<Integer,PROCLUS.Cluster> clusters, int threshold)
          Computes the bad medoids, where the medoid of a cluster with less than the specified threshold of objkects is bad.
private  Set<Integer> computeM_current(Set<Integer> m, Set<Integer> m_best, Set<Integer> m_bad)
           
private  double evaluateClusters(Map<Integer,PROCLUS.Cluster> clusters, Map<Integer,Set<Integer>> dimensions, Database<V> database)
          Evaluates the quality of the clusters.
private  Map<Integer,Set<Integer>> findDimensions(Set<Integer> medoids, Database<V> database)
          Determines the set of correlated dimensions for each medoid in the specified medoid set.
 Description getDescription()
          Returns a description of the algorithm.
private  Map<Integer,List<QueryResult<DoubleDistance>>> getLocalities(Set<Integer> m_c, Database<V> database)
           
private  Set<Integer> greedy(Set<Integer> sampleSet, int m)
          Returns a piercing set of k medoids from the specified sample set.
private  Set<Integer> initialSet(Set<Integer> sampleSet, int k)
          Returns a set of k elements from the specified sample set.
private  double manhattanSegmentalDistance(V o1, V o2, Set<Integer> dimensions)
          Returns the Manhattan segmental distance between o1 and o2 realtive to the specified dimensions.
protected  void runInTime(Database<V> database)
          The run method encapsulated in measure of runtime.
 String[] setParameters(String[] args)
          Sets the parameters k and l additionally to the parameters set by the super-class' method.
 
Methods inherited from class de.lmu.ifi.dbs.elki.algorithm.clustering.subspace.ProjectedClustering
getDistanceFunction, getK_i, getK, getL, getResult, setResult
 
Methods inherited from class de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm
description, isTime, isVerbose, run, setTime, setVerbose
 
Methods inherited from class de.lmu.ifi.dbs.elki.utilities.optionhandling.AbstractParameterizable
addOption, checkGlobalParameterConstraints, deleteOption, description, description, getAttributeSettings, getParameters, getParameterValue, getPossibleOptions, inlineDescription, isSet, setParameters
 
Methods inherited from class de.lmu.ifi.dbs.elki.logging.AbstractLoggable
debugFine, debugFiner, debugFinest, exception, message, progress, progress, progress, verbose, verbose, warning
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface de.lmu.ifi.dbs.elki.algorithm.Algorithm
run, setTime, setVerbose
 
Methods inherited from interface de.lmu.ifi.dbs.elki.utilities.optionhandling.Parameterizable
checkGlobalParameterConstraints, description, getAttributeSettings, getParameters, getPossibleOptions, inlineDescription
 

Field Detail

M_I_PARAM

private final IntParameter M_I_PARAM
Parameter to specify the the multiplier for the initial number of medoids, must be an integer greater than 0.

Default value: 10

Key: -proclus.mi


m_i

private int m_i
Holds m_i.

Constructor Detail

PROCLUS

public PROCLUS()
Adds the parameter M_I_PARAM to the option handler additionally to the parameters provided by super-classes.

Method Detail

runInTime

protected void runInTime(Database<V> database)
                  throws IllegalStateException
Description copied from class: AbstractAlgorithm
The run method encapsulated in measure of runtime. An extending class needs not to take care of runtime itself.

Specified by:
runInTime in class AbstractAlgorithm<V extends RealVector<V,?>>
Parameters:
database - the database to run the algorithm on
Throws:
IllegalStateException - if the algorithm has not been initialized properly (e.g. the setParameters(String[]) method has been failed to be called).
See Also:
AbstractAlgorithm.runInTime(de.lmu.ifi.dbs.elki.database.Database)

getDescription

public Description getDescription()
Description copied from interface: Algorithm
Returns a description of the algorithm.

Returns:
a description of the algorithm
See Also:
Algorithm.getDescription()

setParameters

public String[] setParameters(String[] args)
                       throws ParameterException
Sets the parameters k and l additionally to the parameters set by the super-class' method. Both k and l are required parameters.

Specified by:
setParameters in interface Parameterizable
Overrides:
setParameters in class ProjectedClustering<V extends RealVector<V,?>>
Parameters:
args - parameters to set the attributes accordingly to
Returns:
String[] an array containing the unused parameters
Throws:
ParameterException - in case of wrong parameter-setting
See Also:
Parameterizable.setParameters(String[])

greedy

private Set<Integer> greedy(Set<Integer> sampleSet,
                            int m)
Returns a piercing set of k medoids from the specified sample set.

Parameters:
sampleSet - the sample set
m - the number of medoids to be returned
Returns:
a piercing set of m medoids from the specified sample set

initialSet

private Set<Integer> initialSet(Set<Integer> sampleSet,
                                int k)
Returns a set of k elements from the specified sample set.

Parameters:
sampleSet - the sample set
k - the number of samples to be returned
Returns:
a set of k elements from the specified sample set

computeM_current

private Set<Integer> computeM_current(Set<Integer> m,
                                      Set<Integer> m_best,
                                      Set<Integer> m_bad)

getLocalities

private Map<Integer,List<QueryResult<DoubleDistance>>> getLocalities(Set<Integer> m_c,
                                                                     Database<V> database)

findDimensions

private Map<Integer,Set<Integer>> findDimensions(Set<Integer> medoids,
                                                 Database<V> database)
Determines the set of correlated dimensions for each medoid in the specified medoid set.

Parameters:
medoids - the set of medoids
database - the database containing the objects
Returns:
the set of correlated dimensions for each medoid in the specified medoid set

assignPoints

private Map<Integer,PROCLUS.Cluster> assignPoints(Map<Integer,Set<Integer>> dimensions,
                                                  Database<V> database)
Assigns the objects to the clusters.

Parameters:
dimensions - set of correlated dimensions for each medoid of the cluster
database - the database containing the objects
Returns:
the assignments of the object to the clusters

manhattanSegmentalDistance

private double manhattanSegmentalDistance(V o1,
                                          V o2,
                                          Set<Integer> dimensions)
Returns the Manhattan segmental distance between o1 and o2 realtive to the specified dimensions.

Parameters:
o1 - the first object
o2 - the second object
dimensions - the dimensions to be considered
Returns:
the Manhattan segmental distance between o1 and o2 realtive to the specified dimensions

evaluateClusters

private double evaluateClusters(Map<Integer,PROCLUS.Cluster> clusters,
                                Map<Integer,Set<Integer>> dimensions,
                                Database<V> database)
Evaluates the quality of the clusters.

Parameters:
clusters - the clusters to be evaluated
dimensions - the dimensions associated with each cluster
database - the database holding the objects
Returns:
a measure for the cluster quality

avgDistance

private double avgDistance(V centroid,
                           Set<Integer> objectIDs,
                           Database<V> database,
                           int dimension)
Computes the average distance of the objects to the centroid along the specified dimension.

Parameters:
centroid - the centroid
objectIDs - the set of objects ids
database - the database holding the objects
dimension - the dimension for which the average distance is computed
Returns:
the average distance of the objects to the centroid along the specified dimension

computeBadMedoids

private Set<Integer> computeBadMedoids(Map<Integer,PROCLUS.Cluster> clusters,
                                       int threshold)
Computes the bad medoids, where the medoid of a cluster with less than the specified threshold of objkects is bad.

Parameters:
clusters - the clusters
threshold - the threshold
Returns:
the bad medoids

Release 0.1 (2008-07-10_1838)