Environment for
DeveLoping
KDD-Applications
Supported by Index-Structures

de.lmu.ifi.dbs.elki.algorithm.clustering.correlation
Class ORCLUS<V extends RealVector<V,?>>

java.lang.Object
  extended by de.lmu.ifi.dbs.elki.logging.AbstractLoggable
      extended by de.lmu.ifi.dbs.elki.utilities.optionhandling.AbstractParameterizable
          extended by de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm<V,Clustering<Model>>
              extended by de.lmu.ifi.dbs.elki.algorithm.clustering.subspace.ProjectedClustering<V>
                  extended by de.lmu.ifi.dbs.elki.algorithm.clustering.correlation.ORCLUS<V>
Type Parameters:
V - the type of Realvector handled by this Algorithm
All Implemented Interfaces:
Algorithm<V,Clustering<Model>>, ClusteringAlgorithm<Clustering<Model>,V>, Parameterizable

public class ORCLUS<V extends RealVector<V,?>>
extends ProjectedClustering<V>

ORCLUS provides the ORCLUS algorithm, an algorithm to find clusters in high dimensional spaces.

Reference: C. C. Aggrawal, P. S. Yu: Finding Generalized Projected Clusters in High Dimensional Spaces.
In: Proc. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD '00).

Author:
Elke Achtert

Nested Class Summary
private  class ORCLUS.ORCLUSCluster
          Encapsulates the attributes of a cluster.
private  class ORCLUS.ProjectedEnergy
          Encapsulates the projected energy for a cluster.
 
Field Summary
private  double alpha
          Holds the value of ALPHA_PARAM.
static OptionID ALPHA_ID
          OptionID for ALPHA_PARAM.
private  DoubleParameter ALPHA_PARAM
          Parameter to specify the factor for reducing the number of current clusters in each iteration, must be an integer greater than 0 and less than 1.
private  PCARunner<V,DoubleDistance> pca
          The PCA utility object.
 
Fields inherited from class de.lmu.ifi.dbs.elki.algorithm.clustering.subspace.ProjectedClustering
K_I_ID, K_ID, L_ID
 
Fields inherited from class de.lmu.ifi.dbs.elki.utilities.optionhandling.AbstractParameterizable
optionHandler
 
Fields inherited from class de.lmu.ifi.dbs.elki.logging.AbstractLoggable
debug, logger
 
Constructor Summary
ORCLUS()
          Provides the ORCLUS algorithm, adding parameter ALPHA_PARAM to the option handler additionally to parameters of super class.
 
Method Summary
private  void assign(Database<V> database, List<ORCLUS.ORCLUSCluster> clusters)
          Creates a partitioning of the database by assigning each object to its closest seed.
private  Matrix findBasis(Database<V> database, ORCLUS.ORCLUSCluster cluster, int dim)
          Finds the basis of the subspace of dimensionality dim for the specified cluster.
 Description getDescription()
          Returns a description of the algorithm.
private  List<ORCLUS.ORCLUSCluster> initialSeeds(Database<V> database, int k)
          Initializes the list of seeds wit a random sample of size k.
private  void merge(Database<V> database, List<ORCLUS.ORCLUSCluster> clusters, int k_new, int d_new)
          Reduces the number of seeds to k_new
private  ORCLUS.ProjectedEnergy projectedEnergy(Database<V> database, ORCLUS.ORCLUSCluster c_i, ORCLUS.ORCLUSCluster c_j, int i, int j, int dim)
          Computes the projected energy of the specified clusters.
private  V projection(ORCLUS.ORCLUSCluster c, V o)
          Returns the projection of real vector o in the subspace of cluster c.
protected  Clustering<Model> runInTime(Database<V> database)
          Performs the ORCLUS algorithm on the given database.
 List<String> setParameters(List<String> args)
          Calls the super method and sets additionally the value of the parameter ALPHA_PARAM.
private  ORCLUS.ORCLUSCluster union(Database<V> database, ORCLUS.ORCLUSCluster c1, ORCLUS.ORCLUSCluster c2, int dim)
          Returns the union of the two specified clusters.
 
Methods inherited from class de.lmu.ifi.dbs.elki.algorithm.clustering.subspace.ProjectedClustering
getDistanceFunction, getK_i, getK, getL, getResult, setResult
 
Methods inherited from class de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm
isTime, isVerbose, run, setTime, setVerbose
 
Methods inherited from class de.lmu.ifi.dbs.elki.utilities.optionhandling.AbstractParameterizable
addOption, addParameterizable, addParameterizable, checkGlobalParameterConstraints, collectOptions, getAttributeSettings, getParameters, rememberParametersExcept, removeOption, removeParameterizable, shortDescription
 
Methods inherited from class de.lmu.ifi.dbs.elki.logging.AbstractLoggable
debugFine, debugFiner, debugFinest, exception, progress, verbose, warning
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface de.lmu.ifi.dbs.elki.algorithm.clustering.ClusteringAlgorithm
run
 
Methods inherited from interface de.lmu.ifi.dbs.elki.algorithm.Algorithm
setTime, setVerbose
 
Methods inherited from interface de.lmu.ifi.dbs.elki.utilities.optionhandling.Parameterizable
checkGlobalParameterConstraints, collectOptions, getParameters, shortDescription
 

Field Detail

ALPHA_ID

public static final OptionID ALPHA_ID
OptionID for ALPHA_PARAM.


ALPHA_PARAM

private final DoubleParameter ALPHA_PARAM
Parameter to specify the factor for reducing the number of current clusters in each iteration, must be an integer greater than 0 and less than 1.

Default value: 0.5

Key: -orclus.alpha


alpha

private double alpha
Holds the value of ALPHA_PARAM.


pca

private PCARunner<V extends RealVector<V,?>,DoubleDistance> pca
The PCA utility object.

Constructor Detail

ORCLUS

public ORCLUS()
Provides the ORCLUS algorithm, adding parameter ALPHA_PARAM to the option handler additionally to parameters of super class.

Method Detail

runInTime

protected Clustering<Model> runInTime(Database<V> database)
                               throws IllegalStateException
Performs the ORCLUS algorithm on the given database.

Specified by:
runInTime in class AbstractAlgorithm<V extends RealVector<V,?>,Clustering<Model>>
Parameters:
database - the database to run the algorithm on
Returns:
the Result computed by this algorithm
Throws:
IllegalStateException - if the algorithm has not been initialized properly (e.g. the setParameters(String[]) method has been failed to be called).

getDescription

public Description getDescription()
Description copied from interface: Algorithm
Returns a description of the algorithm.

Returns:
a description of the algorithm

setParameters

public List<String> setParameters(List<String> args)
                           throws ParameterException
Calls the super method and sets additionally the value of the parameter ALPHA_PARAM.

Specified by:
setParameters in interface Parameterizable
Overrides:
setParameters in class ProjectedClustering<V extends RealVector<V,?>>
Parameters:
args - parameters to set the attributes accordingly to
Returns:
a list containing the unused parameters
Throws:
ParameterException - in case of wrong parameter-setting

initialSeeds

private List<ORCLUS.ORCLUSCluster> initialSeeds(Database<V> database,
                                                int k)
Initializes the list of seeds wit a random sample of size k.

Parameters:
database - the database holding the objects
k - the size of the random sample
Returns:
the initial seed list

assign

private void assign(Database<V> database,
                    List<ORCLUS.ORCLUSCluster> clusters)
Creates a partitioning of the database by assigning each object to its closest seed.

Parameters:
database - the database holding the objects
clusters - the array of clusters to which the objects should be assigned to

findBasis

private Matrix findBasis(Database<V> database,
                         ORCLUS.ORCLUSCluster cluster,
                         int dim)
Finds the basis of the subspace of dimensionality dim for the specified cluster.

Parameters:
database - the database to run the algorithm on
cluster - the cluster
dim - the dimensionality of the subspace
Returns:
matrix defining the basis of the subspace for the specified cluster

merge

private void merge(Database<V> database,
                   List<ORCLUS.ORCLUSCluster> clusters,
                   int k_new,
                   int d_new)
Reduces the number of seeds to k_new

Parameters:
database - the database holding the objects
clusters - the set of current seeds
k_new - the new number of seeds
d_new - the new dimensionality of the subspaces for each seed

projectedEnergy

private ORCLUS.ProjectedEnergy projectedEnergy(Database<V> database,
                                               ORCLUS.ORCLUSCluster c_i,
                                               ORCLUS.ORCLUSCluster c_j,
                                               int i,
                                               int j,
                                               int dim)
Computes the projected energy of the specified clusters. The projected energy is given by the mean square distance of the points to the centroid of the union cluster c, when all points in c are projected to the subspace of c.

Parameters:
database - the database holding the objects
c_i - the first cluster
c_j - the second cluster
i - the index of cluster c_i in the cluster list
j - the index of cluster c_j in the cluster list
dim - the dimensionality of the clusters
Returns:
the projected energy of the specified cluster

union

private ORCLUS.ORCLUSCluster union(Database<V> database,
                                   ORCLUS.ORCLUSCluster c1,
                                   ORCLUS.ORCLUSCluster c2,
                                   int dim)
Returns the union of the two specified clusters.

Parameters:
database - the database holding the objects
c1 - the first cluster
c2 - the second cluster
dim - the dimensionality of the union cluster
Returns:
the union of the two specified clusters

projection

private V projection(ORCLUS.ORCLUSCluster c,
                     V o)
Returns the projection of real vector o in the subspace of cluster c.

Parameters:
c - the cluster
o - the double vector
Returns:
the projection of double vector o in the subspace of cluster c

Release 0.2 (2009-07-06_1820)