Environment for
DeveLoping
KDD-Applications
Supported by Index-Structures

de.lmu.ifi.dbs.elki.algorithm.clustering.correlation
Class ORCLUS<V extends NumberVector<V,?>>

java.lang.Object
  extended by de.lmu.ifi.dbs.elki.logging.AbstractLoggable
      extended by de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm<V,Clustering<Model>>
          extended by de.lmu.ifi.dbs.elki.algorithm.clustering.subspace.ProjectedClustering<V>
              extended by de.lmu.ifi.dbs.elki.algorithm.clustering.correlation.ORCLUS<V>
Type Parameters:
V - the type of NumberVector handled by this Algorithm
All Implemented Interfaces:
Algorithm<V,Clustering<Model>>, ClusteringAlgorithm<Clustering<Model>,V>, Parameterizable

@Title(value="ORCLUS: Arbitrarily ORiented projected CLUSter generation")
@Description(value="Algorithm to find correlation clusters in high dimensional spaces.")
@Reference(authors="C. C. Aggrawal, P. S. Yu",
           title="Finding Generalized Projected Clusters in High Dimensional Spaces",
           booktitle="Proc. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD \'00)",
           url="http://dx.doi.org/10.1145/342009.335383")
public class ORCLUS<V extends NumberVector<V,?>>
extends ProjectedClustering<V>

ORCLUS provides the ORCLUS algorithm, an algorithm to find clusters in high dimensional spaces.

Reference: C. C. Aggrawal, P. S. Yu: Finding Generalized Projected Clusters in High Dimensional Spaces.
In: Proc. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD '00).

Author:
Elke Achtert

Nested Class Summary
private  class ORCLUS.ORCLUSCluster
          Encapsulates the attributes of a cluster.
private  class ORCLUS.ProjectedEnergy
          Encapsulates the projected energy for a cluster.
 
Field Summary
private  double alpha
          Holds the value of ALPHA_PARAM.
static OptionID ALPHA_ID
          OptionID for ALPHA_PARAM.
private  DoubleParameter ALPHA_PARAM
          Parameter to specify the factor for reducing the number of current clusters in each iteration, must be an integer greater than 0 and less than 1.
private  PCARunner<V,DoubleDistance> pca
          The PCA utility object.
 
Fields inherited from class de.lmu.ifi.dbs.elki.algorithm.clustering.subspace.ProjectedClustering
K_I_ID, K_ID, L_ID
 
Fields inherited from class de.lmu.ifi.dbs.elki.logging.AbstractLoggable
debug, logger
 
Constructor Summary
ORCLUS(Parameterization config)
          Constructor, adhering to Parameterizable
 
Method Summary
private  void assign(Database<V> database, List<ORCLUS.ORCLUSCluster> clusters)
          Creates a partitioning of the database by assigning each object to its closest seed.
private  Matrix findBasis(Database<V> database, ORCLUS.ORCLUSCluster cluster, int dim)
          Finds the basis of the subspace of dimensionality dim for the specified cluster.
private  List<ORCLUS.ORCLUSCluster> initialSeeds(Database<V> database, int k)
          Initializes the list of seeds wit a random sample of size k.
private  void merge(Database<V> database, List<ORCLUS.ORCLUSCluster> clusters, int k_new, int d_new, IndefiniteProgress cprogress)
          Reduces the number of seeds to k_new
private  ORCLUS.ProjectedEnergy projectedEnergy(Database<V> database, ORCLUS.ORCLUSCluster c_i, ORCLUS.ORCLUSCluster c_j, int i, int j, int dim)
          Computes the projected energy of the specified clusters.
private  V projection(ORCLUS.ORCLUSCluster c, V o)
          Returns the projection of real vector o in the subspace of cluster c.
protected  Clustering<Model> runInTime(Database<V> database)
          Performs the ORCLUS algorithm on the given database.
private  ORCLUS.ORCLUSCluster union(Database<V> database, ORCLUS.ORCLUSCluster c1, ORCLUS.ORCLUSCluster c2, int dim)
          Returns the union of the two specified clusters.
 
Methods inherited from class de.lmu.ifi.dbs.elki.algorithm.clustering.subspace.ProjectedClustering
getDistanceFunction, getK_i, getK, getL
 
Methods inherited from class de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm
isTime, isVerbose, run, setTime, setVerbose
 
Methods inherited from class de.lmu.ifi.dbs.elki.logging.AbstractLoggable
debugFine, debugFiner, debugFinest, exception, progress, verbose, warning
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface de.lmu.ifi.dbs.elki.algorithm.clustering.ClusteringAlgorithm
run
 
Methods inherited from interface de.lmu.ifi.dbs.elki.algorithm.Algorithm
setTime, setVerbose
 

Field Detail

ALPHA_ID

public static final OptionID ALPHA_ID
OptionID for ALPHA_PARAM.


ALPHA_PARAM

private final DoubleParameter ALPHA_PARAM
Parameter to specify the factor for reducing the number of current clusters in each iteration, must be an integer greater than 0 and less than 1.

Default value: 0.5

Key: -orclus.alpha


alpha

private double alpha
Holds the value of ALPHA_PARAM.


pca

private PCARunner<V extends NumberVector<V,?>,DoubleDistance> pca
The PCA utility object.

Constructor Detail

ORCLUS

public ORCLUS(Parameterization config)
Constructor, adhering to Parameterizable

Parameters:
config - Parameterization
Method Detail

runInTime

protected Clustering<Model> runInTime(Database<V> database)
                               throws IllegalStateException
Performs the ORCLUS algorithm on the given database.

Specified by:
runInTime in class AbstractAlgorithm<V extends NumberVector<V,?>,Clustering<Model>>
Parameters:
database - the database to run the algorithm on
Returns:
the Result computed by this algorithm
Throws:
IllegalStateException - if the algorithm has not been initialized properly (e.g. the setParameters(String[]) method has been failed to be called).

initialSeeds

private List<ORCLUS.ORCLUSCluster> initialSeeds(Database<V> database,
                                                int k)
Initializes the list of seeds wit a random sample of size k.

Parameters:
database - the database holding the objects
k - the size of the random sample
Returns:
the initial seed list

assign

private void assign(Database<V> database,
                    List<ORCLUS.ORCLUSCluster> clusters)
Creates a partitioning of the database by assigning each object to its closest seed.

Parameters:
database - the database holding the objects
clusters - the array of clusters to which the objects should be assigned to

findBasis

private Matrix findBasis(Database<V> database,
                         ORCLUS.ORCLUSCluster cluster,
                         int dim)
Finds the basis of the subspace of dimensionality dim for the specified cluster.

Parameters:
database - the database to run the algorithm on
cluster - the cluster
dim - the dimensionality of the subspace
Returns:
matrix defining the basis of the subspace for the specified cluster

merge

private void merge(Database<V> database,
                   List<ORCLUS.ORCLUSCluster> clusters,
                   int k_new,
                   int d_new,
                   IndefiniteProgress cprogress)
Reduces the number of seeds to k_new

Parameters:
database - the database holding the objects
clusters - the set of current seeds
k_new - the new number of seeds
d_new - the new dimensionality of the subspaces for each seed

projectedEnergy

private ORCLUS.ProjectedEnergy projectedEnergy(Database<V> database,
                                               ORCLUS.ORCLUSCluster c_i,
                                               ORCLUS.ORCLUSCluster c_j,
                                               int i,
                                               int j,
                                               int dim)
Computes the projected energy of the specified clusters. The projected energy is given by the mean square distance of the points to the centroid of the union cluster c, when all points in c are projected to the subspace of c.

Parameters:
database - the database holding the objects
c_i - the first cluster
c_j - the second cluster
i - the index of cluster c_i in the cluster list
j - the index of cluster c_j in the cluster list
dim - the dimensionality of the clusters
Returns:
the projected energy of the specified cluster

union

private ORCLUS.ORCLUSCluster union(Database<V> database,
                                   ORCLUS.ORCLUSCluster c1,
                                   ORCLUS.ORCLUSCluster c2,
                                   int dim)
Returns the union of the two specified clusters.

Parameters:
database - the database holding the objects
c1 - the first cluster
c2 - the second cluster
dim - the dimensionality of the union cluster
Returns:
the union of the two specified clusters

projection

private V projection(ORCLUS.ORCLUSCluster c,
                     V o)
Returns the projection of real vector o in the subspace of cluster c.

Parameters:
c - the cluster
o - the double vector
Returns:
the projection of double vector o in the subspace of cluster c

Release 0.3 (2010-03-31_1612)