Environment for
DeveLoping
KDD-Applications
Supported by Index-Structures

de.lmu.ifi.dbs.elki.algorithm.clustering
Class EM<V extends RealVector<V,?>>

java.lang.Object
  extended by de.lmu.ifi.dbs.elki.logging.AbstractLoggable
      extended by de.lmu.ifi.dbs.elki.utilities.optionhandling.AbstractParameterizable
          extended by de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm<V>
              extended by de.lmu.ifi.dbs.elki.algorithm.clustering.EM<V>
Type Parameters:
V - a type of RealVector as a suitable datatype for this algorithm
All Implemented Interfaces:
Algorithm<V>, Clustering<V>, Loggable, Parameterizable

public class EM<V extends RealVector<V,?>>
extends AbstractAlgorithm<V>
implements Clustering<V>

Provides the EM algorithm (clustering by expectation maximization).

Initialization is implemented as random initialization of means (uniformly distributed within the attribute ranges of the given database) and initial zero-covariance and variance=1 in covariance matrices.

Author:
Arthur Zimek

Field Summary
private  double delta
          Keeps delta - a small value as termination criterion in expectation maximization
private  DoubleParameter DELTA_PARAM
          Parameter to specify the termination criterion for maximization of E(M): E(M) - E(M') < em.delta, must be a double equal to or greater than 0.
private  int k
          Keeps k - the number of clusters to find.
private  IntParameter K_PARAM
          Parameter to specify the number of clusters to find, must be an integer greater than 0.
private  EMClusters<V> result
          Keeps the result.
private static double SINGULARITY_CHEAT
          Small value to increment diagonally of a matrix in order to avoid singularity befor building the inverse.
 
Fields inherited from class de.lmu.ifi.dbs.elki.utilities.optionhandling.AbstractParameterizable
optionHandler
 
Fields inherited from class de.lmu.ifi.dbs.elki.logging.AbstractLoggable
debug
 
Constructor Summary
EM()
          Provides the EM algorithm (clustering by expectation maximization).
 
Method Summary
protected  void assignProbabilitiesToInstances(Database<V> database, List<Double> normDistrFactor, List<V> means, List<Matrix> invCovMatr, List<Double> clusterWeights)
          Assigns the current probability values to the instances in the database.
protected  double expectationOfMixture(Database<V> database)
          The expectation value of the current mixture of distributions.
 Description getDescription()
          Returns a description of the algorithm.
 Clusters<V> getResult()
          Returns the result of the algorithm.
protected  List<V> initialMeans(Database<V> database)
          Creates k random points distributed uniformly within the attribute ranges of the given database.
protected  void runInTime(Database<V> database)
          Performs the EM clustering algorithm on the given database.
 String[] setParameters(String[] args)
          Sets parameters k and delta.
 
Methods inherited from class de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm
description, isTime, isVerbose, run, setTime, setVerbose
 
Methods inherited from class de.lmu.ifi.dbs.elki.utilities.optionhandling.AbstractParameterizable
addOption, checkGlobalParameterConstraints, deleteOption, description, description, getAttributeSettings, getParameters, getParameterValue, getPossibleOptions, inlineDescription, isSet, setParameters
 
Methods inherited from class de.lmu.ifi.dbs.elki.logging.AbstractLoggable
debugFine, debugFiner, debugFinest, exception, message, progress, progress, progress, verbose, verbose, warning
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface de.lmu.ifi.dbs.elki.algorithm.Algorithm
run, setTime, setVerbose
 
Methods inherited from interface de.lmu.ifi.dbs.elki.utilities.optionhandling.Parameterizable
checkGlobalParameterConstraints, description, getAttributeSettings, getParameters, getPossibleOptions, inlineDescription
 

Field Detail

SINGULARITY_CHEAT

private static final double SINGULARITY_CHEAT
Small value to increment diagonally of a matrix in order to avoid singularity befor building the inverse.

See Also:
Constant Field Values

K_PARAM

private final IntParameter K_PARAM
Parameter to specify the number of clusters to find, must be an integer greater than 0.

Key: -em.k


DELTA_PARAM

private final DoubleParameter DELTA_PARAM
Parameter to specify the termination criterion for maximization of E(M): E(M) - E(M') < em.delta, must be a double equal to or greater than 0.

Default value: 0.0

Key: -em.delta


k

private int k
Keeps k - the number of clusters to find.


delta

private double delta
Keeps delta - a small value as termination criterion in expectation maximization


result

private EMClusters<V extends RealVector<V,?>> result
Keeps the result.

Constructor Detail

EM

public EM()
Provides the EM algorithm (clustering by expectation maximization).

Method Detail

runInTime

protected void runInTime(Database<V> database)
                  throws IllegalStateException
Performs the EM clustering algorithm on the given database.

Finally a hard clustering is provided where each clusters gets assigned the points exhibiting the highest probability to belong to this cluster. But still, the database objects hold associated the complete probability-vector for all models.

Specified by:
runInTime in class AbstractAlgorithm<V extends RealVector<V,?>>
Parameters:
database - the database to run the algorithm on
Throws:
IllegalStateException - if the algorithm has not been initialized properly (e.g. the setParameters(String[]) method has been failed to be called).
See Also:
AbstractAlgorithm.runInTime(de.lmu.ifi.dbs.elki.database.Database)

assignProbabilitiesToInstances

protected void assignProbabilitiesToInstances(Database<V> database,
                                              List<Double> normDistrFactor,
                                              List<V> means,
                                              List<Matrix> invCovMatr,
                                              List<Double> clusterWeights)
Assigns the current probability values to the instances in the database.

Parameters:
database - the database used for assignment to instances
normDistrFactor - normalization factor for density function, based on current covariance matrix
means - the current means
invCovMatr - the inverse covariance matrices
clusterWeights - the weights of the current clusters

expectationOfMixture

protected double expectationOfMixture(Database<V> database)
The expectation value of the current mixture of distributions.

Computed as the sum of the logarithms of the prior probability of each instance.

Parameters:
database - the database where the prior probability of each instance is associated
Returns:
the expectation value of the current mixture of distributions

initialMeans

protected List<V> initialMeans(Database<V> database)
Creates k random points distributed uniformly within the attribute ranges of the given database.

Parameters:
database - the database must contain enough points in order to ascertain the range of attribute values. Less than two points would make no sense. The content of the database is not touched otherwise.
Returns:
a list of k random points distributed uniformly within the attribute ranges of the given database

getDescription

public Description getDescription()
Description copied from interface: Algorithm
Returns a description of the algorithm.

Specified by:
getDescription in interface Algorithm<V extends RealVector<V,?>>
Returns:
a description of the algorithm
See Also:
Algorithm.getDescription()

getResult

public Clusters<V> getResult()
Description copied from interface: Algorithm
Returns the result of the algorithm.

Specified by:
getResult in interface Algorithm<V extends RealVector<V,?>>
Specified by:
getResult in interface Clustering<V extends RealVector<V,?>>
Returns:
the result of the algorithm
See Also:
Algorithm.getResult()

setParameters

public String[] setParameters(String[] args)
                       throws ParameterException
Sets parameters k and delta.

Specified by:
setParameters in interface Parameterizable
Overrides:
setParameters in class AbstractAlgorithm<V extends RealVector<V,?>>
Parameters:
args - parameters to set the attributes accordingly to
Returns:
String[] an array containing the unused parameters
Throws:
ParameterException - in case of wrong parameter-setting
See Also:
Parameterizable.setParameters(String[])

Release 0.1 (2008-07-10_1838)