
V - a type of NumberVector as a suitable datatype for this
        algorithm@Title(value="EM-Clustering: Clustering by Expectation Maximization") @Description(value="Provides k Gaussian mixtures maximizing the probability of the given data") @Reference(authors="A. P. Dempster, N. M. Laird, D. B. Rubin", title="Maximum Likelihood from Incomplete Data via the EM algorithm", booktitle="Journal of the Royal Statistical Society, Series B, 39(1), 1977, pp. 1-31", url="http://www.jstor.org/stable/2984875") public class EM<V extends NumberVector<?>> extends AbstractAlgorithm<Clustering<EMModel<V>>> implements ClusteringAlgorithm<Clustering<EMModel<V>>>
 Reference: A. P. Dempster, N. M. Laird, D. B. Rubin: Maximum Likelihood from
 Incomplete Data via the EM algorithm. 
 In Journal of the Royal Statistical Society, Series B, 39(1), 1977, pp. 1-31
 
| Modifier and Type | Class and Description | 
|---|---|
static class  | 
EM.Parameterizer<V extends NumberVector<?>>
Parameterization class. 
 | 
| Modifier and Type | Field and Description | 
|---|---|
private double | 
delta
Holds the value of  
DELTA_ID. | 
static OptionID | 
DELTA_ID
Parameter to specify the termination criterion for maximization of E(M):
 E(M) - E(M') < em.delta, must be a double equal to or greater than 0. 
 | 
static OptionID | 
INIT_ID
Parameter to specify the initialization method 
 | 
private KMeansInitialization<V> | 
initializer
Class to choose the initial means 
 | 
private int | 
k
Holds the value of  
K_ID. | 
static OptionID | 
K_ID
Parameter to specify the number of clusters to find, must be an integer
 greater than 0. 
 | 
private static Logging | 
LOG
The logger for this class. 
 | 
private int | 
maxiter
Maximum number of iterations to allow 
 | 
private static double | 
MIN_LOGLIKELIHOOD  | 
private WritableDataStore<double[]> | 
probClusterIGivenX
Store the individual probabilities, for use by EMOutlierDetection etc. 
 | 
private static double | 
SINGULARITY_CHEAT
Small value to increment diagonally of a matrix in order to avoid
 singularity before building the inverse. 
 | 
| Constructor and Description | 
|---|
EM(int k,
  double delta,
  KMeansInitialization<V> initializer,
  int maxiter)
Constructor. 
 | 
| Modifier and Type | Method and Description | 
|---|---|
protected double | 
assignProbabilitiesToInstances(Relation<V> database,
                              double[] normDistrFactor,
                              List<Vector> means,
                              List<Matrix> invCovMatr,
                              double[] clusterWeights,
                              WritableDataStore<double[]> probClusterIGivenX)
Assigns the current probability values to the instances in the database and
 compute the expectation value of the current mixture of distributions. 
 | 
TypeInformation[] | 
getInputTypeRestriction()
Get the input type restriction used for negotiating the data query. 
 | 
protected Logging | 
getLogger()
Get the (STATIC) logger for this class. 
 | 
double[] | 
getProbClusterIGivenX(DBIDRef index)
Get the probabilities for a given point. 
 | 
Clustering<EMModel<V>> | 
run(Database database,
   Relation<V> relation)
Performs the EM clustering algorithm on the given database. 
 | 
makeParameterDistanceFunction, runclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitrunprivate static final Logging LOG
private static final double SINGULARITY_CHEAT
public static final OptionID K_ID
private int k
K_ID.public static final OptionID DELTA_ID
public static final OptionID INIT_ID
private static final double MIN_LOGLIKELIHOOD
private double delta
DELTA_ID.private WritableDataStore<double[]> probClusterIGivenX
private KMeansInitialization<V extends NumberVector<?>> initializer
private int maxiter
public EM(int k, double delta, KMeansInitialization<V> initializer, int maxiter)
k - k parameterdelta - delta parameterinitializer - Class to choose the initial meansmaxiter - Maximum number of iterationspublic Clustering<EMModel<V>> run(Database database, Relation<V> relation)
database - Databaserelation - Relationprotected double assignProbabilitiesToInstances(Relation<V> database, double[] normDistrFactor, List<Vector> means, List<Matrix> invCovMatr, double[] clusterWeights, WritableDataStore<double[]> probClusterIGivenX)
database - the database used for assignment to instancesnormDistrFactor - normalization factor for density function, based on
        current covariance matrixmeans - the current meansinvCovMatr - the inverse covariance matricesclusterWeights - the weights of the current clusterspublic double[] getProbClusterIGivenX(DBIDRef index)
index - Point IDpublic TypeInformation[] getInputTypeRestriction()
AbstractAlgorithmgetInputTypeRestriction in interface AlgorithmgetInputTypeRestriction in class AbstractAlgorithm<Clustering<EMModel<V extends NumberVector<?>>>>protected Logging getLogger()
AbstractAlgorithmgetLogger in class AbstractAlgorithm<Clustering<EMModel<V extends NumberVector<?>>>>