Environment for
DeveLoping
KDD-Applications
Supported by Index-Structures

de.lmu.ifi.dbs.elki.algorithm.clustering
Class KMeans<D extends Distance<D>,V extends RealVector<V,?>>

java.lang.Object
  extended by de.lmu.ifi.dbs.elki.logging.AbstractLoggable
      extended by de.lmu.ifi.dbs.elki.utilities.optionhandling.AbstractParameterizable
          extended by de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm<O,R>
              extended by de.lmu.ifi.dbs.elki.algorithm.DistanceBasedAlgorithm<V,D,Clustering<Model>>
                  extended by de.lmu.ifi.dbs.elki.algorithm.clustering.KMeans<D,V>
Type Parameters:
D - a type of Distance as returned by the used distance function
V - a type of RealVector as a suitable datatype for this algorithm
All Implemented Interfaces:
Algorithm<V,Clustering<Model>>, ClusteringAlgorithm<Clustering<Model>,V>, Parameterizable

public class KMeans<D extends Distance<D>,V extends RealVector<V,?>>
extends DistanceBasedAlgorithm<V,D,Clustering<Model>>
implements ClusteringAlgorithm<Clustering<Model>,V>

Provides the k-means algorithm.

Reference: J. McQueen: Some Methods for Classification and Analysis of Multivariate Observations.
In 5th Berkeley Symp. Math. Statist. Prob., Vol. 1, 1967, pp 281-297.

Author:
Arthur Zimek

Field Summary
private  int k
          Holds the value of K_PARAM.
static OptionID K_ID
          OptionID for K_PARAM
private  IntParameter K_PARAM
          Parameter to specify the number of clusters to find, must be an integer greater than 0.
private  Clustering<Model> result
          Keeps the result.
 
Fields inherited from class de.lmu.ifi.dbs.elki.algorithm.DistanceBasedAlgorithm
DISTANCE_FUNCTION_ID, DISTANCE_FUNCTION_PARAM
 
Fields inherited from class de.lmu.ifi.dbs.elki.utilities.optionhandling.AbstractParameterizable
optionHandler
 
Fields inherited from class de.lmu.ifi.dbs.elki.logging.AbstractLoggable
debug, logger
 
Constructor Summary
KMeans()
          Provides the k-means algorithm, adding parameter K_PARAM to the option handler additionally to parameters of super class.
 
Method Summary
 Description getDescription()
          Returns a description of the algorithm.
 Clustering<Model> getResult()
          Retrieve the result.
protected  List<V> means(List<List<Integer>> clusters, List<V> means, Database<V> database)
          Returns the mean vectors of the given clusters in the given database.
protected  Clustering<Model> runInTime(Database<V> database)
          Performs the k-means algorithm on the given database.
 List<String> setParameters(List<String> args)
          Calls the super method and sets additionally the value of the parameter K_PARAM.
protected  List<List<Integer>> sort(List<V> means, Database<V> database)
          Returns a list of clusters.
 
Methods inherited from class de.lmu.ifi.dbs.elki.algorithm.DistanceBasedAlgorithm
getDistanceFunction
 
Methods inherited from class de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm
isTime, isVerbose, run, setTime, setVerbose
 
Methods inherited from class de.lmu.ifi.dbs.elki.utilities.optionhandling.AbstractParameterizable
addOption, addParameterizable, addParameterizable, checkGlobalParameterConstraints, collectOptions, getAttributeSettings, getParameters, rememberParametersExcept, removeOption, removeParameterizable, shortDescription
 
Methods inherited from class de.lmu.ifi.dbs.elki.logging.AbstractLoggable
debugFine, debugFiner, debugFinest, exception, progress, verbose, warning
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface de.lmu.ifi.dbs.elki.algorithm.clustering.ClusteringAlgorithm
run
 
Methods inherited from interface de.lmu.ifi.dbs.elki.algorithm.Algorithm
setTime, setVerbose
 
Methods inherited from interface de.lmu.ifi.dbs.elki.utilities.optionhandling.Parameterizable
checkGlobalParameterConstraints, collectOptions, getParameters, shortDescription
 

Field Detail

K_ID

public static final OptionID K_ID
OptionID for K_PARAM


K_PARAM

private final IntParameter K_PARAM
Parameter to specify the number of clusters to find, must be an integer greater than 0.

Key: -kmeans.k


k

private int k
Holds the value of K_PARAM.


result

private Clustering<Model> result
Keeps the result.

Constructor Detail

KMeans

public KMeans()
Provides the k-means algorithm, adding parameter K_PARAM to the option handler additionally to parameters of super class.

Method Detail

getDescription

public Description getDescription()
Description copied from interface: Algorithm
Returns a description of the algorithm.

Specified by:
getDescription in interface Algorithm<V extends RealVector<V,?>,Clustering<Model>>
Returns:
a description of the algorithm

getResult

public Clustering<Model> getResult()
Description copied from interface: ClusteringAlgorithm
Retrieve the result.

Specified by:
getResult in interface Algorithm<V extends RealVector<V,?>,Clustering<Model>>
Specified by:
getResult in interface ClusteringAlgorithm<Clustering<Model>,V extends RealVector<V,?>>
Returns:
the result of the algorithm

runInTime

protected Clustering<Model> runInTime(Database<V> database)
                               throws IllegalStateException
Performs the k-means algorithm on the given database.

Specified by:
runInTime in class AbstractAlgorithm<V extends RealVector<V,?>,Clustering<Model>>
Parameters:
database - the database to run the algorithm on
Returns:
the Result computed by this algorithm
Throws:
IllegalStateException - if the algorithm has not been initialized properly (e.g. the setParameters(String[]) method has been failed to be called).

means

protected List<V> means(List<List<Integer>> clusters,
                        List<V> means,
                        Database<V> database)
Returns the mean vectors of the given clusters in the given database.

Parameters:
clusters - the clusters to compute the means
means - the recent means
database - the database containing the vectors
Returns:
the mean vectors of the given clusters in the given database

sort

protected List<List<Integer>> sort(List<V> means,
                                   Database<V> database)
Returns a list of clusters. The kth cluster contains the ids of those FeatureVectors, that are nearest to the kth mean.

Parameters:
means - a list of k means
database - the database to cluster
Returns:
list of k clusters

setParameters

public List<String> setParameters(List<String> args)
                           throws ParameterException
Calls the super method and sets additionally the value of the parameter K_PARAM.

Specified by:
setParameters in interface Parameterizable
Overrides:
setParameters in class DistanceBasedAlgorithm<V extends RealVector<V,?>,D extends Distance<D>,Clustering<Model>>
Parameters:
args - parameters to set the attributes accordingly to
Returns:
a list containing the unused parameters
Throws:
ParameterException - in case of wrong parameter-setting

Release 0.2 (2009-07-06_1820)