Environment for
DeveLoping
KDD-Applications
Supported by Index-Structures

de.lmu.ifi.dbs.elki.algorithm.clustering
Class KMeans<D extends Distance<D>,V extends NumberVector<V,?>>

java.lang.Object
  extended by de.lmu.ifi.dbs.elki.logging.AbstractLoggable
      extended by de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm<O,R>
          extended by de.lmu.ifi.dbs.elki.algorithm.DistanceBasedAlgorithm<V,D,Clustering<MeanModel<V>>>
              extended by de.lmu.ifi.dbs.elki.algorithm.clustering.KMeans<D,V>
Type Parameters:
D - a type of Distance as returned by the used distance function
V - a type of NumberVector as a suitable datatype for this algorithm
All Implemented Interfaces:
Algorithm<V,Clustering<MeanModel<V>>>, ClusteringAlgorithm<Clustering<MeanModel<V>>,V>, Parameterizable

@Title(value="K-Means")
@Description(value="Finds a partitioning into k clusters.")
@Reference(authors="J. MacQueen",
           title="Some Methods for Classification and Analysis of Multivariate Observations",
           booktitle="5th Berkeley Symp. Math. Statist. Prob., Vol. 1, 1967, pp 281-297",
           url="http://projecteuclid.org/euclid.bsmsp/1200512992")
public class KMeans<D extends Distance<D>,V extends NumberVector<V,?>>
extends DistanceBasedAlgorithm<V,D,Clustering<MeanModel<V>>>
implements ClusteringAlgorithm<Clustering<MeanModel<V>>,V>

Provides the k-means algorithm.

Reference: J. MacQueen: Some Methods for Classification and Analysis of Multivariate Observations.
In 5th Berkeley Symp. Math. Statist. Prob., Vol. 1, 1967, pp 281-297.

Author:
Arthur Zimek

Field Summary
private  int k
          Holds the value of K_PARAM.
static OptionID K_ID
          OptionID for K_PARAM
private  IntParameter K_PARAM
          Parameter to specify the number of clusters to find, must be an integer greater than 0.
private  int maxiter
          Holds the value of MAXITER_PARAM.
static OptionID MAXITER_ID
          OptionID for MAXITER_PARAM
private  IntParameter MAXITER_PARAM
          Parameter to specify the number of clusters to find, must be an integer greater or equal to 0, where 0 means no limit.
 
Fields inherited from class de.lmu.ifi.dbs.elki.algorithm.DistanceBasedAlgorithm
DISTANCE_FUNCTION_ID, DISTANCE_FUNCTION_PARAM
 
Fields inherited from class de.lmu.ifi.dbs.elki.logging.AbstractLoggable
debug, logger
 
Constructor Summary
KMeans(Parameterization config)
          Constructor, adhering to Parameterizable
 
Method Summary
protected  List<V> means(List<List<Integer>> clusters, List<V> means, Database<V> database)
          Returns the mean vectors of the given clusters in the given database.
protected  Clustering<MeanModel<V>> runInTime(Database<V> database)
          Performs the k-means algorithm on the given database.
protected  List<List<Integer>> sort(List<V> means, Database<V> database)
          Returns a list of clusters.
 
Methods inherited from class de.lmu.ifi.dbs.elki.algorithm.DistanceBasedAlgorithm
getDistanceFactory, getDistanceFunction
 
Methods inherited from class de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm
isTime, isVerbose, run, setTime, setVerbose
 
Methods inherited from class de.lmu.ifi.dbs.elki.logging.AbstractLoggable
debugFine, debugFiner, debugFinest, exception, progress, verbose, warning
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface de.lmu.ifi.dbs.elki.algorithm.clustering.ClusteringAlgorithm
run
 
Methods inherited from interface de.lmu.ifi.dbs.elki.algorithm.Algorithm
setTime, setVerbose
 

Field Detail

K_ID

public static final OptionID K_ID
OptionID for K_PARAM


K_PARAM

private final IntParameter K_PARAM
Parameter to specify the number of clusters to find, must be an integer greater than 0.

Key: -kmeans.k


MAXITER_ID

public static final OptionID MAXITER_ID
OptionID for MAXITER_PARAM


MAXITER_PARAM

private final IntParameter MAXITER_PARAM
Parameter to specify the number of clusters to find, must be an integer greater or equal to 0, where 0 means no limit.

Key: -kmeans.maxiter


k

private int k
Holds the value of K_PARAM.


maxiter

private int maxiter
Holds the value of MAXITER_PARAM.

Constructor Detail

KMeans

public KMeans(Parameterization config)
Constructor, adhering to Parameterizable

Parameters:
config - Parameterization
Method Detail

runInTime

protected Clustering<MeanModel<V>> runInTime(Database<V> database)
                                                                throws IllegalStateException
Performs the k-means algorithm on the given database.

Specified by:
runInTime in class AbstractAlgorithm<V extends NumberVector<V,?>,Clustering<MeanModel<V extends NumberVector<V,?>>>>
Parameters:
database - the database to run the algorithm on
Returns:
the Result computed by this algorithm
Throws:
IllegalStateException - if the algorithm has not been initialized properly (e.g. the setParameters(String[]) method has been failed to be called).

means

protected List<V> means(List<List<Integer>> clusters,
                        List<V> means,
                        Database<V> database)
Returns the mean vectors of the given clusters in the given database.

Parameters:
clusters - the clusters to compute the means
means - the recent means
database - the database containing the vectors
Returns:
the mean vectors of the given clusters in the given database

sort

protected List<List<Integer>> sort(List<V> means,
                                   Database<V> database)
Returns a list of clusters. The kth cluster contains the ids of those FeatureVectors, that are nearest to the kth mean.

Parameters:
means - a list of k means
database - the database to cluster
Returns:
list of k clusters

Release 0.3 (2010-03-31_1612)