weka.clusterers
Class FarthestFirst

java.lang.Object
  extended byweka.clusterers.Clusterer
      extended byweka.clusterers.FarthestFirst
All Implemented Interfaces:
java.lang.Cloneable, OptionHandler, java.io.Serializable

public class FarthestFirst
extends Clusterer
implements OptionHandler

Implements the "Farthest First Traversal Algorithm" by Hochbaum and Shmoys 1985: A best possible heuristic for the k-center problem, Mathematics of Operations Research, 10(2):180-184, as cited by Sanjoy Dasgupta "performance guarantees for hierarchical clustering", colt 2002, sydney works as a fast simple approximate clusterer modelled after SimpleKMeans, might be a useful initializer for it Valid options are:

-N
Specify the number of clusters to generate.

-S
Specify random number seed.

Version:
$Revision: 1.1 $
Author:
Bernhard Pfahringer (bernhard@cs.waikato.ac.nz)
See Also:
Clusterer, OptionHandler, Serialized Form

Field Summary
protected  Instances m_ClusterCentroids
          holds the cluster centroids
protected  Instances m_instances
          training instances, not necessary to keep, could be replaced by m_ClusterCentroids where needed for header info
private  double[] m_Max
          attribute max values
private  double[] m_Min
          attribute min values
protected  int m_NumClusters
          number of clusters to generate
protected  ReplaceMissingValues m_ReplaceMissingFilter
          replace missing values in training instances
protected  int m_Seed
          random seed
 
Constructor Summary
FarthestFirst()
           
 
Method Summary
 void buildClusterer(Instances data)
          Generates a clusterer.
 int clusterInstance(Instance instance)
          Classifies a given instance.
protected  int clusterProcessedInstance(Instance instance)
          clusters an instance that has been through the filters
protected  double difference(int index, double val1, double val2)
          Computes the difference between two given attribute values.
protected  double distance(Instance first, Instance second)
          Calculates the distance between two instances
protected  int farthestAway(double[] minDistance, boolean[] selected)
           
 int getNumClusters()
          gets the number of clusters to generate
 java.lang.String[] getOptions()
          Gets the current settings of FarthestFirst
 int getSeed()
          Get the random number seed
 java.lang.String globalInfo()
          Returns a string describing this clusterer
protected  void initMinMax(Instances data)
           
 java.util.Enumeration listOptions()
          Returns an enumeration describing the available options..
static void main(java.lang.String[] argv)
          Main method for testing this class.
protected  double norm(double x, int i)
          Normalizes a given value of a numeric attribute.
 int numberOfClusters()
          Returns the number of clusters.
 java.lang.String numClustersTipText()
          Returns the tip text for this property
 java.lang.String seedTipText()
          Returns the tip text for this property
 void setNumClusters(int n)
          set the number of clusters to generate
 void setOptions(java.lang.String[] options)
          Parses a given list of options.
 void setSeed(int s)
          Set the random number seed
 java.lang.String toString()
          return a string describing this clusterer
protected  void updateMinDistance(double[] minDistance, boolean[] selected, Instances data, Instance center)
           
private  void updateMinMax(Instance instance)
          Updates the minimum and maximum values for all the attributes based on a new instance.
 
Methods inherited from class weka.clusterers.Clusterer
distributionForInstance, forName, makeCopies
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

m_instances

protected Instances m_instances
training instances, not necessary to keep, could be replaced by m_ClusterCentroids where needed for header info


m_ReplaceMissingFilter

protected ReplaceMissingValues m_ReplaceMissingFilter
replace missing values in training instances


m_NumClusters

protected int m_NumClusters
number of clusters to generate


m_ClusterCentroids

protected Instances m_ClusterCentroids
holds the cluster centroids


m_Min

private double[] m_Min
attribute min values


m_Max

private double[] m_Max
attribute max values


m_Seed

protected int m_Seed
random seed

Constructor Detail

FarthestFirst

public FarthestFirst()
Method Detail

globalInfo

public java.lang.String globalInfo()
Returns a string describing this clusterer

Returns:
a description of the evaluator suitable for displaying in the explorer/experimenter gui

buildClusterer

public void buildClusterer(Instances data)
                    throws java.lang.Exception
Generates a clusterer. Has to initialize all fields of the clusterer that are not being set via options.

Specified by:
buildClusterer in class Clusterer
Parameters:
data - set of instances serving as training data
Throws:
java.lang.Exception - if the clusterer has not been generated successfully

updateMinDistance

protected void updateMinDistance(double[] minDistance,
                                 boolean[] selected,
                                 Instances data,
                                 Instance center)

farthestAway

protected int farthestAway(double[] minDistance,
                           boolean[] selected)

initMinMax

protected void initMinMax(Instances data)

updateMinMax

private void updateMinMax(Instance instance)
Updates the minimum and maximum values for all the attributes based on a new instance.

Parameters:
instance - the new instance

clusterProcessedInstance

protected int clusterProcessedInstance(Instance instance)
clusters an instance that has been through the filters

Parameters:
instance - the instance to assign a cluster to
Returns:
a cluster number

clusterInstance

public int clusterInstance(Instance instance)
                    throws java.lang.Exception
Classifies a given instance.

Overrides:
clusterInstance in class Clusterer
Parameters:
instance - the instance to be assigned to a cluster
Returns:
the number of the assigned cluster as an integer if the class is enumerated, otherwise the predicted value
Throws:
java.lang.Exception - if instance could not be classified successfully

distance

protected double distance(Instance first,
                          Instance second)
Calculates the distance between two instances

Returns:
the distance between the two given instances, between 0 and 1

difference

protected double difference(int index,
                            double val1,
                            double val2)
Computes the difference between two given attribute values.


norm

protected double norm(double x,
                      int i)
Normalizes a given value of a numeric attribute.

Parameters:
x - the value to be normalized
i - the attribute's index

numberOfClusters

public int numberOfClusters()
                     throws java.lang.Exception
Returns the number of clusters.

Specified by:
numberOfClusters in class Clusterer
Returns:
the number of clusters generated for a training dataset.
Throws:
java.lang.Exception - if number of clusters could not be returned successfully

listOptions

public java.util.Enumeration listOptions()
Returns an enumeration describing the available options..

Valid options are:

-N
Specify the number of clusters to generate. If omitted, FarthestFirst will use cross validation to select the number of clusters automatically.

-S
Specify random number seed.

Specified by:
listOptions in interface OptionHandler
Returns:
an enumeration of all the available options.

numClustersTipText

public java.lang.String numClustersTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setNumClusters

public void setNumClusters(int n)
set the number of clusters to generate

Parameters:
n - the number of clusters to generate

getNumClusters

public int getNumClusters()
gets the number of clusters to generate

Returns:
the number of clusters to generate

seedTipText

public java.lang.String seedTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setSeed

public void setSeed(int s)
Set the random number seed

Parameters:
s - the seed

getSeed

public int getSeed()
Get the random number seed

Returns:
the seed

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a given list of options.

Specified by:
setOptions in interface OptionHandler
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

getOptions

public java.lang.String[] getOptions()
Gets the current settings of FarthestFirst

Specified by:
getOptions in interface OptionHandler
Returns:
an array of strings suitable for passing to setOptions()

toString

public java.lang.String toString()
return a string describing this clusterer

Returns:
a description of the clusterer as a string

main

public static void main(java.lang.String[] argv)
Main method for testing this class.

Parameters:
argv - should contain the following arguments:

-t training file [-N number of clusters]