weka.classifiers.meta
Class CVParameterSelection

java.lang.Object
  extended byweka.classifiers.Classifier
      extended byweka.classifiers.SingleClassifierEnhancer
          extended byweka.classifiers.RandomizableSingleClassifierEnhancer
              extended byweka.classifiers.meta.CVParameterSelection
All Implemented Interfaces:
java.lang.Cloneable, Drawable, OptionHandler, Randomizable, java.io.Serializable, Summarizable

public class CVParameterSelection
extends RandomizableSingleClassifierEnhancer
implements Drawable, Summarizable

Class for performing parameter selection by cross-validation for any classifier. For more information, see

R. Kohavi (1995). Wrappers for Performance Enhancement and Oblivious Decision Graphs. PhD Thesis. Department of Computer Science, Stanford University.

Valid options are:

-D
Turn on debugging output.

-W classname
Specify the full class name of classifier to perform cross-validation selection on.

-X num
Number of folds used for cross validation (default 10).

-S seed
Random number seed (default 1).

-P "N 1 5 10"
Sets an optimisation parameter for the classifier with name -N, lower bound 1, upper bound 5, and 10 optimisation steps. The upper bound may be the character 'A' or 'I' to substitute the number of attributes or instances in the training data, respectively. This parameter may be supplied more than once to optimise over several classifier options simultaneously.

Options after -- are passed to the designated sub-classifier.

Version:
$Revision: 1.26 $
Author:
Len Trigg (trigg@cs.waikato.ac.nz)
See Also:
Serialized Form

Nested Class Summary
protected  class CVParameterSelection.CVParameter
           
 
Field Summary
protected  java.lang.String[] m_BestClassifierOptions
          The set of all classifier options as determined by cross-validation
protected  double m_BestPerformance
          The cross-validated performance of the best options
protected  java.lang.String[] m_ClassifierOptions
          The base classifier options (not including those being set by cross-validation)
protected  FastVector m_CVParams
          The set of parameters to cross-validate over
protected  java.lang.String[] m_InitOptions
          The set of all options at initialization time.
protected  int m_NumAttributes
          The number of attributes in the data
protected  int m_NumFolds
          The number of folds used in cross-validation
protected  int m_TrainFoldSize
          The number of instances in a training fold
 
Fields inherited from class weka.classifiers.RandomizableSingleClassifierEnhancer
m_Seed
 
Fields inherited from class weka.classifiers.SingleClassifierEnhancer
m_Classifier
 
Fields inherited from class weka.classifiers.Classifier
m_Debug
 
Fields inherited from interface weka.core.Drawable
BayesNet, NOT_DRAWABLE, TREE
 
Constructor Summary
CVParameterSelection()
           
 
Method Summary
 void addCVParameter(java.lang.String cvParam)
          Adds a scheme parameter to the list of parameters to be set by cross-validation
 void buildClassifier(Instances instances)
          Generates the classifier.
protected  java.lang.String[] createOptions()
          Create the options array to pass to the classifier.
 java.lang.String CVParametersTipText()
          Returns the tip text for this property
 double[] distributionForInstance(Instance instance)
          Predicts the class distribution for the given test instance.
protected  void findParamsByCrossValidation(int depth, Instances trainData, java.util.Random random)
          Finds the best parameter combination.
 java.lang.String getCVParameter(int index)
          Gets the scheme paramter with the given index.
 java.lang.Object[] getCVParameters()
          Get method for CVParameters.
 int getNumFolds()
          Gets the number of folds for the cross-validation.
 java.lang.String[] getOptions()
          Gets the current settings of the Classifier.
 java.lang.String globalInfo()
          Returns a string describing this classifier
 java.lang.String graph()
          Returns graph describing the classifier (if possible).
 int graphType()
          Returns the type of graph this classifier represents.
 java.util.Enumeration listOptions()
          Returns an enumeration describing the available options.
static void main(java.lang.String[] argv)
          Main method for testing this class.
 java.lang.String numFoldsTipText()
          Returns the tip text for this property
 void setCVParameters(java.lang.Object[] params)
          Set method for CVParameters.
 void setNumFolds(int numFolds)
          Sets the number of folds for the cross-validation.
 void setOptions(java.lang.String[] options)
          Parses a given list of options.
 java.lang.String toString()
          Returns description of the cross-validated classifier.
 java.lang.String toSummaryString()
          Returns a string that summarizes the object.
 
Methods inherited from class weka.classifiers.RandomizableSingleClassifierEnhancer
getSeed, seedTipText, setSeed
 
Methods inherited from class weka.classifiers.SingleClassifierEnhancer
classifierTipText, defaultClassifierString, getClassifier, getClassifierSpec, setClassifier
 
Methods inherited from class weka.classifiers.Classifier
classifyInstance, debugTipText, forName, getDebug, makeCopies, setDebug
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

m_ClassifierOptions

protected java.lang.String[] m_ClassifierOptions
The base classifier options (not including those being set by cross-validation)


m_BestClassifierOptions

protected java.lang.String[] m_BestClassifierOptions
The set of all classifier options as determined by cross-validation


m_InitOptions

protected java.lang.String[] m_InitOptions
The set of all options at initialization time. So that getOptions can return this.


m_BestPerformance

protected double m_BestPerformance
The cross-validated performance of the best options


m_CVParams

protected FastVector m_CVParams
The set of parameters to cross-validate over


m_NumAttributes

protected int m_NumAttributes
The number of attributes in the data


m_TrainFoldSize

protected int m_TrainFoldSize
The number of instances in a training fold


m_NumFolds

protected int m_NumFolds
The number of folds used in cross-validation

Constructor Detail

CVParameterSelection

public CVParameterSelection()
Method Detail

createOptions

protected java.lang.String[] createOptions()
Create the options array to pass to the classifier. The parameter values and positions are taken from m_ClassifierOptions and m_CVParams.

Returns:
the options array

findParamsByCrossValidation

protected void findParamsByCrossValidation(int depth,
                                           Instances trainData,
                                           java.util.Random random)
                                    throws java.lang.Exception
Finds the best parameter combination. (recursive for each parameter being optimised).

Parameters:
depth - the index of the parameter to be optimised at this level
Throws:
java.lang.Exception - if an error occurs

globalInfo

public java.lang.String globalInfo()
Returns a string describing this classifier

Returns:
a description of the classifier suitable for displaying in the explorer/experimenter gui

listOptions

public java.util.Enumeration listOptions()
Returns an enumeration describing the available options.

Specified by:
listOptions in interface OptionHandler
Overrides:
listOptions in class RandomizableSingleClassifierEnhancer
Returns:
an enumeration of all the available options.

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a given list of options. Valid options are:

-D
Turn on debugging output.

-W classname
Specify the full class name of classifier to perform cross-validation selection on.

-X num
Number of folds used for cross validation (default 10).

-S seed
Random number seed (default 1).

-P "N 1 5 10"
Sets an optimisation parameter for the classifier with name -N, lower bound 1, upper bound 5, and 10 optimisation steps. The upper bound may be the character 'A' or 'I' to substitute the number of attributes or instances in the training data, respectively. This parameter may be supplied more than once to optimise over several classifier options simultaneously.

Options after -- are passed to the designated sub-classifier.

Specified by:
setOptions in interface OptionHandler
Overrides:
setOptions in class RandomizableSingleClassifierEnhancer
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

getOptions

public java.lang.String[] getOptions()
Gets the current settings of the Classifier.

Specified by:
getOptions in interface OptionHandler
Overrides:
getOptions in class RandomizableSingleClassifierEnhancer
Returns:
an array of strings suitable for passing to setOptions

buildClassifier

public void buildClassifier(Instances instances)
                     throws java.lang.Exception
Generates the classifier.

Specified by:
buildClassifier in class Classifier
Parameters:
instances - set of instances serving as training data
Throws:
java.lang.Exception - if the classifier has not been generated successfully

distributionForInstance

public double[] distributionForInstance(Instance instance)
                                 throws java.lang.Exception
Predicts the class distribution for the given test instance.

Overrides:
distributionForInstance in class Classifier
Parameters:
instance - the instance to be classified
Returns:
the predicted class value
Throws:
java.lang.Exception - if an error occurred during the prediction

addCVParameter

public void addCVParameter(java.lang.String cvParam)
                    throws java.lang.Exception
Adds a scheme parameter to the list of parameters to be set by cross-validation

Parameters:
cvParam - the string representation of a scheme parameter. The format is:
param_char lower_bound upper_bound increment
eg to search a parameter -P from 1 to 10 by increments of 2:
P 1 10 2
Throws:
java.lang.Exception - if the parameter specifier is of the wrong format

getCVParameter

public java.lang.String getCVParameter(int index)
Gets the scheme paramter with the given index.


CVParametersTipText

public java.lang.String CVParametersTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getCVParameters

public java.lang.Object[] getCVParameters()
Get method for CVParameters.


setCVParameters

public void setCVParameters(java.lang.Object[] params)
                     throws java.lang.Exception
Set method for CVParameters.

Throws:
java.lang.Exception

numFoldsTipText

public java.lang.String numFoldsTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getNumFolds

public int getNumFolds()
Gets the number of folds for the cross-validation.

Returns:
the number of folds for the cross-validation

setNumFolds

public void setNumFolds(int numFolds)
                 throws java.lang.Exception
Sets the number of folds for the cross-validation.

Parameters:
numFolds - the number of folds for the cross-validation
Throws:
java.lang.Exception - if parameter illegal

graphType

public int graphType()
Returns the type of graph this classifier represents.

Specified by:
graphType in interface Drawable
Returns:
the type of graph representing the object

graph

public java.lang.String graph()
                       throws java.lang.Exception
Returns graph describing the classifier (if possible).

Specified by:
graph in interface Drawable
Returns:
the graph of the classifier in dotty format
Throws:
java.lang.Exception - if the classifier cannot be graphed

toString

public java.lang.String toString()
Returns description of the cross-validated classifier.

Returns:
description of the cross-validated classifier as a string

toSummaryString

public java.lang.String toSummaryString()
Description copied from interface: Summarizable
Returns a string that summarizes the object.

Specified by:
toSummaryString in interface Summarizable
Returns:
the object summarized as a string

main

public static void main(java.lang.String[] argv)
Main method for testing this class.

Parameters:
argv - the options