weka.attributeSelection
Class WrapperSubsetEval

java.lang.Object
  extended byweka.attributeSelection.ASEvaluation
      extended byweka.attributeSelection.SubsetEvaluator
          extended byweka.attributeSelection.WrapperSubsetEval
All Implemented Interfaces:
OptionHandler, java.io.Serializable

public class WrapperSubsetEval
extends SubsetEvaluator
implements OptionHandler

Wrapper attribute subset evaluator.

For more information see:
Kohavi, R., John G., Wrappers for Feature Subset Selection. In Artificial Intelligence journal, special issue on relevance, Vol. 97, Nos 1-2, pp.273-324.

Valid options are:

-B
Class name of base learner to use for accuracy estimation. Place any classifier options last on the command line following a "--". Eg -B weka.classifiers.bayes.NaiveBayes ... -- -K

-F
Number of cross validation folds to use for estimating accuracy.

-T
Threshold by which to execute another cross validation (standard deviation ---expressed as a percentage of the mean).

-R
Seed for cross validation accuracy estimation. (default = 1)

Version:
$Revision: 1.21 $
Author:
Mark Hall (mhall@cs.waikato.ac.nz)
See Also:
Serialized Form

Field Summary
private  Classifier m_BaseClassifier
          holds the base classifier object
private  int m_classIndex
          class index
private  Evaluation m_Evaluation
          holds an evaluation object
private  int m_folds
          number of folds to use for cross validation
private  int m_numAttribs
          number of attributes in the training data
private  int m_numInstances
          number of instances in the training data
private  int m_seed
          random number seed
private  double m_threshold
          the threshold by which to do further cross validations when estimating the accuracy of a subset
private  Instances m_trainInstances
          training instances
 
Constructor Summary
WrapperSubsetEval()
          Constructor.
 
Method Summary
 void buildEvaluator(Instances data)
          Generates a attribute evaluator.
 java.lang.String classifierTipText()
          Returns the tip text for this property
 double evaluateSubset(java.util.BitSet subset)
          Evaluates a subset of attributes
 java.lang.String foldsTipText()
          Returns the tip text for this property
 Classifier getClassifier()
          Get the classifier used as the base learner.
 int getFolds()
          Get the number of folds used for accuracy estimation
 java.lang.String[] getOptions()
          Gets the current settings of WrapperSubsetEval.
 int getSeed()
          Get the random number seed used for cross validation
 double getThreshold()
          Get the value of the threshold
 java.lang.String globalInfo()
          Returns a string describing this attribute evaluator
 java.util.Enumeration listOptions()
          Returns an enumeration describing the available options.
static void main(java.lang.String[] args)
          Main method for testing this class.
private  boolean repeat(double[] repError, int entries)
          decides whether to do another repeat of cross validation.
protected  void resetOptions()
           
 java.lang.String seedTipText()
          Returns the tip text for this property
 void setClassifier(Classifier newClassifier)
          Set the classifier to use for accuracy estimation
 void setFolds(int f)
          Set the number of folds to use for accuracy estimation
 void setOptions(java.lang.String[] options)
          Parses a given list of options.
 void setSeed(int s)
          Set the seed to use for cross validation
 void setThreshold(double t)
          Set the value of the threshold for repeating cross validation
 java.lang.String thresholdTipText()
          Returns the tip text for this property
 java.lang.String toString()
          Returns a string describing the wrapper
 
Methods inherited from class weka.attributeSelection.ASEvaluation
forName, makeCopies, postProcess
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

m_trainInstances

private Instances m_trainInstances
training instances


m_classIndex

private int m_classIndex
class index


m_numAttribs

private int m_numAttribs
number of attributes in the training data


m_numInstances

private int m_numInstances
number of instances in the training data


m_Evaluation

private Evaluation m_Evaluation
holds an evaluation object


m_BaseClassifier

private Classifier m_BaseClassifier
holds the base classifier object


m_folds

private int m_folds
number of folds to use for cross validation


m_seed

private int m_seed
random number seed


m_threshold

private double m_threshold
the threshold by which to do further cross validations when estimating the accuracy of a subset

Constructor Detail

WrapperSubsetEval

public WrapperSubsetEval()
Constructor. Calls restOptions to set default options

Method Detail

globalInfo

public java.lang.String globalInfo()
Returns a string describing this attribute evaluator

Returns:
a description of the evaluator suitable for displaying in the explorer/experimenter gui

listOptions

public java.util.Enumeration listOptions()
Returns an enumeration describing the available options.

Specified by:
listOptions in interface OptionHandler
Returns:
an enumeration of all the available options.

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a given list of options. Valid options are:

-B
Class name of base learner to use for accuracy estimation. Place any classifier options last on the command line following a "--". Eg -B weka.classifiers.bayes.NaiveBayes ... -- -K

-F
Number of cross validation folds to use for estimating accuracy.

-T
Threshold by which to execute another cross validation (standard deviation ---expressed as a percentage of the mean).

-R
Seed for cross validation accuracy estimation. (default = 1)

Specified by:
setOptions in interface OptionHandler
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

thresholdTipText

public java.lang.String thresholdTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setThreshold

public void setThreshold(double t)
Set the value of the threshold for repeating cross validation

Parameters:
t - the value of the threshold

getThreshold

public double getThreshold()
Get the value of the threshold

Returns:
the threshold as a double

foldsTipText

public java.lang.String foldsTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setFolds

public void setFolds(int f)
Set the number of folds to use for accuracy estimation

Parameters:
f - the number of folds

getFolds

public int getFolds()
Get the number of folds used for accuracy estimation

Returns:
the number of folds

seedTipText

public java.lang.String seedTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setSeed

public void setSeed(int s)
Set the seed to use for cross validation

Parameters:
s - the seed

getSeed

public int getSeed()
Get the random number seed used for cross validation

Returns:
the seed

classifierTipText

public java.lang.String classifierTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setClassifier

public void setClassifier(Classifier newClassifier)
Set the classifier to use for accuracy estimation

Parameters:
newClassifier - the Classifier to use.

getClassifier

public Classifier getClassifier()
Get the classifier used as the base learner.

Returns:
the classifier used as the classifier

getOptions

public java.lang.String[] getOptions()
Gets the current settings of WrapperSubsetEval.

Specified by:
getOptions in interface OptionHandler
Returns:
an array of strings suitable for passing to setOptions()

resetOptions

protected void resetOptions()

buildEvaluator

public void buildEvaluator(Instances data)
                    throws java.lang.Exception
Generates a attribute evaluator. Has to initialize all fields of the evaluator that are not being set via options.

Specified by:
buildEvaluator in class ASEvaluation
Parameters:
data - set of instances serving as training data
Throws:
java.lang.Exception - if the evaluator has not been generated successfully

evaluateSubset

public double evaluateSubset(java.util.BitSet subset)
                      throws java.lang.Exception
Evaluates a subset of attributes

Specified by:
evaluateSubset in class SubsetEvaluator
Parameters:
subset - a bitset representing the attribute subset to be evaluated
Returns:
the "merit" of the subset
Throws:
java.lang.Exception - if the subset could not be evaluated

toString

public java.lang.String toString()
Returns a string describing the wrapper

Returns:
the description as a string

repeat

private boolean repeat(double[] repError,
                       int entries)
decides whether to do another repeat of cross validation. If the standard deviation of the cross validations is greater than threshold% of the mean (default 1%) then another repeat is done.

Parameters:
repError - an array of cross validation results
entries - the number of cross validations done so far
Returns:
true if another cv is to be done

main

public static void main(java.lang.String[] args)
Main method for testing this class.

Parameters:
args - the options