weka.classifiers.meta
Class Bagging

java.lang.Object
  extended byweka.classifiers.Classifier
      extended byweka.classifiers.SingleClassifierEnhancer
          extended byweka.classifiers.IteratedSingleClassifierEnhancer
              extended byweka.classifiers.RandomizableIteratedSingleClassifierEnhancer
                  extended byweka.classifiers.meta.Bagging
All Implemented Interfaces:
AdditionalMeasureProducer, java.lang.Cloneable, OptionHandler, Randomizable, java.io.Serializable, WeightedInstancesHandler

public class Bagging
extends RandomizableIteratedSingleClassifierEnhancer
implements WeightedInstancesHandler, AdditionalMeasureProducer

Class for bagging a classifier. For more information, see

Leo Breiman (1996). Bagging predictors. Machine Learning, 24(2):123-140.

Valid options are:

-W classname
Specify the full class name of a weak classifier as the basis for bagging (required).

-I num
Set the number of bagging iterations (default 10).

-S seed
Random number seed for resampling (default 1).

-P num
Size of each bag, as a percentage of the training size (default 100).

-O
Compute out of bag error.

Options after -- are passed to the designated classifier.

Version:
$Revision: 1.29 $
Author:
Eibe Frank (eibe@cs.waikato.ac.nz), Len Trigg (len@reeltwo.com), Richard Kirkby (rkirkby@cs.waikato.ac.nz)
See Also:
Serialized Form

Field Summary
protected  int m_BagSizePercent
          The size of each bag sample, as a percentage of the training size
protected  boolean m_CalcOutOfBag
          Whether to calculate the out of bag error
protected  double m_OutOfBagError
          The out of bag error that has been calculated
 
Fields inherited from class weka.classifiers.RandomizableIteratedSingleClassifierEnhancer
m_Seed
 
Fields inherited from class weka.classifiers.IteratedSingleClassifierEnhancer
m_Classifiers, m_NumIterations
 
Fields inherited from class weka.classifiers.SingleClassifierEnhancer
m_Classifier
 
Fields inherited from class weka.classifiers.Classifier
m_Debug
 
Constructor Summary
Bagging()
          Constructor.
 
Method Summary
 java.lang.String bagSizePercentTipText()
          Returns the tip text for this property
 void buildClassifier(Instances data)
          Bagging method.
 java.lang.String calcOutOfBagTipText()
          Returns the tip text for this property
protected  java.lang.String defaultClassifierString()
          String describing default classifier.
 double[] distributionForInstance(Instance instance)
          Calculates the class membership probabilities for the given test instance.
 java.util.Enumeration enumerateMeasures()
          Returns an enumeration of the additional measure names.
 int getBagSizePercent()
          Gets the size of each bag, as a percentage of the training set size.
 boolean getCalcOutOfBag()
          Get whether the out of bag error is calculated.
 double getMeasure(java.lang.String additionalMeasureName)
          Returns the value of the named measure.
 java.lang.String[] getOptions()
          Gets the current settings of the Classifier.
 java.lang.String globalInfo()
          Returns a string describing classifier
 java.util.Enumeration listOptions()
          Returns an enumeration describing the available options.
static void main(java.lang.String[] argv)
          Main method for testing this class.
 double measureOutOfBagError()
          Gets the out of bag error that was calculated as the classifier was built.
 Instances resampleWithWeights(Instances data, java.util.Random random, boolean[] sampled)
          Creates a new dataset of the same size using random sampling with replacement according to the given weight vector.
 void setBagSizePercent(int newBagSizePercent)
          Sets the size of each bag, as a percentage of the training set size.
 void setCalcOutOfBag(boolean calcOutOfBag)
          Set whether the out of bag error is calculated.
 void setOptions(java.lang.String[] options)
          Parses a given list of options.
 java.lang.String toString()
          Returns description of the bagged classifier.
 
Methods inherited from class weka.classifiers.RandomizableIteratedSingleClassifierEnhancer
getSeed, seedTipText, setSeed
 
Methods inherited from class weka.classifiers.IteratedSingleClassifierEnhancer
getNumIterations, numIterationsTipText, setNumIterations
 
Methods inherited from class weka.classifiers.SingleClassifierEnhancer
classifierTipText, getClassifier, getClassifierSpec, setClassifier
 
Methods inherited from class weka.classifiers.Classifier
classifyInstance, debugTipText, forName, getDebug, makeCopies, setDebug
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

m_BagSizePercent

protected int m_BagSizePercent
The size of each bag sample, as a percentage of the training size


m_CalcOutOfBag

protected boolean m_CalcOutOfBag
Whether to calculate the out of bag error


m_OutOfBagError

protected double m_OutOfBagError
The out of bag error that has been calculated

Constructor Detail

Bagging

public Bagging()
Constructor.

Method Detail

globalInfo

public java.lang.String globalInfo()
Returns a string describing classifier

Returns:
a description suitable for displaying in the explorer/experimenter gui

defaultClassifierString

protected java.lang.String defaultClassifierString()
String describing default classifier.

Overrides:
defaultClassifierString in class SingleClassifierEnhancer

listOptions

public java.util.Enumeration listOptions()
Returns an enumeration describing the available options.

Specified by:
listOptions in interface OptionHandler
Overrides:
listOptions in class RandomizableIteratedSingleClassifierEnhancer
Returns:
an enumeration of all the available options.

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a given list of options. Valid options are:

-W classname
Specify the full class name of a weak classifier as the basis for bagging (required).

-I num
Set the number of bagging iterations (default 10).

-S seed
Random number seed for resampling (default 1).

-P num
Size of each bag, as a percentage of the training size (default 100).

-O
Compute out of bag error.

Options after -- are passed to the designated classifier.

Specified by:
setOptions in interface OptionHandler
Overrides:
setOptions in class RandomizableIteratedSingleClassifierEnhancer
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

getOptions

public java.lang.String[] getOptions()
Gets the current settings of the Classifier.

Specified by:
getOptions in interface OptionHandler
Overrides:
getOptions in class RandomizableIteratedSingleClassifierEnhancer
Returns:
an array of strings suitable for passing to setOptions

bagSizePercentTipText

public java.lang.String bagSizePercentTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getBagSizePercent

public int getBagSizePercent()
Gets the size of each bag, as a percentage of the training set size.

Returns:
the bag size, as a percentage.

setBagSizePercent

public void setBagSizePercent(int newBagSizePercent)
Sets the size of each bag, as a percentage of the training set size.

Parameters:
newBagSizePercent - the bag size, as a percentage.

calcOutOfBagTipText

public java.lang.String calcOutOfBagTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setCalcOutOfBag

public void setCalcOutOfBag(boolean calcOutOfBag)
Set whether the out of bag error is calculated.

Parameters:
calcOutOfBag - whether to calculate the out of bag error

getCalcOutOfBag

public boolean getCalcOutOfBag()
Get whether the out of bag error is calculated.

Returns:
whether the out of bag error is calculated

measureOutOfBagError

public double measureOutOfBagError()
Gets the out of bag error that was calculated as the classifier was built.

Returns:
the out of bag error

enumerateMeasures

public java.util.Enumeration enumerateMeasures()
Returns an enumeration of the additional measure names.

Specified by:
enumerateMeasures in interface AdditionalMeasureProducer
Returns:
an enumeration of the measure names

getMeasure

public double getMeasure(java.lang.String additionalMeasureName)
Returns the value of the named measure.

Specified by:
getMeasure in interface AdditionalMeasureProducer
Parameters:
additionalMeasureName - the name of the measure to query for its value
Returns:
the value of the named measure
Throws:
java.lang.IllegalArgumentException - if the named measure is not supported

resampleWithWeights

public final Instances resampleWithWeights(Instances data,
                                           java.util.Random random,
                                           boolean[] sampled)
Creates a new dataset of the same size using random sampling with replacement according to the given weight vector. The weights of the instances in the new dataset are set to one. The length of the weight vector has to be the same as the number of instances in the dataset, and all weights have to be positive.

Parameters:
data - the data to be sampled from
random - a random number generator
sampled - indicating which instance has been sampled
Returns:
the new dataset
Throws:
java.lang.IllegalArgumentException - if the weights array is of the wrong length or contains negative weights.

buildClassifier

public void buildClassifier(Instances data)
                     throws java.lang.Exception
Bagging method.

Overrides:
buildClassifier in class IteratedSingleClassifierEnhancer
Parameters:
data - the training data to be used for generating the bagged classifier.
Throws:
java.lang.Exception - if the classifier could not be built successfully

distributionForInstance

public double[] distributionForInstance(Instance instance)
                                 throws java.lang.Exception
Calculates the class membership probabilities for the given test instance.

Overrides:
distributionForInstance in class Classifier
Parameters:
instance - the instance to be classified
Returns:
preedicted class probability distribution
Throws:
java.lang.Exception - if distribution can't be computed successfully

toString

public java.lang.String toString()
Returns description of the bagged classifier.

Returns:
description of the bagged classifier as a string

main

public static void main(java.lang.String[] argv)
Main method for testing this class.

Parameters:
argv - the options