weka.classifiers.meta
Class RacedIncrementalLogitBoost

java.lang.Object
  extended byweka.classifiers.Classifier
      extended byweka.classifiers.meta.RacedIncrementalLogitBoost
All Implemented Interfaces:
java.lang.Cloneable, OptionHandler, java.io.Serializable, UpdateableClassifier

public class RacedIncrementalLogitBoost
extends Classifier
implements OptionHandler, UpdateableClassifier

Classifier for incremental learning of large datasets by way of racing logit-boosted committees. Valid options are:

-C num
Set the minimum chunk size (default 500).

-M num
Set the maximum chunk size (default 8000).

-V num
Set the validation set size (default 5000).

-D
Turn on debugging output.

-W classname
Specify the full class name of a weak learner as the basis for boosting (required).

-Q
Use resampling instead of reweighting.

-S seed
Random number seed for resampling (default 1).

-P type
The type of pruning to use.

Options after -- are passed to the designated learner.

Version:
$Revision: 1.3 $
Author:
Richard Kirkby (rkirkby@cs.waikato.ac.nz), Eibe Frank (eibe@cs.waikato.ac.nz)
See Also:
Serialized Form

Nested Class Summary
protected  class RacedIncrementalLogitBoost.Committee
           
 
Field Summary
protected  RacedIncrementalLogitBoost.Committee m_bestCommittee
          The current best committee
protected  Attribute m_ClassAttribute
          The actual class attribute (for getting class names)
protected  Classifier m_Classifier
          The model base classifier to use
protected  FastVector m_committees
          The committees
protected  Instances m_currentSet
          The instances currently in memory for training
protected  boolean m_Debug
          Whether to output debug messages
protected  int m_maxBatchSizeRequired
          The maximum number of instances required for processing
protected  int m_maxChunkSize
          The maimum chunk size used for training
protected  int m_minChunkSize
          The minimum chunk size used for training
protected  int m_NumClasses
          The number of classes
protected  Instances m_NumericClassData
          Dummy dataset with a numeric class
protected  int m_numInstancesConsumed
          The number of instances consumed
protected  int m_PruningType
          The pruning type used
protected  java.util.Random m_RandomInstance
          The random number generator used
protected  int m_Seed
          Seed for boosting with resampling.
protected  boolean m_UseResampling
          Whether to use resampling
protected  int m_validationChunkSize
          The size of the validation set
protected  Instances m_validationSet
          The instances used for validation
protected  boolean m_validationSetChanged
          Whether the validation set has recently been changed
protected  ZeroR m_zeroR
          The default scheme used when committees aren't ready
static int PRUNETYPE_LOGLIKELIHOOD
           
static int PRUNETYPE_NONE
          The pruning types
static Tag[] TAGS_PRUNETYPE
           
protected static double Z_MAX
          A threshold for responses (Friedman suggests between 2 and 4)
 
Constructor Summary
RacedIncrementalLogitBoost()
           
 
Method Summary
 void buildClassifier(Instances data)
          Builds the classifier.
 java.lang.String classifierTipText()
           
 java.lang.String debugTipText()
          Returns the tip text for this property
 double[] distributionForInstance(Instance instance)
          Computes class distribution of an instance using the best committee.
 int getBestCommitteeChunkSize()
          Get the best committee chunk size
 double getBestCommitteeErrorEstimate()
          Get the best committee's error on the validation data
 double getBestCommitteeLLEstimate()
          Get the best committee's log likelihood on the validation data
 int getBestCommitteeSize()
          Get the number of members in the best committee
 Classifier getClassifier()
          Get the classifier used as the classifier
 boolean getDebug()
          Get whether debugging is turned on
 int getMaxChunkSize()
          Get the maximum chunk size
 int getMinChunkSize()
          Get the minimum chunk size
 java.lang.String[] getOptions()
          Gets the current settings of the Classifier.
 SelectedTag getPruningType()
          Get the pruning type
 int getSeed()
          Get seed for resampling.
 boolean getUseResampling()
          Get whether resampling is turned on
 int getValidationChunkSize()
          Get the validation chunk size
 java.lang.String globalInfo()
           
 java.util.Enumeration listOptions()
          Returns an enumeration describing the available options
static void main(java.lang.String[] argv)
          Main method for this class.
 java.lang.String maxChunkSizeTipText()
           
 java.lang.String minChunkSizeTipText()
           
 java.lang.String pruningTypeTipText()
           
protected static double RtoP(double[] Fs, int j)
          Convert from function responses to probabilities
 java.lang.String seedTipText()
           
 void setClassifier(Classifier newClassifier)
          Set the classifier for boosting.
 void setDebug(boolean debug)
          Set debugging mode
 void setMaxChunkSize(int chunkSize)
          Set the maximum chunk size
 void setMinChunkSize(int chunkSize)
          Set the minimum chunk size
 void setOptions(java.lang.String[] options)
          Parses a given list of options.
 void setPruningType(SelectedTag pruneType)
          Set the pruning type
 void setSeed(int seed)
          Set seed for resampling.
 void setUseResampling(boolean r)
          Set resampling mode
 void setValidationChunkSize(int chunkSize)
          Set the validation chunk size
 java.lang.String toString()
          Returns description of the boosted classifier.
 void updateClassifier(Instance instance)
          Updates the classifier.
 java.lang.String useResamplingTipText()
           
 java.lang.String validationChunkSizeTipText()
           
 
Methods inherited from class weka.classifiers.Classifier
classifyInstance, forName, makeCopies
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

PRUNETYPE_NONE

public static final int PRUNETYPE_NONE
The pruning types

See Also:
Constant Field Values

PRUNETYPE_LOGLIKELIHOOD

public static final int PRUNETYPE_LOGLIKELIHOOD
See Also:
Constant Field Values

TAGS_PRUNETYPE

public static final Tag[] TAGS_PRUNETYPE

m_Classifier

protected Classifier m_Classifier
The model base classifier to use


m_committees

protected FastVector m_committees
The committees


m_PruningType

protected int m_PruningType
The pruning type used


m_UseResampling

protected boolean m_UseResampling
Whether to use resampling


m_Seed

protected int m_Seed
Seed for boosting with resampling.


m_NumClasses

protected int m_NumClasses
The number of classes


Z_MAX

protected static final double Z_MAX
A threshold for responses (Friedman suggests between 2 and 4)

See Also:
Constant Field Values

m_NumericClassData

protected Instances m_NumericClassData
Dummy dataset with a numeric class


m_ClassAttribute

protected Attribute m_ClassAttribute
The actual class attribute (for getting class names)


m_minChunkSize

protected int m_minChunkSize
The minimum chunk size used for training


m_maxChunkSize

protected int m_maxChunkSize
The maimum chunk size used for training


m_validationChunkSize

protected int m_validationChunkSize
The size of the validation set


m_numInstancesConsumed

protected int m_numInstancesConsumed
The number of instances consumed


m_validationSet

protected Instances m_validationSet
The instances used for validation


m_currentSet

protected Instances m_currentSet
The instances currently in memory for training


m_bestCommittee

protected RacedIncrementalLogitBoost.Committee m_bestCommittee
The current best committee


m_zeroR

protected ZeroR m_zeroR
The default scheme used when committees aren't ready


m_validationSetChanged

protected boolean m_validationSetChanged
Whether the validation set has recently been changed


m_maxBatchSizeRequired

protected int m_maxBatchSizeRequired
The maximum number of instances required for processing


m_Debug

protected boolean m_Debug
Whether to output debug messages


m_RandomInstance

protected java.util.Random m_RandomInstance
The random number generator used

Constructor Detail

RacedIncrementalLogitBoost

public RacedIncrementalLogitBoost()
Method Detail

buildClassifier

public void buildClassifier(Instances data)
                     throws java.lang.Exception
Builds the classifier.

Specified by:
buildClassifier in class Classifier
Parameters:
data - set of instances serving as training data
Throws:
java.lang.Exception - if something goes wrong

updateClassifier

public void updateClassifier(Instance instance)
                      throws java.lang.Exception
Updates the classifier.

Specified by:
updateClassifier in interface UpdateableClassifier
Parameters:
instance - the next instance in the stream of training data
Throws:
java.lang.Exception - if something goes wrong

RtoP

protected static double RtoP(double[] Fs,
                             int j)
                      throws java.lang.Exception
Convert from function responses to probabilities

Parameters:
j - the class value of interest
Returns:
the probability prediction for j
Throws:
java.lang.Exception

distributionForInstance

public double[] distributionForInstance(Instance instance)
                                 throws java.lang.Exception
Computes class distribution of an instance using the best committee.

Overrides:
distributionForInstance in class Classifier
Parameters:
instance - the instance to be classified
Returns:
an array containing the estimated membership probabilities of the test instance in each class or the numeric prediction
Throws:
java.lang.Exception - if distribution could not be computed successfully

listOptions

public java.util.Enumeration listOptions()
Returns an enumeration describing the available options

Specified by:
listOptions in interface OptionHandler
Overrides:
listOptions in class Classifier
Returns:
an enumeration of all the available options

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a given list of options. Valid options are:

Specified by:
setOptions in interface OptionHandler
Overrides:
setOptions in class Classifier
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

getOptions

public java.lang.String[] getOptions()
Gets the current settings of the Classifier.

Specified by:
getOptions in interface OptionHandler
Overrides:
getOptions in class Classifier
Returns:
an array of strings suitable for passing to setOptions

globalInfo

public java.lang.String globalInfo()
Returns:
a description of the classifier suitable for displaying in the explorer/experimenter gui

classifierTipText

public java.lang.String classifierTipText()
Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setClassifier

public void setClassifier(Classifier newClassifier)
Set the classifier for boosting. The learner should be able to handle numeric class attributes.

Parameters:
newClassifier - the Classifier to use.

getClassifier

public Classifier getClassifier()
Get the classifier used as the classifier

Returns:
the classifier used as the classifier

minChunkSizeTipText

public java.lang.String minChunkSizeTipText()
Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setMinChunkSize

public void setMinChunkSize(int chunkSize)
Set the minimum chunk size

Parameters:
chunkSize -

getMinChunkSize

public int getMinChunkSize()
Get the minimum chunk size

Returns:
the chunk size

maxChunkSizeTipText

public java.lang.String maxChunkSizeTipText()
Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setMaxChunkSize

public void setMaxChunkSize(int chunkSize)
Set the maximum chunk size

Parameters:
chunkSize -

getMaxChunkSize

public int getMaxChunkSize()
Get the maximum chunk size

Returns:
the chunk size

validationChunkSizeTipText

public java.lang.String validationChunkSizeTipText()
Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setValidationChunkSize

public void setValidationChunkSize(int chunkSize)
Set the validation chunk size

Parameters:
chunkSize -

getValidationChunkSize

public int getValidationChunkSize()
Get the validation chunk size

Returns:
the chunk size

pruningTypeTipText

public java.lang.String pruningTypeTipText()
Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setPruningType

public void setPruningType(SelectedTag pruneType)
Set the pruning type

Parameters:
pruneType -

getPruningType

public SelectedTag getPruningType()
Get the pruning type

Returns:
the type

debugTipText

public java.lang.String debugTipText()
Description copied from class: Classifier
Returns the tip text for this property

Overrides:
debugTipText in class Classifier
Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setDebug

public void setDebug(boolean debug)
Set debugging mode

Overrides:
setDebug in class Classifier
Parameters:
debug - true if debug output should be printed

getDebug

public boolean getDebug()
Get whether debugging is turned on

Overrides:
getDebug in class Classifier
Returns:
true if debugging output is on

useResamplingTipText

public java.lang.String useResamplingTipText()
Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setUseResampling

public void setUseResampling(boolean r)
Set resampling mode


getUseResampling

public boolean getUseResampling()
Get whether resampling is turned on

Returns:
true if resampling output is on

seedTipText

public java.lang.String seedTipText()
Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setSeed

public void setSeed(int seed)
Set seed for resampling.

Parameters:
seed - the seed for resampling

getSeed

public int getSeed()
Get seed for resampling.

Returns:
the seed for resampling

getBestCommitteeChunkSize

public int getBestCommitteeChunkSize()
Get the best committee chunk size


getBestCommitteeSize

public int getBestCommitteeSize()
Get the number of members in the best committee


getBestCommitteeErrorEstimate

public double getBestCommitteeErrorEstimate()
Get the best committee's error on the validation data


getBestCommitteeLLEstimate

public double getBestCommitteeLLEstimate()
Get the best committee's log likelihood on the validation data


toString

public java.lang.String toString()
Returns description of the boosted classifier.

Returns:
description of the boosted classifier as a string

main

public static void main(java.lang.String[] argv)
Main method for this class.