weka.classifiers.meta
Class Decorate

java.lang.Object
  extended byweka.classifiers.Classifier
      extended byweka.classifiers.meta.Decorate
All Implemented Interfaces:
java.lang.Cloneable, OptionHandler, java.io.Serializable

public class Decorate
extends Classifier
implements OptionHandler

DECORATE is a meta-learner for building diverse ensembles of classifiers by using specially constructed artificial training examples. Comprehensive experiments have demonstrated that this technique is consistently more accurate than the base classifier, Bagging and Random Forests. Decorate also obtains higher accuracy than Boosting on small training sets, and achieves comparable performance on larger training sets. For more details see:

Prem Melville and Raymond J. Mooney. Constructing diverse classifier ensembles using artificial training examples. Proceedings of the Seventeeth International Joint Conference on Artificial Intelligence 2003.

Prem Melville and Raymond J. Mooney. Creating diversity in ensembles using artificial data. Submitted.

Valid options are:

-D
Turn on debugging output.

-W classname
Specify the full class name of a weak classifier as the basis for Decorate (default weka.classifiers.trees.J48()).

-I num
Specify the desired size of the committee (default 15).

-M iterations
Set the maximum number of Decorate iterations (default 50).

-S seed
Seed for random number generator. (default 0).

-R factor
Factor that determines number of artificial examples to generate.

Options after -- are passed to the designated classifier.

Version:
$Revision: 1.2 $
Author:
Prem Melville (melville@cs.utexas.edu)
See Also:
Serialized Form

Field Summary
protected  double m_ArtSize
          Amount of artificial/random instances to use - specified as a fraction of the training data size.
protected  java.util.Vector m_AttributeStats
          Attribute statistics - used for generating artificial examples.
protected  Classifier m_Classifier
          The model base classifier to use.
protected  java.util.Vector m_Committee
          Vector of classifiers that make up the committee/ensemble.
protected  boolean m_Debug
          Set to true to get debugging output.
protected  int m_DesiredSize
          The desired ensemble size.
protected  int m_NumIterations
          The maximum number of Decorate iterations to run.
protected  java.util.Random m_Random
          The random number generator.
protected  int m_Seed
          The seed for random number generation.
 
Constructor Summary
Decorate()
           
 
Method Summary
protected  void addInstances(Instances data, Instances newData)
          Add new instances to the given set of instances.
 java.lang.String artificialSizeTipText()
          Returns the tip text for this property
 void buildClassifier(Instances data)
          Build Decorate classifier
 java.lang.String classifierTipText()
          Returns the tip text for this property
protected  double computeError(Instances data)
          Computes the error in classification on the given data.
protected  void computeStats(Instances data)
          Compute and store statistics required for generating artificial data.
 java.lang.String desiredSizeTipText()
          Returns the tip text for this property
 double[] distributionForInstance(Instance instance)
          Calculates the class membership probabilities for the given test instance.
protected  Instances generateArtificialData(int artSize, Instances data)
          Generate artificial training examples.
 double getArtificialSize()
          Factor that determines number of artificial examples to generate.
 Classifier getClassifier()
          Get the classifier used as the base classifier
 boolean getDebug()
          Get whether debugging is turned on
 int getDesiredSize()
          Gets the desired size of the committee.
 int getNumIterations()
          Gets the max number of Decorate iterations to run.
 java.lang.String[] getOptions()
          Gets the current settings of the Classifier.
 int getSeed()
          Gets the seed for the random number generator.
 java.lang.String globalInfo()
          Returns a string describing classifier
protected  int inverseLabel(double[] probs)
          Select class label such that the probability of selection is inversely proportional to the ensemble's predictions.
protected  void labelData(Instances artData)
          Labels the artificially generated data.
 java.util.Enumeration listOptions()
          Returns an enumeration describing the available options
static void main(java.lang.String[] argv)
          Main method for testing this class.
 java.lang.String numIterationsTipText()
          Returns the tip text for this property
protected  void removeInstances(Instances data, int numRemove)
          Removes a specified number of instances from the given set of instances.
 java.lang.String seedTipText()
          Returns the tip text for this property
protected  int selectIndexProbabilistically(double[] cdf)
          Given cumulative probabilities select a nominal attribute value index
 void setArtificialSize(double newArtSize)
          Sets factor that determines number of artificial examples to generate.
 void setClassifier(Classifier newClassifier)
          Set the base classifier for Decorate.
 void setDebug(boolean debug)
          Set debugging mode
 void setDesiredSize(int newDesiredSize)
          Sets the desired size of the committee.
 void setNumIterations(int numIterations)
          Sets the max number of Decorate iterations to run.
 void setOptions(java.lang.String[] options)
          Parses a given list of options.
 void setSeed(int seed)
          Set the seed for random number generator.
 java.lang.String toString()
          Returns description of the Decorate classifier.
 
Methods inherited from class weka.classifiers.Classifier
classifyInstance, debugTipText, forName, makeCopies
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

m_Debug

protected boolean m_Debug
Set to true to get debugging output.


m_Classifier

protected Classifier m_Classifier
The model base classifier to use.


m_Committee

protected java.util.Vector m_Committee
Vector of classifiers that make up the committee/ensemble.


m_DesiredSize

protected int m_DesiredSize
The desired ensemble size.


m_NumIterations

protected int m_NumIterations
The maximum number of Decorate iterations to run.


m_Seed

protected int m_Seed
The seed for random number generation.


m_ArtSize

protected double m_ArtSize
Amount of artificial/random instances to use - specified as a fraction of the training data size.


m_Random

protected java.util.Random m_Random
The random number generator.


m_AttributeStats

protected java.util.Vector m_AttributeStats
Attribute statistics - used for generating artificial examples.

Constructor Detail

Decorate

public Decorate()
Method Detail

listOptions

public java.util.Enumeration listOptions()
Returns an enumeration describing the available options

Specified by:
listOptions in interface OptionHandler
Overrides:
listOptions in class Classifier
Returns:
an enumeration of all the available options

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a given list of options. Valid options are:

-D
Turn on debugging output.

-W classname
Specify the full class name of a weak classifier as the basis for Decorate (required).

-I num
Specify the desired size of the committee (default 15).

-M iterations
Set the maximum number of Decorate iterations (default 50).

-S seed
Seed for random number generator. (default 0).

-R factor
Factor that determines number of artificial examples to generate.

Options after -- are passed to the designated classifier.

Specified by:
setOptions in interface OptionHandler
Overrides:
setOptions in class Classifier
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

getOptions

public java.lang.String[] getOptions()
Gets the current settings of the Classifier.

Specified by:
getOptions in interface OptionHandler
Overrides:
getOptions in class Classifier
Returns:
an array of strings suitable for passing to setOptions

desiredSizeTipText

public java.lang.String desiredSizeTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

numIterationsTipText

public java.lang.String numIterationsTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

artificialSizeTipText

public java.lang.String artificialSizeTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

seedTipText

public java.lang.String seedTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

classifierTipText

public java.lang.String classifierTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui b

globalInfo

public java.lang.String globalInfo()
Returns a string describing classifier

Returns:
a description suitable for displaying in the explorer/experimenter gui

setDebug

public void setDebug(boolean debug)
Set debugging mode

Overrides:
setDebug in class Classifier
Parameters:
debug - true if debug output should be printed

getDebug

public boolean getDebug()
Get whether debugging is turned on

Overrides:
getDebug in class Classifier
Returns:
true if debugging output is on

setClassifier

public void setClassifier(Classifier newClassifier)
Set the base classifier for Decorate.

Parameters:
newClassifier - the Classifier to use.

getClassifier

public Classifier getClassifier()
Get the classifier used as the base classifier

Returns:
the classifier used as the classifier

getArtificialSize

public double getArtificialSize()
Factor that determines number of artificial examples to generate.

Returns:
factor that determines number of artificial examples to generate

setArtificialSize

public void setArtificialSize(double newArtSize)
Sets factor that determines number of artificial examples to generate.


getDesiredSize

public int getDesiredSize()
Gets the desired size of the committee.

Returns:
the desired size of the committee

setDesiredSize

public void setDesiredSize(int newDesiredSize)
Sets the desired size of the committee.

Parameters:
newDesiredSize - the desired size of the committee

setNumIterations

public void setNumIterations(int numIterations)
Sets the max number of Decorate iterations to run.

Parameters:
numIterations - max number of Decorate iterations to run

getNumIterations

public int getNumIterations()
Gets the max number of Decorate iterations to run.

Returns:
the max number of Decorate iterations to run

setSeed

public void setSeed(int seed)
Set the seed for random number generator.

Parameters:
seed - the random number seed

getSeed

public int getSeed()
Gets the seed for the random number generator.

Returns:
the seed for the random number generator

buildClassifier

public void buildClassifier(Instances data)
                     throws java.lang.Exception
Build Decorate classifier

Specified by:
buildClassifier in class Classifier
Parameters:
data - the training data to be used for generating the classifier
Throws:
java.lang.Exception - if the classifier could not be built successfully

computeStats

protected void computeStats(Instances data)
                     throws java.lang.Exception
Compute and store statistics required for generating artificial data.

Parameters:
data - training instances
Throws:
java.lang.Exception - if statistics could not be calculated successfully

generateArtificialData

protected Instances generateArtificialData(int artSize,
                                           Instances data)
Generate artificial training examples.

Parameters:
artSize - size of examples set to create
data - training data
Returns:
the set of unlabeled artificial examples

labelData

protected void labelData(Instances artData)
                  throws java.lang.Exception
Labels the artificially generated data.

Parameters:
artData - the artificially generated instances
Throws:
java.lang.Exception - if instances cannot be labeled successfully

inverseLabel

protected int inverseLabel(double[] probs)
                    throws java.lang.Exception
Select class label such that the probability of selection is inversely proportional to the ensemble's predictions.

Parameters:
probs - class membership probabilities of instance
Returns:
index of class label selected
Throws:
java.lang.Exception - if instances cannot be labeled successfully

selectIndexProbabilistically

protected int selectIndexProbabilistically(double[] cdf)
Given cumulative probabilities select a nominal attribute value index

Parameters:
cdf - array of cumulative probabilities
Returns:
index of attribute selected based on the probability distribution

removeInstances

protected void removeInstances(Instances data,
                               int numRemove)
Removes a specified number of instances from the given set of instances.

Parameters:
data - given instances
numRemove - number of instances to delete from the given instances

addInstances

protected void addInstances(Instances data,
                            Instances newData)
Add new instances to the given set of instances.

Parameters:
data - given instances
newData - set of instances to add to given instances

computeError

protected double computeError(Instances data)
                       throws java.lang.Exception
Computes the error in classification on the given data.

Parameters:
data - the instances to be classified
Returns:
classification error
Throws:
java.lang.Exception - if error can not be computed successfully

distributionForInstance

public double[] distributionForInstance(Instance instance)
                                 throws java.lang.Exception
Calculates the class membership probabilities for the given test instance.

Overrides:
distributionForInstance in class Classifier
Parameters:
instance - the instance to be classified
Returns:
predicted class probability distribution
Throws:
java.lang.Exception - if distribution can't be computed successfully

toString

public java.lang.String toString()
Returns description of the Decorate classifier.

Returns:
description of the Decorate classifier as a string

main

public static void main(java.lang.String[] argv)
Main method for testing this class.

Parameters:
argv - the options