weka.filters.unsupervised.instance
Class RemoveMisclassified

java.lang.Object
  extended byweka.filters.Filter
      extended byweka.filters.unsupervised.instance.RemoveMisclassified
All Implemented Interfaces:
OptionHandler, java.io.Serializable, UnsupervisedFilter

public class RemoveMisclassified
extends Filter
implements UnsupervisedFilter, OptionHandler

A filter that removes instances which are incorrectly classified. Useful for removing outliers.

Valid filter-specific options are:

-W classifier string
Full class name of classifier to use, followed by scheme options. (required)

-C class index
Attribute on which misclassifications are based. If < 0 will use any current set class or default to the last attribute. -F number of folds
The number of folds to use for cross-validation cleansing. (<2 = no cross-validation - default)

-T threshold
Threshold for the max error when predicting numeric class. (Value should be >= 0, default = 0.1)

-I max iterations
The maximum number of cleansing iterations to perform. (<1 = until fully cleansed - default)

-V
Invert the match so that correctly classified instances are discarded.

Version:
$Revision: 1.2 $
Author:
Richard Kirkby (rkirkby@cs.waikato.ac.nz), Malcolm Ware (mfw4@cs.waikato.ac.nz)
See Also:
Serialized Form

Field Summary
protected  int m_classIndex
          The attribute to treat as the class for purposes of cleansing.
protected  Classifier m_cleansingClassifier
          The classifier used to do the cleansing
protected  boolean m_firstBatchFinished
          Have we processed the first batch (i.e. training data)?
protected  boolean m_invertMatching
          Whether to invert the match so the correctly classified instances are discarded
protected  double m_numericClassifyThreshold
          The threshold for deciding when a numeric value is correctly classified
protected  int m_numOfCleansingIterations
          The maximum number of cleansing iterations to perform (<1 = until fully cleansed)
protected  int m_numOfCrossValidationFolds
          The number of cross validation folds to perform (<2 = no cross validation)
 
Fields inherited from class weka.filters.Filter
m_NewBatch
 
Constructor Summary
RemoveMisclassified()
           
 
Method Summary
 boolean batchFinished()
          Signify that this batch of input to the filter is finished.
 java.lang.String classifierTipText()
          Returns the tip text for this property
 java.lang.String classIndexTipText()
          Returns the tip text for this property
private  Instances cleanseCross(Instances data)
          Cleanses the data based on misclassifications when performing cross-validation.
private  Instances cleanseTrain(Instances data)
          Cleanses the data based on misclassifications when used training data.
 Classifier getClassifier()
          Gets the classifier used by the filter.
protected  java.lang.String getClassifierSpec()
          Gets the classifier specification string, which contains the class name of the classifier and any options to the classifier.
 int getClassIndex()
          Gets the attribute on which misclassifications are based.
 boolean getInvert()
          Get whether selection is inverted.
 int getMaxIterations()
          Gets the maximum number of cleansing iterations performed
 int getNumFolds()
          Gets the number of cross-validation folds used by the filter.
 java.lang.String[] getOptions()
          Gets the current settings of the filter.
 double getThreshold()
          Gets the threshold for the max error when predicting a numeric class.
 java.lang.String globalInfo()
          Returns a string describing this filter
 boolean input(Instance instance)
          Input an instance for filtering.
 java.lang.String invertTipText()
          Returns the tip text for this property
 java.util.Enumeration listOptions()
          Returns an enumeration describing the available options.
static void main(java.lang.String[] argv)
          Main method for testing this class.
 java.lang.String maxIterationsTipText()
          Returns the tip text for this property
 java.lang.String numFoldsTipText()
          Returns the tip text for this property
 void setClassifier(Classifier classifier)
          Sets the classifier to classify instances with.
 void setClassIndex(int classIndex)
          Sets the attribute on which misclassifications are based.
 boolean setInputFormat(Instances instanceInfo)
          Sets the format of the input instances.
 void setInvert(boolean invert)
          Set whether selection is inverted.
 void setMaxIterations(int iterations)
          Sets the maximum number of cleansing iterations to perform - < 1 means go until fully cleansed
 void setNumFolds(int numOfFolds)
          Sets the number of cross-validation folds to use - < 2 means no cross-validation.
 void setOptions(java.lang.String[] options)
          Parses the options for this object.
 void setThreshold(double threshold)
          Sets the threshold for the max error when predicting a numeric class.
 java.lang.String thresholdTipText()
          Returns the tip text for this property
 
Methods inherited from class weka.filters.Filter
batchFilterFile, bufferInput, copyStringValues, copyStringValues, filterFile, flushInput, getInputFormat, getInputStringIndex, getOutputFormat, getOutputStringIndex, getStringIndices, inputFormat, inputFormatPeek, isOutputFormatDefined, numPendingOutput, output, outputFormat, outputFormatPeek, outputPeek, push, resetQueue, setOutputFormat, useFilter
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

m_cleansingClassifier

protected Classifier m_cleansingClassifier
The classifier used to do the cleansing


m_classIndex

protected int m_classIndex
The attribute to treat as the class for purposes of cleansing.


m_numOfCrossValidationFolds

protected int m_numOfCrossValidationFolds
The number of cross validation folds to perform (<2 = no cross validation)


m_numOfCleansingIterations

protected int m_numOfCleansingIterations
The maximum number of cleansing iterations to perform (<1 = until fully cleansed)


m_numericClassifyThreshold

protected double m_numericClassifyThreshold
The threshold for deciding when a numeric value is correctly classified


m_invertMatching

protected boolean m_invertMatching
Whether to invert the match so the correctly classified instances are discarded


m_firstBatchFinished

protected boolean m_firstBatchFinished
Have we processed the first batch (i.e. training data)?

Constructor Detail

RemoveMisclassified

public RemoveMisclassified()
Method Detail

setInputFormat

public boolean setInputFormat(Instances instanceInfo)
                       throws java.lang.Exception
Sets the format of the input instances.

Overrides:
setInputFormat in class Filter
Parameters:
instanceInfo - an Instances object containing the input instance structure (any instances contained in the object are ignored - only the structure is required).
Returns:
true if the outputFormat may be collected immediately
Throws:
java.lang.Exception - if the inputFormat can't be set successfully

cleanseTrain

private Instances cleanseTrain(Instances data)
                        throws java.lang.Exception
Cleanses the data based on misclassifications when used training data.

Parameters:
data - the data to train with and cleanse
Throws:
java.lang.Exception

cleanseCross

private Instances cleanseCross(Instances data)
                        throws java.lang.Exception
Cleanses the data based on misclassifications when performing cross-validation.

Parameters:
data - the data to train with and cleanse
Throws:
java.lang.Exception

input

public boolean input(Instance instance)
              throws java.lang.Exception
Input an instance for filtering.

Overrides:
input in class Filter
Parameters:
instance - the input instance
Returns:
true if the filtered instance may now be collected with output().
Throws:
java.lang.NullPointerException - if the input format has not been defined.
java.lang.Exception - if the input instance was not of the correct format or if there was a problem with the filtering.

batchFinished

public boolean batchFinished()
                      throws java.lang.Exception
Signify that this batch of input to the filter is finished.

Overrides:
batchFinished in class Filter
Returns:
true if there are instances pending output
Throws:
java.lang.IllegalStateException - if no input structure has been defined
java.lang.Exception - if there was a problem finishing the batch.

listOptions

public java.util.Enumeration listOptions()
Returns an enumeration describing the available options.

Specified by:
listOptions in interface OptionHandler
Returns:
an enumeration of all the available options.

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses the options for this object. Valid options are:

-W classifier string
Full class name of classifier to use, followed by scheme options. (required)

-C class index
Attribute on which misclassifications are based. If < 0 will use any current set class or default to the last attribute. -F number of folds
The number of folds to use for cross-validation cleansing. (<2 = no cross-validation - default)

-T threshold
Threshold for the max error when predicting numeric class. (Value should be >= 0, default = 0.1)

-I max iterations
The maximum number of cleansing iterations to perform. (<1 = until fully cleansed - default)

-V
Invert the match so that correctly classified instances are discarded.

Specified by:
setOptions in interface OptionHandler
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

getOptions

public java.lang.String[] getOptions()
Gets the current settings of the filter.

Specified by:
getOptions in interface OptionHandler
Returns:
an array of strings suitable for passing to setOptions

globalInfo

public java.lang.String globalInfo()
Returns a string describing this filter

Returns:
a description of the filter suitable for displaying in the explorer/experimenter gui

classifierTipText

public java.lang.String classifierTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setClassifier

public void setClassifier(Classifier classifier)
Sets the classifier to classify instances with.

Parameters:
classifier - The classifier to be used (with its options set).

getClassifier

public Classifier getClassifier()
Gets the classifier used by the filter.

Returns:
The classifier to be used.

getClassifierSpec

protected java.lang.String getClassifierSpec()
Gets the classifier specification string, which contains the class name of the classifier and any options to the classifier.

Returns:
the classifier string.

classIndexTipText

public java.lang.String classIndexTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setClassIndex

public void setClassIndex(int classIndex)
Sets the attribute on which misclassifications are based. If < 0 will use any current set class or default to the last attribute.

Parameters:
classIndex - the class index.

getClassIndex

public int getClassIndex()
Gets the attribute on which misclassifications are based.

Returns:
the class index.

numFoldsTipText

public java.lang.String numFoldsTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setNumFolds

public void setNumFolds(int numOfFolds)
Sets the number of cross-validation folds to use - < 2 means no cross-validation.

Parameters:
numOfFolds - the number of folds.

getNumFolds

public int getNumFolds()
Gets the number of cross-validation folds used by the filter.

Returns:
the number of folds.

thresholdTipText

public java.lang.String thresholdTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setThreshold

public void setThreshold(double threshold)
Sets the threshold for the max error when predicting a numeric class. The value should be >= 0.

Parameters:
threshold - the numeric theshold.

getThreshold

public double getThreshold()
Gets the threshold for the max error when predicting a numeric class.

Returns:
the numeric threshold.

maxIterationsTipText

public java.lang.String maxIterationsTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setMaxIterations

public void setMaxIterations(int iterations)
Sets the maximum number of cleansing iterations to perform - < 1 means go until fully cleansed

Parameters:
iterations - the maximum number of iterations.

getMaxIterations

public int getMaxIterations()
Gets the maximum number of cleansing iterations performed

Returns:
the maximum number of iterations.

invertTipText

public java.lang.String invertTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setInvert

public void setInvert(boolean invert)
Set whether selection is inverted.

Parameters:
invert - whether or not to invert selection.

getInvert

public boolean getInvert()
Get whether selection is inverted.

Returns:
whether or not selection is inverted.

main

public static void main(java.lang.String[] argv)
Main method for testing this class.

Parameters:
argv - should contain arguments to the filter: use -h for help