weka.classifiers
Class CheckClassifier

java.lang.Object
  extended byweka.classifiers.CheckClassifier
All Implemented Interfaces:
OptionHandler

public class CheckClassifier
extends java.lang.Object
implements OptionHandler

Class for examining the capabilities and finding problems with classifiers. If you implement a classifier using the WEKA.libraries, you should run the checks on it to ensure robustness and correct operation. Passing all the tests of this object does not mean bugs in the classifier don't exist, but this will help find some common ones.

Typical usage:

java weka.classifiers.CheckClassifier -W classifier_name classifier_options

CheckClassifier reports on the following:

Running CheckClassifier with the debug option set will output the training and test datasets for any failed tests.

Valid options are:

-D
Turn on debugging output.

-W classname
Specify the full class name of a classifier to perform the tests on (required).

Options after -- are passed to the designated classifier.

Version:
$Revision: 1.15 $
Author:
Len Trigg (trigg@cs.waikato.ac.nz)

Field Summary
protected  java.lang.String m_AnalysisResults
          The results of the analysis as a string
protected  Classifier m_Classifier
          The classifier to be examined
protected  java.lang.String[] m_ClassifierOptions
          The options to be passed to the base classifier.
protected  boolean m_Debug
          Debugging mode, gives extra output if true
 
Constructor Summary
CheckClassifier()
           
 
Method Summary
protected  void addMissing(Instances data, int level, boolean predictorMissing, boolean classMissing)
          Add missing values to a dataset.
protected  boolean canHandleMissing(boolean nominalPredictor, boolean numericPredictor, boolean numericClass, boolean predictorMissing, boolean classMissing, int missingLevel)
          Checks basic missing value handling of the scheme.
protected  boolean canHandleNClasses(boolean nominalPredictor, boolean numericPredictor, int numClasses)
          Checks whether nominal schemes can handle more than two classes.
protected  boolean canHandleZeroTraining(boolean nominalPredictor, boolean numericPredictor, boolean numericClass)
          Checks whether the scheme can handle zero training instances.
protected  boolean canPredict(boolean nominalPredictor, boolean numericPredictor, boolean numericClass)
          Checks basic prediction of the scheme, for simple non-troublesome datasets.
protected  boolean canTakeOptions()
          Checks whether the scheme can take command line options.
protected  void compareDatasets(Instances data1, Instances data2)
          Compare two datasets to see if they differ.
protected  boolean correctBuildInitialisation(boolean nominalPredictor, boolean numericPredictor, boolean numericClass)
          Checks whether the scheme correctly initialises models when buildClassifier is called.
protected  boolean datasetIntegrity(boolean nominalPredictor, boolean numericPredictor, boolean numericClass, boolean predictorMissing, boolean classMissing)
          Checks whether the scheme alters the training dataset during training.
protected  boolean doesntUseTestClassVal(boolean nominalPredictor, boolean numericPredictor, boolean numericClass)
          Checks whether the classifier erroneously uses the class value of test instances (if provided).
 void doTests()
          Begin the tests, reporting results to System.out
 Classifier getClassifier()
          Get the classifier used as the classifier
 boolean getDebug()
          Get whether debugging is turned on
 java.lang.String[] getOptions()
          Gets the current settings of the CheckClassifier.
protected  boolean instanceWeights(boolean nominalPredictor, boolean numericPredictor, boolean numericClass)
          Checks whether the classifier can handle instance weights.
 java.util.Enumeration listOptions()
          Returns an enumeration describing the available options.
static void main(java.lang.String[] args)
          Test method for this class
protected  Instances makeTestDataset(int seed, int numInstances, int numNominal, int numNumeric, int numClasses, boolean numericClass)
          Make a simple set of instances, which can later be modified for use in specific tests.
protected  void printAttributeSummary(boolean nominalPredictor, boolean numericPredictor, boolean numericClass)
          Print out a short summary string for the dataset characteristics
protected  boolean runBasicTest(boolean nominalPredictor, boolean numericPredictor, boolean numericClass, int missingLevel, boolean predictorMissing, boolean classMissing, int numTrain, int numTest, int numClasses, FastVector accepts)
          Runs a text on the datasets with the given characteristics.
 void setClassifier(Classifier newClassifier)
          Set the classifier for boosting.
 void setDebug(boolean debug)
          Set debugging mode
 void setOptions(java.lang.String[] options)
          Parses a given list of options.
protected  void testsPerClassType(boolean numericClass, boolean updateable, boolean weighted)
          Run a battery of tests for a given class attribute type
protected  boolean testWRTZeroR(Classifier classifier, Evaluation evaluation, Instances train, Instances test)
          Determine whether the scheme performs worse than ZeroR during testing
protected  boolean updateableClassifier()
          Checks whether the scheme can build models incrementally.
protected  boolean updatingEquality(boolean nominalPredictor, boolean numericPredictor, boolean numericClass)
          Checks whether an updateable scheme produces the same model when trained incrementally as when batch trained.
protected  boolean weightedInstancesHandler()
          Checks whether the scheme says it can handle instance weights.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

m_Classifier

protected Classifier m_Classifier
The classifier to be examined


m_ClassifierOptions

protected java.lang.String[] m_ClassifierOptions
The options to be passed to the base classifier.


m_AnalysisResults

protected java.lang.String m_AnalysisResults
The results of the analysis as a string


m_Debug

protected boolean m_Debug
Debugging mode, gives extra output if true

Constructor Detail

CheckClassifier

public CheckClassifier()
Method Detail

listOptions

public java.util.Enumeration listOptions()
Returns an enumeration describing the available options.

Specified by:
listOptions in interface OptionHandler
Returns:
an enumeration of all the available options.

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a given list of options. Valid options are:

-D
Turn on debugging output.

-W classname
Specify the full class name of a classifier to perform the tests on (required).

Options after -- are passed to the designated classifier

Specified by:
setOptions in interface OptionHandler
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

getOptions

public java.lang.String[] getOptions()
Gets the current settings of the CheckClassifier.

Specified by:
getOptions in interface OptionHandler
Returns:
an array of strings suitable for passing to setOptions

doTests

public void doTests()
Begin the tests, reporting results to System.out


setDebug

public void setDebug(boolean debug)
Set debugging mode

Parameters:
debug - true if debug output should be printed

getDebug

public boolean getDebug()
Get whether debugging is turned on

Returns:
true if debugging output is on

setClassifier

public void setClassifier(Classifier newClassifier)
Set the classifier for boosting.

Parameters:
newClassifier - the Classifier to use.

getClassifier

public Classifier getClassifier()
Get the classifier used as the classifier

Returns:
the classifier used as the classifier

main

public static void main(java.lang.String[] args)
Test method for this class


testsPerClassType

protected void testsPerClassType(boolean numericClass,
                                 boolean updateable,
                                 boolean weighted)
Run a battery of tests for a given class attribute type

Parameters:
numericClass - true if the class attribute should be numeric
updateable - true if the classifier is updateable
weighted - true if the classifier says it handles weights

canTakeOptions

protected boolean canTakeOptions()
Checks whether the scheme can take command line options.

Returns:
true if the classifier can take options

updateableClassifier

protected boolean updateableClassifier()
Checks whether the scheme can build models incrementally.

Returns:
true if the classifier can train incrementally

weightedInstancesHandler

protected boolean weightedInstancesHandler()
Checks whether the scheme says it can handle instance weights.

Returns:
true if the classifier handles instance weights

canPredict

protected boolean canPredict(boolean nominalPredictor,
                             boolean numericPredictor,
                             boolean numericClass)
Checks basic prediction of the scheme, for simple non-troublesome datasets.

Parameters:
nominalPredictor - if true use nominal predictor attributes
numericPredictor - if true use numeric predictor attributes
numericClass - if true use a numeric class attribute otherwise a nominal class attribute
Returns:
true if the test was passed

canHandleNClasses

protected boolean canHandleNClasses(boolean nominalPredictor,
                                    boolean numericPredictor,
                                    int numClasses)
Checks whether nominal schemes can handle more than two classes. If a scheme is only designed for two-class problems it should throw an appropriate exception for multi-class problems.

Parameters:
nominalPredictor - if true use nominal predictor attributes
numericPredictor - if true use numeric predictor attributes
numClasses - the number of classes to test
Returns:
true if the test was passed

canHandleZeroTraining

protected boolean canHandleZeroTraining(boolean nominalPredictor,
                                        boolean numericPredictor,
                                        boolean numericClass)
Checks whether the scheme can handle zero training instances.

Parameters:
nominalPredictor - if true use nominal predictor attributes
numericPredictor - if true use numeric predictor attributes
numericClass - if true use a numeric class attribute otherwise a nominal class attribute
Returns:
true if the test was passed

correctBuildInitialisation

protected boolean correctBuildInitialisation(boolean nominalPredictor,
                                             boolean numericPredictor,
                                             boolean numericClass)
Checks whether the scheme correctly initialises models when buildClassifier is called. This test calls buildClassifier with one training dataset and records performance on a test set. buildClassifier is then called on a training set with different structure, and then again with the original training set. The performance on the test set is compared with the original results and any performance difference noted as incorrect build initialisation.

Parameters:
nominalPredictor - if true use nominal predictor attributes
numericPredictor - if true use numeric predictor attributes
numericClass - if true use a numeric class attribute otherwise a nominal class attribute
Returns:
true if the test was passed

canHandleMissing

protected boolean canHandleMissing(boolean nominalPredictor,
                                   boolean numericPredictor,
                                   boolean numericClass,
                                   boolean predictorMissing,
                                   boolean classMissing,
                                   int missingLevel)
Checks basic missing value handling of the scheme. If the missing values cause an exception to be thrown by the scheme, this will be recorded.

Parameters:
nominalPredictor - if true use nominal predictor attributes
numericPredictor - if true use numeric predictor attributes
numericClass - if true use a numeric class attribute otherwise a nominal class attribute
predictorMissing - true if the missing values may be in the predictors
classMissing - true if the missing values may be in the class
Returns:
true if the test was passed

updatingEquality

protected boolean updatingEquality(boolean nominalPredictor,
                                   boolean numericPredictor,
                                   boolean numericClass)
Checks whether an updateable scheme produces the same model when trained incrementally as when batch trained. The model itself cannot be compared, so we compare the evaluation on test data for both models. It is possible to get a false positive on this test (likelihood depends on the classifier).

Parameters:
nominalPredictor - if true use nominal predictor attributes
numericPredictor - if true use numeric predictor attributes
numericClass - if true use a numeric class attribute otherwise a nominal class attribute
Returns:
true if the test was passed

doesntUseTestClassVal

protected boolean doesntUseTestClassVal(boolean nominalPredictor,
                                        boolean numericPredictor,
                                        boolean numericClass)
Checks whether the classifier erroneously uses the class value of test instances (if provided). Runs the classifier with test instance class values set to missing and compares with results when test instance class values are left intact.

Parameters:
nominalPredictor - if true use nominal predictor attributes
numericPredictor - if true use numeric predictor attributes
numericClass - if true use a numeric class attribute otherwise a nominal class attribute
Returns:
true if the test was passed

instanceWeights

protected boolean instanceWeights(boolean nominalPredictor,
                                  boolean numericPredictor,
                                  boolean numericClass)
Checks whether the classifier can handle instance weights. This test compares the classifier performance on two datasets that are identical except for the training weights. If the results change, then the classifier must be using the weights. It may be possible to get a false positive from this test if the weight changes aren't significant enough to induce a change in classifier performance (but the weights are chosen to minimize the likelihood of this).

Parameters:
nominalPredictor - if true use nominal predictor attributes
numericPredictor - if true use numeric predictor attributes
numericClass - if true use a numeric class attribute otherwise a nominal class attribute
Returns:
true if the test was passed

datasetIntegrity

protected boolean datasetIntegrity(boolean nominalPredictor,
                                   boolean numericPredictor,
                                   boolean numericClass,
                                   boolean predictorMissing,
                                   boolean classMissing)
Checks whether the scheme alters the training dataset during training. If the scheme needs to modify the training data it should take a copy of the training data. Currently checks for changes to header structure, number of instances, order of instances, instance weights.

Parameters:
nominalPredictor - if true use nominal predictor attributes
numericPredictor - if true use numeric predictor attributes
numericClass - if true use a numeric class attribute otherwise a nominal class attribute
predictorMissing - true if we know the classifier can handle (at least) moderate missing predictor values
classMissing - true if we know the classifier can handle (at least) moderate missing class values
Returns:
true if the test was passed

runBasicTest

protected boolean runBasicTest(boolean nominalPredictor,
                               boolean numericPredictor,
                               boolean numericClass,
                               int missingLevel,
                               boolean predictorMissing,
                               boolean classMissing,
                               int numTrain,
                               int numTest,
                               int numClasses,
                               FastVector accepts)
Runs a text on the datasets with the given characteristics.


testWRTZeroR

protected boolean testWRTZeroR(Classifier classifier,
                               Evaluation evaluation,
                               Instances train,
                               Instances test)
                        throws java.lang.Exception
Determine whether the scheme performs worse than ZeroR during testing

Parameters:
classifier - the pre-trained classifier
evaluation - the classifier evaluation object
train - the training data
test - the test data
Returns:
true if the scheme performs better than ZeroR
Throws:
java.lang.Exception - if there was a problem during the scheme's testing

compareDatasets

protected void compareDatasets(Instances data1,
                               Instances data2)
                        throws java.lang.Exception
Compare two datasets to see if they differ.

Parameters:
data1 - one set of instances
data2 - the other set of instances
Throws:
java.lang.Exception - if the datasets differ

addMissing

protected void addMissing(Instances data,
                          int level,
                          boolean predictorMissing,
                          boolean classMissing)
Add missing values to a dataset.

Parameters:
data - the instances to add missing values to
level - the level of missing values to add (if positive, this is the probability that a value will be set to missing, if negative all but one value will be set to missing (not yet implemented))
predictorMissing - if true, predictor attributes will be modified
classMissing - if true, the class attribute will be modified

makeTestDataset

protected Instances makeTestDataset(int seed,
                                    int numInstances,
                                    int numNominal,
                                    int numNumeric,
                                    int numClasses,
                                    boolean numericClass)
                             throws java.lang.Exception
Make a simple set of instances, which can later be modified for use in specific tests.

Parameters:
seed - the random number seed
numInstances - the number of instances to generate
numNominal - the number of nominal attributes
numNumeric - the number of numeric attributes
numClasses - the number of classes (if nominal class)
numericClass - true if the class attribute should be numeric
Returns:
the test dataset
Throws:
java.lang.Exception - if the dataset couldn't be generated

printAttributeSummary

protected void printAttributeSummary(boolean nominalPredictor,
                                     boolean numericPredictor,
                                     boolean numericClass)
Print out a short summary string for the dataset characteristics

Parameters:
nominalPredictor - true if nominal predictor attributes are present
numericPredictor - true if numeric predictor attributes are present
numericClass - true if the class attribute is numeric