CheckClassifier (Documentation for extended WEKA including Ensembles of Hierarchically Nested Dichotomies)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

weka.classifiers
Class CheckClassifier

java.lang.Object
  weka.classifiers.CheckClassifier

All Implemented Interfaces:: OptionHandler

public class CheckClassifier
extends java.lang.Object
implements OptionHandler

Class for examining the capabilities and finding problems with classifiers. If you implement a classifier using the WEKA.libraries, you should run the checks on it to ensure robustness and correct operation. Passing all the tests of this object does not mean bugs in the classifier don't exist, but this will help find some common ones.

Typical usage:

java weka.classifiers.CheckClassifier -W classifier_name classifier_options

CheckClassifier reports on the following:

Classifier abilities
- Possible command line options to the classifier
- Whether the classifier can predict nominal and/or predict numeric class attributes. Warnings will be displayed if performance is worse than ZeroR
- Whether the classifier can be trained incrementally
- Whether the classifier can handle numeric predictor attributes
- Whether the classifier can handle nominal predictor attributes
- Whether the classifier can handle string predictor attributes
- Whether the classifier can handle missing predictor values
- Whether the classifier can handle missing class values
- Whether a nominal classifier only handles 2 class problems
- Whether the classifier can handle instance weights
Correct functioning
- Correct initialisation during buildClassifier (i.e. no result changes when buildClassifier called repeatedly)
- Whether incremental training produces the same results as during non-incremental training (which may or may not be OK)
- Whether the classifier alters the data pased to it (number of instances, instance order, instance weights, etc)
Degenerate cases
- building classifier with zero training instances
- all but one predictor attribute values missing
- all predictor attribute values missing
- all but one class values missing
- all class values missing

Running CheckClassifier with the debug option set will output the training and test datasets for any failed tests.

Valid options are:

-D
Turn on debugging output.

-W classname
Specify the full class name of a classifier to perform the tests on (required).

Options after -- are passed to the designated classifier.

Version:: $Revision: 1.15 $
Author:: Len Trigg (trigg@cs.waikato.ac.nz)

Field Summary
`protected java.lang.String`	`m_AnalysisResults` The results of the analysis as a string
`protected Classifier`	`m_Classifier` The classifier to be examined
`protected java.lang.String[]`	`m_ClassifierOptions` The options to be passed to the base classifier.
`protected boolean`	`m_Debug` Debugging mode, gives extra output if true

Constructor Summary
`CheckClassifier()`

Method Summary
`protected void`	`addMissing(Instances data, int level, boolean predictorMissing, boolean classMissing)` Add missing values to a dataset.
`protected boolean`	`canHandleMissing(boolean nominalPredictor, boolean numericPredictor, boolean numericClass, boolean predictorMissing, boolean classMissing, int missingLevel)` Checks basic missing value handling of the scheme.
`protected boolean`	`canHandleNClasses(boolean nominalPredictor, boolean numericPredictor, int numClasses)` Checks whether nominal schemes can handle more than two classes.
`protected boolean`	`canHandleZeroTraining(boolean nominalPredictor, boolean numericPredictor, boolean numericClass)` Checks whether the scheme can handle zero training instances.
`protected boolean`	`canPredict(boolean nominalPredictor, boolean numericPredictor, boolean numericClass)` Checks basic prediction of the scheme, for simple non-troublesome datasets.
`protected boolean`	`canTakeOptions()` Checks whether the scheme can take command line options.
`protected void`	`compareDatasets(Instances data1, Instances data2)` Compare two datasets to see if they differ.
`protected boolean`	`correctBuildInitialisation(boolean nominalPredictor, boolean numericPredictor, boolean numericClass)` Checks whether the scheme correctly initialises models when buildClassifier is called.
`protected boolean`	`datasetIntegrity(boolean nominalPredictor, boolean numericPredictor, boolean numericClass, boolean predictorMissing, boolean classMissing)` Checks whether the scheme alters the training dataset during training.
`protected boolean`	`doesntUseTestClassVal(boolean nominalPredictor, boolean numericPredictor, boolean numericClass)` Checks whether the classifier erroneously uses the class value of test instances (if provided).
`void`	`doTests()` Begin the tests, reporting results to System.out
`Classifier`	`getClassifier()` Get the classifier used as the classifier
`boolean`	`getDebug()` Get whether debugging is turned on
`java.lang.String[]`	`getOptions()` Gets the current settings of the CheckClassifier.
`protected boolean`	`instanceWeights(boolean nominalPredictor, boolean numericPredictor, boolean numericClass)` Checks whether the classifier can handle instance weights.
`java.util.Enumeration`	`listOptions()` Returns an enumeration describing the available options.
`static void`	`main(java.lang.String[] args)` Test method for this class
`protected Instances`	`makeTestDataset(int seed, int numInstances, int numNominal, int numNumeric, int numClasses, boolean numericClass)` Make a simple set of instances, which can later be modified for use in specific tests.
`protected void`	`printAttributeSummary(boolean nominalPredictor, boolean numericPredictor, boolean numericClass)` Print out a short summary string for the dataset characteristics
`protected boolean`	`runBasicTest(boolean nominalPredictor, boolean numericPredictor, boolean numericClass, int missingLevel, boolean predictorMissing, boolean classMissing, int numTrain, int numTest, int numClasses, FastVector accepts)` Runs a text on the datasets with the given characteristics.
`void`	`setClassifier(Classifier newClassifier)` Set the classifier for boosting.
`void`	`setDebug(boolean debug)` Set debugging mode
`void`	`setOptions(java.lang.String[] options)` Parses a given list of options.
`protected void`	`testsPerClassType(boolean numericClass, boolean updateable, boolean weighted)` Run a battery of tests for a given class attribute type
`protected boolean`	`testWRTZeroR(Classifier classifier, Evaluation evaluation, Instances train, Instances test)` Determine whether the scheme performs worse than ZeroR during testing
`protected boolean`	`updateableClassifier()` Checks whether the scheme can build models incrementally.
`protected boolean`	`updatingEquality(boolean nominalPredictor, boolean numericPredictor, boolean numericClass)` Checks whether an updateable scheme produces the same model when trained incrementally as when batch trained.
`protected boolean`	`weightedInstancesHandler()` Checks whether the scheme says it can handle instance weights.

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail

m_Classifier

protected Classifier m_Classifier

The classifier to be examined

m_ClassifierOptions

protected java.lang.String[] m_ClassifierOptions

The options to be passed to the base classifier.

m_AnalysisResults

protected java.lang.String m_AnalysisResults

The results of the analysis as a string

m_Debug

protected boolean m_Debug

Debugging mode, gives extra output if true

Constructor Detail

CheckClassifier

public CheckClassifier()

Method Detail

listOptions

public java.util.Enumeration listOptions()

Returns an enumeration describing the available options.

Specified by:: listOptions in interface OptionHandler

Returns:: an enumeration of all the available options.

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception

Parses a given list of options. Valid options are:

-D
Turn on debugging output.

-W classname
Specify the full class name of a classifier to perform the tests on (required).

Options after -- are passed to the designated classifier

Specified by:: setOptions in interface OptionHandler

Parameters:: options - the list of options as an array of strings
Throws:: java.lang.Exception - if an option is not supported

getOptions

public java.lang.String[] getOptions()

Gets the current settings of the CheckClassifier.

Specified by:: getOptions in interface OptionHandler

Returns:: an array of strings suitable for passing to setOptions

doTests

public void doTests()

Begin the tests, reporting results to System.out

setDebug

public void setDebug(boolean debug)

Set debugging mode

Parameters:: debug - true if debug output should be printed

getDebug

public boolean getDebug()

Get whether debugging is turned on

Returns:: true if debugging output is on

setClassifier

public void setClassifier(Classifier newClassifier)

Set the classifier for boosting.

Parameters:: newClassifier - the Classifier to use.

getClassifier

public Classifier getClassifier()

Get the classifier used as the classifier

Returns:: the classifier used as the classifier

main

public static void main(java.lang.String[] args)

Test method for this class

testsPerClassType

protected void testsPerClassType(boolean numericClass,
                                 boolean updateable,
                                 boolean weighted)

Run a battery of tests for a given class attribute type

Parameters:: numericClass - true if the class attribute should be numeric; updateable - true if the classifier is updateable; weighted - true if the classifier says it handles weights

canTakeOptions

protected boolean canTakeOptions()

Checks whether the scheme can take command line options.

Returns:: true if the classifier can take options

updateableClassifier

protected boolean updateableClassifier()

Checks whether the scheme can build models incrementally.

Returns:: true if the classifier can train incrementally

weightedInstancesHandler

protected boolean weightedInstancesHandler()

Checks whether the scheme says it can handle instance weights.

Returns:: true if the classifier handles instance weights

canPredict

protected boolean canPredict(boolean nominalPredictor,
                             boolean numericPredictor,
                             boolean numericClass)

Checks basic prediction of the scheme, for simple non-troublesome datasets.

Parameters:: nominalPredictor - if true use nominal predictor attributes; numericPredictor - if true use numeric predictor attributes; numericClass - if true use a numeric class attribute otherwise a nominal class attribute
Returns:: true if the test was passed

canHandleNClasses

protected boolean canHandleNClasses(boolean nominalPredictor,
                                    boolean numericPredictor,
                                    int numClasses)

Checks whether nominal schemes can handle more than two classes. If a scheme is only designed for two-class problems it should throw an appropriate exception for multi-class problems.

Parameters:: nominalPredictor - if true use nominal predictor attributes; numericPredictor - if true use numeric predictor attributes; numClasses - the number of classes to test
Returns:: true if the test was passed

canHandleZeroTraining

protected boolean canHandleZeroTraining(boolean nominalPredictor,
                                        boolean numericPredictor,
                                        boolean numericClass)

Checks whether the scheme can handle zero training instances.

Parameters:: nominalPredictor - if true use nominal predictor attributes; numericPredictor - if true use numeric predictor attributes; numericClass - if true use a numeric class attribute otherwise a nominal class attribute
Returns:: true if the test was passed

correctBuildInitialisation

protected boolean correctBuildInitialisation(boolean nominalPredictor,
                                             boolean numericPredictor,
                                             boolean numericClass)

Checks whether the scheme correctly initialises models when buildClassifier is called. This test calls buildClassifier with one training dataset and records performance on a test set. buildClassifier is then called on a training set with different structure, and then again with the original training set. The performance on the test set is compared with the original results and any performance difference noted as incorrect build initialisation.

Parameters:: nominalPredictor - if true use nominal predictor attributes; numericPredictor - if true use numeric predictor attributes; numericClass - if true use a numeric class attribute otherwise a nominal class attribute
Returns:: true if the test was passed

canHandleMissing

protected boolean canHandleMissing(boolean nominalPredictor,
                                   boolean numericPredictor,
                                   boolean numericClass,
                                   boolean predictorMissing,
                                   boolean classMissing,
                                   int missingLevel)

Checks basic missing value handling of the scheme. If the missing values cause an exception to be thrown by the scheme, this will be recorded.

Parameters:: nominalPredictor - if true use nominal predictor attributes; numericPredictor - if true use numeric predictor attributes; numericClass - if true use a numeric class attribute otherwise a nominal class attribute; predictorMissing - true if the missing values may be in the predictors; classMissing - true if the missing values may be in the class
Returns:: true if the test was passed

updatingEquality

protected boolean updatingEquality(boolean nominalPredictor,
                                   boolean numericPredictor,
                                   boolean numericClass)

Checks whether an updateable scheme produces the same model when trained incrementally as when batch trained. The model itself cannot be compared, so we compare the evaluation on test data for both models. It is possible to get a false positive on this test (likelihood depends on the classifier).

Parameters:: nominalPredictor - if true use nominal predictor attributes; numericPredictor - if true use numeric predictor attributes; numericClass - if true use a numeric class attribute otherwise a nominal class attribute
Returns:: true if the test was passed

doesntUseTestClassVal

protected boolean doesntUseTestClassVal(boolean nominalPredictor,
                                        boolean numericPredictor,
                                        boolean numericClass)

Checks whether the classifier erroneously uses the class value of test instances (if provided). Runs the classifier with test instance class values set to missing and compares with results when test instance class values are left intact.

Parameters:: nominalPredictor - if true use nominal predictor attributes; numericPredictor - if true use numeric predictor attributes; numericClass - if true use a numeric class attribute otherwise a nominal class attribute
Returns:: true if the test was passed

instanceWeights

protected boolean instanceWeights(boolean nominalPredictor,
                                  boolean numericPredictor,
                                  boolean numericClass)

Checks whether the classifier can handle instance weights. This test compares the classifier performance on two datasets that are identical except for the training weights. If the results change, then the classifier must be using the weights. It may be possible to get a false positive from this test if the weight changes aren't significant enough to induce a change in classifier performance (but the weights are chosen to minimize the likelihood of this).

Parameters:: nominalPredictor - if true use nominal predictor attributes; numericPredictor - if true use numeric predictor attributes; numericClass - if true use a numeric class attribute otherwise a nominal class attribute
Returns:: true if the test was passed

datasetIntegrity

protected boolean datasetIntegrity(boolean nominalPredictor,
                                   boolean numericPredictor,
                                   boolean numericClass,
                                   boolean predictorMissing,
                                   boolean classMissing)

Checks whether the scheme alters the training dataset during training. If the scheme needs to modify the training data it should take a copy of the training data. Currently checks for changes to header structure, number of instances, order of instances, instance weights.

Parameters:: nominalPredictor - if true use nominal predictor attributes; numericPredictor - if true use numeric predictor attributes; numericClass - if true use a numeric class attribute otherwise a nominal class attribute; predictorMissing - true if we know the classifier can handle (at least) moderate missing predictor values; classMissing - true if we know the classifier can handle (at least) moderate missing class values
Returns:: true if the test was passed

runBasicTest

protected boolean runBasicTest(boolean nominalPredictor,
                               boolean numericPredictor,
                               boolean numericClass,
                               int missingLevel,
                               boolean predictorMissing,
                               boolean classMissing,
                               int numTrain,
                               int numTest,
                               int numClasses,
                               FastVector accepts)

Runs a text on the datasets with the given characteristics.

testWRTZeroR

protected boolean testWRTZeroR(Classifier classifier,
                               Evaluation evaluation,
                               Instances train,
                               Instances test)
                        throws java.lang.Exception

Determine whether the scheme performs worse than ZeroR during testing

Parameters:: classifier - the pre-trained classifier; evaluation - the classifier evaluation object; train - the training data; test - the test data
Returns:: true if the scheme performs better than ZeroR
Throws:: java.lang.Exception - if there was a problem during the scheme's testing

compareDatasets

protected void compareDatasets(Instances data1,
                               Instances data2)
                        throws java.lang.Exception

Compare two datasets to see if they differ.

Parameters:: data1 - one set of instances; data2 - the other set of instances
Throws:: java.lang.Exception - if the datasets differ

addMissing

protected void addMissing(Instances data,
                          int level,
                          boolean predictorMissing,
                          boolean classMissing)

Add missing values to a dataset.

Parameters:: data - the instances to add missing values to; level - the level of missing values to add (if positive, this is the probability that a value will be set to missing, if negative all but one value will be set to missing (not yet implemented)); predictorMissing - if true, predictor attributes will be modified; classMissing - if true, the class attribute will be modified

makeTestDataset

protected Instances makeTestDataset(int seed,
                                    int numInstances,
                                    int numNominal,
                                    int numNumeric,
                                    int numClasses,
                                    boolean numericClass)
                             throws java.lang.Exception

Make a simple set of instances, which can later be modified for use in specific tests.

Parameters:: seed - the random number seed; numInstances - the number of instances to generate; numNominal - the number of nominal attributes; numNumeric - the number of numeric attributes; numClasses - the number of classes (if nominal class); numericClass - true if the class attribute should be numeric
Returns:: the test dataset
Throws:: java.lang.Exception - if the dataset couldn't be generated

printAttributeSummary

protected void printAttributeSummary(boolean nominalPredictor,
                                     boolean numericPredictor,
                                     boolean numericClass)

Print out a short summary string for the dataset characteristics

Parameters:: nominalPredictor - true if nominal predictor attributes are present; numericPredictor - true if numeric predictor attributes are present; numericClass - true if the class attribute is numeric