JRip (Documentation for extended WEKA including Ensembles of Hierarchically Nested Dichotomies)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

weka.classifiers.rules
Class JRip

java.lang.Object
  weka.classifiers.Classifier
      weka.classifiers.rules.JRip

All Implemented Interfaces:: AdditionalMeasureProducer, java.lang.Cloneable, OptionHandler, java.io.Serializable, WeightedInstancesHandler

public class JRip
extends Classifier
implements OptionHandler, AdditionalMeasureProducer, WeightedInstancesHandler

This class implements a propositional rule learner, Repeated Incremental Pruning to Produce Error Reduction (RIPPER), which is proposed by William W. Cohen as an optimized version of IREP.

The algorithm is briefly described as follows:

Initialize RS = {}, and for each class from the less prevalent one to the more frequent one, DO:

1. Building stage: repeat 1.1 and 1.2 until the descrition length (DL) of the ruleset and examples is 64 bits greater than the smallest DL met so far, or there are no positive examples, or the error rate >= 50%.

1.1. Grow phase:
Grow one rule by greedily adding antecedents (or conditions) to the rule until the rule is perfect (i.e. 100% accurate). The procedure tries every possible value of each attribute and selects the condition with highest information gain: p(log(p/t)-log(P/T)).

1.2. Prune phase:
Incrementally prune each rule and allow the pruning of any final sequences of the antecedents;
The pruning metric is (p-n)/(p+n) -- but it's actually 2p/(p+n) -1, so in this implementation we simply use p/(p+n) (actually (p+1)/(p+n+2), thus if p+n is 0, it's 0.5).

2. Optimization stage: after generating the initial ruleset {Ri}, generate and prune two variants of each rule Ri from randomized data using procedure 1.1 and 1.2. But one variant is generated from an empty rule while the other is generated by greedily adding antecedents to the original rule. Moreover, the pruning metric used here is (TP+TN)/(P+N).
Then the smallest possible DL for each variant and the original rule is computed. The variant with the minimal DL is selected as the final representative of Ri in the ruleset.
After all the rules in {Ri} have been examined and if there are still residual positives, more rules are generated based on the residual positives using Building Stage again.

3. Delete the rules from the ruleset that would increase the DL of the whole ruleset if it were in it. and add resultant ruleset to RS.

ENDDO

Note that there seem to be 2 bugs in the ripper program that would affect the ruleset size and accuracy slightly. This implementation avoids these bugs and thus is a little bit different from Cohen's original implementation. Even after fixing the bugs, since the order of classes with the same frequency is not defined in ripper, there still seems to be some trivial difference between this implementation and the original ripper, especially for audiology data in UCI repository, where there are lots of classes of few instances.

If wrapped by other classes, typical usage of this class is:
JRip rip = new JRip(); Instances data = ... // Data from somewhere double[] orderedClasses = ... // Get the ordered class counts for the data double expFPRate = ... // Calculate the expected FP/(FP+FN) rate double classIndex = ... // The class index for which ruleset is built // DL of default rule, no theory DL, only data DL double defDL = RuleStats.dataDL(expFPRate, 0.0, data.sumOfWeights(), 0.0, orderedClasses[(int)classIndex]); rip.rulesetForOneClass(expFPRate, data, classIndex, defDL); RuleStats rulesetStats = rip.getRuleStats(0); // Can get heaps of information from RuleStats, e.g. combined DL, // simpleStats, etc. double comDL = rulesetStats.combinedDL(expFPRate, classIndex); int whichRule = ... // Want simple stats of which rule? double[] simpleStats = rulesetStats.getSimpleStats(whichRule); ... Details please see "Fast Effective Rule Induction", William W. Cohen, 'Machine Learning: Proceedings of the Twelfth International Conference' (ML95).

PS. We have compared this implementation with the original ripper implementation in aspects of accuracy, ruleset size and running time on both artificial data "ab+bcd+defg" and UCI datasets. In all these aspects it seems to be quite comparable to the original ripper implementation. However, we didn't consider memory consumption optimization in this implementation.

Version:: $Revision: 1.13 $
Author:: Xin Xu (xx5@cs.waikato.ac.nz), Eibe Frank (eibe@cs.waikato.ac.nz)
See Also:: Serialized Form

Nested Class Summary
`private class`	`JRip.Antd` The single antecedent in the rule, which is composed of an attribute and the corresponding value.
`private class`	`JRip.NominalAntd` The antecedent with nominal attribute
`private class`	`JRip.NumericAntd` The antecedent with numeric attribute
`protected class`	`JRip.RipperRule` This class implements a single rule that predicts specified class.

Field Summary
`private boolean`	`m_CheckErr` Whether check the error rate >= 0.5 in stopping criteria
`private Attribute`	`m_Class` The class attribute of the data
`private boolean`	`m_Debug` Whether in a debug mode
`private FastVector`	`m_Distributions` The predicted class distribution
`private Filter`	`m_Filter` The filter used to randomize the class order
`private int`	`m_Folds` The number of folds to split data into Grow and Prune for IREP
`private double`	`m_MinNo` The minimal number of instance weights within a split
`private int`	`m_Optimizations` Runs of optimizations
`private java.util.Random`	`m_Random` Random object used in this class
`private FastVector`	`m_Ruleset` The ruleset
`private FastVector`	`m_RulesetStats` The RuleStats for the ruleset of each class value
`private long`	`m_Seed` The seed to perform randomization
`private double`	`m_Total` # of all the possible conditions in a rule
`private boolean`	`m_UsePruning` Whether use pruning, i.e. the data is clean or not
`private static double`	`MAX_DL_SURPLUS` The limit of description length surplus in ruleset generation

Constructor Summary
`JRip()`

Method Summary
`void`	`buildClassifier(Instances instances)` Builds Ripper in the order of class frequencies.
`java.lang.String`	`checkErrorRateTipText()` Returns the tip text for this property
`private boolean`	`checkStop(double[] rst, double minDL, double dl)` Check whether the stopping criterion meets
`java.lang.String`	`debugTipText()` Returns the tip text for this property
`double[]`	`distributionForInstance(Instance datum)` Classify the test instance with the rule learner and provide the class distributions
`java.util.Enumeration`	`enumerateMeasures()` Returns an enumeration of the additional measure names
`java.lang.String`	`foldsTipText()` Returns the tip text for this property
`boolean`	`getCheckErrorRate()`
`boolean`	`getDebug()` Get whether debugging is turned on.
`int`	`getFolds()`
`double`	`getMeasure(java.lang.String additionalMeasureName)` Returns the value of the named measure
`double`	`getMinNo()`
`int`	`getOptimizations()`
`java.lang.String[]`	`getOptions()` Gets the current settings of the Classifier.
`FastVector`	`getRuleset()` Get the ruleset generated by Ripper
`RuleStats`	`getRuleStats(int pos)` Get the statistics of the ruleset in the given position
`long`	`getSeed()`
`boolean`	`getUsePruning()`
`java.lang.String`	`globalInfo()` Returns a string describing classifier
`java.util.Enumeration`	`listOptions()` Returns an enumeration describing the available options Valid options are: -F number The number of folds for reduced error pruning.
`static void`	`main(java.lang.String[] args)` Main method.
`java.lang.String`	`minNoTipText()` Returns the tip text for this property
`java.lang.String`	`optimizationsTipText()` Returns the tip text for this property
`protected Instances`	`rulesetForOneClass(double expFPRate, Instances data, double classIndex, double defDL)` Build a ruleset for the given class according to the given data
`java.lang.String`	`seedTipText()` Returns the tip text for this property
`void`	`setCheckErrorRate(boolean d)`
`void`	`setDebug(boolean d)` Set debugging mode.
`void`	`setFolds(int fold)`
`void`	`setMinNo(double m)`
`void`	`setOptimizations(int run)`
`void`	`setOptions(java.lang.String[] options)` Parses a given list of options.
`void`	`setSeed(long s)`
`void`	`setUsePruning(boolean d)`
`java.lang.String`	`toString()` Prints the all the rules of the rule learner.
`java.lang.String`	`usePruningTipText()` Returns the tip text for this property

Methods inherited from class weka.classifiers.Classifier

classifyInstance, forName, makeCopies

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Field Detail

MAX_DL_SURPLUS

private static double MAX_DL_SURPLUS

The limit of description length surplus in ruleset generation

m_Class

private Attribute m_Class

The class attribute of the data

m_Ruleset

private FastVector m_Ruleset

The ruleset

m_Distributions

private FastVector m_Distributions

The predicted class distribution

m_Optimizations

private int m_Optimizations

Runs of optimizations

m_Random

private java.util.Random m_Random

Random object used in this class

m_Total

private double m_Total

# of all the possible conditions in a rule

m_Seed

private long m_Seed

The seed to perform randomization

m_Folds

private int m_Folds

The number of folds to split data into Grow and Prune for IREP

m_MinNo

private double m_MinNo

The minimal number of instance weights within a split

m_Debug

private boolean m_Debug

Whether in a debug mode

m_CheckErr

private boolean m_CheckErr

Whether check the error rate >= 0.5 in stopping criteria

m_UsePruning

private boolean m_UsePruning

Whether use pruning, i.e. the data is clean or not

m_Filter

private Filter m_Filter

The filter used to randomize the class order

m_RulesetStats

private FastVector m_RulesetStats

The RuleStats for the ruleset of each class value

Constructor Detail

JRip

public JRip()

Method Detail

globalInfo

public java.lang.String globalInfo()

Returns a string describing classifier

Returns:: a description suitable for displaying in the explorer/experimenter gui

listOptions

public java.util.Enumeration listOptions()

Returns an enumeration describing the available options Valid options are:

-F number
The number of folds for reduced error pruning. One fold is used as the pruning set. (Default: 3)

-N number
The minimal weights of instances within a split. (Default: 2)

-O number
Set the number of runs of optimizations. (Default: 2)

-D
Whether turn on the debug mode -S number
The seed of randomization used in Ripper.(Default: 1)

-E
Whether NOT check the error rate >= 0.5 in stopping criteria. (default: check)

-P
Whether NOT use pruning. (default: use pruning)

Specified by:: listOptions in interface OptionHandler
Overrides:: listOptions in class Classifier

Returns:: an enumeration of all the available options

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception

Parses a given list of options.

Specified by:: setOptions in interface OptionHandler
Overrides:: setOptions in class Classifier

Parameters:: options - the list of options as an array of strings
Throws:: java.lang.Exception - if an option is not supported

getOptions

public java.lang.String[] getOptions()

Gets the current settings of the Classifier.

Specified by:: getOptions in interface OptionHandler
Overrides:: getOptions in class Classifier

Returns:: an array of strings suitable for passing to setOptions

enumerateMeasures

public java.util.Enumeration enumerateMeasures()

Returns an enumeration of the additional measure names

Specified by:: enumerateMeasures in interface AdditionalMeasureProducer

Returns:: an enumeration of the measure names

getMeasure

public double getMeasure(java.lang.String additionalMeasureName)

Returns the value of the named measure

Specified by:: getMeasure in interface AdditionalMeasureProducer

Parameters:: additionalMeasureName - the name of the measure to query for its value
Returns:: the value of the named measure
Throws:: java.lang.IllegalArgumentException - if the named measure is not supported

foldsTipText

public java.lang.String foldsTipText()

Returns the tip text for this property

Returns:: tip text for this property suitable for displaying in the explorer/experimenter gui

setFolds

public void setFolds(int fold)

getFolds

public int getFolds()

minNoTipText

public java.lang.String minNoTipText()

Returns the tip text for this property

Returns:: tip text for this property suitable for displaying in the explorer/experimenter gui

setMinNo

public void setMinNo(double m)

getMinNo

public double getMinNo()

seedTipText

public java.lang.String seedTipText()

Returns the tip text for this property

Returns:: tip text for this property suitable for displaying in the explorer/experimenter gui

setSeed

public void setSeed(long s)

getSeed

public long getSeed()

optimizationsTipText

public java.lang.String optimizationsTipText()

Returns the tip text for this property

Returns:: tip text for this property suitable for displaying in the explorer/experimenter gui

setOptimizations

public void setOptimizations(int run)

getOptimizations

public int getOptimizations()

debugTipText

public java.lang.String debugTipText()

Returns the tip text for this property

Overrides:: debugTipText in class Classifier

Returns:: tip text for this property suitable for displaying in the explorer/experimenter gui

setDebug

public void setDebug(boolean d)

Description copied from class: Classifier

Set debugging mode.

Overrides:: setDebug in class Classifier

Parameters:: d - true if debug output should be printed

getDebug

public boolean getDebug()

Description copied from class: Classifier

Get whether debugging is turned on.

Overrides:: getDebug in class Classifier

Returns:: true if debugging output is on

checkErrorRateTipText

public java.lang.String checkErrorRateTipText()

Returns the tip text for this property

Returns:: tip text for this property suitable for displaying in the explorer/experimenter gui

setCheckErrorRate

public void setCheckErrorRate(boolean d)

getCheckErrorRate

public boolean getCheckErrorRate()

usePruningTipText

public java.lang.String usePruningTipText()

Returns the tip text for this property

Returns:: tip text for this property suitable for displaying in the explorer/experimenter gui

setUsePruning

public void setUsePruning(boolean d)

getUsePruning

public boolean getUsePruning()

getRuleset

public FastVector getRuleset()

Get the ruleset generated by Ripper

Returns:: the ruleset

getRuleStats

public RuleStats getRuleStats(int pos)

Get the statistics of the ruleset in the given position

Parameters:: pos - the position of the stats, assuming correct

buildClassifier

public void buildClassifier(Instances instances)
                     throws java.lang.Exception

Builds Ripper in the order of class frequencies. For each class it's built in two stages: building and optimization

Specified by:: buildClassifier in class Classifier

Parameters:: instances - the training data
Throws:: java.lang.Exception - if classifier can't be built successfully

distributionForInstance

public double[] distributionForInstance(Instance datum)

Classify the test instance with the rule learner and provide the class distributions

Overrides:: distributionForInstance in class Classifier

Parameters:: datum - the instance to be classified
Returns:: the distribution

rulesetForOneClass

protected Instances rulesetForOneClass(double expFPRate,
                                       Instances data,
                                       double classIndex,
                                       double defDL)
                                throws java.lang.Exception

Build a ruleset for the given class according to the given data

Parameters:: expFPRate - the expected FP/(FP+FN) used in DL calculation; data - the given data; classIndex - the given class index; defDL - the default DL in the data
Throws:: if - the ruleset can be built properly; java.lang.Exception

checkStop

private boolean checkStop(double[] rst,
                          double minDL,
                          double dl)

Check whether the stopping criterion meets

Parameters:: rst - the statistic of the ruleset; minDL - the min description length so far; dl - the current description length of the ruleset
Returns:: true if stop criterion meets, false otherwise

toString

public java.lang.String toString()

Prints the all the rules of the rule learner.

Returns:: a textual description of the classifier

main

public static void main(java.lang.String[] args)

Main method.

Parameters:: args - the options for the classifier

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

weka.classifiers.rules Class JRip

MAX_DL_SURPLUS

m_Class

m_Ruleset

m_Distributions

m_Optimizations

m_Random

m_Total

m_Seed

m_Folds

m_MinNo

m_Debug

m_CheckErr

m_UsePruning

m_Filter

m_RulesetStats

JRip

globalInfo

listOptions

setOptions

getOptions

enumerateMeasures

getMeasure

foldsTipText

setFolds

getFolds

minNoTipText

setMinNo

getMinNo

seedTipText

setSeed

getSeed

optimizationsTipText

setOptimizations

getOptimizations

debugTipText

setDebug

getDebug

checkErrorRateTipText

setCheckErrorRate

getCheckErrorRate

usePruningTipText

setUsePruning

getUsePruning

getRuleset

getRuleStats

buildClassifier

distributionForInstance

rulesetForOneClass

checkStop

toString

main

weka.classifiers.rules
Class JRip