RaceSearch (Documentation for extended WEKA including Ensembles of Hierarchically Nested Dichotomies)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

weka.attributeSelection
Class RaceSearch

java.lang.Object
  weka.attributeSelection.ASSearch
      weka.attributeSelection.RaceSearch

All Implemented Interfaces:: OptionHandler, RankedOutputSearch, java.io.Serializable

public class RaceSearch
extends ASSearch
implements RankedOutputSearch, OptionHandler

Class for performing a racing search.

For more information see:
Moore, A. W. and Lee, M. S. (1994). Efficient algorithms for minimising cross validation error. Proceedings of the Eleventh International Conference on Machine Learning. pp 190--198.

Valid options are:

-R
0 = forward, 1 = backward, 2 = schemata, 3 = rank.

-L
significance level to use for t-tests.

-T
threshold for considering mean errors of two subsets the same

-F
0 = 10 fold, 1 = leave-one-out (selected automatically for schemata race

-A
the attribute evaluator to use when doing a rank search

-Q
produce a ranked list of attributes. Selecting this option forces the race type to be forward. Racing continues until *all* attributes have been selected, thus producing a ranked list of attributes.

-N
Specify the number of attributes to retain. Overides any threshold. Use in conjunction with -Q.

-J
Specify a threshold by which the AttributeSelection module can discard attributes. Use in conjunction with -Q.

-Z
Turn on verbose output for monitoring the search

Version:: $Revision: 1.12 $
Author:: Mark Hall (mhall@cs.waikato.ac.nz)
See Also:: Serialized Form

Field Summary
`private static int`	`BACKWARD_RACE`
`private static int`	`FORWARD_RACE` search types
`private static int`	`LEAVE_ONE_OUT`
`private ASEvaluation`	`m_ASEval` the attribute evaluator to generate the initial ranking when doing a rank race
`private double`	`m_bestMerit` holds the merit of the best subset found
`private int`	`m_calculatedNumToSelect`
`private int`	`m_classIndex` the class index
`private boolean`	`m_debug` verbose output for monitoring the search and debugging
`private double`	`m_delta` threshold for comparisons
`private Instances`	`m_Instances`
`private int`	`m_numAttribs` the number of attributes in the data
`private int`	`m_numFolds` number of cross validation folds---equal to the number of instances for leave-one-out cv
`private int`	`m_numToSelect` The number of attributes to retain if a ranking is requested. -1 indicates that all attributes are to be retained.
`private int`	`m_raceType` the selected search type
`private double[][]`	`m_rankedAtts` The ranked list of attributes produced if m_rankingRequested is true
`private int`	`m_rankedSoFar` The number of attributes ranked so far (if ranking is requested)
`private int[]`	`m_Ranking` will hold the attribute ranking produced by the above attribute evaluator if doing a rank search
`private boolean`	`m_rankingRequested` If true then produce a ranked list of attributes by fully traversing a forward hillclimb race
`private int`	`m_samples` the number of samples above which to begin testing for similarity between competing subsets
`private double`	`m_sigLevel` the significance level for comparisons
`private HoldOutSubsetEvaluator`	`m_theEvaluator` the subset evaluator to use
`private double`	`m_threshold` the threshold for removing attributes if ranking is requested
`private int`	`m_totalEvals` the total number of partially/fully evaluated subsets
`private int`	`m_xvalType` the selected xval type
`private static int`	`RANK_RACE`
`private static int`	`SCHEMATA_RACE`
`static Tag[]`	`TAGS_SELECTION`
`private static int`	`TEN_FOLD` xval types
`static Tag[]`	`XVALTAGS_SELECTION`

Constructor Summary
`RaceSearch()`

Method Summary
`java.lang.String`	`attributeEvaluatorTipText()` Returns the tip text for this property
`private int[]`	`attributeList(char[] list)` Convert an attribute set to an array of indices
`java.lang.String`	`debugTipText()` Returns the tip text for this property
`private void`	`determineNumToSelectFromThreshold(double[][] ranking)`
`java.lang.String`	`foldsTipText()` Returns the tip text for this property
`java.lang.String`	`generateRankingTipText()` Returns the tip text for this property
`ASEvaluation`	`getAttributeEvaluator()` Get the attribute evaluator used to generate the ranking.
`int`	`getCalculatedNumToSelect()` Gets the calculated number of attributes to retain.
`boolean`	`getDebug()` Get whether output is to be verbose
`SelectedTag`	`getFoldsType()` Get the xfold type
`boolean`	`getGenerateRanking()` Gets whether ranking has been requested.
`int`	`getNumToSelect()` Gets the number of attributes to be retained.
`java.lang.String[]`	`getOptions()` Gets the current settings of BestFirst.
`SelectedTag`	`getRaceType()` Get the race type
`double`	`getSelectionThreshold()` Returns the threshold so that the AttributeSelection module can discard attributes from the ranking.
`double`	`getSignificanceLevel()` Get the significance level
`double`	`getThreshold()` Get the threshold
`java.lang.String`	`globalInfo()` Returns a string describing this search method
`private int[]`	`hillclimbRace(Instances data, java.util.Random random)` Performs a hill climbing race---all single attribute changes to a base subset are raced in parallel.
`java.util.Enumeration`	`listOptions()` Returns an enumeration describing the available options.
`java.lang.String`	`numToSelectTipText()` Returns the tip text for this property
`private java.lang.String`	`printSets(char[][] raceSets)` Print an attribute set.
`private double[]`	`raceSubsets(char[][] raceSets, Instances data, boolean baseSetIncluded, java.util.Random random)` Races the leave-one-out cross validation errors of a set of attribute subsets on a set of instances.
`java.lang.String`	`raceTypeTipText()` Returns the tip text for this property
`double[][]`	`rankedAttributes()` Returns a X by 2 list of attribute indexes and corresponding evaluations from best (highest) to worst.
`private int[]`	`rankRace(Instances data, java.util.Random random)` Performs a rank race---race consisting of no attributes, the top ranked attribute, the top two attributes etc.
`protected void`	`resetOptions()` Reset the search method.
`private int[]`	`schemataRace(Instances data, java.util.Random random)` Performs a schemata race---a series of races in parallel.
`int[]`	`search(ASEvaluation ASEval, Instances data)` Searches the attribute subset space by racing cross validation errors of competing subsets
`java.lang.String`	`selectionThresholdTipText()` Returns the tip text for this property
`void`	`setAttributeEvaluator(ASEvaluation newEvaluator)` Set the attribute evaluator to use for generating the ranking.
`void`	`setDebug(boolean d)` Set whether verbose output should be generated.
`void`	`setFoldsType(SelectedTag d)` Set the xfold type
`void`	`setGenerateRanking(boolean doRank)` Records whether the user has requested a ranked list of attributes.
`void`	`setNumToSelect(int n)` Specify the number of attributes to select from the ranked list (if generating a ranking). -1 indicates that all attributes are to be retained.
`void`	`setOptions(java.lang.String[] options)` Parses a given list of options.
`void`	`setRaceType(SelectedTag d)` Set the race type
`void`	`setSelectionThreshold(double threshold)` Set the threshold by which the AttributeSelection module can discard attributes.
`void`	`setSignificanceLevel(double sig)` Sets the significance level to use
`void`	`setThreshold(double t)` Sets the threshold for comparisons
`java.lang.String`	`significanceLevelTipText()` Returns the tip text for this property
`java.lang.String`	`thresholdTipText()` Returns the tip text for this property
`java.lang.String`	`toString()`
`private double`	`ttest(Stats c1, Stats c2)`

Methods inherited from class weka.attributeSelection.ASSearch

forName

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Field Detail

m_Instances

private Instances m_Instances

FORWARD_RACE

private static final int FORWARD_RACE

search types

See Also:: Constant Field Values

BACKWARD_RACE

private static final int BACKWARD_RACE

See Also:: Constant Field Values

SCHEMATA_RACE

private static final int SCHEMATA_RACE

See Also:: Constant Field Values

RANK_RACE

private static final int RANK_RACE

See Also:: Constant Field Values

TAGS_SELECTION

public static final Tag[] TAGS_SELECTION

m_raceType

private int m_raceType

the selected search type

TEN_FOLD

private static final int TEN_FOLD

xval types

See Also:: Constant Field Values

LEAVE_ONE_OUT

private static final int LEAVE_ONE_OUT

See Also:: Constant Field Values

XVALTAGS_SELECTION

public static final Tag[] XVALTAGS_SELECTION

m_xvalType

private int m_xvalType

the selected xval type

m_classIndex

private int m_classIndex

the class index

m_numAttribs

private int m_numAttribs

the number of attributes in the data

m_totalEvals

private int m_totalEvals

the total number of partially/fully evaluated subsets

m_bestMerit

private double m_bestMerit

holds the merit of the best subset found

m_theEvaluator

private HoldOutSubsetEvaluator m_theEvaluator

the subset evaluator to use

m_sigLevel

private double m_sigLevel

the significance level for comparisons

m_delta

private double m_delta

threshold for comparisons

m_samples

private int m_samples

the number of samples above which to begin testing for similarity between competing subsets

m_numFolds

private int m_numFolds

number of cross validation folds---equal to the number of instances for leave-one-out cv

m_ASEval

private ASEvaluation m_ASEval

the attribute evaluator to generate the initial ranking when doing a rank race

m_Ranking

private int[] m_Ranking

will hold the attribute ranking produced by the above attribute evaluator if doing a rank search

m_debug

private boolean m_debug

verbose output for monitoring the search and debugging

m_rankingRequested

private boolean m_rankingRequested

If true then produce a ranked list of attributes by fully traversing a forward hillclimb race

m_rankedAtts

private double[][] m_rankedAtts

The ranked list of attributes produced if m_rankingRequested is true

m_rankedSoFar

private int m_rankedSoFar

The number of attributes ranked so far (if ranking is requested)

m_numToSelect

private int m_numToSelect

The number of attributes to retain if a ranking is requested. -1 indicates that all attributes are to be retained. Has precedence over m_threshold

m_calculatedNumToSelect

private int m_calculatedNumToSelect

m_threshold

private double m_threshold

the threshold for removing attributes if ranking is requested

Constructor Detail

RaceSearch

public RaceSearch()

Method Detail

globalInfo

public java.lang.String globalInfo()

Returns a string describing this search method

Returns:: a description of the search method suitable for displaying in the explorer/experimenter gui

raceTypeTipText

public java.lang.String raceTypeTipText()

Returns the tip text for this property

Returns:: tip text for this property suitable for displaying in the explorer/experimenter gui

setRaceType

public void setRaceType(SelectedTag d)

Set the race type

Parameters:: d - the type of race

getRaceType

public SelectedTag getRaceType()

Get the race type

Returns:: the type of race

significanceLevelTipText

public java.lang.String significanceLevelTipText()

Returns the tip text for this property

Returns:: tip text for this property suitable for displaying in the explorer/experimenter gui

setSignificanceLevel

public void setSignificanceLevel(double sig)

Sets the significance level to use

Parameters:: sig - the significance level

getSignificanceLevel

public double getSignificanceLevel()

Get the significance level

Returns:: the current significance level

thresholdTipText

public java.lang.String thresholdTipText()

Returns the tip text for this property

Returns:: tip text for this property suitable for displaying in the explorer/experimenter gui

setThreshold

public void setThreshold(double t)

Sets the threshold for comparisons

Specified by:: setThreshold in interface RankedOutputSearch

Parameters:: t - the threshold to use

getThreshold

public double getThreshold()

Get the threshold

Specified by:: getThreshold in interface RankedOutputSearch

Returns:: the current threshold

foldsTipText

public java.lang.String foldsTipText()

Returns the tip text for this property

Returns:: tip text for this property suitable for displaying in the explorer/experimenter gui

setFoldsType

public void setFoldsType(SelectedTag d)

Set the xfold type

Parameters:: d - the type of xval

getFoldsType

public SelectedTag getFoldsType()

Get the xfold type

Returns:: the type of xval

debugTipText

public java.lang.String debugTipText()

Returns the tip text for this property

Returns:: tip text for this property suitable for displaying in the explorer/experimenter gui

setDebug

public void setDebug(boolean d)

Set whether verbose output should be generated.

Parameters:: d - true if output is to be verbose.

getDebug

public boolean getDebug()

Get whether output is to be verbose

Returns:: true if output will be verbose

attributeEvaluatorTipText

public java.lang.String attributeEvaluatorTipText()

Returns the tip text for this property

Returns:: tip text for this property suitable for displaying in the explorer/experimenter gui

setAttributeEvaluator

public void setAttributeEvaluator(ASEvaluation newEvaluator)

Set the attribute evaluator to use for generating the ranking.

Parameters:: newEvaluator - the attribute evaluator to use.

getAttributeEvaluator

public ASEvaluation getAttributeEvaluator()

Get the attribute evaluator used to generate the ranking.

Returns:: the evaluator used to generate the ranking.

generateRankingTipText

public java.lang.String generateRankingTipText()

Returns the tip text for this property

Returns:: tip text for this property suitable for displaying in the explorer/experimenter gui

setGenerateRanking

public void setGenerateRanking(boolean doRank)

Records whether the user has requested a ranked list of attributes.

Specified by:: setGenerateRanking in interface RankedOutputSearch

Parameters:: doRank - true if ranking is requested

getGenerateRanking

public boolean getGenerateRanking()

Gets whether ranking has been requested. This is used by the AttributeSelection module to determine if rankedAttributes() should be called.

Specified by:: getGenerateRanking in interface RankedOutputSearch

Returns:: true if ranking has been requested.

numToSelectTipText

public java.lang.String numToSelectTipText()

Returns the tip text for this property

Returns:: tip text for this property suitable for displaying in the explorer/experimenter gui

setNumToSelect

public void setNumToSelect(int n)

Specify the number of attributes to select from the ranked list (if generating a ranking). -1 indicates that all attributes are to be retained.

Specified by:: setNumToSelect in interface RankedOutputSearch

Parameters:: n - the number of attributes to retain

getNumToSelect

public int getNumToSelect()

Gets the number of attributes to be retained.

Specified by:: getNumToSelect in interface RankedOutputSearch

Returns:: the number of attributes to retain

getCalculatedNumToSelect

public int getCalculatedNumToSelect()

Gets the calculated number of attributes to retain. This is the actual number of attributes to retain. This is the same as getNumToSelect if the user specifies a number which is not less than zero. Otherwise it should be the number of attributes in the (potentially transformed) data.

Specified by:: getCalculatedNumToSelect in interface RankedOutputSearch

selectionThresholdTipText

public java.lang.String selectionThresholdTipText()

Returns the tip text for this property

Returns:: tip text for this property suitable for displaying in the explorer/experimenter gui

setSelectionThreshold

public void setSelectionThreshold(double threshold)

Set the threshold by which the AttributeSelection module can discard attributes.

Parameters:: threshold - the threshold.

getSelectionThreshold

public double getSelectionThreshold()

Returns the threshold so that the AttributeSelection module can discard attributes from the ranking.

listOptions

public java.util.Enumeration listOptions()

Returns an enumeration describing the available options.

Specified by:: listOptions in interface OptionHandler

Returns:: an enumeration of all the available options.

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception

Parses a given list of options. Valid options are:

-R
0 = forward, 1 = backward, 2 = schemata, 3 = rank.

-L
significance level to use for t-tests.

-T
threshold for considering mean errors of two subsets the same

-F
0 = 10 fold, 1 = leave-one-out (selected automatically for schemata race

-A
the attribute evaluator to use when doing a rank search

-N
Specify the number of attributes to retain. Overides any threshold. Use in conjunction with -Q.

-J
Specify a threshold by which the AttributeSelection module can discard attributes. Use in conjunction with -Q.

-Z
Turn on verbose output for monitoring the search

Specified by:: setOptions in interface OptionHandler

Parameters:: options - the list of options as an array of strings
Throws:: java.lang.Exception - if an option is not supported

getOptions

public java.lang.String[] getOptions()

Gets the current settings of BestFirst.

Specified by:: getOptions in interface OptionHandler

Returns:: an array of strings suitable for passing to setOptions()

search

public int[] search(ASEvaluation ASEval,
                    Instances data)
             throws java.lang.Exception

Searches the attribute subset space by racing cross validation errors of competing subsets

Specified by:: search in class ASSearch

Parameters:: data - the training instances.; ASEval - the attribute evaluator to guide the search
Returns:: an array (not necessarily ordered) of selected attribute indexes
Throws:: java.lang.Exception - if the search can't be completed

rankedAttributes

public double[][] rankedAttributes()
                            throws java.lang.Exception

Description copied from interface: RankedOutputSearch

Returns a X by 2 list of attribute indexes and corresponding evaluations from best (highest) to worst.

Specified by:: rankedAttributes in interface RankedOutputSearch

Returns:: the ranked list of attribute indexes in an array of ints
Throws:: java.lang.Exception - if the ranking can't be produced

determineNumToSelectFromThreshold

private void determineNumToSelectFromThreshold(double[][] ranking)

printSets

private java.lang.String printSets(char[][] raceSets)

Print an attribute set.

schemataRace

private int[] schemataRace(Instances data,
                           java.util.Random random)
                    throws java.lang.Exception

Performs a schemata race---a series of races in parallel.

Parameters:: data - the instances to estimate accuracy over.; random - a random number generator
Returns:: an array of selected attribute indices.
Throws:: java.lang.Exception

ttest

private double ttest(Stats c1,
                     Stats c2)
              throws java.lang.Exception

Throws:: java.lang.Exception

rankRace

private int[] rankRace(Instances data,
                       java.util.Random random)
                throws java.lang.Exception

Performs a rank race---race consisting of no attributes, the top ranked attribute, the top two attributes etc. The initial ranking is determined by an attribute evaluator.

Parameters:: data - the instances to estimate accuracy over; random - a random number generator
Returns:: an array of selected attribute indices.
Throws:: java.lang.Exception

hillclimbRace

private int[] hillclimbRace(Instances data,
                            java.util.Random random)
                     throws java.lang.Exception

Performs a hill climbing race---all single attribute changes to a base subset are raced in parallel. The winner is chosen and becomes the new base subset and the process is repeated until there is no improvement in error over the base subset.

Parameters:: data - the instances to estimate accuracy over; random - a random number generator
Returns:: an array of selected attribute indices.
Throws:: java.lang.Exception

attributeList

private int[] attributeList(char[] list)

Convert an attribute set to an array of indices

raceSubsets

private double[] raceSubsets(char[][] raceSets,
                             Instances data,
                             boolean baseSetIncluded,
                             java.util.Random random)
                      throws java.lang.Exception

Races the leave-one-out cross validation errors of a set of attribute subsets on a set of instances.

Parameters:: raceSets - a set of attribute subset specifications; data - the instances to use when cross validating; baseSetIncluded - true if the first attribute set is a base set generated from the previous race; random - a random number generator
Returns:: the index of the winning subset
Throws:: java.lang.Exception - if an error occurs during cross validation

toString

public java.lang.String toString()

resetOptions

protected void resetOptions()

Reset the search method.

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

weka.attributeSelection Class RaceSearch

m_Instances

FORWARD_RACE

BACKWARD_RACE

SCHEMATA_RACE

RANK_RACE

TAGS_SELECTION

m_raceType

TEN_FOLD

LEAVE_ONE_OUT

XVALTAGS_SELECTION

m_xvalType

m_classIndex

m_numAttribs

m_totalEvals

m_bestMerit

m_theEvaluator

m_sigLevel

m_delta

m_samples

m_numFolds

m_ASEval

m_Ranking

m_debug

m_rankingRequested

m_rankedAtts

m_rankedSoFar

m_numToSelect

m_calculatedNumToSelect

m_threshold

RaceSearch

globalInfo

raceTypeTipText

setRaceType

getRaceType

significanceLevelTipText

setSignificanceLevel

getSignificanceLevel

thresholdTipText

setThreshold

getThreshold

foldsTipText

setFoldsType

getFoldsType

debugTipText

setDebug

getDebug

attributeEvaluatorTipText

setAttributeEvaluator

getAttributeEvaluator

generateRankingTipText

setGenerateRanking

getGenerateRanking

numToSelectTipText

setNumToSelect

getNumToSelect

getCalculatedNumToSelect

selectionThresholdTipText

setSelectionThreshold

getSelectionThreshold

listOptions

setOptions

getOptions

search

rankedAttributes

determineNumToSelectFromThreshold

printSets

schemataRace

ttest

rankRace

hillclimbRace

attributeList

raceSubsets

toString

resetOptions

weka.attributeSelection
Class RaceSearch