weka.attributeSelection
Class AttributeSelection

java.lang.Object
  extended byweka.attributeSelection.AttributeSelection
All Implemented Interfaces:
java.io.Serializable

public class AttributeSelection
extends java.lang.Object
implements java.io.Serializable

Attribute selection class. Takes the name of a search class and an evaluation class on the command line.

Valid options are:

-h
Display help.

-I
Specify the training arff file.

-C
The index of the attribute to use as the class.

-S
The full class name of the search method followed by search method options (if any).
Eg. -S "weka.attributeSelection.BestFirst -N 10"

-X
Perform a cross validation.

-N
Specify a random number seed. Use in conjuction with -X. (Default = 1).

------------------------------------------------------------------------

Example usage as the main of an attribute evaluator (called FunkyEvaluator):

 public static void main(String [] args) {
   try {
     ASEvaluator eval = new FunkyEvaluator();
     System.out.println(SelectAttributes(Evaluator, args));
   } catch (Exception e) {
     System.err.println(e.getMessage());
   }
 }
  

------------------------------------------------------------------------

Version:
$Revision: 1.31 $
Author:
Mark Hall (mhall@cs.waikato.ac.nz)
See Also:
Serialized Form

Field Summary
private  ASEvaluation m_ASEvaluator
          the attribute/subset evaluator
private  Remove m_attributeFilter
          the attribute filter for processing instances with respect to the most recent feature selection run
private  double[][] m_attributeRanking
          the attribute indexes and associated merits if a ranking is produced
private  boolean m_doRank
          rank features (if allowed by the search method)
private  boolean m_doXval
          do cross validation
private  int m_numFolds
          the number of folds to use for cross validation
private  int m_numToSelect
          number of attributes requested from ranked results
private  double[][] m_rankResults
          hold statistics for repeated feature selection, such as under cross validation
private  ASSearch m_searchMethod
          the search method
private  int m_seed
          seed used to randomly shuffle instances for cross validation
private  int[] m_selectedAttributeSet
          the selected attributes
private  java.lang.StringBuffer m_selectionResults
          holds a string describing the results of the attribute selection
private  double[] m_subsetResults
           
private  double m_threshold
          cutoff value by which to select attributes for ranked results
private  Instances m_trainInstances
          the instances to select attributes from
private  AttributeTransformer m_transformer
          if a feature selection run involves an attribute transformer
private  int m_trials
           
 
Constructor Summary
AttributeSelection()
          constructor.
 
Method Summary
 java.lang.String CrossValidateAttributes()
          Perform a cross validation for attribute selection.
 java.lang.String CVResultsString()
          returns a string summarizing the results of repeated attribute selection runs on splits of a dataset.
static void main(java.lang.String[] args)
          Main method for testing this class.
private static java.lang.String makeOptionString(ASEvaluation ASEvaluator, ASSearch searchMethod)
          Make up the help string giving all the command line options
 int numberAttributesSelected()
          Return the number of attributes selected from the most recent run of attribute selection
private  java.lang.String printSelectionResults()
          Assembles a text description of the attribute selection results.
 double[][] rankedAttributes()
          get the final ranking of the attributes.
 Instance reduceDimensionality(Instance in)
          reduce the dimensionality of a single instance to include only those attributes chosen by the last run of attribute selection.
 Instances reduceDimensionality(Instances in)
          reduce the dimensionality of a set of instances to include only those attributes chosen by the last run of attribute selection.
static java.lang.String SelectAttributes(ASEvaluation ASEvaluator, java.lang.String[] options)
          Perform attribute selection with a particular evaluator and a set of options specifying search method and input file etc.
static java.lang.String SelectAttributes(ASEvaluation ASEvaluator, java.lang.String[] options, Instances train)
          Perform attribute selection with a particular evaluator and a set of options specifying search method and options for the search method and evaluator.
 void SelectAttributes(Instances data)
          Perform attribute selection on the supplied training instances.
 void selectAttributesCVSplit(Instances split)
          Select attributes for a split of the data.
 int[] selectedAttributes()
          get the final selected set of attributes.
 void setEvaluator(ASEvaluation evaluator)
          set the attribute/subset evaluator
 void setFolds(int folds)
          set the number of folds for cross validation
 void setRanking(boolean r)
          produce a ranking (if possible with the set search and evaluator)
 void setSearch(ASSearch search)
          set the search method
 void setSeed(int s)
          set the seed for use in cross validation
 void setThreshold(double t)
          set the threshold by which to select features from a ranked list
 void setXval(boolean x)
          do a cross validation
 java.lang.String toResultsString()
          get a description of the attribute selection
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

m_trainInstances

private Instances m_trainInstances
the instances to select attributes from


m_ASEvaluator

private ASEvaluation m_ASEvaluator
the attribute/subset evaluator


m_searchMethod

private ASSearch m_searchMethod
the search method


m_numFolds

private int m_numFolds
the number of folds to use for cross validation


m_selectionResults

private java.lang.StringBuffer m_selectionResults
holds a string describing the results of the attribute selection


m_doRank

private boolean m_doRank
rank features (if allowed by the search method)


m_doXval

private boolean m_doXval
do cross validation


m_seed

private int m_seed
seed used to randomly shuffle instances for cross validation


m_threshold

private double m_threshold
cutoff value by which to select attributes for ranked results


m_numToSelect

private int m_numToSelect
number of attributes requested from ranked results


m_selectedAttributeSet

private int[] m_selectedAttributeSet
the selected attributes


m_attributeRanking

private double[][] m_attributeRanking
the attribute indexes and associated merits if a ranking is produced


m_transformer

private AttributeTransformer m_transformer
if a feature selection run involves an attribute transformer


m_attributeFilter

private Remove m_attributeFilter
the attribute filter for processing instances with respect to the most recent feature selection run


m_rankResults

private double[][] m_rankResults
hold statistics for repeated feature selection, such as under cross validation


m_subsetResults

private double[] m_subsetResults

m_trials

private int m_trials
Constructor Detail

AttributeSelection

public AttributeSelection()
constructor. Sets defaults for each member varaible. Default attribute evaluator is CfsSubsetEval; default search method is BestFirst.

Method Detail

numberAttributesSelected

public int numberAttributesSelected()
                             throws java.lang.Exception
Return the number of attributes selected from the most recent run of attribute selection

Returns:
the number of attributes selected
Throws:
java.lang.Exception

selectedAttributes

public int[] selectedAttributes()
                         throws java.lang.Exception
get the final selected set of attributes.

Returns:
an array of attribute indexes
Throws:
java.lang.Exception - if attribute selection has not been performed yet

rankedAttributes

public double[][] rankedAttributes()
                            throws java.lang.Exception
get the final ranking of the attributes.

Returns:
a two dimensional array of ranked attribute indexes and their associated merit scores as doubles.
Throws:
java.lang.Exception - if a ranking has not been produced

setEvaluator

public void setEvaluator(ASEvaluation evaluator)
set the attribute/subset evaluator

Parameters:
evaluator - the evaluator to use

setSearch

public void setSearch(ASSearch search)
set the search method

Parameters:
search - the search method to use

setFolds

public void setFolds(int folds)
set the number of folds for cross validation

Parameters:
folds - the number of folds

setRanking

public void setRanking(boolean r)
produce a ranking (if possible with the set search and evaluator)

Parameters:
r - true if a ranking is to be produced

setXval

public void setXval(boolean x)
do a cross validation

Parameters:
x - true if a cross validation is to be performed

setSeed

public void setSeed(int s)
set the seed for use in cross validation

Parameters:
s - the seed

setThreshold

public void setThreshold(double t)
set the threshold by which to select features from a ranked list

Parameters:
t - the threshold

toResultsString

public java.lang.String toResultsString()
get a description of the attribute selection

Returns:
a String describing the results of attribute selection

reduceDimensionality

public Instances reduceDimensionality(Instances in)
                               throws java.lang.Exception
reduce the dimensionality of a set of instances to include only those attributes chosen by the last run of attribute selection.

Parameters:
in - the instances to be reduced
Returns:
a dimensionality reduced set of instances
Throws:
java.lang.Exception - if the instances can't be reduced

reduceDimensionality

public Instance reduceDimensionality(Instance in)
                              throws java.lang.Exception
reduce the dimensionality of a single instance to include only those attributes chosen by the last run of attribute selection.

Parameters:
in - the instance to be reduced
Returns:
a dimensionality reduced instance
Throws:
java.lang.Exception - if the instance can't be reduced

SelectAttributes

public static java.lang.String SelectAttributes(ASEvaluation ASEvaluator,
                                                java.lang.String[] options)
                                         throws java.lang.Exception
Perform attribute selection with a particular evaluator and a set of options specifying search method and input file etc.

Parameters:
ASEvaluator - an evaluator object
options - an array of options, not only for the evaluator but also the search method (if any) and an input data file
Returns:
the results of attribute selection as a String
Throws:
java.lang.Exception - if no training file is set

CVResultsString

public java.lang.String CVResultsString()
                                 throws java.lang.Exception
returns a string summarizing the results of repeated attribute selection runs on splits of a dataset.

Returns:
a summary of attribute selection results
Throws:
java.lang.Exception - if no attribute selection has been performed.

selectAttributesCVSplit

public void selectAttributesCVSplit(Instances split)
                             throws java.lang.Exception
Select attributes for a split of the data. Calling this function updates the statistics on attribute selection. CVResultsString() returns a string summarizing the results of repeated calls to this function. Assumes that splits are from the same dataset--- ie. have the same number and types of attributes as previous splits.

Parameters:
split - the instances to select attributes from
Throws:
java.lang.Exception - if an error occurs

CrossValidateAttributes

public java.lang.String CrossValidateAttributes()
                                         throws java.lang.Exception
Perform a cross validation for attribute selection. With subset evaluators the number of times each attribute is selected over the cross validation is reported. For attribute evaluators, the average merit and average ranking + std deviation is reported for each attribute.

Returns:
the results of cross validation as a String
Throws:
java.lang.Exception - if an error occurs during cross validation

SelectAttributes

public void SelectAttributes(Instances data)
                      throws java.lang.Exception
Perform attribute selection on the supplied training instances.

Parameters:
data - the instances to select attributes from
Throws:
java.lang.Exception - if there is a problem during selection

SelectAttributes

public static java.lang.String SelectAttributes(ASEvaluation ASEvaluator,
                                                java.lang.String[] options,
                                                Instances train)
                                         throws java.lang.Exception
Perform attribute selection with a particular evaluator and a set of options specifying search method and options for the search method and evaluator.

Parameters:
ASEvaluator - an evaluator object
options - an array of options, not only for the evaluator but also the search method (if any) and an input data file
train - the input instances
Returns:
the results of attribute selection as a String
Throws:
java.lang.Exception - if incorrect options are supplied

printSelectionResults

private java.lang.String printSelectionResults()
Assembles a text description of the attribute selection results.

Returns:
a string describing the results of attribute selection.

makeOptionString

private static java.lang.String makeOptionString(ASEvaluation ASEvaluator,
                                                 ASSearch searchMethod)
                                          throws java.lang.Exception
Make up the help string giving all the command line options

Parameters:
ASEvaluator - the attribute evaluator to include options for
searchMethod - the search method to include options for
Returns:
a string detailing the valid command line options
Throws:
java.lang.Exception

main

public static void main(java.lang.String[] args)
Main method for testing this class.

Parameters:
args - the options