weka.attributeSelection
Class ConsistencySubsetEval

java.lang.Object
  extended byweka.attributeSelection.ASEvaluation
      extended byweka.attributeSelection.SubsetEvaluator
          extended byweka.attributeSelection.ConsistencySubsetEval
All Implemented Interfaces:
java.io.Serializable

public class ConsistencySubsetEval
extends SubsetEvaluator

Consistency attribute subset evaluator.

For more information see:
Liu, H., and Setiono, R., (1996). A probabilistic approach to feature selection - A filter solution. In 13th International Conference on Machine Learning (ICML'96), July 1996, pp. 319-327. Bari, Italy.

Version:
$Revision: 1.10 $
Author:
Mark Hall (mhall@cs.waikato.ac.nz)
See Also:
Serialized Form

Nested Class Summary
 class ConsistencySubsetEval.hashKey
          Class providing keys to the hash table.
 
Field Summary
private  int m_classIndex
          class index
private  Discretize m_disTransform
          Discretise numeric attributes
private  int m_numAttribs
          number of attributes in the training data
private  int m_numInstances
          number of instances in the training data
private  java.util.Hashtable m_table
          Hash table for evaluating feature subsets
private  Instances m_trainInstances
          training instances
 
Constructor Summary
ConsistencySubsetEval()
          Constructor.
 
Method Summary
 void buildEvaluator(Instances data)
          Generates a attribute evaluator.
private  double consistencyCount()
          calculates the level of consistency in a dataset using a subset of features.
 double evaluateSubset(java.util.BitSet subset)
          Evaluates a subset of attributes
 java.lang.String globalInfo()
          Returns a string describing this search method
private  void insertIntoTable(Instance inst, double[] instA)
          Inserts an instance into the hash table
static void main(java.lang.String[] args)
          Main method for testing this class.
private  void resetOptions()
          reset to defaults
 java.lang.String toString()
          returns a description of the evaluator
 
Methods inherited from class weka.attributeSelection.ASEvaluation
forName, makeCopies, postProcess
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

m_trainInstances

private Instances m_trainInstances
training instances


m_classIndex

private int m_classIndex
class index


m_numAttribs

private int m_numAttribs
number of attributes in the training data


m_numInstances

private int m_numInstances
number of instances in the training data


m_disTransform

private Discretize m_disTransform
Discretise numeric attributes


m_table

private java.util.Hashtable m_table
Hash table for evaluating feature subsets

Constructor Detail

ConsistencySubsetEval

public ConsistencySubsetEval()
Constructor. Calls restOptions to set default options

Method Detail

globalInfo

public java.lang.String globalInfo()
Returns a string describing this search method

Returns:
a description of the search suitable for displaying in the explorer/experimenter gui

resetOptions

private void resetOptions()
reset to defaults


buildEvaluator

public void buildEvaluator(Instances data)
                    throws java.lang.Exception
Generates a attribute evaluator. Has to initialize all fields of the evaluator that are not being set via options.

Specified by:
buildEvaluator in class ASEvaluation
Parameters:
data - set of instances serving as training data
Throws:
java.lang.Exception - if the evaluator has not been generated successfully

evaluateSubset

public double evaluateSubset(java.util.BitSet subset)
                      throws java.lang.Exception
Evaluates a subset of attributes

Specified by:
evaluateSubset in class SubsetEvaluator
Parameters:
subset - a bitset representing the attribute subset to be evaluated
Returns:
the "merit" of the subset
Throws:
java.lang.Exception - if the subset could not be evaluated

consistencyCount

private double consistencyCount()
calculates the level of consistency in a dataset using a subset of features. The consistency of a hash table entry is the total number of instances hashed to that location minus the number of instances in the largest class hashed to that location. The total consistency is 1.0 minus the sum of the individual consistencies divided by the total number of instances.

Returns:
the consistency of the hash table as a value between 0 and 1.

insertIntoTable

private void insertIntoTable(Instance inst,
                             double[] instA)
                      throws java.lang.Exception
Inserts an instance into the hash table

Parameters:
inst - instance to be inserted
instA - the instance to be inserted as an array of attribute values.
Throws:
java.lang.Exception - if the instance can't be inserted

toString

public java.lang.String toString()
returns a description of the evaluator

Returns:
a description of the evaluator as a String.

main

public static void main(java.lang.String[] args)
Main method for testing this class.

Parameters:
args - the options