weka.attributeSelection
Class ReliefFAttributeEval

java.lang.Object
  extended byweka.attributeSelection.ASEvaluation
      extended byweka.attributeSelection.AttributeEvaluator
          extended byweka.attributeSelection.ReliefFAttributeEval
All Implemented Interfaces:
OptionHandler, java.io.Serializable

public class ReliefFAttributeEval
extends AttributeEvaluator
implements OptionHandler

Class for Evaluating attributes individually using ReliefF.

For more information see:

Kira, K. and Rendell, L. A. (1992). A practical approach to feature selection. In D. Sleeman and P. Edwards, editors, Proceedings of the International Conference on Machine Learning, pages 249-256. Morgan Kaufmann.

Kononenko, I. (1994). Estimating attributes: analysis and extensions of Relief. In De Raedt, L. and Bergadano, F., editors, Machine Learning: ECML-94, pages 171-182. Springer Verlag.

Marko Robnik Sikonja, Igor Kononenko: An adaptation of Relief for attribute estimation on regression. In D.Fisher (ed.): Machine Learning, Proceedings of 14th International Conference on Machine Learning ICML'97, Nashville, TN, 1997.

Valid options are: -M
Specify the number of instances to sample when estimating attributes.
If not specified then all instances will be used.

-D
Seed for randomly sampling instances.

-K
Number of nearest neighbours to use for estimating attributes.
(Default is 10).

-W
Weight nearest neighbours by distance.

-A
Specify sigma value (used in an exp function to control how quickly
weights decrease for more distant instances). Use in conjunction with
-W. Sensible values = 1/5 to 1/10 the number of nearest neighbours.

Version:
$Revision: 1.15 $
Author:
Mark Hall (mhall@cs.waikato.ac.nz)
See Also:
Serialized Form

Field Summary
private  int m_classIndex
          The class index
private  double[] m_classProbs
          Prior class probabilities (discrete class case)
private  int[] m_index
          Index in the m_karray of the farthest instance for each class
private  double[][][] m_karray
          k nearest scores + instance indexes for n classes
private  int m_Knn
          The number of nearest hits/misses
private  double[] m_maxArray
          Upper bound for numeric attributes
private  double[] m_minArray
          Lower bound for numeric attributes
private  double[] m_nda
          Used to hold the prob of different value of an attribute given nearest instances (numeric class case)
private  double m_ndc
          Used to hold the probability of a different class val given nearest instances (numeric class)
private  double[] m_ndcda
          Used to hold the prob of a different class val and different att val given nearest instances (numeric class case)
private  int m_numAttribs
          The number of attributes
private  int m_numClasses
          The number of classes if class is nominal
private  boolean m_numericClass
          Numeric class
private  int m_numInstances
          The number of instances
private  int m_sampleM
          The number of instances to sample when estimating attributes default == -1, use all instances
private  int m_seed
          Random number seed used for sampling instances
private  int m_sigma
           
private  int[] m_stored
          Number of nearest neighbours stored of each class
private  Instances m_trainInstances
          The training instances
private  boolean m_weightByDistance
          Weight by distance rather than equal weights
private  double[] m_weights
          Holds the weights that relief assigns to attributes
private  double[] m_weightsByRank
          used to (optionally) weight nearest neighbours by their distance from the instance in question.
private  double[] m_worst
          Keep track of the farthest instance for each class
 
Constructor Summary
ReliefFAttributeEval()
          Constructor
 
Method Summary
 void buildEvaluator(Instances data)
          Initializes a ReliefF attribute evaluator.
private  double difference(int index, double val1, double val2)
          Computes the difference between two given attribute values.
private  double distance(Instance first, Instance second)
          Calculates the distance between two instances
 double evaluateAttribute(int attribute)
          Evaluates an individual attribute using ReliefF's instance based approach.
private  void findKHitMiss(int instNum)
          Find the K nearest instances to supplied instance if the class is numeric, or the K nearest Hits (same class) and Misses (K from each of the other classes) if the class is discrete.
 int getNumNeighbours()
          Get the number of nearest neighbours
 java.lang.String[] getOptions()
          Gets the current settings of ReliefFAttributeEval.
 int getSampleSize()
          Get the number of instances used for estimating attributes
 int getSeed()
          Get the seed used for randomly sampling instances.
 int getSigma()
          Get the value of sigma.
 boolean getWeightByDistance()
          Get whether nearest neighbours are being weighted by distance
 java.lang.String globalInfo()
          Returns a string describing this attribute evaluator
 java.util.Enumeration listOptions()
          Returns an enumeration describing the available options.
static void main(java.lang.String[] args)
          Main method for testing this class.
private  double norm(double x, int i)
          Normalizes a given value of a numeric attribute.
 java.lang.String numNeighboursTipText()
          Returns the tip text for this property
protected  void resetOptions()
          Reset options to their default values
 java.lang.String sampleSizeTipText()
          Returns the tip text for this property
 java.lang.String seedTipText()
          Returns the tip text for this property
 void setNumNeighbours(int n)
          Set the number of nearest neighbours
 void setOptions(java.lang.String[] options)
          Parses a given list of options.
 void setSampleSize(int s)
          Set the number of instances to sample for attribute estimation
 void setSeed(int s)
          Set the random number seed for randomly sampling instances.
 void setSigma(int s)
          Sets the sigma value.
 void setWeightByDistance(boolean b)
          Set the nearest neighbour weighting method
 java.lang.String sigmaTipText()
          Returns the tip text for this property
 java.lang.String toString()
          Return a description of the ReliefF attribute evaluator.
private  void updateMinMax(Instance instance)
          Updates the minimum and maximum values for all the attributes based on a new instance.
private  void updateWeightsDiscreteClass(int instNum)
          update attribute weights given an instance when the class is discrete
private  void updateWeightsNumericClass(int instNum)
          update attribute weights given an instance when the class is numeric
 java.lang.String weightByDistanceTipText()
          Returns the tip text for this property
 
Methods inherited from class weka.attributeSelection.ASEvaluation
forName, makeCopies, postProcess
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

m_trainInstances

private Instances m_trainInstances
The training instances


m_classIndex

private int m_classIndex
The class index


m_numAttribs

private int m_numAttribs
The number of attributes


m_numInstances

private int m_numInstances
The number of instances


m_numericClass

private boolean m_numericClass
Numeric class


m_numClasses

private int m_numClasses
The number of classes if class is nominal


m_ndc

private double m_ndc
Used to hold the probability of a different class val given nearest instances (numeric class)


m_nda

private double[] m_nda
Used to hold the prob of different value of an attribute given nearest instances (numeric class case)


m_ndcda

private double[] m_ndcda
Used to hold the prob of a different class val and different att val given nearest instances (numeric class case)


m_weights

private double[] m_weights
Holds the weights that relief assigns to attributes


m_classProbs

private double[] m_classProbs
Prior class probabilities (discrete class case)


m_sampleM

private int m_sampleM
The number of instances to sample when estimating attributes default == -1, use all instances


m_Knn

private int m_Knn
The number of nearest hits/misses


m_karray

private double[][][] m_karray
k nearest scores + instance indexes for n classes


m_maxArray

private double[] m_maxArray
Upper bound for numeric attributes


m_minArray

private double[] m_minArray
Lower bound for numeric attributes


m_worst

private double[] m_worst
Keep track of the farthest instance for each class


m_index

private int[] m_index
Index in the m_karray of the farthest instance for each class


m_stored

private int[] m_stored
Number of nearest neighbours stored of each class


m_seed

private int m_seed
Random number seed used for sampling instances


m_weightsByRank

private double[] m_weightsByRank
used to (optionally) weight nearest neighbours by their distance from the instance in question. Each entry holds exp(-((rank(r_i, i_j)/sigma)^2)) where rank(r_i,i_j) is the rank of instance i_j in a sequence of instances ordered by the distance from r_i. sigma is a user defined parameter, default=20


m_sigma

private int m_sigma

m_weightByDistance

private boolean m_weightByDistance
Weight by distance rather than equal weights

Constructor Detail

ReliefFAttributeEval

public ReliefFAttributeEval()
Constructor

Method Detail

globalInfo

public java.lang.String globalInfo()
Returns a string describing this attribute evaluator

Returns:
a description of the evaluator suitable for displaying in the explorer/experimenter gui

listOptions

public java.util.Enumeration listOptions()
Returns an enumeration describing the available options.

Specified by:
listOptions in interface OptionHandler
Returns:
an enumeration of all the available options.

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a given list of options. Valid options are:

-M
Specify the number of instances to sample when estimating attributes.
If not specified then all instances will be used.

-D
Seed for randomly sampling instances.

-K
Number of nearest neighbours to use for estimating attributes.
(Default is 10).

-W
Weight nearest neighbours by distance.

-A
Specify sigma value (used in an exp function to control how quickly
weights decrease for more distant instances). Use in conjunction with
-W. Sensible values = 1/5 to 1/10 the number of nearest neighbours.

Specified by:
setOptions in interface OptionHandler
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

sigmaTipText

public java.lang.String sigmaTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setSigma

public void setSigma(int s)
              throws java.lang.Exception
Sets the sigma value.

Parameters:
s - the value of sigma (> 0)
Throws:
java.lang.Exception - if s is not positive

getSigma

public int getSigma()
Get the value of sigma.

Returns:
the sigma value.

numNeighboursTipText

public java.lang.String numNeighboursTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setNumNeighbours

public void setNumNeighbours(int n)
Set the number of nearest neighbours

Parameters:
n - the number of nearest neighbours.

getNumNeighbours

public int getNumNeighbours()
Get the number of nearest neighbours

Returns:
the number of nearest neighbours

seedTipText

public java.lang.String seedTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setSeed

public void setSeed(int s)
Set the random number seed for randomly sampling instances.

Parameters:
s - the random number seed.

getSeed

public int getSeed()
Get the seed used for randomly sampling instances.

Returns:
the random number seed.

sampleSizeTipText

public java.lang.String sampleSizeTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setSampleSize

public void setSampleSize(int s)
Set the number of instances to sample for attribute estimation

Parameters:
s - the number of instances to sample.

getSampleSize

public int getSampleSize()
Get the number of instances used for estimating attributes

Returns:
the number of instances.

weightByDistanceTipText

public java.lang.String weightByDistanceTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setWeightByDistance

public void setWeightByDistance(boolean b)
Set the nearest neighbour weighting method

Parameters:
b - true nearest neighbours are to be weighted by distance.

getWeightByDistance

public boolean getWeightByDistance()
Get whether nearest neighbours are being weighted by distance

Returns:
m_weightByDiffernce

getOptions

public java.lang.String[] getOptions()
Gets the current settings of ReliefFAttributeEval.

Specified by:
getOptions in interface OptionHandler
Returns:
an array of strings suitable for passing to setOptions()

toString

public java.lang.String toString()
Return a description of the ReliefF attribute evaluator.

Returns:
a description of the evaluator as a String.

buildEvaluator

public void buildEvaluator(Instances data)
                    throws java.lang.Exception
Initializes a ReliefF attribute evaluator.

Specified by:
buildEvaluator in class ASEvaluation
Parameters:
data - set of instances serving as training data
Throws:
java.lang.Exception - if the evaluator has not been generated successfully

evaluateAttribute

public double evaluateAttribute(int attribute)
                         throws java.lang.Exception
Evaluates an individual attribute using ReliefF's instance based approach. The actual work is done by buildEvaluator which evaluates all features.

Specified by:
evaluateAttribute in class AttributeEvaluator
Parameters:
attribute - the index of the attribute to be evaluated
Returns:
the "merit" of the attribute
Throws:
java.lang.Exception - if the attribute could not be evaluated

resetOptions

protected void resetOptions()
Reset options to their default values


norm

private double norm(double x,
                    int i)
Normalizes a given value of a numeric attribute.

Parameters:
x - the value to be normalized
i - the attribute's index

updateMinMax

private void updateMinMax(Instance instance)
Updates the minimum and maximum values for all the attributes based on a new instance.

Parameters:
instance - the new instance

difference

private double difference(int index,
                          double val1,
                          double val2)
Computes the difference between two given attribute values.


distance

private double distance(Instance first,
                        Instance second)
Calculates the distance between two instances

Returns:
the distance between the two given instances, between 0 and 1

updateWeightsNumericClass

private void updateWeightsNumericClass(int instNum)
update attribute weights given an instance when the class is numeric

Parameters:
instNum - the index of the instance to use when updating weights

updateWeightsDiscreteClass

private void updateWeightsDiscreteClass(int instNum)
update attribute weights given an instance when the class is discrete

Parameters:
instNum - the index of the instance to use when updating weights

findKHitMiss

private void findKHitMiss(int instNum)
Find the K nearest instances to supplied instance if the class is numeric, or the K nearest Hits (same class) and Misses (K from each of the other classes) if the class is discrete.

Parameters:
instNum - the index of the instance to find nearest neighbours of

main

public static void main(java.lang.String[] args)
Main method for testing this class.

Parameters:
args - the options