weka.classifiers.functions
Class LeastMedSq

java.lang.Object
  extended byweka.classifiers.Classifier
      extended byweka.classifiers.functions.LeastMedSq
All Implemented Interfaces:
java.lang.Cloneable, OptionHandler, java.io.Serializable

public class LeastMedSq
extends Classifier
implements OptionHandler

Implements a least median sqaured linear regression utilising the existing weka LinearRegression class to form predictions. The basis of the algorithm is Robust regression and outlier detection Peter J. Rousseeuw, Annick M. Leroy. c1987

Version:
$Revision: 1.9 $
Author:
Tony Voyle (tv6@waikato.ac.nz)
See Also:
Serialized Form

Field Summary
private  double m_bestMedian
           
private  LinearRegression m_bestRegression
           
private  LinearRegression m_currentRegression
           
private  Instances m_Data
           
private  boolean m_debug
           
private  boolean m_israndom
           
private  LinearRegression m_ls
           
private  ReplaceMissingValues m_MissingFilter
           
private  java.util.Random m_random
           
private  long m_randomseed
           
private  double[] m_Residuals
           
private  Instances m_RLSData
           
private  int m_samples
           
private  int m_samplesize
           
private  double m_scalefactor
           
private  RemoveRange m_SplitFilter
           
private  double m_SSR
           
private  Instances m_SubSample
           
private  NominalToBinary m_TransformFilter
           
private  double[] m_weight
           
 
Fields inherited from class weka.classifiers.Classifier
m_Debug
 
Constructor Summary
LeastMedSq()
           
 
Method Summary
 void buildClassifier(Instances data)
          Build lms regression
private  void buildRLSRegression()
          Builds a new LinearRegression without the 'bad' data found by buildWeight
private  void buildWeight()
          Builds a weight function removing instances with an abnormally high scaled residual
 double classifyInstance(Instance instance)
          Classify a given instance using the best generated LinearRegression Classifier.
private  void cleanUpData(Instances data)
          Cleans up data
static int combinations(int n, int r)
          Produces the combination nCr
private  void findBestRegression()
          Finds the best regression generated from m_samples random samples from the training data
private  void findResiduals()
          Finds residuals (squared) for the current regression.
private  void genRegression()
          Generates a LinearRegression classifier from the current m_SubSample
 boolean getDebug()
          Returns whether or not debugging output shouild be printed
private  void getMedian()
          finds the median residual squared for the current regression
 java.lang.String[] getOptions()
          Gets the current option settings for the OptionHandler.
 long getRandomSeed()
          get the seed for the random number generator
private  void getSamples()
          Gets the number of samples to use.
 int getSampleSize()
          gets number of samples
 java.lang.String globalInfo()
          Returns a string describing this classifier
 java.util.Enumeration listOptions()
          Returns an enumeration of all the available options..
static void main(java.lang.String[] argv)
          generate a Linear regression predictor for testing
private static int partition(double[] a, int l, int r)
          Partitions an array of numbers such that all numbers less than that at index r, between indexes l and r will have a smaller index and all numbers greater than will have a larger index
 java.lang.String randomSeedTipText()
          Returns the tip text for this property
 java.lang.String sampleSizeTipText()
          Returns the tip text for this property
private static void select(double[] a, int l, int r, int k)
          Finds the kth number in an array
private  java.lang.String selectIndices(Instances data)
          Returns a string suitable for passing to RemoveRange consisting of m_samplesize indices.
private  void selectSubSample(Instances data)
          Produces a random sample from m_Data in m_SubSample
 void setDebug(boolean debug)
          sets whether or not debugging output shouild be printed
 void setOptions(java.lang.String[] options)
          Sets the OptionHandler's options using the given list.
private  void setRandom()
          Set up the random number generator
 void setRandomSeed(long randomseed)
          Set the seed for the random number generator
 void setSampleSize(int samplesize)
          sets number of samples
 java.lang.String toString()
          Returns a string representing the best LinearRegression classifier found.
 
Methods inherited from class weka.classifiers.Classifier
debugTipText, distributionForInstance, forName, makeCopies
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

m_Residuals

private double[] m_Residuals

m_weight

private double[] m_weight

m_SSR

private double m_SSR

m_scalefactor

private double m_scalefactor

m_bestMedian

private double m_bestMedian

m_currentRegression

private LinearRegression m_currentRegression

m_bestRegression

private LinearRegression m_bestRegression

m_ls

private LinearRegression m_ls

m_Data

private Instances m_Data

m_RLSData

private Instances m_RLSData

m_SubSample

private Instances m_SubSample

m_MissingFilter

private ReplaceMissingValues m_MissingFilter

m_TransformFilter

private NominalToBinary m_TransformFilter

m_SplitFilter

private RemoveRange m_SplitFilter

m_samplesize

private int m_samplesize

m_samples

private int m_samples

m_israndom

private boolean m_israndom

m_debug

private boolean m_debug

m_random

private java.util.Random m_random

m_randomseed

private long m_randomseed
Constructor Detail

LeastMedSq

public LeastMedSq()
Method Detail

globalInfo

public java.lang.String globalInfo()
Returns a string describing this classifier

Returns:
a description of the classifier suitable for displaying in the explorer/experimenter gui

buildClassifier

public void buildClassifier(Instances data)
                     throws java.lang.Exception
Build lms regression

Specified by:
buildClassifier in class Classifier
Parameters:
data - training data
Throws:
java.lang.Exception - if an error occurs

classifyInstance

public double classifyInstance(Instance instance)
                        throws java.lang.Exception
Classify a given instance using the best generated LinearRegression Classifier.

Overrides:
classifyInstance in class Classifier
Parameters:
instance - instance to be classified
Returns:
class value
Throws:
java.lang.Exception - if an error occurs

cleanUpData

private void cleanUpData(Instances data)
                  throws java.lang.Exception
Cleans up data

Parameters:
data - data to be cleaned up
Throws:
java.lang.Exception - if an error occurs

getSamples

private void getSamples()
                 throws java.lang.Exception
Gets the number of samples to use.

Throws:
java.lang.Exception

setRandom

private void setRandom()
Set up the random number generator


findBestRegression

private void findBestRegression()
                         throws java.lang.Exception
Finds the best regression generated from m_samples random samples from the training data

Throws:
java.lang.Exception - if an error occurs

genRegression

private void genRegression()
                    throws java.lang.Exception
Generates a LinearRegression classifier from the current m_SubSample

Throws:
java.lang.Exception - if an error occurs

findResiduals

private void findResiduals()
                    throws java.lang.Exception
Finds residuals (squared) for the current regression.

Throws:
java.lang.Exception - if an error occurs

getMedian

private void getMedian()
                throws java.lang.Exception
finds the median residual squared for the current regression

Throws:
java.lang.Exception - if an error occurs

toString

public java.lang.String toString()
Returns a string representing the best LinearRegression classifier found.

Returns:
String representing the regression

buildWeight

private void buildWeight()
                  throws java.lang.Exception
Builds a weight function removing instances with an abnormally high scaled residual

Throws:
java.lang.Exception

buildRLSRegression

private void buildRLSRegression()
                         throws java.lang.Exception
Builds a new LinearRegression without the 'bad' data found by buildWeight

Throws:
java.lang.Exception

select

private static void select(double[] a,
                           int l,
                           int r,
                           int k)
Finds the kth number in an array

Parameters:
a - an array of numbers
l - left pointer
r - right pointer
k - position of number to be found

partition

private static int partition(double[] a,
                             int l,
                             int r)
Partitions an array of numbers such that all numbers less than that at index r, between indexes l and r will have a smaller index and all numbers greater than will have a larger index

Parameters:
a - an array of numbers
l - left pointer
r - right pointer
Returns:
final index of number originally at r

selectSubSample

private void selectSubSample(Instances data)
                      throws java.lang.Exception
Produces a random sample from m_Data in m_SubSample

Parameters:
data - data from which to take sample
Throws:
java.lang.Exception - if an error occurs

selectIndices

private java.lang.String selectIndices(Instances data)
Returns a string suitable for passing to RemoveRange consisting of m_samplesize indices.

Parameters:
data - dataset from which to take indicese
Returns:
string of indices suitable for passing to RemoveRange

sampleSizeTipText

public java.lang.String sampleSizeTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setSampleSize

public void setSampleSize(int samplesize)
sets number of samples

Parameters:
samplesize - value

getSampleSize

public int getSampleSize()
gets number of samples

Returns:
value

randomSeedTipText

public java.lang.String randomSeedTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setRandomSeed

public void setRandomSeed(long randomseed)
Set the seed for the random number generator

Parameters:
randomseed - the seed

getRandomSeed

public long getRandomSeed()
get the seed for the random number generator

Returns:
the seed value

setDebug

public void setDebug(boolean debug)
sets whether or not debugging output shouild be printed

Overrides:
setDebug in class Classifier
Parameters:
debug - true if debugging output selected

getDebug

public boolean getDebug()
Returns whether or not debugging output shouild be printed

Overrides:
getDebug in class Classifier
Returns:
true if debuging output selected

listOptions

public java.util.Enumeration listOptions()
Returns an enumeration of all the available options..

Specified by:
listOptions in interface OptionHandler
Overrides:
listOptions in class Classifier
Returns:
an enumeration of all available options.

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Sets the OptionHandler's options using the given list. All options will be set (or reset) during this call (i.e. incremental setting of options is not possible).

Specified by:
setOptions in interface OptionHandler
Overrides:
setOptions in class Classifier
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

getOptions

public java.lang.String[] getOptions()
Gets the current option settings for the OptionHandler.

Specified by:
getOptions in interface OptionHandler
Overrides:
getOptions in class Classifier
Returns:
the list of current option settings as an array of strings

combinations

public static int combinations(int n,
                               int r)
                        throws java.lang.Exception
Produces the combination nCr

Parameters:
n -
Returns:
the combination
Throws:
java.lang.Exception - if r is greater than n

main

public static void main(java.lang.String[] argv)
generate a Linear regression predictor for testing

Parameters:
argv - options