weka.attributeSelection
Class GeneticSearch

java.lang.Object
  extended byweka.attributeSelection.ASSearch
      extended byweka.attributeSelection.GeneticSearch
All Implemented Interfaces:
OptionHandler, java.io.Serializable, StartSetHandler

public class GeneticSearch
extends ASSearch
implements StartSetHandler, OptionHandler

Class for performing a genetic based search.

For more information see:

David E. Goldberg (1989). Genetic algorithms in search, optimization and machine learning. Addison-Wesley.

Valid options are:

-Z
Sets the size of the population. (default = 20).

-G
Sets the number of generations to perform. (default = 5).

-C
Sets the probability that crossover will occur. (default = .6).

-M
Sets the probability that a feature will be toggled on/off.

-R
Sets how frequently reports will be generated. Eg, setting the value to 5 will generate a report every 5th generation.

(default = number of generations).

-S
Sets the seed for random number generation.

Version:
$Revision: 1.14 $
Author:
Mark Hall (mhall@cs.waikato.ac.nz)
See Also:
Serialized Form

Nested Class Summary
protected  class GeneticSearch.GABitSet
           
 
Field Summary
private  double m_avgFitness
           
private  GeneticSearch.GABitSet m_best
          the best population member found during the search
private  int m_bestFeatureCount
          the number of features in the best population member
private  int m_classIndex
          holds the class index
private  java.lang.StringBuffer m_generationReports
          holds the generation reports
private  boolean m_hasClass
          does the data have a class
private  java.util.Hashtable m_lookupTable
          the lookup table
private  int m_lookupTableSize
          the number of entries to cache for lookup
private  double m_maxFitness
           
private  int m_maxGenerations
          the maximum number of generations to evaluate
private  double m_minFitness
           
private  int m_numAttribs
          number of attributes in the data
private  double m_pCrossover
          the probability of crossover occuring
private  double m_pMutation
          the probability of mutation occuring
private  int m_popSize
          the number of individual solutions
private  GeneticSearch.GABitSet[] m_population
          the current population
private  java.util.Random m_random
          random number generation
private  int m_reportFrequency
          how often reports are generated
private  int m_seed
          seed for random number generation
private  int[] m_starting
          holds a starting set as an array of attributes.
private  Range m_startRange
          holds the start set for the search as a Range
private  double m_sumFitness
          sum of the current population fitness
 
Constructor Summary
GeneticSearch()
          Constructor.
 
Method Summary
private  int[] attributeList(java.util.BitSet group)
          converts a BitSet into a list of attribute indexes
private  boolean checkBest()
          checks to see if any population members in the current population are better than the best found so far.
private  int countFeatures(java.util.BitSet featureSet)
          counts the number of features in a subset
 java.lang.String crossoverProbTipText()
          Returns the tip text for this property
private  void evaluatePopulation(SubsetEvaluator ASEvaluator)
          evaluates an entire population.
private  void generation()
          performs a single generation---selection, crossover, and mutation
 double getCrossoverProb()
          get the probability of crossover
 int getMaxGenerations()
          get the number of generations
 double getMutationProb()
          get the probability of mutation
 java.lang.String[] getOptions()
          Gets the current settings of ReliefFAttributeEval.
 int getPopulationSize()
          get the size of the population
 int getReportFrequency()
          get how often repports are generated
 int getSeed()
          get the value of the random number generator's seed
 java.lang.String getStartSet()
          Returns a list of attributes (and or attribute ranges) as a String
 java.lang.String globalInfo()
          Returns a string describing this search method
private  void initPopulation()
          creates random population members for the initial population.
 java.util.Enumeration listOptions()
          Returns an enumeration describing the available options.
 java.lang.String maxGenerationsTipText()
          Returns the tip text for this property
 java.lang.String mutationProbTipText()
          Returns the tip text for this property
private  java.lang.String populationReport(int genNum)
          generates a report on the current population
 java.lang.String populationSizeTipText()
          Returns the tip text for this property
private  void populationStatistics()
          calculates summary statistics for the current population
private  java.lang.String printPopChrom(java.util.BitSet temp)
          prints a population member's chromosome
private  java.lang.String printPopMember(java.util.BitSet temp)
          prints a population member as a series of attribute numbers
 java.lang.String reportFrequencyTipText()
          Returns the tip text for this property
private  void resetOptions()
          reset to default values for options
private  void scalePopulation()
          scales the raw (objective) merit of the population members
 int[] search(ASEvaluation ASEval, Instances data)
          Searches the attribute subset space using a genetic algorithm.
 java.lang.String seedTipText()
          Returns the tip text for this property
private  int select()
          selects a population member to be considered for crossover
 void setCrossoverProb(double c)
          set the probability of crossover
 void setMaxGenerations(int m)
          set the number of generations to evaluate
 void setMutationProb(double m)
          set the probability of mutation
 void setOptions(java.lang.String[] options)
          Parses a given list of options.
 void setPopulationSize(int p)
          set the population size
 void setReportFrequency(int f)
          set how often reports are generated
 void setSeed(int s)
          set the seed for random number generation
 void setStartSet(java.lang.String startSet)
          Sets a starting set of attributes for the search.
 java.lang.String startSetTipText()
          Returns the tip text for this property
private  java.lang.String startSetToString()
          converts the array of starting attributes to a string.
 java.lang.String toString()
          returns a description of the search
 
Methods inherited from class weka.attributeSelection.ASSearch
forName
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

m_starting

private int[] m_starting
holds a starting set as an array of attributes. Becomes one member of the initial random population


m_startRange

private Range m_startRange
holds the start set for the search as a Range


m_hasClass

private boolean m_hasClass
does the data have a class


m_classIndex

private int m_classIndex
holds the class index


m_numAttribs

private int m_numAttribs
number of attributes in the data


m_population

private GeneticSearch.GABitSet[] m_population
the current population


m_popSize

private int m_popSize
the number of individual solutions


m_best

private GeneticSearch.GABitSet m_best
the best population member found during the search


m_bestFeatureCount

private int m_bestFeatureCount
the number of features in the best population member


m_lookupTableSize

private int m_lookupTableSize
the number of entries to cache for lookup


m_lookupTable

private java.util.Hashtable m_lookupTable
the lookup table


m_random

private java.util.Random m_random
random number generation


m_seed

private int m_seed
seed for random number generation


m_pCrossover

private double m_pCrossover
the probability of crossover occuring


m_pMutation

private double m_pMutation
the probability of mutation occuring


m_sumFitness

private double m_sumFitness
sum of the current population fitness


m_maxFitness

private double m_maxFitness

m_minFitness

private double m_minFitness

m_avgFitness

private double m_avgFitness

m_maxGenerations

private int m_maxGenerations
the maximum number of generations to evaluate


m_reportFrequency

private int m_reportFrequency
how often reports are generated


m_generationReports

private java.lang.StringBuffer m_generationReports
holds the generation reports

Constructor Detail

GeneticSearch

public GeneticSearch()
Constructor. Make a new GeneticSearch object

Method Detail

listOptions

public java.util.Enumeration listOptions()
Returns an enumeration describing the available options.

Specified by:
listOptions in interface OptionHandler
Returns:
an enumeration of all the available options.

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a given list of options. Valid options are:

-Z
Sets the size of the population. (default = 20).

-G
Sets the number of generations to perform. (default = 5).

-C
Sets the probability that crossover will occur. (default = .6).

-M
Sets the probability that a feature will be toggled on/off.

-R
Sets how frequently reports will be generated. Eg, setting the value to 5 will generate a report every 5th generation.

(default = number of generations).

-S
Sets the seed for random number generation.

Specified by:
setOptions in interface OptionHandler
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

getOptions

public java.lang.String[] getOptions()
Gets the current settings of ReliefFAttributeEval.

Specified by:
getOptions in interface OptionHandler
Returns:
an array of strings suitable for passing to setOptions()

startSetTipText

public java.lang.String startSetTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setStartSet

public void setStartSet(java.lang.String startSet)
                 throws java.lang.Exception
Sets a starting set of attributes for the search. It is the search method's responsibility to report this start set (if any) in its toString() method.

Specified by:
setStartSet in interface StartSetHandler
Parameters:
startSet - a string containing a list of attributes (and or ranges), eg. 1,2,6,10-15.
Throws:
java.lang.Exception - if start set can't be set.

getStartSet

public java.lang.String getStartSet()
Returns a list of attributes (and or attribute ranges) as a String

Specified by:
getStartSet in interface StartSetHandler
Returns:
a list of attributes (and or attribute ranges)

seedTipText

public java.lang.String seedTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setSeed

public void setSeed(int s)
set the seed for random number generation

Parameters:
s - seed value

getSeed

public int getSeed()
get the value of the random number generator's seed

Returns:
the seed for random number generation

reportFrequencyTipText

public java.lang.String reportFrequencyTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setReportFrequency

public void setReportFrequency(int f)
set how often reports are generated

Parameters:
f - generate reports every f generations

getReportFrequency

public int getReportFrequency()
get how often repports are generated

Returns:
how often reports are generated

mutationProbTipText

public java.lang.String mutationProbTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setMutationProb

public void setMutationProb(double m)
set the probability of mutation

Parameters:
m - the probability for mutation occuring

getMutationProb

public double getMutationProb()
get the probability of mutation

Returns:
the probability of mutation occuring

crossoverProbTipText

public java.lang.String crossoverProbTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setCrossoverProb

public void setCrossoverProb(double c)
set the probability of crossover

Parameters:
c - the probability that two population members will exchange genetic material

getCrossoverProb

public double getCrossoverProb()
get the probability of crossover

Returns:
the probability of crossover

maxGenerationsTipText

public java.lang.String maxGenerationsTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setMaxGenerations

public void setMaxGenerations(int m)
set the number of generations to evaluate

Parameters:
m - the number of generations

getMaxGenerations

public int getMaxGenerations()
get the number of generations

Returns:
the maximum number of generations

populationSizeTipText

public java.lang.String populationSizeTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setPopulationSize

public void setPopulationSize(int p)
set the population size

Parameters:
p - the size of the population

getPopulationSize

public int getPopulationSize()
get the size of the population

Returns:
the population size

globalInfo

public java.lang.String globalInfo()
Returns a string describing this search method

Returns:
a description of the search suitable for displaying in the explorer/experimenter gui

startSetToString

private java.lang.String startSetToString()
converts the array of starting attributes to a string. This is used by getOptions to return the actual attributes specified as the starting set. This is better than using m_startRanges.getRanges() as the same start set can be specified in different ways from the command line---eg 1,2,3 == 1-3. This is to ensure that stuff that is stored in a database is comparable.

Returns:
a comma seperated list of individual attribute numbers as a String

toString

public java.lang.String toString()
returns a description of the search

Returns:
a description of the search as a String

search

public int[] search(ASEvaluation ASEval,
                    Instances data)
             throws java.lang.Exception
Searches the attribute subset space using a genetic algorithm.

Specified by:
search in class ASSearch
Parameters:
data - the training instances.
ASEval - the attribute evaluator to guide the search
Returns:
an array (not necessarily ordered) of selected attribute indexes
Throws:
java.lang.Exception - if the search can't be completed

attributeList

private int[] attributeList(java.util.BitSet group)
converts a BitSet into a list of attribute indexes

Parameters:
group - the BitSet to convert
Returns:
an array of attribute indexes

checkBest

private boolean checkBest()
                   throws java.lang.Exception
checks to see if any population members in the current population are better than the best found so far. Also checks to see if the search has converged---that is there is no difference in fitness between the best and worse population member

Returns:
true is the search has converged
Throws:
java.lang.Exception - if something goes wrong

countFeatures

private int countFeatures(java.util.BitSet featureSet)
counts the number of features in a subset

Parameters:
featureSet - the feature set for which to count the features
Returns:
the number of features in the subset

generation

private void generation()
                 throws java.lang.Exception
performs a single generation---selection, crossover, and mutation

Throws:
java.lang.Exception - if an error occurs

select

private int select()
selects a population member to be considered for crossover

Returns:
the index of the selected population member

evaluatePopulation

private void evaluatePopulation(SubsetEvaluator ASEvaluator)
                         throws java.lang.Exception
evaluates an entire population. Population members are looked up in a hash table and if they are not found then they are evaluated using ASEvaluator.

Parameters:
ASEvaluator - the subset evaluator to use for evaluating population members
Throws:
java.lang.Exception - if something goes wrong during evaluation

initPopulation

private void initPopulation()
                     throws java.lang.Exception
creates random population members for the initial population. Also sets the first population member to be a start set (if any) provided by the user

Throws:
java.lang.Exception - if the population can't be created

populationStatistics

private void populationStatistics()
calculates summary statistics for the current population


scalePopulation

private void scalePopulation()
scales the raw (objective) merit of the population members


populationReport

private java.lang.String populationReport(int genNum)
generates a report on the current population

Returns:
a report as a String

printPopMember

private java.lang.String printPopMember(java.util.BitSet temp)
prints a population member as a series of attribute numbers

Parameters:
temp - the chromosome of a population member
Returns:
a population member as a String of attribute numbers

printPopChrom

private java.lang.String printPopChrom(java.util.BitSet temp)
prints a population member's chromosome

Parameters:
temp - the chromosome of a population member
Returns:
a population member's chromosome as a String

resetOptions

private void resetOptions()
reset to default values for options