weka.datagenerators
Class RDG1

java.lang.Object
  extended byweka.datagenerators.Generator
      extended byweka.datagenerators.RDG1
All Implemented Interfaces:
OptionHandler, java.io.Serializable

public class RDG1
extends Generator
implements OptionHandler, java.io.Serializable

Class to generate data randomly by producing a decision list. The decision list consists of rules. Instances are generated randomly one by one. If decision list fails to classify the current instance, a new rule according to this current instance is generated and added to the decision list.

The option -V switches on voting, which means that at the end of the generation all instances are reclassified to the class value that is supported by the most rules.

This data generator can generate 'boolean' attributes (= nominal with the values {true, false}) and numeric attributes. The rules can be 'A' or 'NOT A' for boolean values and 'B < random_value' or 'B >= random_value' for numeric values.

Valid options are:

-R num
The maximum number of attributes chosen to form a rule (default 10).

-M num
The minimum number of attributes chosen to form a rule (default 1).

-I num
The number of irrelevant attributes (default 0).

-N num
The number of numeric attributes (default 0).

-S seed
Random number seed for random function used (default 1).

-V
Flag to use voting.

Following an example of a generated dataset:
%
% weka.datagenerators.RDG1 -r expl -a 2 -c 3 -n 4 -N 1 -I 0 -M 2 -R 10 -S 2
%
relation expl

attribute a0 {false,true}
attribute a1 numeric
attribute class {c0,c1,c2}

data

true,0.496823,c0
false,0.743158,c1
false,0.408285,c1
false,0.993687,c2
%
% Number of attributes chosen as irrelevant = 0
%
% DECISIONLIST (number of rules = 3):
% RULE 0: c0 := a1 < 0.986, a0
% RULE 1: c1 := a1 < 0.95, not(a0)
% RULE 2: c2 := not(a0), a1 >= 0.562

Version:
$Revision: 1.2 $
Author:
Gabi Schmidberger (gabi@cs.waikato.ac.nz)
See Also:
Serialized Form

Nested Class Summary
private  class RDG1.RuleList
           
 
Field Summary
(package private)  boolean[] m_AttList_Irr
           
private  Instances m_DatasetFormat
           
private  int m_Debug
           
private  FastVector m_DecisionList
           
private  int m_MaxRuleSize
           
private  int m_MinRuleSize
           
private  int m_NumIrrelevant
           
private  int m_NumNumeric
           
private  java.util.Random m_Random
           
private  int m_Seed
           
private  boolean m_VoteFlag
           
 
Fields inherited from class weka.datagenerators.Generator
 
Constructor Summary
RDG1()
           
 
Method Summary
private  boolean classifyExample(Instance example)
          Tries to classify an example.
 Instances defineDataFormat()
          Initializes the format for the dataset produced.
private  Instances defineDataset(java.util.Random random)
          Returns a dataset header.
private  boolean[] defineIrrelevant(java.util.Random random)
          Defines randomly the attributes as irrelevant.
private  int[] defineNumeric(java.util.Random random)
          Chooses randomly the attributes that get datatyp numeric.
 Instance generateExample()
          Generate an example of the dataset dataset.
private  Instance generateExample(java.util.Random random, Instances format)
          Generates an example with its classvalue set to missing and binds it to the datasets.
 Instances generateExamples()
          Generate all examples of the dataset.
 Instances generateExamples(int num, java.util.Random random, Instances format)
          Generate all examples of the dataset.
 java.lang.String generateFinished()
          Compiles documentation about the data generation.
private  FastVector generateTestList(java.util.Random random, Instance example)
          Generates a new rule for the decision list and classifies the new example.
 boolean[] getAttList_Irr()
          Gets the array that defines which of the attributes are seen to be irrelevant.
 Instances getDatasetFormat()
          Gets the dataset format.
 int getMaxRuleSize()
          Gets the maximum number of tests in rules.
 int getMinRuleSize()
          Gets the minimum number of tests in rules.
 int getNumIrrelevant()
          Gets the number of irrelevant attributes.
 int getNumNumeric()
          Gets the number of numerical attributes.
 java.lang.String[] getOptions()
          Gets the current settings of the datagenerator RDG1.
 java.util.Random getRandom()
          Gets the random generator.
 int getSeed()
          Gets the random number seed.
 boolean getSingleModeFlag()
          Gets the single mode flag.
 boolean getVoteFlag()
          Gets the vote flag.
 java.lang.String globalInfo()
          Returns a string describing this data generator.
 java.util.Enumeration listOptions()
          Returns an enumeration describing the available options.
static void main(java.lang.String[] argv)
          Main method for testing this class.
 void setAttList_Irr(boolean[] newAttList_Irr)
          Sets the array that defines which of the attributes are seen to be irrelevant.
 void setDatasetFormat(Instances newDatasetFormat)
          Sets the dataset format.
 void setMaxRuleSize(int newMaxRuleSize)
          Sets the maximum number of tests in rules.
 void setMinRuleSize(int newMinRuleSize)
          Sets the minimum number of tests in rules.
 void setNumIrrelevant(int newNumIrrelevant)
          Sets the number of irrelevant attributes.
 void setNumNumeric(int newNumNumeric)
          Sets the number of numerical attributes.
 void setOptions(java.lang.String[] options)
          Parses a list of options for this object.
 void setRandom(java.util.Random newRandom)
          Sets the random generator.
 void setSeed(int newSeed)
          Sets the random number seed.
 void setVoteFlag(boolean newVoteFlag)
          Sets the vote flag.
private  Instance updateDecisionList(java.util.Random random, Instance example)
          Generates a new rule for the decision list.
private  Instances voteDataset(Instances dataset)
          Resets the class values of all instances using voting.
private  Instance votedReclassifyExample(Instance example)
          Classify example with maximum vote the following way.
 
Methods inherited from class weka.datagenerators.Generator
getDebug, getFormat, getNumAttributes, getNumClasses, getNumExamples, getNumExamplesAct, getOutput, getRelationName, makeData, setDebug, setFormat, setNumAttributes, setNumClasses, setNumExamples, setNumExamplesAct, setOutput, setRelationName, toStringFormat
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

m_MaxRuleSize

private int m_MaxRuleSize

m_MinRuleSize

private int m_MinRuleSize

m_NumIrrelevant

private int m_NumIrrelevant

m_NumNumeric

private int m_NumNumeric

m_Seed

private int m_Seed

m_VoteFlag

private boolean m_VoteFlag

m_DatasetFormat

private Instances m_DatasetFormat

m_Random

private java.util.Random m_Random

m_DecisionList

private FastVector m_DecisionList

m_AttList_Irr

boolean[] m_AttList_Irr

m_Debug

private int m_Debug
Constructor Detail

RDG1

public RDG1()
Method Detail

globalInfo

public java.lang.String globalInfo()
Returns a string describing this data generator.

Returns:
a description of the data generator suitable for displaying in the explorer/experimenter gui

listOptions

public java.util.Enumeration listOptions()
Returns an enumeration describing the available options.

Specified by:
listOptions in interface OptionHandler
Returns:
an enumeration of all the available options

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a list of options for this object.

For list of valid options see class description.

Specified by:
setOptions in interface OptionHandler
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

getOptions

public java.lang.String[] getOptions()
Gets the current settings of the datagenerator RDG1.

Specified by:
getOptions in interface OptionHandler
Returns:
an array of strings suitable for passing to setOptions

getRandom

public java.util.Random getRandom()
Gets the random generator.

Returns:
the random generator

setRandom

public void setRandom(java.util.Random newRandom)
Sets the random generator.

Parameters:
newRandom - is the random generator.

getMaxRuleSize

public int getMaxRuleSize()
Gets the maximum number of tests in rules.

Returns:
the maximum number of tests allowed in rules

setMaxRuleSize

public void setMaxRuleSize(int newMaxRuleSize)
Sets the maximum number of tests in rules.

Parameters:
newMaxRuleSize - new maximum number of tests allowed in rules.

getMinRuleSize

public int getMinRuleSize()
Gets the minimum number of tests in rules.

Returns:
the minimum number of tests allowed in rules

setMinRuleSize

public void setMinRuleSize(int newMinRuleSize)
Sets the minimum number of tests in rules.

Parameters:
newMinRuleSize - new minimum number of test in rules.

getNumIrrelevant

public int getNumIrrelevant()
Gets the number of irrelevant attributes.

Returns:
the number of irrelevant attributes

setNumIrrelevant

public void setNumIrrelevant(int newNumIrrelevant)
Sets the number of irrelevant attributes.


getNumNumeric

public int getNumNumeric()
Gets the number of numerical attributes.

Returns:
the number of numerical attributes.

setNumNumeric

public void setNumNumeric(int newNumNumeric)
Sets the number of numerical attributes.


getVoteFlag

public boolean getVoteFlag()
Gets the vote flag.

Returns:
voting flag.

setVoteFlag

public void setVoteFlag(boolean newVoteFlag)
Sets the vote flag.

Parameters:
newVoteFlag - boolean with the new setting of the vote flag.

getSingleModeFlag

public boolean getSingleModeFlag()
Gets the single mode flag.

Specified by:
getSingleModeFlag in class Generator
Returns:
true if methode generateExample can be used.

getSeed

public int getSeed()
Gets the random number seed.

Returns:
the random number seed.

setSeed

public void setSeed(int newSeed)
Sets the random number seed.

Parameters:
newSeed - the new random number seed.

getDatasetFormat

public Instances getDatasetFormat()
Gets the dataset format.

Returns:
the dataset format.

setDatasetFormat

public void setDatasetFormat(Instances newDatasetFormat)
Sets the dataset format.

Parameters:
newDatasetFormat - the new dataset format.

getAttList_Irr

public boolean[] getAttList_Irr()
Gets the array that defines which of the attributes are seen to be irrelevant.

Returns:
the array that defines the irrelevant attributes

setAttList_Irr

public void setAttList_Irr(boolean[] newAttList_Irr)
Sets the array that defines which of the attributes are seen to be irrelevant.

Parameters:
newAttList_Irr - array that defines the irrelevant attributes.

defineDataFormat

public Instances defineDataFormat()
                           throws java.lang.Exception
Initializes the format for the dataset produced.

Specified by:
defineDataFormat in class Generator
Returns:
the output data format
Throws:
java.lang.Exception - data format could not be defined

generateExample

public Instance generateExample()
                         throws java.lang.Exception
Generate an example of the dataset dataset.

Specified by:
generateExample in class Generator
Returns:
the instance generated
Throws:
java.lang.Exception - if format not defined or generating
examples one by one is not possible, because voting is chosen

generateExamples

public Instances generateExamples()
                           throws java.lang.Exception
Generate all examples of the dataset.

Specified by:
generateExamples in class Generator
Returns:
the instance generated
Throws:
java.lang.Exception - if format not defined or generating
examples one by one is not possible, because voting is chosen

generateExamples

public Instances generateExamples(int num,
                                  java.util.Random random,
                                  Instances format)
                           throws java.lang.Exception
Generate all examples of the dataset.

Returns:
the instance generated
Throws:
java.lang.Exception - if format not defined or generating
examples one by one is not possible, because voting is chosen

updateDecisionList

private Instance updateDecisionList(java.util.Random random,
                                    Instance example)
                             throws java.lang.Exception
Generates a new rule for the decision list. and classifies the new example

Parameters:
random - random number generator
example - example used to update decision list
Throws:
java.lang.Exception

generateTestList

private FastVector generateTestList(java.util.Random random,
                                    Instance example)
                             throws java.lang.Exception
Generates a new rule for the decision list and classifies the new example.

Parameters:
random - random number generator
example -
Throws:
java.lang.Exception

generateExample

private Instance generateExample(java.util.Random random,
                                 Instances format)
                          throws java.lang.Exception
Generates an example with its classvalue set to missing and binds it to the datasets.

Parameters:
random - random number generator
Throws:
java.lang.Exception

classifyExample

private boolean classifyExample(Instance example)
                         throws java.lang.Exception
Tries to classify an example.

Parameters:
example -
Throws:
java.lang.Exception

votedReclassifyExample

private Instance votedReclassifyExample(Instance example)
                                 throws java.lang.Exception
Classify example with maximum vote the following way. With every rule in the decisionlist, it is evaluated if the given instance could be the class of the rule. Finally the class value that receives the highest number of votes is assigned to the example.

Parameters:
example - example to be reclassified
Returns:
instance with new class value
Throws:
java.lang.Exception

defineDataset

private Instances defineDataset(java.util.Random random)
                         throws java.lang.Exception
Returns a dataset header.

Parameters:
random - random number generator
Returns:
dataset header
Throws:
java.lang.Exception

defineIrrelevant

private boolean[] defineIrrelevant(java.util.Random random)
Defines randomly the attributes as irrelevant. Number of attributes to be set as irrelevant is either set with a preceeding call of setNumIrrelevant() or is per default 0.

Parameters:
random -
Returns:
list of boolean values with one value for each attribute, and each value set true or false according to if the corresponding attribute was defined irrelevant or not

defineNumeric

private int[] defineNumeric(java.util.Random random)
Chooses randomly the attributes that get datatyp numeric.

Parameters:
random -
Returns:
list of integer values, with one value for each attribute, and each value set to Attribut.NOMINAL or Attribut.NUMERIC

generateFinished

public java.lang.String generateFinished()
                                  throws java.lang.Exception
Compiles documentation about the data generation. This is the number of irrelevant attributes and the decisionlist with all rules. Considering that the decisionlist might get enhanced until the last instance is generated, this method should be called at the end of the data generation process.

Specified by:
generateFinished in class Generator
Returns:
string with additional information about generated dataset
Throws:
java.lang.Exception - no input structure has been defined

voteDataset

private Instances voteDataset(Instances dataset)
                       throws java.lang.Exception
Resets the class values of all instances using voting. For each instance the class value that satisfies the most rules is choosen as new class value.

Parameters:
dataset -
Returns:
the changed instances
Throws:
java.lang.Exception

main

public static void main(java.lang.String[] argv)
Main method for testing this class.

Parameters:
argv - should contain arguments for the data producer: