weka.datagenerators
Class ClusterGenerator

java.lang.Object
  extended byweka.datagenerators.ClusterGenerator
All Implemented Interfaces:
java.io.Serializable
Direct Known Subclasses:
BIRCHCluster

public abstract class ClusterGenerator
extends java.lang.Object
implements java.io.Serializable

Abstract class for cluster data generators. -------------------------------------------------------------------

General options are:

-r string
Name of the relation of the generated dataset.
(default = name built using name of used generator and options)

-a num
Number of attributes. (default = 2)

-k num
Number of clusters. (default = 4)

-c
Class Flag. If set, cluster is listed in extra class attribute.

-o filename
writes the generated dataset to the given file using ARFF-Format. (default = stdout). -------------------------------------------------------------------

Example usage as the main of a datagenerator called RandomGenerator:

 public static void main(String [] args) {
   try {
     DataGenerator.makeData(new RandomGenerator(), argv);
   } catch (Exception e) {
     System.err.println(e.getMessage());
   }
 }
 

------------------------------------------------------------------

Version:
$Revision: 1.1 $
Author:
Gabi Schmidberger (gabi@cs.waikato.ac.nz)
See Also:
Serialized Form

Field Summary
private  boolean m_ClassFlag
           
private  boolean m_Debug
           
private  Instances m_Format
           
protected  int m_NumAttributes
           
protected  int m_NumClusters
           
private  int m_NumExamplesAct
           
private  java.io.PrintWriter m_Output
           
private  java.lang.String m_RelationName
           
 
Constructor Summary
ClusterGenerator()
           
 
Method Summary
(package private) abstract  Instances defineDataFormat()
          Initializes the format for the dataset produced.
(package private) abstract  Instance generateExample()
          Generates one example of the dataset.
(package private) abstract  Instances generateExamples()
          Generates all examples of the dataset.
(package private) abstract  java.lang.String generateFinished()
          Generates a comment string that documentates the data generator.
(package private) abstract  java.lang.String generateStart()
          Generates a comment string that documentates the data generator.
 boolean getClassFlag()
          Gets the class flag.
 boolean getDebug()
          Gets the debug flag.
protected  Instances getFormat()
          Gets the format of the dataset that is to be generated.
private  java.lang.String[] getGenericOptions()
          Gets the current generic settings of the datagenerator.
 int getNumAttributes()
          Gets the number of attributes that should be produced.
 int getNumClusters()
          Gets the number of clusters the dataset should have.
 int getNumExamplesAct()
          Gets the number of examples the dataset should have.
 java.io.PrintWriter getOutput()
          Gets the print writer.
 java.lang.String getRelationName()
          Gets the relation name the dataset should have.
(package private) abstract  boolean getSingleModeFlag()
          Return if single mode is set for the given data generator mode depends on option setting and or generator type.
private static java.lang.String listGenericOptions(ClusterGenerator generator)
          Method for listing generic options.
private  java.lang.String listSpecificOptions(ClusterGenerator generator)
          Makes a string with the options of the specific data generator.
static void makeData(ClusterGenerator generator, java.lang.String[] options)
          Calls the data generator.
 void setClassFlag(boolean classFlag)
          Sets the class flag, if class flag is set, the cluster is listed as class atrribute in an extra attribute.
 void setDebug(boolean debug)
          Sets the debug flag.
protected  void setFormat(Instances newFormat)
          Sets the format of the dataset that is to be generated.
 void setNumAttributes(int numAttributes)
          Sets the number of attributes the dataset should have.
 void setNumClusters(int numClusters)
          Sets the number of clusters the dataset should have.
 void setNumExamplesAct(int numExamplesAct)
          Sets the number of examples the dataset should have.
private static void setOptions(ClusterGenerator generator, java.lang.String[] options)
          Sets the generic options and specific options.
 void setOutput(java.io.PrintWriter newOutput)
          Sets the print writer.
 void setRelationName(java.lang.String relationName)
          Sets the relation name the dataset should have.
protected  java.lang.String toStringFormat()
          Returns a string representing the dataset in the instance queue.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

m_Debug

private boolean m_Debug

m_Format

private Instances m_Format

m_RelationName

private java.lang.String m_RelationName

m_NumAttributes

protected int m_NumAttributes

m_NumClusters

protected int m_NumClusters

m_ClassFlag

private boolean m_ClassFlag

m_NumExamplesAct

private int m_NumExamplesAct

m_Output

private java.io.PrintWriter m_Output
Constructor Detail

ClusterGenerator

public ClusterGenerator()
Method Detail

defineDataFormat

abstract Instances defineDataFormat()
                             throws java.lang.Exception
Initializes the format for the dataset produced. Must be called before the generateExample or generateExamples methods are used.

Returns:
the format for the dataset
Throws:
java.lang.Exception - if the generating of the format failed

generateExample

abstract Instance generateExample()
                           throws java.lang.Exception
Generates one example of the dataset.

Returns:
the generated example
Throws:
java.lang.Exception - if the format of the dataset is not yet defined
java.lang.Exception - if the generator only works with generateExamples which means in non single mode

generateExamples

abstract Instances generateExamples()
                             throws java.lang.Exception
Generates all examples of the dataset.

Returns:
the generated dataset
Throws:
java.lang.Exception - if the format of the dataset is not yet defined
java.lang.Exception - if the generator only works with generateExample, which means in single mode

generateStart

abstract java.lang.String generateStart()
                                 throws java.lang.Exception
Generates a comment string that documentates the data generator. By default this string is added at the beginning of the produced output as ARFF file type, next after the options.

Returns:
string contains info about the generated rules
Throws:
java.lang.Exception - if the generating of the documentation fails

generateFinished

abstract java.lang.String generateFinished()
                                    throws java.lang.Exception
Generates a comment string that documentates the data generator. By default this string is added at the end of the produced output as ARFF file type.

Returns:
string contains info about the generated rules
Throws:
java.lang.Exception - if the generating of the documentation fails

getSingleModeFlag

abstract boolean getSingleModeFlag()
                            throws java.lang.Exception
Return if single mode is set for the given data generator mode depends on option setting and or generator type.

Returns:
single mode flag
Throws:
java.lang.Exception - if mode is not set yet

setClassFlag

public void setClassFlag(boolean classFlag)
Sets the class flag, if class flag is set, the cluster is listed as class atrribute in an extra attribute.

Parameters:
classFlag - the new class flag

getClassFlag

public boolean getClassFlag()
Gets the class flag.

Returns:
the class flag

setDebug

public void setDebug(boolean debug)
Sets the debug flag.

Parameters:
debug - the new debug flag

getDebug

public boolean getDebug()
Gets the debug flag.

Returns:
the debug flag

setRelationName

public void setRelationName(java.lang.String relationName)
Sets the relation name the dataset should have.

Parameters:
relationName - the new relation name

getRelationName

public java.lang.String getRelationName()
Gets the relation name the dataset should have.

Returns:
the relation name the dataset should have

setNumClusters

public void setNumClusters(int numClusters)
Sets the number of clusters the dataset should have.

Parameters:
numClusters - the new number of clusters

getNumClusters

public int getNumClusters()
Gets the number of clusters the dataset should have.

Returns:
the number of clusters the dataset should have

setNumAttributes

public void setNumAttributes(int numAttributes)
Sets the number of attributes the dataset should have.

Parameters:
numAttributes - the new number of attributes

getNumAttributes

public int getNumAttributes()
Gets the number of attributes that should be produced.

Returns:
the number of attributes that should be produced

setNumExamplesAct

public void setNumExamplesAct(int numExamplesAct)
Sets the number of examples the dataset should have.

Parameters:
numExamplesAct - the new number of examples

getNumExamplesAct

public int getNumExamplesAct()
Gets the number of examples the dataset should have.

Returns:
the number of examples the dataset should have

setOutput

public void setOutput(java.io.PrintWriter newOutput)
Sets the print writer.

Parameters:
newOutput - the new print writer

getOutput

public java.io.PrintWriter getOutput()
Gets the print writer.

Returns:
print writer object

setFormat

protected void setFormat(Instances newFormat)
Sets the format of the dataset that is to be generated.


getFormat

protected Instances getFormat()
Gets the format of the dataset that is to be generated.

Returns:
the dataset format of the dataset

toStringFormat

protected java.lang.String toStringFormat()
Returns a string representing the dataset in the instance queue.

Returns:
the string representing the output data format

makeData

public static void makeData(ClusterGenerator generator,
                            java.lang.String[] options)
                     throws java.lang.Exception
Calls the data generator.

Parameters:
options - options of the data generator
Throws:
java.lang.Exception - if there was an error in the option list

listSpecificOptions

private java.lang.String listSpecificOptions(ClusterGenerator generator)
Makes a string with the options of the specific data generator.

Parameters:
generator - the datagenerator that is used
Returns:
string with the options of the data generator used

setOptions

private static void setOptions(ClusterGenerator generator,
                               java.lang.String[] options)
                        throws java.lang.Exception
Sets the generic options and specific options.

Parameters:
generator - the data generator used
options - the generic options and the specific options
Throws:
java.lang.Exception - if help request or any invalid option

listGenericOptions

private static java.lang.String listGenericOptions(ClusterGenerator generator)
Method for listing generic options.

Parameters:
generator - the data generator
Returns:
string with the generic data generator options

getGenericOptions

private java.lang.String[] getGenericOptions()
Gets the current generic settings of the datagenerator.

Returns:
an array of strings suitable for passing to setOptions