weka.core.converters
Class C45Loader

java.lang.Object
  extended byweka.core.converters.AbstractLoader
      extended byweka.core.converters.C45Loader
All Implemented Interfaces:
BatchLoader, IncrementalLoader, Loader, java.io.Serializable

public class C45Loader
extends AbstractLoader
implements BatchLoader, IncrementalLoader

Reads C4.5 input files. Takes a filestem or filestem with .names or .data appended. Assumes that both .names and .data exist in the directory of the supplied filestem.

Version:
$Revision: 1.5 $
Author:
Mark Hall (mhall@cs.waikato.ac.nz)
See Also:
Loader, Serialized Form

Field Summary
private  java.io.Reader m_dataReader
          Reader for data file
private  java.lang.String m_fileStem
          Holds the filestem.
private  boolean[] m_ignore
          Which attributes are ignore or label.
private  java.io.Reader m_namesReader
          Reader for names file
private  int m_numAttribs
          Number of attributes in the data (including ignore and label attributes).
protected  java.io.File m_sourceFile
          Holds the source of the data set.
private  java.io.File m_sourceFileData
          Describe variable m_sourceFileData here.
protected  Instances m_structure
          Holds the determined structure (header) of the data set.
 
Fields inherited from class weka.core.converters.AbstractLoader
BATCH, INCREMENTAL, m_retrieval, NONE
 
Constructor Summary
C45Loader()
           
 
Method Summary
 Instances getDataSet()
          Return the full data set.
private  Instance getInstance(java.io.StreamTokenizer tokenizer)
          Reads an instance using the supplied tokenizer.
 Instance getNextInstance()
          Read the data set incrementally---get the next instance in the data set or returns null if there are no more instances to get.
 Instances getStructure()
          Determines and returns (if possible) the structure (internally the header) of the data set as an empty set of instances.
 java.lang.String globalInfo()
          Returns a string describing this attribute evaluator
private  void initTokenizer(java.io.StreamTokenizer tokenizer)
          Initializes the stream tokenizer
static void main(java.lang.String[] args)
          Main method for testing this class.
private  void readHeader(java.io.StreamTokenizer tokenizer)
          Reads header (from the names file) using the supplied tokenizer
private  java.lang.String removeTrailingPeriod(java.lang.String val)
           
 void reset()
          Resets the Loader ready to read a new data set
 void setSource(java.io.File file)
          Resets the Loader object and sets the source of the data set to be the supplied File object.
 
Methods inherited from class weka.core.converters.AbstractLoader
getRetrieval, setRetrieval, setSource
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

m_structure

protected Instances m_structure
Holds the determined structure (header) of the data set.


m_sourceFile

protected java.io.File m_sourceFile
Holds the source of the data set. In this case the names file of the data set. m_sourceFileData is the data file.


m_sourceFileData

private java.io.File m_sourceFileData
Describe variable m_sourceFileData here.


m_namesReader

private transient java.io.Reader m_namesReader
Reader for names file


m_dataReader

private transient java.io.Reader m_dataReader
Reader for data file


m_fileStem

private java.lang.String m_fileStem
Holds the filestem.


m_numAttribs

private int m_numAttribs
Number of attributes in the data (including ignore and label attributes).


m_ignore

private boolean[] m_ignore
Which attributes are ignore or label. These are *not* included in the arff representation.

Constructor Detail

C45Loader

public C45Loader()
Method Detail

globalInfo

public java.lang.String globalInfo()
Returns a string describing this attribute evaluator

Returns:
a description of the evaluator suitable for displaying in the explorer/experimenter gui

reset

public void reset()
Resets the Loader ready to read a new data set


setSource

public void setSource(java.io.File file)
               throws java.io.IOException
Resets the Loader object and sets the source of the data set to be the supplied File object.

Specified by:
setSource in interface Loader
Overrides:
setSource in class AbstractLoader
Parameters:
file - the source file.
Throws:
java.io.IOException - if an error occurs

getStructure

public Instances getStructure()
                       throws java.io.IOException
Determines and returns (if possible) the structure (internally the header) of the data set as an empty set of instances.

Specified by:
getStructure in interface Loader
Specified by:
getStructure in class AbstractLoader
Returns:
the structure of the data set as an empty set of Instances
Throws:
java.io.IOException - if an error occurs

getDataSet

public Instances getDataSet()
                     throws java.io.IOException
Return the full data set. If the structure hasn't yet been determined by a call to getStructure then method should do so before processing the rest of the data set.

Specified by:
getDataSet in interface Loader
Specified by:
getDataSet in class AbstractLoader
Returns:
the structure of the data set as an empty set of Instances
Throws:
java.io.IOException - if there is no source or parsing fails

getNextInstance

public Instance getNextInstance()
                         throws java.io.IOException
Read the data set incrementally---get the next instance in the data set or returns null if there are no more instances to get. If the structure hasn't yet been determined by a call to getStructure then method should do so before returning the next instance in the data set. If it is not possible to read the data set incrementally (ie. in cases where the data set structure cannot be fully established before all instances have been seen) then an exception should be thrown.

Specified by:
getNextInstance in interface Loader
Specified by:
getNextInstance in class AbstractLoader
Returns:
the next instance in the data set as an Instance object or null if there are no more instances to be read
Throws:
java.io.IOException - if there is an error during parsing

getInstance

private Instance getInstance(java.io.StreamTokenizer tokenizer)
                      throws java.io.IOException
Reads an instance using the supplied tokenizer.

Parameters:
tokenizer - the tokenizer to use
Returns:
an Instance or null if there are no more instances to read
Throws:
java.io.IOException - if an error occurs

removeTrailingPeriod

private java.lang.String removeTrailingPeriod(java.lang.String val)

readHeader

private void readHeader(java.io.StreamTokenizer tokenizer)
                 throws java.io.IOException
Reads header (from the names file) using the supplied tokenizer

Parameters:
tokenizer - the tokenizer to use
Throws:
java.io.IOException - if an error occurs

initTokenizer

private void initTokenizer(java.io.StreamTokenizer tokenizer)
Initializes the stream tokenizer

Parameters:
tokenizer - the tokenizer to initialize

main

public static void main(java.lang.String[] args)
Main method for testing this class.

Parameters:
args - should contain [.names | data]