weka.core.converters
Class CSVLoader

java.lang.Object
  extended byweka.core.converters.AbstractLoader
      extended byweka.core.converters.CSVLoader
All Implemented Interfaces:
BatchLoader, Loader, java.io.Serializable

public class CSVLoader
extends AbstractLoader
implements BatchLoader

Reads a text file that is comma or tab delimited..

Version:
$Revision: 1.4 $
Author:
Mark Hall (mhall@cs.waikato.ac.nz)
See Also:
Loader, Serialized Form

Field Summary
private  FastVector m_cumulativeInstances
          Holds instances accumulated so far
private  FastVector m_cumulativeStructure
          A list of hash tables for accumulating nominal values during parsing.
protected  java.io.File m_sourceFile
          Holds the source of the data set.
protected  Instances m_structure
          Holds the determined structure (header) of the data set.
 
Fields inherited from class weka.core.converters.AbstractLoader
BATCH, INCREMENTAL, m_retrieval, NONE
 
Constructor Summary
CSVLoader()
           
 
Method Summary
private  void checkStructure(FastVector current)
          Checks the current instance against what is known about the structure of the data set so far.
 Instances getDataSet()
          Return the full data set.
private  FastVector getInstance(java.io.StreamTokenizer tokenizer)
          Attempts to parse a line of the data set.
 Instance getNextInstance()
          CSVLoader is unable to process a data set incrementally.
 Instances getStructure()
          Determines and returns (if possible) the structure (internally the header) of the data set as an empty set of instances.
 java.lang.String globalInfo()
          Returns a string describing this attribute evaluator
private  void initTokenizer(java.io.StreamTokenizer tokenizer)
          Initializes the stream tokenizer
static void main(java.lang.String[] args)
          Main method.
private  void readHeader(java.io.StreamTokenizer tokenizer)
          Assumes the first line of the file contains the attribute names.
private  void readStructure(java.io.StreamTokenizer st)
           
 void reset()
          Resets the loader ready to read a new data set
 void setSource(java.io.File file)
          Resets the Loader object and sets the source of the data set to be the supplied File object.
 
Methods inherited from class weka.core.converters.AbstractLoader
getRetrieval, setRetrieval, setSource
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

m_structure

protected Instances m_structure
Holds the determined structure (header) of the data set.


m_sourceFile

protected java.io.File m_sourceFile
Holds the source of the data set.


m_cumulativeStructure

private FastVector m_cumulativeStructure
A list of hash tables for accumulating nominal values during parsing.


m_cumulativeInstances

private FastVector m_cumulativeInstances
Holds instances accumulated so far

Constructor Detail

CSVLoader

public CSVLoader()
Method Detail

globalInfo

public java.lang.String globalInfo()
Returns a string describing this attribute evaluator

Returns:
a description of the evaluator suitable for displaying in the explorer/experimenter gui

reset

public void reset()
Resets the loader ready to read a new data set


setSource

public void setSource(java.io.File file)
               throws java.io.IOException
Resets the Loader object and sets the source of the data set to be the supplied File object.

Specified by:
setSource in interface Loader
Overrides:
setSource in class AbstractLoader
Parameters:
file - the source file.
Throws:
java.io.IOException - if an error occurs

getStructure

public Instances getStructure()
                       throws java.io.IOException
Determines and returns (if possible) the structure (internally the header) of the data set as an empty set of instances.

Specified by:
getStructure in interface Loader
Specified by:
getStructure in class AbstractLoader
Returns:
the structure of the data set as an empty set of Instances
Throws:
java.io.IOException - if an error occurs

readStructure

private void readStructure(java.io.StreamTokenizer st)
                    throws java.io.IOException
Throws:
java.io.IOException

getDataSet

public Instances getDataSet()
                     throws java.io.IOException
Return the full data set. If the structure hasn't yet been determined by a call to getStructure then method should do so before processing the rest of the data set.

Specified by:
getDataSet in interface Loader
Specified by:
getDataSet in class AbstractLoader
Returns:
the structure of the data set as an empty set of Instances
Throws:
java.io.IOException - if there is no source or parsing fails

getNextInstance

public Instance getNextInstance()
                         throws java.io.IOException
CSVLoader is unable to process a data set incrementally.

Specified by:
getNextInstance in interface Loader
Specified by:
getNextInstance in class AbstractLoader
Returns:
never returns without throwing an exception
Throws:
java.io.IOException - always. CSVLoader is unable to process a data set incrementally.

getInstance

private FastVector getInstance(java.io.StreamTokenizer tokenizer)
                        throws java.io.IOException
Attempts to parse a line of the data set.

Parameters:
tokenizer - the tokenizer
Returns:
a FastVector containg String and Double objects representing the values of the instance.
Throws:
java.io.IOException - if an error occurs

    private_normal_behavior
      requires: tokenizer != null;
      ensures: \result  != null;
  also
    private_exceptional_behavior
      requires: tokenizer == null
                || (* unsucessful parse *);
      signals: (IOException);
 

checkStructure

private void checkStructure(FastVector current)
                     throws java.lang.Exception
Checks the current instance against what is known about the structure of the data set so far. If there is a nominal value for an attribute that was beleived to be numeric then all previously seen values for this attribute are stored in a Hashtable.

Parameters:
current - a FastVector value
Throws:
java.lang.Exception - if an error occurs

    private_normal_behavior
      requires: current != null;
  also
    private_exceptional_behavior
      requires: current == null
                || (* unrecognized object type in current *);
      signals: (Exception);
 

readHeader

private void readHeader(java.io.StreamTokenizer tokenizer)
                 throws java.io.IOException
Assumes the first line of the file contains the attribute names. Assumes all attributes are real (Reading the full data set with getDataSet will establish the true structure).

Parameters:
tokenizer - a StreamTokenizer value
Throws:
java.io.IOException - if an error occurs

    private_normal_behavior
      requires: tokenizer != null;
      modifiable: m_structure;
      ensures: m_structure != null;
  also
    private_exceptional_behavior
      requires: tokenizer == null
                || (* unsucessful parse *);
      signals: (IOException);
 

initTokenizer

private void initTokenizer(java.io.StreamTokenizer tokenizer)
Initializes the stream tokenizer

Parameters:
tokenizer - the tokenizer to initialize

main

public static void main(java.lang.String[] args)
Main method.

Parameters:
args - should contain the name of an input file.