weka.core
Class Instances

java.lang.Object
  extended byweka.core.Instances
All Implemented Interfaces:
java.io.Serializable
Direct Known Subclasses:
ClassRemoveableInstances, IndividualInstances, NNge.Exemplar, ReferenceInstances

public class Instances
extends java.lang.Object
implements java.io.Serializable

Class for handling an ordered set of weighted instances.

Typical usage (code from the main() method of this class):

...
// Read all the instances in the file
reader = new FileReader(filename);
instances = new Instances(reader);

// Make the last attribute be the class
instances.setClassIndex(instances.numAttributes() - 1);

// Print header and instances.
System.out.println("\nDataset:\n");
System.out.println(instances);

...

All methods that change a set of instances are safe, ie. a change of a set of instances does not affect any other sets of instances. All methods that change a datasets's attribute information clone the dataset before it is changed.

Version:
$Revision: 1.49 $
Author:
Eibe Frank (eibe@cs.waikato.ac.nz), Len Trigg (trigg@cs.waikato.ac.nz)
See Also:
Serialized Form

Field Summary
(package private) static java.lang.String ARFF_DATA
          The keyword used to denote the start of the arff data section
(package private) static java.lang.String ARFF_RELATION
          The keyword used to denote the start of an arff header
private  boolean classRemoved
          Should be set to true, if the class-attribute was removed, and to false, if the class-attribute was added again.
static java.lang.String FILE_EXTENSION
          The filename extension that should be used for arff files
private  int lastRemoved
          Keeps the index of the last removed attribute position.
protected  FastVector m_Attributes
          The attribute information.
protected  int m_ClassIndex
          The class attribute's index
protected  int[] m_IndicesBuffer
          Buffer of indices for sparse instance
protected  FastVector m_Instances
          The instances.
protected  java.lang.String m_RelationName
          The dataset's name.
protected  double[] m_ValueBuffer
          Buffer of values for sparse instance
 
Constructor Summary
Instances(Instances dataset)
          Constructor copying all instances and references to the header information from the given set of instances.
Instances(Instances dataset, int capacity)
          Constructor creating an empty set of instances.
Instances(Instances source, int first, int toCopy)
          Creates a new set of instances by copying a subset of another set.
Instances(java.io.Reader reader)
          Reads an ARFF file from a reader, and assigns a weight of one to each instance.
Instances(java.io.Reader reader, int capacity)
          Reads the header of an ARFF file from a reader and reserves space for the given number of instances.
Instances(java.lang.String name, FastVector attInfo, int capacity)
          Creates an empty set of instances.
 
Method Summary
 void add(Instance instance)
          Adds one instance to the end of the set.
 Attribute attribute(int index)
          Returns an attribute.
 Attribute attribute(java.lang.String name)
          Returns an attribute given its name.
 AttributeStats attributeStats(int index)
          Calculates summary statistics on the values that appear in this set of instances for a specified attribute.
 double[] attributeToDoubleArray(int index)
          Gets the value of all instances in this dataset for a particular attribute.
 boolean checkForStringAttributes()
          Checks for string attributes in the dataset
 boolean checkInstance(Instance instance)
          Checks if the given instance is compatible with this dataset.
 Attribute classAttribute()
          Returns the class attribute.
 int classIndex()
          Returns the class attribute's index.
 void compactify()
          Compactifies the set of instances.
private  void copyInstances(int from, Instances dest, int num)
          Copies instances from one set to the end of another one.
 void delete()
          Removes all instances from the set.
 void delete(int index)
          Removes an instance at the given position from the set.
 void deleteAttributeAt(int position)
          Deletes an attribute at the given position (0 to numAttributes() - 1).
 void deleteStringAttributes()
          Deletes all string attributes in the dataset.
 void deleteWithMissing(Attribute att)
          Removes all instances with missing values for a particular attribute from the dataset.
 void deleteWithMissing(int attIndex)
          Removes all instances with missing values for a particular attribute from the dataset.
 void deleteWithMissingClass()
          Removes all instances with a missing class value from the dataset.
 java.util.Enumeration enumerateAttributes()
          Returns an enumeration of all the attributes.
 java.util.Enumeration enumerateInstances()
          Returns an enumeration of all instances in the dataset.
 boolean equalHeaders(Instances dataset)
          Checks if two headers are equivalent.
private  void errms(java.io.StreamTokenizer tokenizer, java.lang.String theMsg)
          Throws error message with line number and last token read.
 Instance firstInstance()
          Returns the first instance in the set.
private  void freshAttributeInfo()
          Replaces the attribute information by a clone of itself.
private  void getFirstToken(java.io.StreamTokenizer tokenizer)
          Gets next token, skipping empty lines.
private  void getIndex(java.io.StreamTokenizer tokenizer)
          Gets index, checking for a premature and of line.
protected  boolean getInstance(java.io.StreamTokenizer tokenizer, boolean flag)
          Reads a single instance using the tokenizer and appends it to the dataset.
protected  boolean getInstanceFull(java.io.StreamTokenizer tokenizer, boolean flag)
          Reads a single instance using the tokenizer and appends it to the dataset.
protected  boolean getInstanceSparse(java.io.StreamTokenizer tokenizer, boolean flag)
          Reads a single instance using the tokenizer and appends it to the dataset.
private  void getLastToken(java.io.StreamTokenizer tokenizer, boolean endOfFileOk)
          Gets token and checks if its end of line.
private  void getNextToken(java.io.StreamTokenizer tokenizer)
          Gets next token, checking for a premature and of line.
 java.util.Random getRandomNumberGenerator(long seed)
          Returns a random number generator.
private  void initTokenizer(java.io.StreamTokenizer tokenizer)
          Initializes the StreamTokenizer used for reading the ARFF file.
 void insertAttributeAt(Attribute att, int position)
          Inserts an attribute at the given position (0 to numAttributes()) and sets all values to be missing.
 Instance instance(int index)
          Returns the instance at the given position.
private  java.lang.String instancesAndWeights()
          Returns string including all instances, their weights and their indices in the original dataset.
 Instance lastInstance()
          Returns the last instance in the set.
static void main(java.lang.String[] args)
          Main method for this class -- just prints a summary of a set of instances.
 double meanOrMode(Attribute att)
          Returns the mean (mode) for a numeric (nominal) attribute as a floating-point value.
 double meanOrMode(int attIndex)
          Returns the mean (mode) for a numeric (nominal) attribute as a floating-point value.
static Instances mergeInstances(Instances first, Instances second)
          Merges two sets of Instances together.
 int numAttributes()
          Returns the number of attributes.
 int numClasses()
          Returns the number of class labels.
 int numDistinctValues(Attribute att)
          Returns the number of distinct values of a given attribute.
 int numDistinctValues(int attIndex)
          Returns the number of distinct values of a given attribute.
 int numInstances()
          Returns the number of instances in the dataset.
private  void quickSort(int attIndex, int lo0, int hi0)
          Implements quicksort.
 void randomize(java.util.Random random)
          Shuffles the instances in the set so that they are ordered randomly.
protected  void readHeader(java.io.StreamTokenizer tokenizer)
          Reads and stores header of an ARFF file.
 boolean readInstance(java.io.Reader reader)
          Reads a single instance from the reader and appends it to the dataset.
private  void readTillEOL(java.io.StreamTokenizer tokenizer)
          Reads and skips all tokens before next end of line token.
 java.lang.String relationName()
          Returns the relation's name.
 void renameAttribute(Attribute att, java.lang.String name)
          Renames an attribute.
 void renameAttribute(int att, java.lang.String name)
          Renames an attribute.
 void renameAttributeValue(Attribute att, java.lang.String val, java.lang.String name)
          Renames the value of a nominal (or string) attribute value.
 void renameAttributeValue(int att, int val, java.lang.String name)
          Renames the value of a nominal (or string) attribute value.
 Instances resample(java.util.Random random)
          Creates a new dataset of the same size using random sampling with replacement.
 Instances resampleWithWeights(java.util.Random random)
          Creates a new dataset of the same size using random sampling with replacement according to the current instance weights.
 Instances resampleWithWeights(java.util.Random random, double[] weights)
          Creates a new dataset of the same size using random sampling with replacement according to the given weight vector.
 void setClass(Attribute att)
          Sets the class attribute.
 void setClassIndex(int classIndex)
          Sets the class index of the set.
 void setRelationName(java.lang.String newName)
          Sets the relation's name.
 void sort(Attribute att)
          Sorts the instances based on an attribute.
 void sort(int attIndex)
          Sorts the instances based on an attribute.
 void stratify(int numFolds)
          Stratifies a set of instances according to its class values if the class attribute is nominal (so that afterwards a stratified cross-validation can be performed).
private  void stratStep(int numFolds)
          Help function needed for stratification of set.
 Instances stringFreeStructure()
          Create a copy of the structure, but "cleanse" string types (i.e.
 double sumOfWeights()
          Computes the sum of all the instances' weights.
private  void swap(int i, int j)
          Swaps two instances in the set.
static void test(java.lang.String[] argv)
          Method for testing this class.
 Instances testCV(int numFolds, int numFold)
          Creates the test set for one fold of a cross-validation on the dataset.
 java.lang.String toString()
          Returns the dataset as a string in ARFF format.
 java.lang.String toSummaryString()
          Generates a string summarizing the set of instances.
 Instances trainCV(int numFolds, int numFold)
          Creates the training set for one fold of a cross-validation on the dataset.
 Instances trainCV(int numFolds, int numFold, java.util.Random random)
          Creates the training set for one fold of a cross-validation on the dataset.
 double variance(Attribute att)
          Computes the variance for a numeric attribute.
 double variance(int attIndex)
          Computes the variance for a numeric attribute.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

lastRemoved

private int lastRemoved
Keeps the index of the last removed attribute position.


classRemoved

private boolean classRemoved
Should be set to true, if the class-attribute was removed, and to false, if the class-attribute was added again.


FILE_EXTENSION

public static java.lang.String FILE_EXTENSION
The filename extension that should be used for arff files


ARFF_RELATION

static java.lang.String ARFF_RELATION
The keyword used to denote the start of an arff header


ARFF_DATA

static java.lang.String ARFF_DATA
The keyword used to denote the start of the arff data section


m_RelationName

protected java.lang.String m_RelationName
The dataset's name.


m_Attributes

protected FastVector m_Attributes
The attribute information.


m_Instances

protected FastVector m_Instances
The instances.


m_ClassIndex

protected int m_ClassIndex
The class attribute's index


m_ValueBuffer

protected double[] m_ValueBuffer
Buffer of values for sparse instance


m_IndicesBuffer

protected int[] m_IndicesBuffer
Buffer of indices for sparse instance

Constructor Detail

Instances

public Instances(java.io.Reader reader)
          throws java.io.IOException
Reads an ARFF file from a reader, and assigns a weight of one to each instance. Lets the index of the class attribute be undefined (negative).

Parameters:
reader - the reader
Throws:
java.io.IOException - if the ARFF file is not read successfully

Instances

public Instances(java.io.Reader reader,
                 int capacity)
          throws java.io.IOException
Reads the header of an ARFF file from a reader and reserves space for the given number of instances. Lets the class index be undefined (negative).

Parameters:
reader - the reader
capacity - the capacity
Throws:
java.lang.IllegalArgumentException - if the header is not read successfully or the capacity is negative.
java.io.IOException - if there is a problem with the reader.

Instances

public Instances(Instances dataset)
Constructor copying all instances and references to the header information from the given set of instances.


Instances

public Instances(Instances dataset,
                 int capacity)
Constructor creating an empty set of instances. Copies references to the header information from the given set of instances. Sets the capacity of the set of instances to 0 if its negative.

Parameters:
capacity - the capacity of the new dataset

Instances

public Instances(Instances source,
                 int first,
                 int toCopy)
Creates a new set of instances by copying a subset of another set.

Parameters:
source - the set of instances from which a subset is to be created
first - the index of the first instance to be copied
toCopy - the number of instances to be copied
Throws:
java.lang.IllegalArgumentException - if first and toCopy are out of range

Instances

public Instances(java.lang.String name,
                 FastVector attInfo,
                 int capacity)
Creates an empty set of instances. Uses the given attribute information. Sets the capacity of the set of instances to 0 if its negative. Given attribute information must not be changed after this constructor has been used.

Parameters:
name - the name of the relation
attInfo - the attribute information
capacity - the capacity of the set
Method Detail

stringFreeStructure

public Instances stringFreeStructure()
Create a copy of the structure, but "cleanse" string types (i.e. doesn't contain references to the strings seen in the past).

Returns:
a copy of the instance structure.

add

public final void add(Instance instance)
Adds one instance to the end of the set. Shallow copies instance before it is added. Increases the size of the dataset if it is not large enough. Does not check if the instance is compatible with the dataset.

Parameters:
instance - the instance to be added

attribute

public final Attribute attribute(int index)
Returns an attribute.

Parameters:
index - the attribute's index
Returns:
the attribute at the given position

attribute

public final Attribute attribute(java.lang.String name)
Returns an attribute given its name. If there is more than one attribute with the same name, it returns the first one. Returns null if the attribute can't be found.

Parameters:
name - the attribute's name
Returns:
the attribute with the given name, null if the attribute can't be found

checkForStringAttributes

public boolean checkForStringAttributes()
Checks for string attributes in the dataset

Returns:
true if string attributes are present, false otherwise

checkInstance

public final boolean checkInstance(Instance instance)
Checks if the given instance is compatible with this dataset. Only looks at the size of the instance and the ranges of the values for nominal and string attributes.

Returns:
true if the instance is compatible with the dataset

classAttribute

public final Attribute classAttribute()
Returns the class attribute.

Returns:
the class attribute
Throws:
UnassignedClassException - if the class is not set

classIndex

public final int classIndex()
Returns the class attribute's index. Returns negative number if it's undefined.

Returns:
the class index as an integer

compactify

public final void compactify()
Compactifies the set of instances. Decreases the capacity of the set so that it matches the number of instances in the set.


delete

public final void delete()
Removes all instances from the set.


delete

public final void delete(int index)
Removes an instance at the given position from the set.

Parameters:
index - the instance's position

deleteStringAttributes

public void deleteStringAttributes()
Deletes all string attributes in the dataset. A deep copy of the attribute information is performed before an attribute is deleted.

Throws:
java.lang.IllegalArgumentException - if string attribute couldn't be successfully deleted (probably because it is the class attribute).

deleteWithMissing

public final void deleteWithMissing(int attIndex)
Removes all instances with missing values for a particular attribute from the dataset.

Parameters:
attIndex - the attribute's index

deleteWithMissing

public final void deleteWithMissing(Attribute att)
Removes all instances with missing values for a particular attribute from the dataset.

Parameters:
att - the attribute

deleteWithMissingClass

public final void deleteWithMissingClass()
Removes all instances with a missing class value from the dataset.

Throws:
UnassignedClassException - if class is not set

enumerateAttributes

public java.util.Enumeration enumerateAttributes()
Returns an enumeration of all the attributes.

Returns:
enumeration of all the attributes.

enumerateInstances

public final java.util.Enumeration enumerateInstances()
Returns an enumeration of all instances in the dataset.

Returns:
enumeration of all instances in the dataset

equalHeaders

public final boolean equalHeaders(Instances dataset)
Checks if two headers are equivalent.

Parameters:
dataset - another dataset
Returns:
true if the header of the given dataset is equivalent to this header

firstInstance

public final Instance firstInstance()
Returns the first instance in the set.

Returns:
the first instance in the set

getRandomNumberGenerator

public java.util.Random getRandomNumberGenerator(long seed)
Returns a random number generator. The initial seed of the random number generator depends on the given seed and the hash code of a string representation of a instances chosen based on the given seed.

Parameters:
seed - the given seed
Returns:
the random number generator

instance

public final Instance instance(int index)
Returns the instance at the given position.

Parameters:
index - the instance's index
Returns:
the instance at the given position

lastInstance

public final Instance lastInstance()
Returns the last instance in the set.

Returns:
the last instance in the set

meanOrMode

public final double meanOrMode(int attIndex)
Returns the mean (mode) for a numeric (nominal) attribute as a floating-point value. Returns 0 if the attribute is neither nominal nor numeric. If all values are missing it returns zero.

Parameters:
attIndex - the attribute's index
Returns:
the mean or the mode

meanOrMode

public final double meanOrMode(Attribute att)
Returns the mean (mode) for a numeric (nominal) attribute as a floating-point value. Returns 0 if the attribute is neither nominal nor numeric. If all values are missing it returns zero.

Parameters:
att - the attribute
Returns:
the mean or the mode

numAttributes

public final int numAttributes()
Returns the number of attributes.

Returns:
the number of attributes as an integer

numClasses

public final int numClasses()
Returns the number of class labels.

Returns:
the number of class labels as an integer if the class attribute is nominal, 1 otherwise.
Throws:
UnassignedClassException - if the class is not set

numDistinctValues

public final int numDistinctValues(int attIndex)
Returns the number of distinct values of a given attribute. Returns the number of instances if the attribute is a string attribute. The value 'missing' is not counted.

Parameters:
attIndex - the attribute
Returns:
the number of distinct values of a given attribute

numDistinctValues

public final int numDistinctValues(Attribute att)
Returns the number of distinct values of a given attribute. Returns the number of instances if the attribute is a string attribute. The value 'missing' is not counted.

Parameters:
att - the attribute
Returns:
the number of distinct values of a given attribute

numInstances

public final int numInstances()
Returns the number of instances in the dataset.

Returns:
the number of instances in the dataset as an integer

randomize

public final void randomize(java.util.Random random)
Shuffles the instances in the set so that they are ordered randomly.

Parameters:
random - a random number generator

readInstance

public final boolean readInstance(java.io.Reader reader)
                           throws java.io.IOException
Reads a single instance from the reader and appends it to the dataset. Automatically expands the dataset if it is not large enough to hold the instance. This method does not check for carriage return at the end of the line.

Parameters:
reader - the reader
Returns:
false if end of file has been reached
Throws:
java.io.IOException - if the information is not read successfully

relationName

public final java.lang.String relationName()
Returns the relation's name.

Returns:
the relation's name as a string

renameAttribute

public final void renameAttribute(int att,
                                  java.lang.String name)
Renames an attribute. This change only affects this dataset.

Parameters:
att - the attribute's index
name - the new name

renameAttribute

public final void renameAttribute(Attribute att,
                                  java.lang.String name)
Renames an attribute. This change only affects this dataset.

Parameters:
att - the attribute
name - the new name

renameAttributeValue

public final void renameAttributeValue(int att,
                                       int val,
                                       java.lang.String name)
Renames the value of a nominal (or string) attribute value. This change only affects this dataset.

Parameters:
att - the attribute's index
val - the value's index
name - the new name

renameAttributeValue

public final void renameAttributeValue(Attribute att,
                                       java.lang.String val,
                                       java.lang.String name)
Renames the value of a nominal (or string) attribute value. This change only affects this dataset.

Parameters:
att - the attribute
val - the value
name - the new name

resample

public final Instances resample(java.util.Random random)
Creates a new dataset of the same size using random sampling with replacement.

Parameters:
random - a random number generator
Returns:
the new dataset

resampleWithWeights

public final Instances resampleWithWeights(java.util.Random random)
Creates a new dataset of the same size using random sampling with replacement according to the current instance weights. The weights of the instances in the new dataset are set to one.

Parameters:
random - a random number generator
Returns:
the new dataset

resampleWithWeights

public final Instances resampleWithWeights(java.util.Random random,
                                           double[] weights)
Creates a new dataset of the same size using random sampling with replacement according to the given weight vector. The weights of the instances in the new dataset are set to one. The length of the weight vector has to be the same as the number of instances in the dataset, and all weights have to be positive.

Parameters:
random - a random number generator
weights - the weight vector
Returns:
the new dataset
Throws:
java.lang.IllegalArgumentException - if the weights array is of the wrong length or contains negative weights.

setClass

public final void setClass(Attribute att)
Sets the class attribute.

Parameters:
att - attribute to be the class

setClassIndex

public final void setClassIndex(int classIndex)
Sets the class index of the set. If the class index is negative there is assumed to be no class. (ie. it is undefined)

Parameters:
classIndex - the new class index
Throws:
java.lang.IllegalArgumentException - if the class index is too big or < 0

setRelationName

public final void setRelationName(java.lang.String newName)
Sets the relation's name.

Parameters:
newName - the new relation name.

sort

public final void sort(int attIndex)
Sorts the instances based on an attribute. For numeric attributes, instances are sorted in ascending order. For nominal attributes, instances are sorted based on the attribute label ordering specified in the header. Instances with missing values for the attribute are placed at the end of the dataset.

Parameters:
attIndex - the attribute's index

sort

public final void sort(Attribute att)
Sorts the instances based on an attribute. For numeric attributes, instances are sorted into ascending order. For nominal attributes, instances are sorted based on the attribute label ordering specified in the header. Instances with missing values for the attribute are placed at the end of the dataset.

Parameters:
att - the attribute

stratify

public final void stratify(int numFolds)
Stratifies a set of instances according to its class values if the class attribute is nominal (so that afterwards a stratified cross-validation can be performed).

Parameters:
numFolds - the number of folds in the cross-validation
Throws:
UnassignedClassException - if the class is not set

sumOfWeights

public final double sumOfWeights()
Computes the sum of all the instances' weights.

Returns:
the sum of all the instances' weights as a double

testCV

public Instances testCV(int numFolds,
                        int numFold)
Creates the test set for one fold of a cross-validation on the dataset.

Parameters:
numFolds - the number of folds in the cross-validation. Must be greater than 1.
numFold - 0 for the first fold, 1 for the second, ...
Returns:
the test set as a set of weighted instances
Throws:
java.lang.IllegalArgumentException - if the number of folds is less than 2 or greater than the number of instances.

toString

public final java.lang.String toString()
Returns the dataset as a string in ARFF format. Strings are quoted if they contain whitespace characters, or if they are a question mark.

Returns:
the dataset in ARFF format as a string

trainCV

public Instances trainCV(int numFolds,
                         int numFold)
Creates the training set for one fold of a cross-validation on the dataset. The data is subsequently randomized based on the given random number generator.

Parameters:
numFolds - the number of folds in the cross-validation. Must be greater than 1.
numFold - 0 for the first fold, 1 for the second, ...
Returns:
the training set
Throws:
java.lang.IllegalArgumentException - if the number of folds is less than 2 or greater than the number of instances.

trainCV

public Instances trainCV(int numFolds,
                         int numFold,
                         java.util.Random random)
Creates the training set for one fold of a cross-validation on the dataset. The data is subsequently randomized based on the given random number generator.

Parameters:
numFolds - the number of folds in the cross-validation. Must be greater than 1.
numFold - 0 for the first fold, 1 for the second, ...
random - the random number generator
Returns:
the training set
Throws:
java.lang.IllegalArgumentException - if the number of folds is less than 2 or greater than the number of instances.

variance

public final double variance(int attIndex)
Computes the variance for a numeric attribute.

Parameters:
attIndex - the numeric attribute
Returns:
the variance if the attribute is numeric
Throws:
java.lang.IllegalArgumentException - if the attribute is not numeric

variance

public final double variance(Attribute att)
Computes the variance for a numeric attribute.

Parameters:
att - the numeric attribute
Returns:
the variance if the attribute is numeric
Throws:
java.lang.IllegalArgumentException - if the attribute is not numeric

attributeStats

public AttributeStats attributeStats(int index)
Calculates summary statistics on the values that appear in this set of instances for a specified attribute.

Parameters:
index - the index of the attribute to summarize.
Returns:
an AttributeStats object with it's fields calculated.

attributeToDoubleArray

public double[] attributeToDoubleArray(int index)
Gets the value of all instances in this dataset for a particular attribute. Useful in conjunction with Utils.sort to allow iterating through the dataset in sorted order for some attribute.

Parameters:
index - the index of the attribute.
Returns:
an array containing the value of the desired attribute for each instance in the dataset.

toSummaryString

public java.lang.String toSummaryString()
Generates a string summarizing the set of instances. Gives a breakdown for each attribute indicating the number of missing/discrete/unique values and other information.

Returns:
a string summarizing the dataset

getInstance

protected boolean getInstance(java.io.StreamTokenizer tokenizer,
                              boolean flag)
                       throws java.io.IOException
Reads a single instance using the tokenizer and appends it to the dataset. Automatically expands the dataset if it is not large enough to hold the instance.

Parameters:
tokenizer - the tokenizer to be used
flag - if method should test for carriage return after each instance
Returns:
false if end of file has been reached
Throws:
java.io.IOException - if the information is not read successfully

getInstanceSparse

protected boolean getInstanceSparse(java.io.StreamTokenizer tokenizer,
                                    boolean flag)
                             throws java.io.IOException
Reads a single instance using the tokenizer and appends it to the dataset. Automatically expands the dataset if it is not large enough to hold the instance.

Parameters:
tokenizer - the tokenizer to be used
flag - if method should test for carriage return after each instance
Returns:
false if end of file has been reached
Throws:
java.io.IOException - if the information is not read successfully

getInstanceFull

protected boolean getInstanceFull(java.io.StreamTokenizer tokenizer,
                                  boolean flag)
                           throws java.io.IOException
Reads a single instance using the tokenizer and appends it to the dataset. Automatically expands the dataset if it is not large enough to hold the instance.

Parameters:
tokenizer - the tokenizer to be used
flag - if method should test for carriage return after each instance
Returns:
false if end of file has been reached
Throws:
java.io.IOException - if the information is not read successfully

readHeader

protected void readHeader(java.io.StreamTokenizer tokenizer)
                   throws java.io.IOException
Reads and stores header of an ARFF file.

Parameters:
tokenizer - the stream tokenizer
Throws:
java.io.IOException - if the information is not read successfully

copyInstances

private void copyInstances(int from,
                           Instances dest,
                           int num)
Copies instances from one set to the end of another one.

Parameters:
from - the position of the first instance to be copied
dest - the destination for the instances
num - the number of instances to be copied

errms

private void errms(java.io.StreamTokenizer tokenizer,
                   java.lang.String theMsg)
            throws java.io.IOException
Throws error message with line number and last token read.

Parameters:
theMsg - the error message to be thrown
tokenizer - the stream tokenizer
Throws:
IOExcpetion - containing the error message
java.io.IOException

freshAttributeInfo

private void freshAttributeInfo()
Replaces the attribute information by a clone of itself.


getFirstToken

private void getFirstToken(java.io.StreamTokenizer tokenizer)
                    throws java.io.IOException
Gets next token, skipping empty lines.

Parameters:
tokenizer - the stream tokenizer
Throws:
java.io.IOException - if reading the next token fails

getIndex

private void getIndex(java.io.StreamTokenizer tokenizer)
               throws java.io.IOException
Gets index, checking for a premature and of line.

Parameters:
tokenizer - the stream tokenizer
Throws:
java.io.IOException - if it finds a premature end of line

getLastToken

private void getLastToken(java.io.StreamTokenizer tokenizer,
                          boolean endOfFileOk)
                   throws java.io.IOException
Gets token and checks if its end of line.

Parameters:
tokenizer - the stream tokenizer
Throws:
java.io.IOException - if it doesn't find an end of line

getNextToken

private void getNextToken(java.io.StreamTokenizer tokenizer)
                   throws java.io.IOException
Gets next token, checking for a premature and of line.

Parameters:
tokenizer - the stream tokenizer
Throws:
java.io.IOException - if it finds a premature end of line

initTokenizer

private void initTokenizer(java.io.StreamTokenizer tokenizer)
Initializes the StreamTokenizer used for reading the ARFF file.

Parameters:
tokenizer - the stream tokenizer

instancesAndWeights

private java.lang.String instancesAndWeights()
Returns string including all instances, their weights and their indices in the original dataset.

Returns:
description of instance and its weight as a string

quickSort

private void quickSort(int attIndex,
                       int lo0,
                       int hi0)
Implements quicksort.

Parameters:
attIndex - the attribute's index
lo0 - the first index of the subset to be sorted
hi0 - the last index of the subset to be sorted

readTillEOL

private void readTillEOL(java.io.StreamTokenizer tokenizer)
                  throws java.io.IOException
Reads and skips all tokens before next end of line token.

Parameters:
tokenizer - the stream tokenizer
Throws:
java.io.IOException

stratStep

private void stratStep(int numFolds)
Help function needed for stratification of set.

Parameters:
numFolds - the number of folds for the stratification

swap

private void swap(int i,
                  int j)
Swaps two instances in the set.

Parameters:
i - the first instance's index
j - the second instance's index

mergeInstances

public static Instances mergeInstances(Instances first,
                                       Instances second)
Merges two sets of Instances together. The resulting set will have all the attributes of the first set plus all the attributes of the second set. The number of instances in both sets must be the same.

Parameters:
first - the first set of Instances
second - the second set of Instances
Returns:
the merged set of Instances
Throws:
java.lang.IllegalArgumentException - if the datasets are not the same size

test

public static void test(java.lang.String[] argv)
Method for testing this class.

Parameters:
argv - should contain one element: the name of an ARFF file

main

public static void main(java.lang.String[] args)
Main method for this class -- just prints a summary of a set of instances.

Parameters:
args - should contain one element: the name of an ARFF file

deleteAttributeAt

public void deleteAttributeAt(int position)
Deletes an attribute at the given position (0 to numAttributes() - 1). A deep copy of the attribute information is performed before the attribute is deleted. Allows deletion of the class-attribute: Keeps track of the currently removed position by setting lastRemoved to position, and sets classRemoved to position == m_ClassIndex.

Parameters:
position - the attribute's position
Throws:
java.lang.IllegalArgumentException - if the given index is out of range

insertAttributeAt

public void insertAttributeAt(Attribute att,
                              int position)
Inserts an attribute at the given position (0 to numAttributes()) and sets all values to be missing. Shallow copies the attribute before it is inserted, and performs a deep copy of the existing attribute information. If the class-attribute was removed and position equals lastRemoved, m_ClassIndex is set to position and classRemoved is set to false.

Parameters:
att - the attribute to be inserted
position - the attribute's position
Throws:
java.lang.IllegalArgumentException - if the given index is out of range