|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectweka.core.Instances
Class for handling an ordered set of weighted instances.
Typical usage (code from the main() method of this class):
...
// Read all the instances in the file
reader = new FileReader(filename);
instances = new Instances(reader);
// Make the last attribute be the class
instances.setClassIndex(instances.numAttributes() - 1);
// Print header and instances.
System.out.println("\nDataset:\n");
System.out.println(instances);
...
All methods that change a set of instances are safe, ie. a change of a set of instances does not affect any other sets of instances. All methods that change a datasets's attribute information clone the dataset before it is changed.
Field Summary | |
(package private) static java.lang.String |
ARFF_DATA
The keyword used to denote the start of the arff data section |
(package private) static java.lang.String |
ARFF_RELATION
The keyword used to denote the start of an arff header |
private boolean |
classRemoved
Should be set to true, if the class-attribute was removed, and to false, if the class-attribute was added again. |
static java.lang.String |
FILE_EXTENSION
The filename extension that should be used for arff files |
private int |
lastRemoved
Keeps the index of the last removed attribute position. |
protected FastVector |
m_Attributes
The attribute information. |
protected int |
m_ClassIndex
The class attribute's index |
protected int[] |
m_IndicesBuffer
Buffer of indices for sparse instance |
protected FastVector |
m_Instances
The instances. |
protected java.lang.String |
m_RelationName
The dataset's name. |
protected double[] |
m_ValueBuffer
Buffer of values for sparse instance |
Constructor Summary | |
Instances(Instances dataset)
Constructor copying all instances and references to the header information from the given set of instances. |
|
Instances(Instances dataset,
int capacity)
Constructor creating an empty set of instances. |
|
Instances(Instances source,
int first,
int toCopy)
Creates a new set of instances by copying a subset of another set. |
|
Instances(java.io.Reader reader)
Reads an ARFF file from a reader, and assigns a weight of one to each instance. |
|
Instances(java.io.Reader reader,
int capacity)
Reads the header of an ARFF file from a reader and reserves space for the given number of instances. |
|
Instances(java.lang.String name,
FastVector attInfo,
int capacity)
Creates an empty set of instances. |
Method Summary | |
void |
add(Instance instance)
Adds one instance to the end of the set. |
Attribute |
attribute(int index)
Returns an attribute. |
Attribute |
attribute(java.lang.String name)
Returns an attribute given its name. |
AttributeStats |
attributeStats(int index)
Calculates summary statistics on the values that appear in this set of instances for a specified attribute. |
double[] |
attributeToDoubleArray(int index)
Gets the value of all instances in this dataset for a particular attribute. |
boolean |
checkForStringAttributes()
Checks for string attributes in the dataset |
boolean |
checkInstance(Instance instance)
Checks if the given instance is compatible with this dataset. |
Attribute |
classAttribute()
Returns the class attribute. |
int |
classIndex()
Returns the class attribute's index. |
void |
compactify()
Compactifies the set of instances. |
private void |
copyInstances(int from,
Instances dest,
int num)
Copies instances from one set to the end of another one. |
void |
delete()
Removes all instances from the set. |
void |
delete(int index)
Removes an instance at the given position from the set. |
void |
deleteAttributeAt(int position)
Deletes an attribute at the given position (0 to numAttributes() - 1). |
void |
deleteStringAttributes()
Deletes all string attributes in the dataset. |
void |
deleteWithMissing(Attribute att)
Removes all instances with missing values for a particular attribute from the dataset. |
void |
deleteWithMissing(int attIndex)
Removes all instances with missing values for a particular attribute from the dataset. |
void |
deleteWithMissingClass()
Removes all instances with a missing class value from the dataset. |
java.util.Enumeration |
enumerateAttributes()
Returns an enumeration of all the attributes. |
java.util.Enumeration |
enumerateInstances()
Returns an enumeration of all instances in the dataset. |
boolean |
equalHeaders(Instances dataset)
Checks if two headers are equivalent. |
private void |
errms(java.io.StreamTokenizer tokenizer,
java.lang.String theMsg)
Throws error message with line number and last token read. |
Instance |
firstInstance()
Returns the first instance in the set. |
private void |
freshAttributeInfo()
Replaces the attribute information by a clone of itself. |
private void |
getFirstToken(java.io.StreamTokenizer tokenizer)
Gets next token, skipping empty lines. |
private void |
getIndex(java.io.StreamTokenizer tokenizer)
Gets index, checking for a premature and of line. |
protected boolean |
getInstance(java.io.StreamTokenizer tokenizer,
boolean flag)
Reads a single instance using the tokenizer and appends it to the dataset. |
protected boolean |
getInstanceFull(java.io.StreamTokenizer tokenizer,
boolean flag)
Reads a single instance using the tokenizer and appends it to the dataset. |
protected boolean |
getInstanceSparse(java.io.StreamTokenizer tokenizer,
boolean flag)
Reads a single instance using the tokenizer and appends it to the dataset. |
private void |
getLastToken(java.io.StreamTokenizer tokenizer,
boolean endOfFileOk)
Gets token and checks if its end of line. |
private void |
getNextToken(java.io.StreamTokenizer tokenizer)
Gets next token, checking for a premature and of line. |
java.util.Random |
getRandomNumberGenerator(long seed)
Returns a random number generator. |
private void |
initTokenizer(java.io.StreamTokenizer tokenizer)
Initializes the StreamTokenizer used for reading the ARFF file. |
void |
insertAttributeAt(Attribute att,
int position)
Inserts an attribute at the given position (0 to numAttributes()) and sets all values to be missing. |
Instance |
instance(int index)
Returns the instance at the given position. |
private java.lang.String |
instancesAndWeights()
Returns string including all instances, their weights and their indices in the original dataset. |
Instance |
lastInstance()
Returns the last instance in the set. |
static void |
main(java.lang.String[] args)
Main method for this class -- just prints a summary of a set of instances. |
double |
meanOrMode(Attribute att)
Returns the mean (mode) for a numeric (nominal) attribute as a floating-point value. |
double |
meanOrMode(int attIndex)
Returns the mean (mode) for a numeric (nominal) attribute as a floating-point value. |
static Instances |
mergeInstances(Instances first,
Instances second)
Merges two sets of Instances together. |
int |
numAttributes()
Returns the number of attributes. |
int |
numClasses()
Returns the number of class labels. |
int |
numDistinctValues(Attribute att)
Returns the number of distinct values of a given attribute. |
int |
numDistinctValues(int attIndex)
Returns the number of distinct values of a given attribute. |
int |
numInstances()
Returns the number of instances in the dataset. |
private void |
quickSort(int attIndex,
int lo0,
int hi0)
Implements quicksort. |
void |
randomize(java.util.Random random)
Shuffles the instances in the set so that they are ordered randomly. |
protected void |
readHeader(java.io.StreamTokenizer tokenizer)
Reads and stores header of an ARFF file. |
boolean |
readInstance(java.io.Reader reader)
Reads a single instance from the reader and appends it to the dataset. |
private void |
readTillEOL(java.io.StreamTokenizer tokenizer)
Reads and skips all tokens before next end of line token. |
java.lang.String |
relationName()
Returns the relation's name. |
void |
renameAttribute(Attribute att,
java.lang.String name)
Renames an attribute. |
void |
renameAttribute(int att,
java.lang.String name)
Renames an attribute. |
void |
renameAttributeValue(Attribute att,
java.lang.String val,
java.lang.String name)
Renames the value of a nominal (or string) attribute value. |
void |
renameAttributeValue(int att,
int val,
java.lang.String name)
Renames the value of a nominal (or string) attribute value. |
Instances |
resample(java.util.Random random)
Creates a new dataset of the same size using random sampling with replacement. |
Instances |
resampleWithWeights(java.util.Random random)
Creates a new dataset of the same size using random sampling with replacement according to the current instance weights. |
Instances |
resampleWithWeights(java.util.Random random,
double[] weights)
Creates a new dataset of the same size using random sampling with replacement according to the given weight vector. |
void |
setClass(Attribute att)
Sets the class attribute. |
void |
setClassIndex(int classIndex)
Sets the class index of the set. |
void |
setRelationName(java.lang.String newName)
Sets the relation's name. |
void |
sort(Attribute att)
Sorts the instances based on an attribute. |
void |
sort(int attIndex)
Sorts the instances based on an attribute. |
void |
stratify(int numFolds)
Stratifies a set of instances according to its class values if the class attribute is nominal (so that afterwards a stratified cross-validation can be performed). |
private void |
stratStep(int numFolds)
Help function needed for stratification of set. |
Instances |
stringFreeStructure()
Create a copy of the structure, but "cleanse" string types (i.e. |
double |
sumOfWeights()
Computes the sum of all the instances' weights. |
private void |
swap(int i,
int j)
Swaps two instances in the set. |
static void |
test(java.lang.String[] argv)
Method for testing this class. |
Instances |
testCV(int numFolds,
int numFold)
Creates the test set for one fold of a cross-validation on the dataset. |
java.lang.String |
toString()
Returns the dataset as a string in ARFF format. |
java.lang.String |
toSummaryString()
Generates a string summarizing the set of instances. |
Instances |
trainCV(int numFolds,
int numFold)
Creates the training set for one fold of a cross-validation on the dataset. |
Instances |
trainCV(int numFolds,
int numFold,
java.util.Random random)
Creates the training set for one fold of a cross-validation on the dataset. |
double |
variance(Attribute att)
Computes the variance for a numeric attribute. |
double |
variance(int attIndex)
Computes the variance for a numeric attribute. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
private int lastRemoved
private boolean classRemoved
public static java.lang.String FILE_EXTENSION
static java.lang.String ARFF_RELATION
static java.lang.String ARFF_DATA
protected java.lang.String m_RelationName
protected FastVector m_Attributes
protected FastVector m_Instances
protected int m_ClassIndex
protected double[] m_ValueBuffer
protected int[] m_IndicesBuffer
Constructor Detail |
public Instances(java.io.Reader reader) throws java.io.IOException
reader
- the reader
java.io.IOException
- if the ARFF file is not read
successfullypublic Instances(java.io.Reader reader, int capacity) throws java.io.IOException
reader
- the readercapacity
- the capacity
java.lang.IllegalArgumentException
- if the header is not read successfully
or the capacity is negative.
java.io.IOException
- if there is a problem with the reader.public Instances(Instances dataset)
public Instances(Instances dataset, int capacity)
capacity
- the capacity of the new datasetpublic Instances(Instances source, int first, int toCopy)
source
- the set of instances from which a subset
is to be createdfirst
- the index of the first instance to be copiedtoCopy
- the number of instances to be copied
java.lang.IllegalArgumentException
- if first and toCopy are out of rangepublic Instances(java.lang.String name, FastVector attInfo, int capacity)
name
- the name of the relationattInfo
- the attribute informationcapacity
- the capacity of the setMethod Detail |
public Instances stringFreeStructure()
public final void add(Instance instance)
instance
- the instance to be addedpublic final Attribute attribute(int index)
index
- the attribute's index
public final Attribute attribute(java.lang.String name)
name
- the attribute's name
public boolean checkForStringAttributes()
public final boolean checkInstance(Instance instance)
public final Attribute classAttribute()
UnassignedClassException
- if the class is not setpublic final int classIndex()
public final void compactify()
public final void delete()
public final void delete(int index)
index
- the instance's positionpublic void deleteStringAttributes()
java.lang.IllegalArgumentException
- if string attribute couldn't be
successfully deleted (probably because it is the class attribute).public final void deleteWithMissing(int attIndex)
attIndex
- the attribute's indexpublic final void deleteWithMissing(Attribute att)
att
- the attributepublic final void deleteWithMissingClass()
UnassignedClassException
- if class is not setpublic java.util.Enumeration enumerateAttributes()
public final java.util.Enumeration enumerateInstances()
public final boolean equalHeaders(Instances dataset)
dataset
- another dataset
public final Instance firstInstance()
public java.util.Random getRandomNumberGenerator(long seed)
seed
- the given seed
public final Instance instance(int index)
index
- the instance's index
public final Instance lastInstance()
public final double meanOrMode(int attIndex)
attIndex
- the attribute's index
public final double meanOrMode(Attribute att)
att
- the attribute
public final int numAttributes()
public final int numClasses()
UnassignedClassException
- if the class is not setpublic final int numDistinctValues(int attIndex)
attIndex
- the attribute
public final int numDistinctValues(Attribute att)
att
- the attribute
public final int numInstances()
public final void randomize(java.util.Random random)
random
- a random number generatorpublic final boolean readInstance(java.io.Reader reader) throws java.io.IOException
reader
- the reader
java.io.IOException
- if the information is not read
successfullypublic final java.lang.String relationName()
public final void renameAttribute(int att, java.lang.String name)
att
- the attribute's indexname
- the new namepublic final void renameAttribute(Attribute att, java.lang.String name)
att
- the attributename
- the new namepublic final void renameAttributeValue(int att, int val, java.lang.String name)
att
- the attribute's indexval
- the value's indexname
- the new namepublic final void renameAttributeValue(Attribute att, java.lang.String val, java.lang.String name)
att
- the attributeval
- the valuename
- the new namepublic final Instances resample(java.util.Random random)
random
- a random number generator
public final Instances resampleWithWeights(java.util.Random random)
random
- a random number generator
public final Instances resampleWithWeights(java.util.Random random, double[] weights)
random
- a random number generatorweights
- the weight vector
java.lang.IllegalArgumentException
- if the weights array is of the wrong
length or contains negative weights.public final void setClass(Attribute att)
att
- attribute to be the classpublic final void setClassIndex(int classIndex)
classIndex
- the new class index
java.lang.IllegalArgumentException
- if the class index is too big or < 0public final void setRelationName(java.lang.String newName)
newName
- the new relation name.public final void sort(int attIndex)
attIndex
- the attribute's indexpublic final void sort(Attribute att)
att
- the attributepublic final void stratify(int numFolds)
numFolds
- the number of folds in the cross-validation
UnassignedClassException
- if the class is not setpublic final double sumOfWeights()
public Instances testCV(int numFolds, int numFold)
numFolds
- the number of folds in the cross-validation. Must
be greater than 1.numFold
- 0 for the first fold, 1 for the second, ...
java.lang.IllegalArgumentException
- if the number of folds is less than 2
or greater than the number of instances.public final java.lang.String toString()
public Instances trainCV(int numFolds, int numFold)
numFolds
- the number of folds in the cross-validation. Must
be greater than 1.numFold
- 0 for the first fold, 1 for the second, ...
java.lang.IllegalArgumentException
- if the number of folds is less than 2
or greater than the number of instances.public Instances trainCV(int numFolds, int numFold, java.util.Random random)
numFolds
- the number of folds in the cross-validation. Must
be greater than 1.numFold
- 0 for the first fold, 1 for the second, ...random
- the random number generator
java.lang.IllegalArgumentException
- if the number of folds is less than 2
or greater than the number of instances.public final double variance(int attIndex)
attIndex
- the numeric attribute
java.lang.IllegalArgumentException
- if the attribute is not numericpublic final double variance(Attribute att)
att
- the numeric attribute
java.lang.IllegalArgumentException
- if the attribute is not numericpublic AttributeStats attributeStats(int index)
index
- the index of the attribute to summarize.
public double[] attributeToDoubleArray(int index)
index
- the index of the attribute.
public java.lang.String toSummaryString()
protected boolean getInstance(java.io.StreamTokenizer tokenizer, boolean flag) throws java.io.IOException
tokenizer
- the tokenizer to be usedflag
- if method should test for carriage return after
each instance
java.io.IOException
- if the information is not read
successfullyprotected boolean getInstanceSparse(java.io.StreamTokenizer tokenizer, boolean flag) throws java.io.IOException
tokenizer
- the tokenizer to be usedflag
- if method should test for carriage return after
each instance
java.io.IOException
- if the information is not read
successfullyprotected boolean getInstanceFull(java.io.StreamTokenizer tokenizer, boolean flag) throws java.io.IOException
tokenizer
- the tokenizer to be usedflag
- if method should test for carriage return after
each instance
java.io.IOException
- if the information is not read
successfullyprotected void readHeader(java.io.StreamTokenizer tokenizer) throws java.io.IOException
tokenizer
- the stream tokenizer
java.io.IOException
- if the information is not read
successfullyprivate void copyInstances(int from, Instances dest, int num)
from
- the position of the first instance to be copieddest
- the destination for the instancesnum
- the number of instances to be copiedprivate void errms(java.io.StreamTokenizer tokenizer, java.lang.String theMsg) throws java.io.IOException
theMsg
- the error message to be throwntokenizer
- the stream tokenizer
IOExcpetion
- containing the error message
java.io.IOException
private void freshAttributeInfo()
private void getFirstToken(java.io.StreamTokenizer tokenizer) throws java.io.IOException
tokenizer
- the stream tokenizer
java.io.IOException
- if reading the next token failsprivate void getIndex(java.io.StreamTokenizer tokenizer) throws java.io.IOException
tokenizer
- the stream tokenizer
java.io.IOException
- if it finds a premature end of lineprivate void getLastToken(java.io.StreamTokenizer tokenizer, boolean endOfFileOk) throws java.io.IOException
tokenizer
- the stream tokenizer
java.io.IOException
- if it doesn't find an end of lineprivate void getNextToken(java.io.StreamTokenizer tokenizer) throws java.io.IOException
tokenizer
- the stream tokenizer
java.io.IOException
- if it finds a premature end of lineprivate void initTokenizer(java.io.StreamTokenizer tokenizer)
tokenizer
- the stream tokenizerprivate java.lang.String instancesAndWeights()
private void quickSort(int attIndex, int lo0, int hi0)
attIndex
- the attribute's indexlo0
- the first index of the subset to be sortedhi0
- the last index of the subset to be sortedprivate void readTillEOL(java.io.StreamTokenizer tokenizer) throws java.io.IOException
tokenizer
- the stream tokenizer
java.io.IOException
private void stratStep(int numFolds)
numFolds
- the number of folds for the stratificationprivate void swap(int i, int j)
i
- the first instance's indexj
- the second instance's indexpublic static Instances mergeInstances(Instances first, Instances second)
first
- the first set of Instancessecond
- the second set of Instances
java.lang.IllegalArgumentException
- if the datasets are not the same sizepublic static void test(java.lang.String[] argv)
argv
- should contain one element: the name of an ARFF filepublic static void main(java.lang.String[] args)
args
- should contain one element: the name of an ARFF filepublic void deleteAttributeAt(int position)
lastRemoved
to position
,
and sets classRemoved
to position == m_ClassIndex
.
position
- the attribute's position
java.lang.IllegalArgumentException
- if the given index is out of rangepublic void insertAttributeAt(Attribute att, int position)
position
equals lastRemoved
,
m_ClassIndex
is set to position and classRemoved
is set to false
.
att
- the attribute to be insertedposition
- the attribute's position
java.lang.IllegalArgumentException
- if the given index is out of range
|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |