|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectweka.filters.Filter
An abstract class for instance filters: objects that take instances as input, carry out some transformation on the instance and then output the instance. The method implementations in this class assume that most of the work will be done in the methods overridden by subclasses.
A simple example of filter use. This example doesn't remove instances from the output queue until all instances have been input, so has higher memory consumption than an approach that uses output instances as they are made available:
Filter filter = ..some type of filter..
Instances instances = ..some instances..
for (int i = 0; i < data.numInstances(); i++) {
filter.input(data.instance(i));
}
filter.batchFinished();
Instances newData = filter.outputFormat();
Instance processed;
while ((processed = filter.output()) != null) {
newData.add(processed);
}
..do something with newData..
Field Summary | |
private boolean |
m_Debug
Debugging mode |
private Instances |
m_InputFormat
The input format for instances |
private int[] |
m_InputStringAtts
Indices of string attributes in the input format |
protected boolean |
m_NewBatch
Record whether the filter is at the start of a batch |
private Instances |
m_OutputFormat
The output format for instances |
private Queue |
m_OutputQueue
The output instance queue |
private int[] |
m_OutputStringAtts
Indices of string attributes in the output format |
Constructor Summary | |
Filter()
|
Method Summary | |
static void |
batchFilterFile(Filter filter,
java.lang.String[] options)
Method for testing filters ability to process multiple batches. |
boolean |
batchFinished()
Signify that this batch of input to the filter is finished. |
protected void |
bufferInput(Instance instance)
Adds the supplied input instance to the inputformat dataset for later processing. |
protected void |
copyStringValues(Instance instance,
boolean instSrcCompat,
Instances srcDataset,
Instances destDataset)
Takes string values referenced by an Instance and copies them from a source dataset to a destination dataset. |
protected void |
copyStringValues(Instance instance,
boolean instSrcCompat,
Instances srcDataset,
int[] srcStrAtts,
Instances destDataset,
int[] destStrAtts)
Takes string values referenced by an Instance and copies them from a source dataset to a destination dataset. |
private void |
copyStringValues(Instance inst,
Instances destDataset,
int[] strAtts)
Copies string values contained in the instance copied to a new dataset. |
static void |
filterFile(Filter filter,
java.lang.String[] options)
Method for testing filters. |
protected void |
flushInput()
This will remove all buffered instances from the inputformat dataset. |
protected Instances |
getInputFormat()
Gets the currently set inputformat instances. |
protected int[] |
getInputStringIndex()
Returns an array containing the indices of all string attributes in the input format. |
Instances |
getOutputFormat()
Gets the format of the output instances. |
protected int[] |
getOutputStringIndex()
Returns an array containing the indices of all string attributes in the output format. |
protected int[] |
getStringIndices(Instances insts)
Gets an array containing the indices of all string attributes. |
boolean |
input(Instance instance)
Input an instance for filtering. |
boolean |
inputFormat(Instances instanceInfo)
Deprecated. use setInputFormat(Instances) instead. |
protected Instances |
inputFormatPeek()
Returns a reference to the current input format without copying it. |
boolean |
isOutputFormatDefined()
Returns whether the output format is ready to be collected |
static void |
main(java.lang.String[] args)
Main method for testing this class. |
int |
numPendingOutput()
Returns the number of instances pending output |
Instance |
output()
Output an instance after filtering and remove from the output queue. |
Instances |
outputFormat()
Deprecated. use getOutputFormat() instead. |
protected Instances |
outputFormatPeek()
Returns a reference to the current output format without copying it. |
Instance |
outputPeek()
Output an instance after filtering but do not remove from the output queue. |
protected void |
push(Instance instance)
Adds an output instance to the queue. |
protected void |
resetQueue()
Clears the output queue. |
boolean |
setInputFormat(Instances instanceInfo)
Sets the format of the input instances. |
protected void |
setOutputFormat(Instances outputFormat)
Sets the format of output instances. |
static Instances |
useFilter(Instances data,
Filter filter)
Filters an entire set of instances through a filter and returns the new set. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
private boolean m_Debug
private Instances m_OutputFormat
private Queue m_OutputQueue
private int[] m_OutputStringAtts
private int[] m_InputStringAtts
private Instances m_InputFormat
protected boolean m_NewBatch
Constructor Detail |
public Filter()
Method Detail |
protected void setOutputFormat(Instances outputFormat)
outputFormat
- the new output formatprotected Instances getInputFormat()
protected Instances inputFormatPeek()
protected Instances outputFormatPeek()
protected void push(Instance instance)
instance
- the instance to be added to the queue.protected void resetQueue()
protected void bufferInput(Instance instance)
instance
- the Instance
to buffer.protected int[] getInputStringIndex()
protected int[] getOutputStringIndex()
private void copyStringValues(Instance inst, Instances destDataset, int[] strAtts)
destDataset
- the destination set of InstancesstrAtts
- an array containing the indices of any string attributes
in the dataset.protected void copyStringValues(Instance instance, boolean instSrcCompat, Instances srcDataset, Instances destDataset)
instance
- the instance containing references to strings in the source
dataset that will have references updated to be valid for the destination
dataset.instSrcCompat
- true if the instance structure is the same as the
source, or false if it is the same as the destinationsrcDataset
- the dataset for which the current instance string
references are valid (after any position mapping if needed)destDataset
- the dataset for which the current instance string
references need to be inserted (after any position mapping if needed)protected void copyStringValues(Instance instance, boolean instSrcCompat, Instances srcDataset, int[] srcStrAtts, Instances destDataset, int[] destStrAtts)
instance
- the instance containing references to strings in the source
dataset that will have references updated to be valid for the destination
dataset.instSrcCompat
- true if the instance structure is the same as the
source, or false if it is the same as the destination (i.e. which of the
string attribute indices contains the correct locations for this instance).srcDataset
- the dataset for which the current instance string
references are valid (after any position mapping if needed)srcStrAtts
- an array containing the indices of string attributes
in the source datset.destDataset
- the dataset for which the current instance string
references need to be inserted (after any position mapping if needed)destStrAtts
- an array containing the indices of string attributes
in the destination datset.protected void flushInput()
public boolean inputFormat(Instances instanceInfo) throws java.lang.Exception
setInputFormat(Instances)
instead.
java.lang.Exception
public boolean setInputFormat(Instances instanceInfo) throws java.lang.Exception
super.setInputFormat(Instances)
instanceInfo
- an Instances object containing the input instance
structure (any instances contained in the object are ignored - only the
structure is required).
java.lang.Exception
- if the inputFormat can't be set successfullypublic Instances outputFormat()
getOutputFormat()
instead.
public Instances getOutputFormat()
java.lang.NullPointerException
- if no input structure has been
defined (or the output format hasn't been determined yet)public boolean input(Instance instance) throws java.lang.Exception
instance
- the input instance
java.lang.NullPointerException
- if the input format has not been
defined.
java.lang.Exception
- if the input instance was not of the correct
format or if there was a problem with the filtering.public boolean batchFinished() throws java.lang.Exception
java.lang.NullPointerException
- if no input structure has been defined,
java.lang.Exception
- if there was a problem finishing the batch.public Instance output()
java.lang.NullPointerException
- if no output structure has been definedpublic Instance outputPeek()
java.lang.NullPointerException
- if no input structure has been definedpublic int numPendingOutput()
java.lang.NullPointerException
- if no input structure has been definedpublic boolean isOutputFormatDefined()
protected int[] getStringIndices(Instances insts)
insts
- the Instances to scan for string attributes.
public static Instances useFilter(Instances data, Filter filter) throws java.lang.Exception
data
- the data to be filteredfilter
- the filter to be used
java.lang.Exception
- if the filter can't be used successfullypublic static void filterFile(Filter filter, java.lang.String[] options) throws java.lang.Exception
java.lang.Exception
- if something goes wrong or the user requests help on
command optionspublic static void batchFilterFile(Filter filter, java.lang.String[] options) throws java.lang.Exception
java.lang.Exception
- if something goes wrong or the user requests help on
command optionspublic static void main(java.lang.String[] args)
|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |