de.lmu.ifi.dbs.elki.datasource.parser
Class NumberVectorLabelParser<V extends NumberVector<?,?>>

java.lang.Object
  extended by de.lmu.ifi.dbs.elki.datasource.parser.AbstractParser
      extended by de.lmu.ifi.dbs.elki.datasource.parser.NumberVectorLabelParser<V>
Type Parameters:
V - the type of NumberVector used
All Implemented Interfaces:
LinebasedParser, Parser, InspectionUtilFrequentlyScanned, Parameterizable
Direct Known Subclasses:
DoubleVectorLabelParser, FloatVectorLabelParser, SparseFloatVectorLabelParser, TermFrequencyParser

public abstract class NumberVectorLabelParser<V extends NumberVector<?,?>>
extends AbstractParser
implements LinebasedParser, Parser

Provides a parser for parsing one point per line, attributes separated by whitespace.

Several labels may be given per point. A label must not be parseable as double. Lines starting with "#" will be ignored.

An index can be specified to identify an entry to be treated as class label. This index counts all entries (numeric and labels as well) starting with 0.


Nested Class Summary
static class NumberVectorLabelParser.Parameterizer<V extends NumberVector<?,?>>
          Parameterization class.
 
Field Summary
static OptionID LABEL_INDICES_ID
          A comma separated list of the indices of labels (may be numeric), counting whitespace separated entries in a line starting with 0.
protected  BitSet labelIndices
          Keeps the indices of the attributes to be treated as a string label.
 
Fields inherited from class de.lmu.ifi.dbs.elki.datasource.parser.AbstractParser
ATTRIBUTE_CONCATENATION, COLUMN_SEPARATOR_ID, COMMENT, NUMBER_PATTERN, QUOTE_CHAR, QUOTE_ID, quoteChar, WHITESPACE_PATTERN
 
Constructor Summary
NumberVectorLabelParser(Pattern colSep, char quoteChar, BitSet labelIndices)
          Constructor
 
Method Summary
protected abstract  V createDBObject(List<Double> attributes)
           Creates a database object of type V.
protected abstract  VectorFieldTypeInformation<V> getTypeInformation(int dimensionality)
          Get a prototype object for the given dimensionality.
 MultipleObjectsBundle parse(InputStream in)
          Returns a list of the objects parsed from the specified input stream.
 SingleObjectBundle parseLine(String line)
          Parse a single line into a database object
protected  Pair<V,LabelList> parseLineInternal(String line)
          Internal method for parsing a single line.
 
Methods inherited from class de.lmu.ifi.dbs.elki.datasource.parser.AbstractParser
getLogger, tokenize, toString
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

LABEL_INDICES_ID

public static final OptionID LABEL_INDICES_ID
A comma separated list of the indices of labels (may be numeric), counting whitespace separated entries in a line starting with 0. The corresponding entries will be treated as a label.

Key: -parser.labelIndices


labelIndices

protected BitSet labelIndices
Keeps the indices of the attributes to be treated as a string label.

Constructor Detail

NumberVectorLabelParser

public NumberVectorLabelParser(Pattern colSep,
                               char quoteChar,
                               BitSet labelIndices)
Constructor

Parameters:
colSep -
quoteChar -
labelIndices -
Method Detail

parse

public MultipleObjectsBundle parse(InputStream in)
Description copied from interface: Parser
Returns a list of the objects parsed from the specified input stream.

Specified by:
parse in interface Parser
Parameters:
in - the stream to parse objects from
Returns:
a list containing those objects parsed from the input stream

parseLine

public SingleObjectBundle parseLine(String line)
Description copied from interface: LinebasedParser
Parse a single line into a database object

Specified by:
parseLine in interface LinebasedParser
Parameters:
line - single line
Returns:
parsing result

parseLineInternal

protected Pair<V,LabelList> parseLineInternal(String line)
Internal method for parsing a single line. Used by both line based parsig as well as block parsing. This saves the building of meta data for each line.

Parameters:
line - Line to process
Returns:
parsing result

createDBObject

protected abstract V createDBObject(List<Double> attributes)

Creates a database object of type V.

Parameters:
attributes - the attributes of the vector to create.
Returns:
a RalVector of type V containing the given attribute values

getTypeInformation

protected abstract VectorFieldTypeInformation<V> getTypeInformation(int dimensionality)
Get a prototype object for the given dimensionality.

Parameters:
dimensionality - Dimensionality
Returns:
Prototype object

Release 0.4.0 (2011-09-20_1324)