de.lmu.ifi.dbs.elki.datasource.parser
Class TermFrequencyParser

java.lang.Object
  extended by de.lmu.ifi.dbs.elki.datasource.parser.AbstractParser
      extended by de.lmu.ifi.dbs.elki.datasource.parser.NumberVectorLabelParser<SparseFloatVector>
          extended by de.lmu.ifi.dbs.elki.datasource.parser.TermFrequencyParser
All Implemented Interfaces:
LinebasedParser, Parser, InspectionUtilFrequentlyScanned, Parameterizable

@Title(value="Term frequency parser")
@Description(value="Parse a file containing term frequencies. The expected format is \'label term1  term2  ...\'. Terms must not contain the separator character!")
public class TermFrequencyParser
extends NumberVectorLabelParser<SparseFloatVector>

A parser to load term frequency data, which essentially are sparse vectors with text keys.


Nested Class Summary
static class TermFrequencyParser.Parameterizer
          Parameterization class.
 
Field Summary
(package private)  HashMap<String,Integer> keymap
          Map
private static Logging logger
          Class logger
(package private)  int maxdim
          Maximum dimension used
 
Fields inherited from class de.lmu.ifi.dbs.elki.datasource.parser.NumberVectorLabelParser
LABEL_INDICES_ID, labelIndices
 
Fields inherited from class de.lmu.ifi.dbs.elki.datasource.parser.AbstractParser
ATTRIBUTE_CONCATENATION, COLUMN_SEPARATOR_ID, COMMENT, NUMBER_PATTERN, QUOTE_CHAR, QUOTE_ID, quoteChar, WHITESPACE_PATTERN
 
Constructor Summary
TermFrequencyParser(Pattern colSep, char quoteChar, BitSet labelIndices)
          Constructor.
 
Method Summary
protected  SparseFloatVector createDBObject(List<Double> attributes)
           Creates a database object of type V.
protected  Logging getLogger()
          Get the logger for this class.
protected  VectorFieldTypeInformation<SparseFloatVector> getTypeInformation(int dimensionality)
          Get a prototype object for the given dimensionality.
 MultipleObjectsBundle parse(InputStream in)
          Returns a list of the objects parsed from the specified input stream.
 Pair<SparseFloatVector,LabelList> parseLineInternal(String line)
          Internal method for parsing a single line.
 
Methods inherited from class de.lmu.ifi.dbs.elki.datasource.parser.NumberVectorLabelParser
parseLine
 
Methods inherited from class de.lmu.ifi.dbs.elki.datasource.parser.AbstractParser
tokenize, toString
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

logger

private static final Logging logger
Class logger


maxdim

int maxdim
Maximum dimension used


keymap

HashMap<String,Integer> keymap
Map

Constructor Detail

TermFrequencyParser

public TermFrequencyParser(Pattern colSep,
                           char quoteChar,
                           BitSet labelIndices)
Constructor.

Parameters:
colSep -
quoteChar -
labelIndices -
Method Detail

createDBObject

protected SparseFloatVector createDBObject(List<Double> attributes)
Description copied from class: NumberVectorLabelParser

Creates a database object of type V.

Specified by:
createDBObject in class NumberVectorLabelParser<SparseFloatVector>
Parameters:
attributes - the attributes of the vector to create.
Returns:
a RalVector of type V containing the given attribute values

parseLineInternal

public Pair<SparseFloatVector,LabelList> parseLineInternal(String line)
Description copied from class: NumberVectorLabelParser
Internal method for parsing a single line. Used by both line based parsig as well as block parsing. This saves the building of meta data for each line.

Overrides:
parseLineInternal in class NumberVectorLabelParser<SparseFloatVector>
Parameters:
line - Line to process
Returns:
parsing result

parse

public MultipleObjectsBundle parse(InputStream in)
Description copied from interface: Parser
Returns a list of the objects parsed from the specified input stream.

Specified by:
parse in interface Parser
Overrides:
parse in class NumberVectorLabelParser<SparseFloatVector>
Parameters:
in - the stream to parse objects from
Returns:
a list containing those objects parsed from the input stream

getTypeInformation

protected VectorFieldTypeInformation<SparseFloatVector> getTypeInformation(int dimensionality)
Description copied from class: NumberVectorLabelParser
Get a prototype object for the given dimensionality.

Specified by:
getTypeInformation in class NumberVectorLabelParser<SparseFloatVector>
Parameters:
dimensionality - Dimensionality
Returns:
Prototype object

getLogger

protected Logging getLogger()
Description copied from class: AbstractParser
Get the logger for this class.

Specified by:
getLogger in class AbstractParser
Returns:
Logger.

Release 0.4.0 (2011-09-20_1324)