Package de.lmu.ifi.dbs.elki.datasource.parser

Parsers for different file formats and data types.

See:
          Description


Interface Summary
DistanceParser<D extends Distance<D>> A DistanceParser shall provide a DistanceParsingResult by parsing an InputStream.
LinebasedParser A parser that can parse single line.
Parser A Parser shall provide a ParsingResult by parsing an InputStream.
 

Class Summary
AbstractParser Abstract superclass for all parsers providing the option handler for handling options.
AbstractParser.Parameterizer Parameterization class.
ArffParser Parser to load WEKA .arff files into ELKI.
ArffParser.Parameterizer Parameterization class.
BitVectorLabelParser Provides a parser for parsing one BitVector per line, bits separated by whitespace.
BitVectorLabelParser.Parameterizer Parameterization class.
DistanceParsingResult<D extends Distance<D>> Provides a list of database objects and labels associated with these objects and a cache of precomputed distances between the database objects.
DoubleVectorLabelParser Provides a parser for parsing one point per line, attributes separated by whitespace.
DoubleVectorLabelParser.Parameterizer Parameterization class.
DoubleVectorLabelTransposingParser Parser reads points transposed.
DoubleVectorLabelTransposingParser.Parameterizer Parameterization class.
FloatVectorLabelParser Provides a parser for parsing one point per line, attributes separated by whitespace.
FloatVectorLabelParser.Parameterizer Parameterization class.
NumberDistanceParser<D extends NumberDistance<D,N>,N extends Number> Provides a parser for parsing one distance value per line.
NumberDistanceParser.Parameterizer<D extends NumberDistance<D,N>,N extends Number> Parameterization class.
NumberVectorLabelParser<V extends NumberVector<?,?>> Provides a parser for parsing one point per line, attributes separated by whitespace.
NumberVectorLabelParser.Parameterizer<V extends NumberVector<?,?>> Parameterization class.
ParameterizationFunctionLabelParser Provides a parser for parsing one point per line, attributes separated by whitespace.
ParameterizationFunctionLabelParser.Parameterizer Parameterization class.
SimplePolygonParser Parser to load polygon data (2D and 3D only) from a simple format.
SimplePolygonParser.Parameterizer Parameterization class.
SparseBitVectorLabelParser Provides a parser for parsing one sparse BitVector per line, where the indices of the one-bits are separated by whitespace.
SparseBitVectorLabelParser.Parameterizer Parameterization class.
SparseFloatVectorLabelParser Provides a parser for parsing one point per line, attributes separated by whitespace.
SparseFloatVectorLabelParser.Parameterizer Parameterization class.
TermFrequencyParser A parser to load term frequency data, which essentially are sparse vectors with text keys.
TermFrequencyParser.Parameterizer Parameterization class.
 

Package de.lmu.ifi.dbs.elki.datasource.parser Description

Parsers for different file formats and data types.

The general use-case for any parser is to create objects out of an InputStream (e.g. by reading a data file). The objects are packed in a MultipleObjectsBundle which, in turn, is used by a DatabaseConnection-Object to fill a Database containing the corresponding objects.

By default (i.e., if the user does not specify any specific requests), any KDDTask will use the StaticArrayDatabase which, in turn, will use a FileBasedDatabaseConnection and a DoubleVectorLabelParser to parse a specified data file creating a StaticArrayDatabase containing DoubleVector-Objects.

Thus, the standard procedure to use a data set of a real-valued vector space is to prepare the data set in a file of the following format (as suitable to DoubleVectorLabelParser):

This file format is e.g. also suitable to gnuplot.

As an example file following these requirements consider e.g.: exampledata.txt


Release 0.4.0 (2011-09-20_1324)