|
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectde.lmu.ifi.dbs.elki.datasource.parser.ArffParser
public class ArffParser

Parser to load WEKA .arff files into ELKI. This parser is quite hackish, and contains lots of not yet configurable magic. TODO: Sparse vectors are not yet supported.
| Nested Class Summary | |
|---|---|
static class |
ArffParser.Parameterizer
Parameterization class. |
| Field Summary | |
|---|---|
static Pattern |
ARFF_COMMENT
Comment pattern. |
static Pattern |
ARFF_HEADER_ATTRIBUTE
Arff attribute declaration marker |
static Pattern |
ARFF_HEADER_DATA
Arff data marker |
static Pattern |
ARFF_HEADER_RELATION
Arff file marker |
static Pattern |
ARFF_NUMERIC
Pattern for numeric columns |
static String |
DEFAULT_ARFF_MAGIC_CLASS
Pattern to auto-convert columns to class labels. |
static String |
DEFAULT_ARFF_MAGIC_EID
Pattern to auto-convert columns to external ids. |
static Pattern |
EMPTY
Empty line pattern. |
private static Logging |
logger
Logger |
(package private) Pattern |
magic_class
Pattern to recognize class label columns |
(package private) Pattern |
magic_eid
Pattern to recognize external ids |
| Constructor Summary | |
|---|---|
ArffParser(Pattern magic_eid,
Pattern magic_class)
Constructor. |
|
ArffParser(String magic_eid,
String magic_class)
Constructor. |
|
| Method Summary | |
|---|---|
private Object[] |
loadDenseInstance(StreamTokenizer tokenizer,
int[] dimsize,
TypeInformation[] etyp,
int outdim)
|
private Object[] |
loadSparseInstance(StreamTokenizer tokenizer,
int[] targ,
int[] dimsize,
TypeInformation[] elkitypes,
int metaLength)
|
private StreamTokenizer |
makeArffTokenizer(BufferedReader br)
Make a StreamTokenizer for the ARFF format. |
private void |
nextToken(StreamTokenizer tokenizer)
Helper function for token handling. |
MultipleObjectsBundle |
parse(InputStream instream)
Returns a list of the objects parsed from the specified input stream. |
private void |
parseAttributeStatements(BufferedReader br,
ArrayList<String> names,
ArrayList<String> types)
Parse the "@attribute" section of the ARFF file. |
private void |
processColumnTypes(ArrayList<String> names,
ArrayList<String> types,
int[] targ,
TypeInformation[] etyp,
int[] dims)
Process the column types (and names!) |
private void |
readHeader(BufferedReader br)
Read the dataset header part of the ARFF file, to ensure consistency. |
private void |
setupBundleHeaders(ArrayList<String> names,
int[] targ,
TypeInformation[] etyp,
int[] dimsize,
MultipleObjectsBundle bundle,
boolean sparse)
Setup the headers for the object bundle. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
private static final Logging logger
public static final Pattern ARFF_HEADER_RELATION
public static final Pattern ARFF_HEADER_ATTRIBUTE
public static final Pattern ARFF_HEADER_DATA
public static final Pattern ARFF_COMMENT
public static final String DEFAULT_ARFF_MAGIC_EID
public static final String DEFAULT_ARFF_MAGIC_CLASS
public static final Pattern ARFF_NUMERIC
public static final Pattern EMPTY
Pattern magic_eid
Pattern magic_class
| Constructor Detail |
|---|
public ArffParser(Pattern magic_eid,
Pattern magic_class)
magic_eid - Magic to recognize external IDsmagic_class - Magic to recognize class labels
public ArffParser(String magic_eid,
String magic_class)
magic_eid - Magic to recognize external IDsmagic_class - Magic to recognize class labels| Method Detail |
|---|
public MultipleObjectsBundle parse(InputStream instream)
Parser
parse in interface Parserinstream - the stream to parse objects from
private Object[] loadSparseInstance(StreamTokenizer tokenizer,
int[] targ,
int[] dimsize,
TypeInformation[] elkitypes,
int metaLength)
throws IOException
IOException
private Object[] loadDenseInstance(StreamTokenizer tokenizer,
int[] dimsize,
TypeInformation[] etyp,
int outdim)
throws IOException
IOExceptionprivate StreamTokenizer makeArffTokenizer(BufferedReader br)
br - Buffered reader
private void setupBundleHeaders(ArrayList<String> names,
int[] targ,
TypeInformation[] etyp,
int[] dimsize,
MultipleObjectsBundle bundle,
boolean sparse)
names - Attribute namestarg - Target columnsetyp - ELKI type informationdimsize - Number of dimensions in the individual typesbundle - Output bundlesparse - Flag to create sparse vectors
private void readHeader(BufferedReader br)
throws IOException
br - Buffered Reader
IOException
private void parseAttributeStatements(BufferedReader br,
ArrayList<String> names,
ArrayList<String> types)
throws IOException
br - Inputnames - List (to fill) of attribute namestypes - List (to fill) of attribute types
IOException
private void processColumnTypes(ArrayList<String> names,
ArrayList<String> types,
int[] targ,
TypeInformation[] etyp,
int[] dims)
names - Attribute namestypes - Attribute typestarg - Target dimension mapping (ARFF to ELKI), return valueetyp - ELKI type information, return valuedims - Number of successive dimensions, return value
private void nextToken(StreamTokenizer tokenizer)
throws IOException
tokenizer - Tokenizer
IOException
|
|
|||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||||