
public class ArffParser extends Object implements Parser
| Modifier and Type | Class and Description | 
|---|---|
static class  | 
ArffParser.Parameterizer
Parameterization class. 
 | 
| Modifier and Type | Field and Description | 
|---|---|
static Pattern | 
ARFF_COMMENT
Comment pattern. 
 | 
static Pattern | 
ARFF_HEADER_ATTRIBUTE
Arff attribute declaration marker. 
 | 
static Pattern | 
ARFF_HEADER_DATA
Arff data marker. 
 | 
static Pattern | 
ARFF_HEADER_RELATION
Arff file marker. 
 | 
static Pattern | 
ARFF_NUMERIC
Pattern for numeric columns. 
 | 
static String | 
DEFAULT_ARFF_MAGIC_CLASS
Pattern to auto-convert columns to class labels. 
 | 
static String | 
DEFAULT_ARFF_MAGIC_EID
Pattern to auto-convert columns to external ids. 
 | 
static Pattern | 
EMPTY
Empty line pattern. 
 | 
private static Logging | 
LOG
Logger. 
 | 
(package private) Pattern | 
magic_class
Pattern to recognize class label columns. 
 | 
(package private) Pattern | 
magic_eid
Pattern to recognize external ids. 
 | 
| Constructor and Description | 
|---|
ArffParser(Pattern magic_eid,
          Pattern magic_class)
Constructor. 
 | 
ArffParser(String magic_eid,
          String magic_class)
Constructor. 
 | 
| Modifier and Type | Method and Description | 
|---|---|
private Object[] | 
loadDenseInstance(StreamTokenizer tokenizer,
                 int[] dimsize,
                 TypeInformation[] etyp,
                 int outdim)  | 
private Object[] | 
loadSparseInstance(StreamTokenizer tokenizer,
                  int[] targ,
                  int[] dimsize,
                  TypeInformation[] elkitypes,
                  int metaLength)  | 
private StreamTokenizer | 
makeArffTokenizer(BufferedReader br)
Make a StreamTokenizer for the ARFF format. 
 | 
private void | 
nextToken(StreamTokenizer tokenizer)
Helper function for token handling. 
 | 
MultipleObjectsBundle | 
parse(InputStream instream)
Returns a list of the objects parsed from the specified input stream. 
 | 
private void | 
parseAttributeStatements(BufferedReader br,
                        ArrayList<String> names,
                        ArrayList<String> types)
Parse the "@attribute" section of the ARFF file. 
 | 
private void | 
processColumnTypes(ArrayList<String> names,
                  ArrayList<String> types,
                  int[] targ,
                  TypeInformation[] etyp,
                  int[] dims)
Process the column types (and names!) 
 | 
private void | 
readHeader(BufferedReader br)
Read the dataset header part of the ARFF file, to ensure consistency. 
 | 
private void | 
setupBundleHeaders(ArrayList<String> names,
                  int[] targ,
                  TypeInformation[] etyp,
                  int[] dimsize,
                  MultipleObjectsBundle bundle,
                  boolean sparse)
Setup the headers for the object bundle. 
 | 
private static final Logging LOG
public static final Pattern ARFF_HEADER_RELATION
public static final Pattern ARFF_HEADER_ATTRIBUTE
public static final Pattern ARFF_HEADER_DATA
public static final Pattern ARFF_COMMENT
public static final String DEFAULT_ARFF_MAGIC_EID
public static final String DEFAULT_ARFF_MAGIC_CLASS
public static final Pattern ARFF_NUMERIC
public static final Pattern EMPTY
Pattern magic_eid
Pattern magic_class
public ArffParser(Pattern magic_eid, Pattern magic_class)
magic_eid - Magic to recognize external IDsmagic_class - Magic to recognize class labelspublic MultipleObjectsBundle parse(InputStream instream)
Parserprivate Object[] loadSparseInstance(StreamTokenizer tokenizer, int[] targ, int[] dimsize, TypeInformation[] elkitypes, int metaLength) throws IOException
IOExceptionprivate Object[] loadDenseInstance(StreamTokenizer tokenizer, int[] dimsize, TypeInformation[] etyp, int outdim) throws IOException
IOExceptionprivate StreamTokenizer makeArffTokenizer(BufferedReader br)
br - Buffered readerprivate void setupBundleHeaders(ArrayList<String> names, int[] targ, TypeInformation[] etyp, int[] dimsize, MultipleObjectsBundle bundle, boolean sparse)
names - Attribute namestarg - Target columnsetyp - ELKI type informationdimsize - Number of dimensions in the individual typesbundle - Output bundlesparse - Flag to create sparse vectorsprivate void readHeader(BufferedReader br) throws IOException
br - Buffered ReaderIOExceptionprivate void parseAttributeStatements(BufferedReader br, ArrayList<String> names, ArrayList<String> types) throws IOException
br - Inputnames - List (to fill) of attribute namestypes - List (to fill) of attribute typesIOExceptionprivate void processColumnTypes(ArrayList<String> names, ArrayList<String> types, int[] targ, TypeInformation[] etyp, int[] dims)
names - Attribute namestypes - Attribute typestarg - Target dimension mapping (ARFF to ELKI), return valueetyp - ELKI type information, return valuedims - Number of successive dimensions, return valueprivate void nextToken(StreamTokenizer tokenizer) throws IOException
tokenizer - TokenizerIOException