Environment for
DeveLoping
KDD-Applications
Supported by Index-Structures

de.lmu.ifi.dbs.elki.preprocessing
Class DiSHPreprocessor<V extends RealVector<V,N>,N extends Number>

java.lang.Object
  extended by de.lmu.ifi.dbs.elki.logging.AbstractLoggable
      extended by de.lmu.ifi.dbs.elki.utilities.optionhandling.AbstractParameterizable
          extended by de.lmu.ifi.dbs.elki.preprocessing.DiSHPreprocessor<V,N>
All Implemented Interfaces:
Loggable, PreferenceVectorPreprocessor<V>, Preprocessor<V>, Parameterizable

public class DiSHPreprocessor<V extends RealVector<V,N>,N extends Number>
extends AbstractParameterizable
implements PreferenceVectorPreprocessor<V>

Preprocessor for DiSH preference vector assignment to objects of a certain database.

Author:
Elke Achtert

Nested Class Summary
static class DiSHPreprocessor.Strategy
          Available strategies for determination of the preference vecrtor.
 
Field Summary
private static String CONDITION
          Description for the determination of the preference vector.
static DoubleDistance DEFAULT_EPSILON
          The default value for epsilon.
static DiSHPreprocessor.Strategy DEFAULT_STRATEGY
          Default strategy.
private  DoubleDistance[] epsilon
          The epsilon value for each dimension;
static String EPSILON_D
          Description for parameter epsilon.
static String EPSILON_P
          Option string for parameter epsilon.
private  int minpts
          Threshold for minimum number of points in the neighborhood.
static String MINPTS_D
          Description for parameter minimum points.
static String MINPTS_P
          Parameter minimum points.
private  DiSHPreprocessor.Strategy strategy
          The strategy to determine the preference vector.
static String STRATEGY_D
          Description for parameter strategy.
static String STRATEGY_P
          Parameter strategy.
 
Fields inherited from class de.lmu.ifi.dbs.elki.utilities.optionhandling.AbstractParameterizable
optionHandler
 
Fields inherited from class de.lmu.ifi.dbs.elki.logging.AbstractLoggable
debug
 
Constructor Summary
DiSHPreprocessor()
          Provides a new AdvancedHiSCPreprocessor that computes the preference vector of objects of a certain database.
 
Method Summary
 String description()
          Returns a description of the class and the required parameters.
private  BitSet determinePreferenceVector(Database<V> database, Set<Integer>[] neighborIDs, StringBuffer msg)
          Determines the preference vector according to the specified neighbor ids.
private  BitSet determinePreferenceVectorByApriori(Database<V> database, Set<Integer>[] neighborIDs, StringBuffer msg)
          Determines the preference vector with the apriori strategy.
private  BitSet determinePreferenceVectorByMaxIntersection(Set<Integer>[] neighborIDs, StringBuffer msg)
          Determines the preference vector with the max intersection strategy.
 DoubleDistance[] getEpsilon()
          Returns the value of the epsilon parameter.
 int getMinpts()
          Returns minpts.
private  DimensionSelectingDistanceFunction<N,V>[] initDistanceFunctions(Database<V> database, int dimensionality, boolean verbose, boolean time)
          Initializes the dimension selecting distancefunctions to determine the preference vectors.
private  int max(Map<Integer,Set<Integer>> candidates)
          Returns the set with the maximum size contained in the specified map.
private  int maxIntersection(Map<Integer,Set<Integer>> candidates, Set<Integer> set, Set<Integer> result)
          Returns the index of the set having the maximum intersection set with the specified set contained in the specified map.
 void run(Database<V> database, boolean verbose, boolean time)
          This method executes the actual preprocessing step of this Preprocessor for the objects of the specified database.
 String[] setParameters(String[] args)
          Sets the attributes of the class accordingly to the given parameters.
 
Methods inherited from class de.lmu.ifi.dbs.elki.utilities.optionhandling.AbstractParameterizable
addOption, checkGlobalParameterConstraints, deleteOption, description, description, getAttributeSettings, getParameters, getParameterValue, getPossibleOptions, inlineDescription, isSet, setParameters
 
Methods inherited from class de.lmu.ifi.dbs.elki.logging.AbstractLoggable
debugFine, debugFiner, debugFinest, exception, message, progress, progress, progress, verbose, verbose, warning
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface de.lmu.ifi.dbs.elki.utilities.optionhandling.Parameterizable
checkGlobalParameterConstraints, getAttributeSettings, getParameters, getPossibleOptions, inlineDescription
 

Field Detail

DEFAULT_EPSILON

public static final DoubleDistance DEFAULT_EPSILON
The default value for epsilon.


EPSILON_P

public static final String EPSILON_P
Option string for parameter epsilon.

See Also:
Constant Field Values

EPSILON_D

public static String EPSILON_D
Description for parameter epsilon.


MINPTS_P

public static final String MINPTS_P
Parameter minimum points.

See Also:
Constant Field Values

CONDITION

private static final String CONDITION
Description for the determination of the preference vector.

See Also:
Constant Field Values

MINPTS_D

public static final String MINPTS_D
Description for parameter minimum points.

See Also:
Constant Field Values

STRATEGY_P

public static final String STRATEGY_P
Parameter strategy.

See Also:
Constant Field Values

DEFAULT_STRATEGY

public static DiSHPreprocessor.Strategy DEFAULT_STRATEGY
Default strategy.


STRATEGY_D

public static final String STRATEGY_D
Description for parameter strategy.


epsilon

private DoubleDistance[] epsilon
The epsilon value for each dimension;


minpts

private int minpts
Threshold for minimum number of points in the neighborhood.


strategy

private DiSHPreprocessor.Strategy strategy
The strategy to determine the preference vector.

Constructor Detail

DiSHPreprocessor

public DiSHPreprocessor()
Provides a new AdvancedHiSCPreprocessor that computes the preference vector of objects of a certain database.

Method Detail

run

public void run(Database<V> database,
                boolean verbose,
                boolean time)
Description copied from interface: Preprocessor
This method executes the actual preprocessing step of this Preprocessor for the objects of the specified database.

Specified by:
run in interface Preprocessor<V extends RealVector<V,N>>
Parameters:
database - the database for which the preprocessing is performed
verbose - flag to allow verbose messages while performing the algorithm
time - flag to request output of performance time
See Also:
Preprocessor.run(de.lmu.ifi.dbs.elki.database.Database,boolean,boolean)

description

public String description()
Description copied from interface: Parameterizable
Returns a description of the class and the required parameters.

This description should be suitable for a usage description as for a standalone application.

Specified by:
description in interface Parameterizable
Overrides:
description in class AbstractParameterizable
Returns:
String a description of the class and the required parameters
See Also:
Parameterizable.description()

setParameters

public String[] setParameters(String[] args)
                       throws ParameterException
Description copied from interface: Parameterizable
Sets the attributes of the class accordingly to the given parameters. Returns a new String array containing those entries of the given array that are neither expected nor used by this Parameterizable.

Specified by:
setParameters in interface Parameterizable
Overrides:
setParameters in class AbstractParameterizable
Parameters:
args - parameters to set the attributes accordingly to
Returns:
String[] an array containing the unused parameters
Throws:
ParameterException - in case of wrong parameter-setting
See Also:
Parameterizable.setParameters(String[])

determinePreferenceVector

private BitSet determinePreferenceVector(Database<V> database,
                                         Set<Integer>[] neighborIDs,
                                         StringBuffer msg)
                                  throws ParameterException,
                                         UnableToComplyException
Determines the preference vector according to the specified neighbor ids.

Parameters:
database - the database storing the objects
neighborIDs - the list of ids of the neighbors in each dimension
msg - a string buffer for debug messages
Returns:
the preference vector
Throws:
ParameterException
UnableToComplyException

determinePreferenceVectorByApriori

private BitSet determinePreferenceVectorByApriori(Database<V> database,
                                                  Set<Integer>[] neighborIDs,
                                                  StringBuffer msg)
                                           throws ParameterException,
                                                  UnableToComplyException
Determines the preference vector with the apriori strategy.

Parameters:
database - the database storing the objects
neighborIDs - the list of ids of the neighbors in each dimension
msg - a string buffer for debug messages
Returns:
the preference vector
Throws:
ParameterException
UnableToComplyException

determinePreferenceVectorByMaxIntersection

private BitSet determinePreferenceVectorByMaxIntersection(Set<Integer>[] neighborIDs,
                                                          StringBuffer msg)
Determines the preference vector with the max intersection strategy.

Parameters:
neighborIDs - the list of ids of the neighbors in each dimension
msg - a string buffer for debug messages
Returns:
the preference vector

max

private int max(Map<Integer,Set<Integer>> candidates)
Returns the set with the maximum size contained in the specified map.

Parameters:
candidates - the map containing the sets
Returns:
the set with the maximum size

maxIntersection

private int maxIntersection(Map<Integer,Set<Integer>> candidates,
                            Set<Integer> set,
                            Set<Integer> result)
Returns the index of the set having the maximum intersection set with the specified set contained in the specified map.

Parameters:
candidates - the map containing the sets
set - the set to intersect with
result - the set to put the result in
Returns:
the set with the maximum size

initDistanceFunctions

private DimensionSelectingDistanceFunction<N,V>[] initDistanceFunctions(Database<V> database,
                                                                        int dimensionality,
                                                                        boolean verbose,
                                                                        boolean time)
                                                                                                        throws ParameterException
Initializes the dimension selecting distancefunctions to determine the preference vectors.

Parameters:
database - the database storing the objects
dimensionality - the dimensionality of the objects
verbose - flag to allow verbose messages while performing the algorithm
time - flag to request output of performance time
Returns:
the dimension selecting distancefunctions to determine the preference vectors
Throws:
ParameterException

getEpsilon

public DoubleDistance[] getEpsilon()
Returns the value of the epsilon parameter.

Returns:
the value of the epsilon parameter

getMinpts

public int getMinpts()
Returns minpts.

Returns:
minpts

Release 0.1 (2008-07-10_1838)