Environment for
DeveLoping
KDD-Applications
Supported by Index-Structures

de.lmu.ifi.dbs.elki.preprocessing
Class DiSHPreprocessor<V extends NumberVector<V,?>>

java.lang.Object
  extended by de.lmu.ifi.dbs.elki.logging.AbstractLoggable
      extended by de.lmu.ifi.dbs.elki.preprocessing.DiSHPreprocessor<V>
Type Parameters:
V - Vector type
All Implemented Interfaces:
PreferenceVectorPreprocessor<V>, Preprocessor<V>, Parameterizable

@Description(value="Computes the preference vector of objects of a certain database according to the DiSH algorithm.")
public class DiSHPreprocessor<V extends NumberVector<V,?>>
extends AbstractLoggable
implements PreferenceVectorPreprocessor<V>, Parameterizable

Preprocessor for DiSH preference vector assignment to objects of a certain database.

Author:
Elke Achtert

Nested Class Summary
static class DiSHPreprocessor.Strategy
          Available strategies for determination of the preference vector.
 
Field Summary
private static String CONDITION
          Description for the determination of the preference vector.
static DoubleDistance DEFAULT_EPSILON
          The default value for epsilon.
static DiSHPreprocessor.Strategy DEFAULT_STRATEGY
          Default strategy.
private  DoubleDistance[] epsilon
          The epsilon value for each dimension;
static OptionID EPSILON_ID
          OptionID for EPSILON_PARAM
protected  DoubleListParameter EPSILON_PARAM
          A comma separated list of positive doubles specifying the maximum radius of the neighborhood to be considered in each dimension for determination of the preference vector (default is DEFAULT_EPSILON in each dimension).
private  int minpts
          Threshold for minimum number of points in the neighborhood.
static OptionID MINPTS_ID
          OptionID for MINPTS_PARAM
static String MINPTS_P
          Option name for MINPTS_ID.
protected  IntParameter MINPTS_PARAM
          Positive threshold for minimum numbers of points in the epsilon-neighborhood of a point, must satisfy following CONDITION.
private  DiSHPreprocessor.Strategy strategy
          The strategy to determine the preference vector.
static OptionID STRATEGY_ID
          OptionID for STRATEGY_PARAM
private  StringParameter STRATEGY_PARAM
          The strategy for determination of the preference vector, available strategies are: DiSHPreprocessor.Strategy.APRIORI and DiSHPreprocessor.Strategy.MAX_INTERSECTION.
 
Fields inherited from class de.lmu.ifi.dbs.elki.logging.AbstractLoggable
debug, logger
 
Constructor Summary
DiSHPreprocessor(Parameterization config)
          Constructor, adhering to Parameterizable
 
Method Summary
private  BitSet determinePreferenceVector(Database<V> database, Set<Integer>[] neighborIDs, StringBuffer msg)
          Determines the preference vector according to the specified neighbor ids.
private  BitSet determinePreferenceVectorByApriori(Database<V> database, Set<Integer>[] neighborIDs, StringBuffer msg)
          Determines the preference vector with the apriori strategy.
private  BitSet determinePreferenceVectorByMaxIntersection(Set<Integer>[] neighborIDs, StringBuffer msg)
          Determines the preference vector with the max intersection strategy.
 DoubleDistance[] getEpsilon()
          Returns the value of the epsilon parameter.
 int getMinpts()
          Returns minpts.
private  DimensionSelectingDistanceFunction<V>[] initDistanceFunctions(Database<V> database, int dimensionality)
          Initializes the dimension selecting distancefunctions to determine the preference vectors.
private  int max(Map<Integer,Set<Integer>> candidates)
          Returns the set with the maximum size contained in the specified map.
private  int maxIntersection(Map<Integer,Set<Integer>> candidates, Set<Integer> set, Set<Integer> result)
          Returns the index of the set having the maximum intersection set with the specified set contained in the specified map.
 void run(Database<V> database, boolean verbose, boolean time)
          This method executes the particular preprocessing step of this Preprocessor for the objects of the specified database.
 
Methods inherited from class de.lmu.ifi.dbs.elki.logging.AbstractLoggable
debugFine, debugFiner, debugFinest, exception, progress, verbose, warning
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DEFAULT_EPSILON

public static final DoubleDistance DEFAULT_EPSILON
The default value for epsilon.


EPSILON_ID

public static final OptionID EPSILON_ID
OptionID for EPSILON_PARAM


MINPTS_P

public static final String MINPTS_P
Option name for MINPTS_ID.

See Also:
Constant Field Values

CONDITION

private static final String CONDITION
Description for the determination of the preference vector.

See Also:
Constant Field Values

MINPTS_ID

public static final OptionID MINPTS_ID
OptionID for MINPTS_PARAM


DEFAULT_STRATEGY

public static DiSHPreprocessor.Strategy DEFAULT_STRATEGY
Default strategy.


STRATEGY_ID

public static final OptionID STRATEGY_ID
OptionID for STRATEGY_PARAM


EPSILON_PARAM

protected final DoubleListParameter EPSILON_PARAM
A comma separated list of positive doubles specifying the maximum radius of the neighborhood to be considered in each dimension for determination of the preference vector (default is DEFAULT_EPSILON in each dimension). If only one value is specified, this value will be used for each dimension.

Key: -dish.epsilon

Default value: DEFAULT_EPSILON


epsilon

private DoubleDistance[] epsilon
The epsilon value for each dimension;


MINPTS_PARAM

protected final IntParameter MINPTS_PARAM
Positive threshold for minimum numbers of points in the epsilon-neighborhood of a point, must satisfy following CONDITION.

Key: -dish.minpts


minpts

private int minpts
Threshold for minimum number of points in the neighborhood.


STRATEGY_PARAM

private final StringParameter STRATEGY_PARAM
The strategy for determination of the preference vector, available strategies are: DiSHPreprocessor.Strategy.APRIORI and DiSHPreprocessor.Strategy.MAX_INTERSECTION.

Key: -dish.strategy

Default value: DEFAULT_STRATEGY


strategy

private DiSHPreprocessor.Strategy strategy
The strategy to determine the preference vector.

Constructor Detail

DiSHPreprocessor

public DiSHPreprocessor(Parameterization config)
Constructor, adhering to Parameterizable

Parameters:
config - Parameterization
Method Detail

run

public void run(Database<V> database,
                boolean verbose,
                boolean time)
Description copied from interface: Preprocessor
This method executes the particular preprocessing step of this Preprocessor for the objects of the specified database.

Specified by:
run in interface Preprocessor<V extends NumberVector<V,?>>
Parameters:
database - the database for which the preprocessing is performed
verbose - flag to allow verbose messages while performing the algorithm
time - flag to request output of performance time

determinePreferenceVector

private BitSet determinePreferenceVector(Database<V> database,
                                         Set<Integer>[] neighborIDs,
                                         StringBuffer msg)
                                  throws ParameterException,
                                         UnableToComplyException
Determines the preference vector according to the specified neighbor ids.

Parameters:
database - the database storing the objects
neighborIDs - the list of ids of the neighbors in each dimension
msg - a string buffer for debug messages
Returns:
the preference vector
Throws:
ParameterException
UnableToComplyException

determinePreferenceVectorByApriori

private BitSet determinePreferenceVectorByApriori(Database<V> database,
                                                  Set<Integer>[] neighborIDs,
                                                  StringBuffer msg)
                                           throws ParameterException,
                                                  UnableToComplyException
Determines the preference vector with the apriori strategy.

Parameters:
database - the database storing the objects
neighborIDs - the list of ids of the neighbors in each dimension
msg - a string buffer for debug messages
Returns:
the preference vector
Throws:
ParameterException
UnableToComplyException

determinePreferenceVectorByMaxIntersection

private BitSet determinePreferenceVectorByMaxIntersection(Set<Integer>[] neighborIDs,
                                                          StringBuffer msg)
Determines the preference vector with the max intersection strategy.

Parameters:
neighborIDs - the list of ids of the neighbors in each dimension
msg - a string buffer for debug messages
Returns:
the preference vector

max

private int max(Map<Integer,Set<Integer>> candidates)
Returns the set with the maximum size contained in the specified map.

Parameters:
candidates - the map containing the sets
Returns:
the set with the maximum size

maxIntersection

private int maxIntersection(Map<Integer,Set<Integer>> candidates,
                            Set<Integer> set,
                            Set<Integer> result)
Returns the index of the set having the maximum intersection set with the specified set contained in the specified map.

Parameters:
candidates - the map containing the sets
set - the set to intersect with
result - the set to put the result in
Returns:
the set with the maximum size

initDistanceFunctions

private DimensionSelectingDistanceFunction<V>[] initDistanceFunctions(Database<V> database,
                                                                      int dimensionality)
                                                                                         throws ParameterException
Initializes the dimension selecting distancefunctions to determine the preference vectors.

Parameters:
database - the database storing the objects
dimensionality - the dimensionality of the objects
Returns:
the dimension selecting distancefunctions to determine the preference vectors
Throws:
ParameterException

getEpsilon

public DoubleDistance[] getEpsilon()
Returns the value of the epsilon parameter.

Returns:
the value of the epsilon parameter

getMinpts

public int getMinpts()
Returns minpts.

Returns:
minpts

Release 0.3 (2010-03-31_1612)