Environment for
DeveLoping
KDD-Applications
Supported by Index-Structures

de.lmu.ifi.dbs.elki.algorithm
Class DependencyDerivator<V extends RealVector<V,?>,D extends Distance<D>>

java.lang.Object
  extended by de.lmu.ifi.dbs.elki.logging.AbstractLoggable
      extended by de.lmu.ifi.dbs.elki.utilities.optionhandling.AbstractParameterizable
          extended by de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm<O,R>
              extended by de.lmu.ifi.dbs.elki.algorithm.DistanceBasedAlgorithm<V,D,CorrelationAnalysisSolution<V>>
                  extended by de.lmu.ifi.dbs.elki.algorithm.DependencyDerivator<V,D>
Type Parameters:
V - the type of RealVector handled by this Algorithm
D - the type of Distance used by this Algorithm
All Implemented Interfaces:
Algorithm<V,CorrelationAnalysisSolution<V>>, Parameterizable

public class DependencyDerivator<V extends RealVector<V,?>,D extends Distance<D>>
extends DistanceBasedAlgorithm<V,D,CorrelationAnalysisSolution<V>>

Dependency derivator computes quantitatively linear dependencies among attributes of a given dataset based on a linear correlation PCA.

Reference:
E. Achtert, C. Böhm, H.-P. Kriegel, P. Kröger, A. Zimek: Deriving Quantitative Dependencies for Correlation Clusters.
In Proc. 12th Int. Conf. on Knowledge Discovery and Data Mining (KDD '06), Philadelphia, PA 2006.

Author:
Arthur Zimek

Field Summary
static OptionID DEPENDENCY_DERIVATOR_RANDOM_SAMPLE
          OptionID for RANDOM_SAMPLE_FLAG
 NumberFormat NF
          Number format for output of solution.
static OptionID OUTPUT_ACCURACY_ID
          OptionID for OUTPUT_ACCURACY_PARAM
private  IntParameter OUTPUT_ACCURACY_PARAM
          Parameter to specify the threshold for output accuracy fraction digits, must be an integer equal to or greater than 0.
private  PCAFilteredRunner<V,DoubleDistance> pca
          Holds the object performing the pca.
private  Flag RANDOM_SAMPLE_FLAG
          Flag to use random sample (use knn query around centroid, if flag is not set).
static OptionID SAMPLE_SIZE_ID
          OptionID for SAMPLE_SIZE_PARAM
private  IntParameter SAMPLE_SIZE_PARAM
          Optional parameter to specify the treshold for the size of the random sample to use, must be an integer greater than 0.
private  Integer sampleSize
          Holds the value of SAMPLE_SIZE_PARAM.
private  CorrelationAnalysisSolution<V> solution
          Holds the solution.
 
Fields inherited from class de.lmu.ifi.dbs.elki.algorithm.DistanceBasedAlgorithm
DISTANCE_FUNCTION_ID, DISTANCE_FUNCTION_PARAM
 
Fields inherited from class de.lmu.ifi.dbs.elki.utilities.optionhandling.AbstractParameterizable
optionHandler
 
Fields inherited from class de.lmu.ifi.dbs.elki.logging.AbstractLoggable
debug, logger
 
Constructor Summary
DependencyDerivator()
          Provides a dependency derivator, adding parameters OUTPUT_ACCURACY_PARAM, SAMPLE_SIZE_PARAM , and flag RANDOM_SAMPLE_FLAG to the option handler additionally to parameters of super class.
 
Method Summary
 CorrelationAnalysisSolution<V> generateModel(Database<V> db, Collection<Integer> ids)
          Runs the pca on the given set of IDs.
 CorrelationAnalysisSolution<V> generateModel(Database<V> db, Collection<Integer> ids, V centroidDV)
          Runs the pca on the given set of IDs and for the given centroid.
 Description getDescription()
          Returns a description of the algorithm.
 CorrelationAnalysisSolution<V> getResult()
          Returns the result of the algorithm.
 CorrelationAnalysisSolution<V> runInTime(Database<V> db)
          Computes quantitatively linear dependencies among the attributes of the given database based on a linear correlation PCA.
 List<String> setParameters(List<String> args)
          Calls the super method and sets additionally the values of the parameters OUTPUT_ACCURACY_PARAM and SAMPLE_SIZE_PARAM.
 
Methods inherited from class de.lmu.ifi.dbs.elki.algorithm.DistanceBasedAlgorithm
getDistanceFunction
 
Methods inherited from class de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm
isTime, isVerbose, run, setTime, setVerbose
 
Methods inherited from class de.lmu.ifi.dbs.elki.utilities.optionhandling.AbstractParameterizable
addOption, addParameterizable, addParameterizable, checkGlobalParameterConstraints, collectOptions, getAttributeSettings, getParameters, rememberParametersExcept, removeOption, removeParameterizable, shortDescription
 
Methods inherited from class de.lmu.ifi.dbs.elki.logging.AbstractLoggable
debugFine, debugFiner, debugFinest, exception, progress, verbose, warning
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface de.lmu.ifi.dbs.elki.utilities.optionhandling.Parameterizable
checkGlobalParameterConstraints, collectOptions, getParameters, shortDescription
 

Field Detail

DEPENDENCY_DERIVATOR_RANDOM_SAMPLE

public static final OptionID DEPENDENCY_DERIVATOR_RANDOM_SAMPLE
OptionID for RANDOM_SAMPLE_FLAG


OUTPUT_ACCURACY_ID

public static final OptionID OUTPUT_ACCURACY_ID
OptionID for OUTPUT_ACCURACY_PARAM


OUTPUT_ACCURACY_PARAM

private final IntParameter OUTPUT_ACCURACY_PARAM

Parameter to specify the threshold for output accuracy fraction digits, must be an integer equal to or greater than 0.

Default value: 4

Key: -derivator.accuracy


SAMPLE_SIZE_ID

public static final OptionID SAMPLE_SIZE_ID
OptionID for SAMPLE_SIZE_PARAM


SAMPLE_SIZE_PARAM

private final IntParameter SAMPLE_SIZE_PARAM
Optional parameter to specify the treshold for the size of the random sample to use, must be an integer greater than 0.

Default value: the size of the complete dataset

Key: -derivator.sampleSize


sampleSize

private Integer sampleSize
Holds the value of SAMPLE_SIZE_PARAM.


RANDOM_SAMPLE_FLAG

private final Flag RANDOM_SAMPLE_FLAG
Flag to use random sample (use knn query around centroid, if flag is not set).

Key: -derivator.randomSample


pca

private PCAFilteredRunner<V extends RealVector<V,?>,DoubleDistance> pca
Holds the object performing the pca.


solution

private CorrelationAnalysisSolution<V extends RealVector<V,?>> solution
Holds the solution.


NF

public final NumberFormat NF
Number format for output of solution.

Constructor Detail

DependencyDerivator

public DependencyDerivator()
Provides a dependency derivator, adding parameters OUTPUT_ACCURACY_PARAM, SAMPLE_SIZE_PARAM , and flag RANDOM_SAMPLE_FLAG to the option handler additionally to parameters of super class.

Method Detail

getDescription

public Description getDescription()
Description copied from interface: Algorithm
Returns a description of the algorithm.

Returns:
a description of the algorithm

runInTime

public CorrelationAnalysisSolution<V> runInTime(Database<V> db)
                                                                 throws IllegalStateException
Computes quantitatively linear dependencies among the attributes of the given database based on a linear correlation PCA.

Specified by:
runInTime in class AbstractAlgorithm<V extends RealVector<V,?>,CorrelationAnalysisSolution<V extends RealVector<V,?>>>
Parameters:
db - the database to run this DependencyDerivator on
Returns:
the CorrelationAnalysisSolution computed by this DependencyDerivator
Throws:
IllegalStateException - if the algorithm has not been initialized properly (e.g. the setParameters(String[]) method has been failed to be called).

generateModel

public CorrelationAnalysisSolution<V> generateModel(Database<V> db,
                                                    Collection<Integer> ids)
Runs the pca on the given set of IDs. The centroid is computed from the given ids.

Parameters:
db - the database
ids - the set of ids
Returns:
a matrix of equations describing the dependencies

generateModel

public CorrelationAnalysisSolution<V> generateModel(Database<V> db,
                                                    Collection<Integer> ids,
                                                    V centroidDV)
Runs the pca on the given set of IDs and for the given centroid.

Parameters:
db - the database
ids - the set of ids
centroidDV - the centroid
Returns:
a matrix of equations describing the dependencies

getResult

public CorrelationAnalysisSolution<V> getResult()
Description copied from interface: Algorithm
Returns the result of the algorithm.

Returns:
the result of the algorithm

setParameters

public List<String> setParameters(List<String> args)
                           throws ParameterException
Calls the super method and sets additionally the values of the parameters OUTPUT_ACCURACY_PARAM and SAMPLE_SIZE_PARAM. The remaining parameters are passed to the pca.

Specified by:
setParameters in interface Parameterizable
Overrides:
setParameters in class DistanceBasedAlgorithm<V extends RealVector<V,?>,D extends Distance<D>,CorrelationAnalysisSolution<V extends RealVector<V,?>>>
Parameters:
args - parameters to set the attributes accordingly to
Returns:
a list containing the unused parameters
Throws:
ParameterException - in case of wrong parameter-setting

Release 0.2.1 (2009-07-13_1605)