Environment for
DeveLoping
KDD-Applications
Supported by Index-Structures

de.lmu.ifi.dbs.elki.algorithm
Class DependencyDerivator<V extends NumberVector<V,?>,D extends Distance<D>>

java.lang.Object
  extended by de.lmu.ifi.dbs.elki.logging.AbstractLoggable
      extended by de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm<O,R>
          extended by de.lmu.ifi.dbs.elki.algorithm.DistanceBasedAlgorithm<V,D,CorrelationAnalysisSolution<V>>
              extended by de.lmu.ifi.dbs.elki.algorithm.DependencyDerivator<V,D>
Type Parameters:
V - the type of FeatureVector handled by this Algorithm
D - the type of Distance used by this Algorithm
All Implemented Interfaces:
Algorithm<V,CorrelationAnalysisSolution<V>>, Parameterizable

@Title(value="Dependency Derivator: Deriving numerical inter-dependencies on data")
@Description(value="Derives an equality-system describing dependencies between attributes in a correlation-cluster")
@Reference(authors="E. Achtert, C. B\u00f6hm, H.-P. Kriegel, P. Kr\u00f6ger, A. Zimek",
           title="Deriving Quantitative Dependencies for Correlation Clusters",
           booktitle="Proc. 12th Int. Conf. on Knowledge Discovery and Data Mining (KDD \'06), Philadelphia, PA 2006.",
           url="http://dx.doi.org/10.1145/1150402.1150408")
public class DependencyDerivator<V extends NumberVector<V,?>,D extends Distance<D>>
extends DistanceBasedAlgorithm<V,D,CorrelationAnalysisSolution<V>>

Dependency derivator computes quantitatively linear dependencies among attributes of a given dataset based on a linear correlation PCA.

Reference:
E. Achtert, C. Böhm, H.-P. Kriegel, P. Kröger, A. Zimek: Deriving Quantitative Dependencies for Correlation Clusters.
In Proc. 12th Int. Conf. on Knowledge Discovery and Data Mining (KDD '06), Philadelphia, PA 2006.

Author:
Arthur Zimek

Field Summary
static OptionID DEPENDENCY_DERIVATOR_RANDOM_SAMPLE
          OptionID for RANDOM_SAMPLE_FLAG
 NumberFormat NF
          Number format for output of solution.
static OptionID OUTPUT_ACCURACY_ID
          OptionID for OUTPUT_ACCURACY_PARAM
private  IntParameter OUTPUT_ACCURACY_PARAM
           Parameter to specify the threshold for output accuracy fraction digits, must be an integer equal to or greater than 0.
private  PCAFilteredRunner<V,DoubleDistance> pca
          Holds the object performing the pca.
private  Flag RANDOM_SAMPLE_FLAG
          Flag to use random sample (use knn query around centroid, if flag is not set).
static OptionID SAMPLE_SIZE_ID
          OptionID for SAMPLE_SIZE_PARAM
private  IntParameter SAMPLE_SIZE_PARAM
          Optional parameter to specify the treshold for the size of the random sample to use, must be an integer greater than 0.
private  Integer sampleSize
          Holds the value of SAMPLE_SIZE_PARAM.
 
Fields inherited from class de.lmu.ifi.dbs.elki.algorithm.DistanceBasedAlgorithm
DISTANCE_FUNCTION_ID, DISTANCE_FUNCTION_PARAM
 
Fields inherited from class de.lmu.ifi.dbs.elki.logging.AbstractLoggable
debug, logger
 
Constructor Summary
DependencyDerivator(Parameterization config)
          Constructor, adhering to Parameterizable
 
Method Summary
 CorrelationAnalysisSolution<V> generateModel(Database<V> db, Collection<Integer> ids)
          Runs the pca on the given set of IDs.
 CorrelationAnalysisSolution<V> generateModel(Database<V> db, Collection<Integer> ids, V centroidDV)
          Runs the pca on the given set of IDs and for the given centroid.
 CorrelationAnalysisSolution<V> runInTime(Database<V> db)
          Computes quantitatively linear dependencies among the attributes of the given database based on a linear correlation PCA.
 
Methods inherited from class de.lmu.ifi.dbs.elki.algorithm.DistanceBasedAlgorithm
getDistanceFactory, getDistanceFunction
 
Methods inherited from class de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm
isTime, isVerbose, run, setTime, setVerbose
 
Methods inherited from class de.lmu.ifi.dbs.elki.logging.AbstractLoggable
debugFine, debugFiner, debugFinest, exception, progress, verbose, warning
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DEPENDENCY_DERIVATOR_RANDOM_SAMPLE

public static final OptionID DEPENDENCY_DERIVATOR_RANDOM_SAMPLE
OptionID for RANDOM_SAMPLE_FLAG


OUTPUT_ACCURACY_ID

public static final OptionID OUTPUT_ACCURACY_ID
OptionID for OUTPUT_ACCURACY_PARAM


OUTPUT_ACCURACY_PARAM

private final IntParameter OUTPUT_ACCURACY_PARAM

Parameter to specify the threshold for output accuracy fraction digits, must be an integer equal to or greater than 0.

Default value: 4

Key: -derivator.accuracy


SAMPLE_SIZE_ID

public static final OptionID SAMPLE_SIZE_ID
OptionID for SAMPLE_SIZE_PARAM


SAMPLE_SIZE_PARAM

private final IntParameter SAMPLE_SIZE_PARAM
Optional parameter to specify the treshold for the size of the random sample to use, must be an integer greater than 0.

Default value: the size of the complete dataset

Key: -derivator.sampleSize


sampleSize

private Integer sampleSize
Holds the value of SAMPLE_SIZE_PARAM.


RANDOM_SAMPLE_FLAG

private final Flag RANDOM_SAMPLE_FLAG
Flag to use random sample (use knn query around centroid, if flag is not set).

Key: -derivator.randomSample


pca

private PCAFilteredRunner<V extends NumberVector<V,?>,DoubleDistance> pca
Holds the object performing the pca.


NF

public final NumberFormat NF
Number format for output of solution.

Constructor Detail

DependencyDerivator

public DependencyDerivator(Parameterization config)
Constructor, adhering to Parameterizable

Parameters:
config - Parameterization
Method Detail

runInTime

public CorrelationAnalysisSolution<V> runInTime(Database<V> db)
                                                                   throws IllegalStateException
Computes quantitatively linear dependencies among the attributes of the given database based on a linear correlation PCA.

Specified by:
runInTime in class AbstractAlgorithm<V extends NumberVector<V,?>,CorrelationAnalysisSolution<V extends NumberVector<V,?>>>
Parameters:
db - the database to run this DependencyDerivator on
Returns:
the CorrelationAnalysisSolution computed by this DependencyDerivator
Throws:
IllegalStateException - if the algorithm has not been initialized properly (e.g. the setParameters(String[]) method has been failed to be called).

generateModel

public CorrelationAnalysisSolution<V> generateModel(Database<V> db,
                                                    Collection<Integer> ids)
Runs the pca on the given set of IDs. The centroid is computed from the given ids.

Parameters:
db - the database
ids - the set of ids
Returns:
a matrix of equations describing the dependencies

generateModel

public CorrelationAnalysisSolution<V> generateModel(Database<V> db,
                                                    Collection<Integer> ids,
                                                    V centroidDV)
Runs the pca on the given set of IDs and for the given centroid.

Parameters:
db - the database
ids - the set of ids
centroidDV - the centroid
Returns:
a matrix of equations describing the dependencies

Release 0.3 (2010-03-31_1612)