Environment for
DeveLoping
KDD-Applications
Supported by Index-Structures

de.lmu.ifi.dbs.elki.algorithm.outlier
Class ReferenceBasedOutlierDetection<V extends NumberVector<V,N>,N extends Number>

java.lang.Object
  extended by de.lmu.ifi.dbs.elki.logging.AbstractLoggable
      extended by de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm<O,R>
          extended by de.lmu.ifi.dbs.elki.algorithm.DistanceBasedAlgorithm<V,DoubleDistance,OutlierResult>
              extended by de.lmu.ifi.dbs.elki.algorithm.outlier.ReferenceBasedOutlierDetection<V,N>
Type Parameters:
V - a type of NumberVector as a suitable data object for this algorithm
N - the type of the attributes of the feature vector
All Implemented Interfaces:
Algorithm<V,OutlierResult>, Parameterizable

@Title(value="An Efficient Reference-based Approach to Outlier Detection in Large Datasets")
@Description(value="Computes kNN distances approximately, using reference points with various reference point strategies.")
@Reference(authors="Y. Pei, O.R.Zaiane, Y. Gao",
           title="An Efficient Reference-based Approach to Outlier Detection in Large Datasets",
           booktitle="Proc. 19th IEEE Int. Conf. on Data Engineering (ICDE \'03), Bangalore, India, 2003",
           url="http://dx.doi.org/10.1109/ICDM.2006.17")
public class ReferenceBasedOutlierDetection<V extends NumberVector<V,N>,N extends Number>
extends DistanceBasedAlgorithm<V,DoubleDistance,OutlierResult>

provides the Reference-Based Outlier Detection algorithm, an algorithm that computes kNN distances approximately, using reference points. There are two subclasses for this algorithm: One computes reference points that lay randomly in the data space, the other one computes reference points that lay on a grid in the data space.

Reference:
Y. Pei, O. R. Zaiane, Y. Gao: An Efficient Reference-Based Approach to Outlier Detection in Large Datasets.
In: Proc. IEEE Int. Conf. on Data Mining (ICDM'06), Hong Kong, China, 2006.

Author:
Lisa Reichert, Erich Schubert

Field Summary
private  int k
          Holds the value of K_PARAM.
static OptionID K_ID
          OptionID for K_PARAM
private  IntParameter K_PARAM
          Parameter to specify the number of nearest neighbors of an object, to be considered for computing its REFOD_SCORE, must be an integer greater than 1.
static AssociationID<Double> REFOD_SCORE
          The association id to associate the REFOD_SCORE of an object for the Reference based outlier detection algorithm.
private  ReferencePointsHeuristic<V> refp
          Stores the reference point strategy
static OptionID REFP_ID
          OptionID for REFP_PARAM
private  ObjectParameter<ReferencePointsHeuristic<V>> REFP_PARAM
          Parameter for the reference points heuristic.
 
Fields inherited from class de.lmu.ifi.dbs.elki.algorithm.DistanceBasedAlgorithm
DISTANCE_FUNCTION_ID, DISTANCE_FUNCTION_PARAM
 
Fields inherited from class de.lmu.ifi.dbs.elki.logging.AbstractLoggable
debug, logger
 
Constructor Summary
ReferenceBasedOutlierDetection(Parameterization config)
          Constructor, adhering to Parameterizable
 
Method Summary
 double computeDensity(List<DistanceResultPair<DoubleDistance>> referenceDists, int index)
          Computes the density of an object.
 List<DistanceResultPair<DoubleDistance>> computeDistanceVector(V refPoint, Database<V> database)
          Computes for each object the distance to one reference point.
 Collection<V> computeReferencePoints(Database<V> database)
          Computes the reference points.
protected  OutlierResult runInTime(Database<V> database)
          Runs the algorithm in the timed evaluation part.
 
Methods inherited from class de.lmu.ifi.dbs.elki.algorithm.DistanceBasedAlgorithm
getDistanceFactory, getDistanceFunction
 
Methods inherited from class de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm
isTime, isVerbose, run, setTime, setVerbose
 
Methods inherited from class de.lmu.ifi.dbs.elki.logging.AbstractLoggable
debugFine, debugFiner, debugFinest, exception, progress, verbose, warning
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

REFOD_SCORE

public static final AssociationID<Double> REFOD_SCORE
The association id to associate the REFOD_SCORE of an object for the Reference based outlier detection algorithm.


REFP_ID

public static final OptionID REFP_ID
OptionID for REFP_PARAM


REFP_PARAM

private final ObjectParameter<ReferencePointsHeuristic<V extends NumberVector<V,N>>> REFP_PARAM
Parameter for the reference points heuristic.

Key: -refod.refp


K_ID

public static final OptionID K_ID
OptionID for K_PARAM


K_PARAM

private final IntParameter K_PARAM
Parameter to specify the number of nearest neighbors of an object, to be considered for computing its REFOD_SCORE, must be an integer greater than 1.

Key: -refod.k


k

private int k
Holds the value of K_PARAM.


refp

private ReferencePointsHeuristic<V extends NumberVector<V,N>> refp
Stores the reference point strategy

Constructor Detail

ReferenceBasedOutlierDetection

public ReferenceBasedOutlierDetection(Parameterization config)
Constructor, adhering to Parameterizable

Parameters:
config - Parameterization
Method Detail

runInTime

protected OutlierResult runInTime(Database<V> database)
                           throws IllegalStateException
Runs the algorithm in the timed evaluation part.

Specified by:
runInTime in class AbstractAlgorithm<V extends NumberVector<V,N>,OutlierResult>
Parameters:
database - the database to run the algorithm on
Returns:
the Result computed by this algorithm
Throws:
IllegalStateException - if the algorithm has not been initialized properly (e.g. the setParameters(String[]) method has been failed to be called).

computeReferencePoints

public Collection<V> computeReferencePoints(Database<V> database)
Computes the reference points.

Parameters:
database - Database to build reference points for
Returns:
List of Reference Points

computeDistanceVector

public List<DistanceResultPair<DoubleDistance>> computeDistanceVector(V refPoint,
                                                                      Database<V> database)
Computes for each object the distance to one reference point. (one dimensional representation of the data set)

Parameters:
refPoint - Reference Point Feature Vector
database - database to work on
Returns:
array containing the distance to one reference point for each database object and the object id

computeDensity

public double computeDensity(List<DistanceResultPair<DoubleDistance>> referenceDists,
                             int index)
Computes the density of an object. The density of an object is the distances to the k nearest neighbors. Neighbors and distances are computed approximately. (approximation for kNN distance: instead of a normal NN search the NN of an object are those objects that have a similar distance to a reference point. The k- nearest neighbors of an object are those objects that lay close to the object in the reference distance vector)

Parameters:
referenceDists - vector of the reference distances,
index - index of the current object
Returns:
density for one object and reference point

Release 0.3 (2010-03-31_1612)