|
|
|||||||||||||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||||||||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||||||||||||||
java.lang.Objectde.lmu.ifi.dbs.elki.logging.AbstractLoggable
de.lmu.ifi.dbs.elki.utilities.optionhandling.AbstractParameterizable
de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm<ParameterizationFunction,Clustering<Model>>
de.lmu.ifi.dbs.elki.algorithm.clustering.correlation.CASH
public class CASH
Provides the CASH algorithm, an subspace clustering algorithm based on the hough transform.
Reference:
E. Achtert, C. Böhm, J. David, P. Kröger, A. Zimek:
Robust clustering in arbitrarily oriented subspaces.
In Proc. 8th SIAM Int. Conf. on Data Mining (SDM'08), Atlanta, GA, 2008
| Field Summary | |
|---|---|
private boolean |
adjust
Holds the value of ADJUST_FLAG. |
private Flag |
ADJUST_FLAG
Flag to indicate that an adjustment of the applied heuristic for choosing an interval is performed after an interval is selected. |
static OptionID |
ADJUST_ID
OptionID for ADJUST_FLAG |
private Database<ParameterizationFunction> |
database
The database holding the objects. |
private double |
jitter
Holds the value of JITTER_PARAM. |
static OptionID |
JITTER_ID
OptionID for JITTER_PARAM |
private DoubleParameter |
JITTER_PARAM
Parameter to specify the maximum jitter for distance values, must be a double greater than 0. |
private int |
maxLevel
Holds the value of MAXLEVEL_PARAM. |
static OptionID |
MAXLEVEL_ID
OptionID for MAXLEVEL_PARAM. |
private IntParameter |
MAXLEVEL_PARAM
Parameter to specify the maximum level for splitting the hypercube, must be an integer greater than 0. |
private int |
minDim
Holds the value of MINDIM_PARAM. |
static OptionID |
MINDIM_ID
OptionID for MINDIM_PARAM |
private IntParameter |
MINDIM_PARAM
Parameter to specify the minimum dimensionality of the subspaces to be found, must be an integer greater than 0. |
private int |
minPts
Holds the value of MINPTS_PARAM. |
static OptionID |
MINPTS_ID
OptionID for MINPTS_PARAM |
private IntParameter |
MINPTS_PARAM
Parameter to specify the threshold for minimum number of points in a cluster, must be an integer greater than 0. |
private int |
noiseDim
Holds the dimensionality for noise. |
private Set<Integer> |
processedIDs
Holds a set of processed ids. |
private Clustering<Model> |
result
The result. |
| Fields inherited from class de.lmu.ifi.dbs.elki.utilities.optionhandling.AbstractParameterizable |
|---|
optionHandler |
| Fields inherited from class de.lmu.ifi.dbs.elki.logging.AbstractLoggable |
|---|
debug, logger |
| Constructor Summary | |
|---|---|
CASH()
Provides a new CASH algorithm, adding parameters MINPTS_PARAM, MAXLEVEL_PARAM, MINDIM_PARAM, JITTER_PARAM,
and flag ADJUST_FLAG
to the option handler additionally to parameters of super class. |
|
| Method Summary | |
|---|---|
private Database<ParameterizationFunction> |
buildDB(int dim,
Matrix basis,
Set<Integer> ids,
Database<ParameterizationFunction> database)
Builds a dim-1 dimensional database where the objects are projected into the specified subspace. |
private Database<DoubleVector> |
buildDerivatorDB(Database<ParameterizationFunction> database,
CASHInterval interval)
Builds a database for the derivator consisting of the ids in the specified interval. |
private Database<DoubleVector> |
buildDerivatorDB(Database<ParameterizationFunction> database,
Set<Integer> ids)
Builds a database for the derivator consisting of the ids in the specified interval. |
private Matrix |
determineBasis(double[] alpha)
Determines a basis defining a subspace described by the specified alpha values. |
private double[] |
determineMinMaxDistance(Database<ParameterizationFunction> database,
int dimensionality)
Determines the minimum and maximum function value of all parameterization functions stored in the specified database. |
private CASHInterval |
determineNextIntervalAtMaxLevel(DefaultHeap<Integer,CASHInterval> heap)
Determines the next ''best'' interval at maximum level, i.e. the next interval containing the most unprocessed objects. |
private CASHInterval |
doDetermineNextIntervalAtMaxLevel(DefaultHeap<Integer,CASHInterval> heap)
Recursive helper method to determine the next ''best'' interval at maximum level, i.e. the next interval containing the most unprocessed objects |
private Clustering<Model> |
doRun(Database<ParameterizationFunction> database,
FiniteProgress progress)
Runs the CASH algorithm on the specified database, this method is recursively called until only noise is left. |
private Set<Integer> |
getDatabaseIDs(Database<ParameterizationFunction> database)
Returns the set of ids belonging to the specified database. |
Description |
getDescription()
Returns a description of the algorithm. |
Clustering<Model> |
getResult()
Returns the result of the algorithm. |
private void |
initHeap(DefaultHeap<Integer,CASHInterval> heap,
Database<ParameterizationFunction> database,
int dim,
Set<Integer> ids)
Initializes the heap with the root intervals. |
private ParameterizationFunction |
project(Matrix basis,
ParameterizationFunction f)
Projects the specified parameterization function into the subspace described by the given basis. |
private Matrix |
runDerivator(Database<ParameterizationFunction> database,
int dim,
CASHInterval interval,
Set<Integer> ids)
Runs the derivator on the specified interval and assigns all points having a distance less then the standard deviation of the derivator model to the model to this model. |
private LinearEquationSystem |
runDerivator(Database<ParameterizationFunction> database,
int dimensionality,
Set<Integer> ids)
Runs the derivator on the specified interval and assigns all points having a distance less then the standard deviation of the derivator model to the model to this model. |
protected Clustering<Model> |
runInTime(Database<ParameterizationFunction> database)
Performs the CASH algorithm on the given database. |
List<String> |
setParameters(List<String> args)
Calls the super method and sets additionally the values of the parameters MINPTS_PARAM, MAXLEVEL_PARAM, MINDIM_PARAM, JITTER_PARAM,
and the flag ADJUST_FLAG. |
private double |
sinusProduct(int start,
int end,
double[] alpha)
Computes the product of all sinus values of the specified angles from start to end index. |
| Methods inherited from class de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm |
|---|
isTime, isVerbose, run, setTime, setVerbose |
| Methods inherited from class de.lmu.ifi.dbs.elki.utilities.optionhandling.AbstractParameterizable |
|---|
addOption, addParameterizable, addParameterizable, checkGlobalParameterConstraints, collectOptions, getAttributeSettings, getParameters, rememberParametersExcept, removeOption, removeParameterizable, shortDescription |
| Methods inherited from class de.lmu.ifi.dbs.elki.logging.AbstractLoggable |
|---|
debugFine, debugFiner, debugFinest, exception, progress, verbose, warning |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Methods inherited from interface de.lmu.ifi.dbs.elki.algorithm.clustering.ClusteringAlgorithm |
|---|
run |
| Methods inherited from interface de.lmu.ifi.dbs.elki.algorithm.Algorithm |
|---|
setTime, setVerbose |
| Methods inherited from interface de.lmu.ifi.dbs.elki.utilities.optionhandling.Parameterizable |
|---|
checkGlobalParameterConstraints, collectOptions, getParameters, shortDescription |
| Field Detail |
|---|
public static final OptionID MINPTS_ID
MINPTS_PARAM
private final IntParameter MINPTS_PARAM
Key: -cash.minpts
private int minPts
MINPTS_PARAM.
public static final OptionID MAXLEVEL_ID
MAXLEVEL_PARAM.
private final IntParameter MAXLEVEL_PARAM
Key: -cash.maxlevel
private int maxLevel
MAXLEVEL_PARAM.
public static final OptionID MINDIM_ID
MINDIM_PARAM
private final IntParameter MINDIM_PARAM
Default value: 1
Key: -cash.mindim
private int minDim
MINDIM_PARAM.
public static final OptionID JITTER_ID
JITTER_PARAM
private final DoubleParameter JITTER_PARAM
Key: -cash.jitter
private double jitter
JITTER_PARAM.
public static final OptionID ADJUST_ID
ADJUST_FLAG
private final Flag ADJUST_FLAG
Key: -cash.adjust
private boolean adjust
ADJUST_FLAG.
private Clustering<Model> result
private int noiseDim
private Set<Integer> processedIDs
private Database<ParameterizationFunction> database
| Constructor Detail |
|---|
public CASH()
MINPTS_PARAM, MAXLEVEL_PARAM, MINDIM_PARAM, JITTER_PARAM,
and flag ADJUST_FLAG
to the option handler additionally to parameters of super class.
| Method Detail |
|---|
protected Clustering<Model> runInTime(Database<ParameterizationFunction> database)
throws IllegalStateException
runInTime in class AbstractAlgorithm<ParameterizationFunction,Clustering<Model>>database - the database to run the algorithm on
IllegalStateException - if the algorithm has not been initialized
properly (e.g. the setParameters(String[]) method has been failed
to be called).public Clustering<Model> getResult()
getResult in interface Algorithm<ParameterizationFunction,Clustering<Model>>getResult in interface ClusteringAlgorithm<Clustering<Model>,ParameterizationFunction>public Description getDescription()
getDescription in interface Algorithm<ParameterizationFunction,Clustering<Model>>
public List<String> setParameters(List<String> args)
throws ParameterException
MINPTS_PARAM, MAXLEVEL_PARAM, MINDIM_PARAM, JITTER_PARAM,
and the flag ADJUST_FLAG.
setParameters in interface ParameterizablesetParameters in class AbstractAlgorithm<ParameterizationFunction,Clustering<Model>>args - parameters to set the attributes accordingly to
ParameterException - in case of wrong parameter-setting
private Clustering<Model> doRun(Database<ParameterizationFunction> database,
FiniteProgress progress)
throws UnableToComplyException,
ParameterException,
NonNumericFeaturesException
database - the current database to run the CASH algorithm onprogress - the progress object for verbose messages
UnableToComplyException - if an error according to the database occurs
ParameterException - if the parameter setting is wrong
NonNumericFeaturesException - if non numeric feature vectors are used
private void initHeap(DefaultHeap<Integer,CASHInterval> heap,
Database<ParameterizationFunction> database,
int dim,
Set<Integer> ids)
heap - the heap to be initializeddatabase - the database storing the paramterization functionsdim - the dimensionality of the databaseids - the ids of the database
private Database<ParameterizationFunction> buildDB(int dim,
Matrix basis,
Set<Integer> ids,
Database<ParameterizationFunction> database)
throws UnableToComplyException
dim - the dimensionality of the databasebasis - the basis defining the subspaceids - the ids for the new databasedatabase - the database storing the parameterization functions
UnableToComplyException - if an error according to the database occurs
private ParameterizationFunction project(Matrix basis,
ParameterizationFunction f)
basis - the basis defining he subspacef - the parameterization function to be projected
private Matrix determineBasis(double[] alpha)
alpha - the alpha values
private double sinusProduct(int start,
int end,
double[] alpha)
start - the index to startend - the index to endalpha - the array of angles
private CASHInterval determineNextIntervalAtMaxLevel(DefaultHeap<Integer,CASHInterval> heap)
heap - the heap storing the intervals
private CASHInterval doDetermineNextIntervalAtMaxLevel(DefaultHeap<Integer,CASHInterval> heap)
heap - the heap storing the intervals
private Set<Integer> getDatabaseIDs(Database<ParameterizationFunction> database)
database - the database containing the parameterization functions.
private double[] determineMinMaxDistance(Database<ParameterizationFunction> database,
int dimensionality)
database - the database containing the parameterization functions.dimensionality - the dimensionality of the database
private Matrix runDerivator(Database<ParameterizationFunction> database,
int dim,
CASHInterval interval,
Set<Integer> ids)
throws UnableToComplyException,
ParameterException
database - the database containing the parameterization functionsinterval - the interval to build the modeldim - the dimensionality of the databaseids - an empty set to assign the ids
UnableToComplyException - if an error according to the database occurs
ParameterException - if the parameter setting is wrong
private Database<DoubleVector> buildDerivatorDB(Database<ParameterizationFunction> database,
CASHInterval interval)
throws UnableToComplyException
database - the database storing the parameterization functionsinterval - the interval to build the database from
UnableToComplyException - if an error according to the database occurs
private LinearEquationSystem runDerivator(Database<ParameterizationFunction> database,
int dimensionality,
Set<Integer> ids)
database - the database containing the parameterization functionsids - the ids to build the modeldimensionality - the dimensionality of the subspace
private Database<DoubleVector> buildDerivatorDB(Database<ParameterizationFunction> database,
Set<Integer> ids)
throws UnableToComplyException
database - the database storing the parameterization functionsids - the ids to build the database from
UnableToComplyException - if initialization of the database is not possible
|
|
|||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||||