Environment for
DeveLoping
KDD-Applications
Supported by Index-Structures

de.lmu.ifi.dbs.elki.utilities
Class DatabaseUtil

java.lang.Object
  extended by de.lmu.ifi.dbs.elki.utilities.DatabaseUtil

public final class DatabaseUtil
extends Object

Class with Database-related utility functions such as centroid computation, covariances etc.

Author:
Erich Schubert

Constructor Summary
DatabaseUtil()
           
 
Method Summary
static
<O extends NumberVector<O,?>>
O
centroid(Database<O> database)
          Returns the centroid as a NumberVector object of the specified database.
static
<V extends NumberVector<V,?>>
V
centroid(Database<V> database, Collection<Integer> ids)
          Returns the centroid as a NumberVector object of the specified objects stored in the given database.
static
<V extends NumberVector<V,?>>
V
centroid(Database<V> database, Collection<Integer> ids, BitSet dimensions)
          Returns the centroid w.r.t. the dimensions specified by the given BitSet as a NumberVector object of the specified objects stored in the given database.
static
<V extends NumberVector<V,?>>
V
centroid(Database<V> database, Iterator<Integer> iter, BitSet bitSet)
          Returns the centroid w.r.t. the dimensions specified by the given BitSet as a NumberVector object of the specified objects stored in the given database.
static Vector centroid(Matrix data)
          Returns the centroid as a Vector object of the specified data matrix.
static
<NV extends NumberVector<NV,?>>
Pair<NV,NV>
computeMinMax(Database<NV> database)
          Determines the minimum and maximum values in each dimension of all objects stored in the given database.
static
<O extends NumberVector<O,?>>
Matrix
covarianceMatrix(Database<O> database)
          Determines the covariance matrix of the objects stored in the given database.
static
<O extends NumberVector<O,?>>
Matrix
covarianceMatrix(Database<O> database, O centroid)
           Determines the covariance matrix of the objects stored in the given database w.r.t. the given centroid.
static
<V extends NumberVector<V,?>>
Matrix
covarianceMatrix(Database<V> database, Collection<Integer> ids)
          Determines the covariance matrix of the objects stored in the given database.
static Matrix covarianceMatrix(Matrix data)
          Determines the d x d covariance matrix of the given n x d data matrix.
static
<O extends DatabaseObject>
Class<? extends DatabaseObject>
getBaseObjectClassExpensive(Database<O> database)
          Do a full inspection of the database to find the base object class.
static SortedSet<ClassLabel> getClassLabels(Database<?> database)
          Retrieves all class labels within the database.
static String getClassOrObjectLabel(Database<?> database, Integer objid)
          Get the class label or object label of an object in the database
static Collection<Integer> getObjectsByLabelMatch(Database<?> database, Pattern name_pattern)
          Find object by matching their labels.
static
<O extends DatabaseObject>
Class<? extends O>
guessObjectClass(Database<O> database)
          Do a cheap guess at the databases object class.
static double[] variances(Database<NumberVector<?,?>> database, NumberVector<?,?> centroid, Collection<Integer>[] ids)
          Determines the variances in each dimension of the specified objects stored in the given database.
static
<O extends NumberVector<O,?>>
double[]
variances(Database<O> database)
          Determines the variances in each dimension of all objects stored in the given database.
static
<V extends NumberVector<V,?>>
double[]
variances(Database<V> database, Collection<Integer> ids)
          Determines the variances in each dimension of the specified objects stored in the given database.
static
<V extends NumberVector<V,?>>
double[]
variances(Database<V> database, V centroid, Collection<Integer> ids)
          Determines the variances in each dimension of the specified objects stored in the given database.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DatabaseUtil

public DatabaseUtil()
Method Detail

centroid

public static <V extends NumberVector<V,?>> V centroid(Database<V> database,
                                                       Collection<Integer> ids)
Returns the centroid as a NumberVector object of the specified objects stored in the given database. The objects belonging to the specified ids must be instance of NumberVector.

Type Parameters:
V - Vector type
Parameters:
database - the database storing the objects
ids - the ids of the objects
Returns:
the centroid of the specified objects stored in the given database
Throws:
IllegalArgumentException - if the id list is empty

centroid

public static <V extends NumberVector<V,?>> V centroid(Database<V> database,
                                                       Collection<Integer> ids,
                                                       BitSet dimensions)
Returns the centroid w.r.t. the dimensions specified by the given BitSet as a NumberVector object of the specified objects stored in the given database. The objects belonging to the specified IDs must be instance of NumberVector.

Type Parameters:
V - Vector type
Parameters:
database - the database storing the objects
ids - the identifiable objects
dimensions - the BitSet representing the dimensions to be considered
Returns:
the centroid of the specified objects stored in the given database w.r.t. the specified subspace
Throws:
IllegalArgumentException - if the id list is empty

centroid

public static <V extends NumberVector<V,?>> V centroid(Database<V> database,
                                                       Iterator<Integer> iter,
                                                       BitSet bitSet)
Returns the centroid w.r.t. the dimensions specified by the given BitSet as a NumberVector object of the specified objects stored in the given database. The objects belonging to the specified ids must be instance of NumberVector.

Type Parameters:
V - Vector type
Parameters:
database - the database storing the objects
iter - iterator over the identifiable objects
bitSet - the bitSet specifying the dimensions to be considered
Returns:
the centroid of the specified objects stored in the given database
Throws:
IllegalArgumentException - if the id list is empty

centroid

public static <O extends NumberVector<O,?>> O centroid(Database<O> database)
Returns the centroid as a NumberVector object of the specified database. The objects must be instance of NumberVector.

Type Parameters:
O - Vector type
Parameters:
database - the database storing the objects
Returns:
the centroid of the specified objects stored in the given database
Throws:
IllegalArgumentException - if the database is empty

centroid

public static Vector centroid(Matrix data)
Returns the centroid as a Vector object of the specified data matrix.

Parameters:
data - the data matrix, where the data vectors are column vectors
Returns:
the centroid of the specified data matrix

covarianceMatrix

public static <V extends NumberVector<V,?>> Matrix covarianceMatrix(Database<V> database,
                                                                    Collection<Integer> ids)
Determines the covariance matrix of the objects stored in the given database.

Type Parameters:
V - Vector type
Parameters:
database - the database storing the objects
ids - the ids of the objects
Returns:
the covariance matrix of the specified objects

covarianceMatrix

public static <O extends NumberVector<O,?>> Matrix covarianceMatrix(Database<O> database)
Determines the covariance matrix of the objects stored in the given database.

Type Parameters:
O - Vector type
Parameters:
database - the database storing the objects
Returns:
the covariance matrix of the specified objects

covarianceMatrix

public static <O extends NumberVector<O,?>> Matrix covarianceMatrix(Database<O> database,
                                                                    O centroid)

Determines the covariance matrix of the objects stored in the given database w.r.t. the given centroid.

Type Parameters:
O - Vector type
Parameters:
database - the database storing the objects
centroid - the centroid of the database
Returns:
the covariance matrix of the specified objects

covarianceMatrix

public static Matrix covarianceMatrix(Matrix data)
Determines the d x d covariance matrix of the given n x d data matrix.

Parameters:
data - the database storing the objects
Returns:
the covariance matrix of the given data matrix.

variances

public static <O extends NumberVector<O,?>> double[] variances(Database<O> database)
Determines the variances in each dimension of all objects stored in the given database.

Type Parameters:
O - Vector type
Parameters:
database - the database storing the objects
Returns:
the variances in each dimension of all objects stored in the given database

variances

public static <V extends NumberVector<V,?>> double[] variances(Database<V> database,
                                                               Collection<Integer> ids)
Determines the variances in each dimension of the specified objects stored in the given database. Returns variances(database, centroid(database, ids), ids)

Type Parameters:
V - Vector type
Parameters:
database - the database storing the objects
ids - the ids of the objects
Returns:
the variances in each dimension of the specified objects

variances

public static <V extends NumberVector<V,?>> double[] variances(Database<V> database,
                                                               V centroid,
                                                               Collection<Integer> ids)
Determines the variances in each dimension of the specified objects stored in the given database.

Type Parameters:
V - Vector type
Parameters:
database - the database storing the objects
ids - the ids of the objects
centroid - the centroid or reference vector of the ids
Returns:
the variances in each dimension of the specified objects

variances

public static double[] variances(Database<NumberVector<?,?>> database,
                                 NumberVector<?,?> centroid,
                                 Collection<Integer>[] ids)
Determines the variances in each dimension of the specified objects stored in the given database.

Parameters:
database - the database storing the objects
ids - the array of ids of the objects to be considered in each dimension
centroid - the centroid or reference vector of the ids
Returns:
the variances in each dimension of the specified objects

computeMinMax

public static <NV extends NumberVector<NV,?>> Pair<NV,NV> computeMinMax(Database<NV> database)
Determines the minimum and maximum values in each dimension of all objects stored in the given database.

Type Parameters:
NV - vector type
Parameters:
database - the database storing the objects
Returns:
Minimum and Maximum vector for the hyperrectangle

getClassLabels

public static SortedSet<ClassLabel> getClassLabels(Database<?> database)
Retrieves all class labels within the database.

Parameters:
database - the database to be scanned for class labels
Returns:
a set comprising all class labels that are currently set in the database

guessObjectClass

public static <O extends DatabaseObject> Class<? extends O> guessObjectClass(Database<O> database)
Do a cheap guess at the databases object class.

Type Parameters:
O - Restriction type
Parameters:
database - Database
Returns:
Class of first object in the Database.

getBaseObjectClassExpensive

public static <O extends DatabaseObject> Class<? extends DatabaseObject> getBaseObjectClassExpensive(Database<O> database)
Do a full inspection of the database to find the base object class. Note: this can be an abstract class or interface! TODO: Implement a full search for shared superclasses. But since currently the databases will always use only once class, this is not yet implemented.

Type Parameters:
O - Restriction type
Parameters:
database - Database
Returns:
Superclass of all objects in the database

getObjectsByLabelMatch

public static Collection<Integer> getObjectsByLabelMatch(Database<?> database,
                                                         Pattern name_pattern)
Find object by matching their labels.

Parameters:
database - Database to search in
name_pattern - Name to match against class or object label
Returns:
found cluster or it throws an exception.

getClassOrObjectLabel

public static String getClassOrObjectLabel(Database<?> database,
                                           Integer objid)
Get the class label or object label of an object in the database

Parameters:
database - Database
objid - Object ID
Returns:
String representation of label or object label

Release 0.3 (2010-03-31_1612)