Environment for
DeveLoping
KDD-Applications
Supported by Index-Structures

de.lmu.ifi.dbs.elki.utilities
Class DatabaseUtil

java.lang.Object
  extended by de.lmu.ifi.dbs.elki.utilities.DatabaseUtil

public final class DatabaseUtil
extends Object

Class with Database-related utility functions such as centroid computation, covariances etc.

Author:
Erich Schubert

Constructor Summary
DatabaseUtil()
           
 
Method Summary
static
<O extends RealVector<O,?>>
O
centroid(Database<O> database)
          Returns the centroid as a RealVector object of the specified database.
static
<V extends RealVector<V,?>>
V
centroid(Database<V> database, Collection<Integer> ids)
          Returns the centroid as a RealVector object of the specified objects stored in the given database.
static
<V extends RealVector<V,?>>
V
centroid(Database<V> database, Collection<Integer> ids, BitSet bitSet)
          Returns the centroid w.r.t. the dimensions specified by the given BitSet as a RealVector object of the specified objects stored in the given database.
static
<V extends RealVector<V,?>>
V
centroid(Database<V> database, Iterator<Integer> iter, BitSet bitSet)
          Returns the centroid w.r.t. the dimensions specified by the given BitSet as a RealVector object of the specified objects stored in the given database.
static Vector centroid(Matrix data)
          Returns the centroid as a Vector object of the specified data matrix.
static
<O extends RealVector<O,?>>
Matrix
covarianceMatrix(Database<O> database)
          Determines the covariance matrix of the objects stored in the given database.
static
<O extends RealVector<O,?>>
Matrix
covarianceMatrix(Database<O> database, O centroid)
           Determines the covariance matrix of the objects stored in the given database w.r.t. the given centroid.
static
<V extends RealVector<V,?>>
Matrix
covarianceMatrix(Database<V> database, Collection<Integer> ids)
          Determines the covariance matrix of the objects stored in the given database.
static Matrix covarianceMatrix(Matrix data)
          Determines the d x d covariance matrix of the given n x d data matrix.
static
<O extends DatabaseObject>
Class<? extends DatabaseObject>
getBaseObjectClassExpensive(Database<O> database)
          Do a full inspection of the database to find the base object class.
static SortedSet<ClassLabel> getClassLabels(Database<?> database)
          Retrieves all class labels within the database.
static
<O extends DatabaseObject>
Class<? extends O>
guessObjectClass(Database<O> database)
          Do a cheap guess at the databases object class.
static double[][] min_max(Database<RealVector<?,?>> database)
          Determines the minimum and maximum values in each dimension of all objects stored in the given database.
static
<O extends RealVector<O,?>>
double[]
variances(Database<O> database)
          Determines the variances in each dimension of all objects stored in the given database.
static double[] variances(Database<RealVector<?,?>> database, RealVector<?,?> centroid, Collection<Integer>[] ids)
          Determines the variances in each dimension of the specified objects stored in the given database.
static
<V extends RealVector<V,?>>
double[]
variances(Database<V> database, Collection<Integer> ids)
          Determines the variances in each dimension of the specified objects stored in the given database.
static
<V extends RealVector<V,?>>
double[]
variances(Database<V> database, V centroid, Collection<Integer> ids)
          Determines the variances in each dimension of the specified objects stored in the given database.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DatabaseUtil

public DatabaseUtil()
Method Detail

centroid

public static <V extends RealVector<V,?>> V centroid(Database<V> database,
                                                     Collection<Integer> ids)
Returns the centroid as a RealVector object of the specified objects stored in the given database. The objects belonging to the specified ids must be instance of RealVector.

Type Parameters:
V - Vector type
Parameters:
database - the database storing the objects
ids - the ids of the objects
Returns:
the centroid of the specified objects stored in the given database
Throws:
IllegalArgumentException - if the id list is empty

centroid

public static <V extends RealVector<V,?>> V centroid(Database<V> database,
                                                     Collection<Integer> ids,
                                                     BitSet bitSet)
Returns the centroid w.r.t. the dimensions specified by the given BitSet as a RealVector object of the specified objects stored in the given database. The objects belonging to the specified ids must be instance of RealVector.

Type Parameters:
V - Vector type
Parameters:
database - the database storing the objects
ids - the identifiable objects
bitSet - the bitSet specifiying the dimensions to be considered
Returns:
the centroid of the specified objects stored in the given database
Throws:
IllegalArgumentException - if the id list is empty

centroid

public static <V extends RealVector<V,?>> V centroid(Database<V> database,
                                                     Iterator<Integer> iter,
                                                     BitSet bitSet)
Returns the centroid w.r.t. the dimensions specified by the given BitSet as a RealVector object of the specified objects stored in the given database. The objects belonging to the specified ids must be instance of RealVector.

Type Parameters:
V - Vector type
Parameters:
database - the database storing the objects
iter - iterator over the identifiable objects
bitSet - the bitSet specifiying the dimensions to be considered
Returns:
the centroid of the specified objects stored in the given database
Throws:
IllegalArgumentException - if the id list is empty

centroid

public static <O extends RealVector<O,?>> O centroid(Database<O> database)
Returns the centroid as a RealVector object of the specified database. The objects must be instance of RealVector.

Type Parameters:
O - Vector type
Parameters:
database - the database storing the objects
Returns:
the centroid of the specified objects stored in the given database
Throws:
IllegalArgumentException - if the database is empty

centroid

public static Vector centroid(Matrix data)
Returns the centroid as a Vector object of the specified data matrix.

Parameters:
data - the data matrix, where the data vectors are column vectors
Returns:
the centroid of the specified data matrix

covarianceMatrix

public static <V extends RealVector<V,?>> Matrix covarianceMatrix(Database<V> database,
                                                                  Collection<Integer> ids)
Determines the covariance matrix of the objects stored in the given database.

Type Parameters:
V - Vector type
Parameters:
database - the database storing the objects
ids - the ids of the objects
Returns:
the covariance matrix of the specified objects

covarianceMatrix

public static <O extends RealVector<O,?>> Matrix covarianceMatrix(Database<O> database)
Determines the covariance matrix of the objects stored in the given database.

Type Parameters:
O - Vector type
Parameters:
database - the database storing the objects
Returns:
the covariance matrix of the specified objects

covarianceMatrix

public static <O extends RealVector<O,?>> Matrix covarianceMatrix(Database<O> database,
                                                                  O centroid)

Determines the covariance matrix of the objects stored in the given database w.r.t. the given centroid.

Type Parameters:
O - Vector type
Parameters:
database - the database storing the objects
centroid - the centroid of the database
Returns:
the covariance matrix of the specified objects

covarianceMatrix

public static Matrix covarianceMatrix(Matrix data)
Determines the d x d covariance matrix of the given n x d data matrix.

Parameters:
data - the database storing the objects
Returns:
the covariance matrix of the given data matrix.

variances

public static <O extends RealVector<O,?>> double[] variances(Database<O> database)
Determines the variances in each dimension of all objects stored in the given database.

Type Parameters:
O - Vector type
Parameters:
database - the database storing the objects
Returns:
the variances in each dimension of all objects stored in the given database

variances

public static <V extends RealVector<V,?>> double[] variances(Database<V> database,
                                                             Collection<Integer> ids)
Determines the variances in each dimension of the specified objects stored in the given database. Returns variances(database, centroid(database, ids), ids)

Type Parameters:
V - Vector type
Parameters:
database - the database storing the objects
ids - the ids of the objects
Returns:
the variances in each dimension of the specified objects

variances

public static <V extends RealVector<V,?>> double[] variances(Database<V> database,
                                                             V centroid,
                                                             Collection<Integer> ids)
Determines the variances in each dimension of the specified objects stored in the given database.

Type Parameters:
V - Vector type
Parameters:
database - the database storing the objects
ids - the ids of the objects
centroid - the centroid or reference vector of the ids
Returns:
the variances in each dimension of the specified objects

variances

public static double[] variances(Database<RealVector<?,?>> database,
                                 RealVector<?,?> centroid,
                                 Collection<Integer>[] ids)
Determines the variances in each dimension of the specified objects stored in the given database.

Parameters:
database - the database storing the objects
ids - the array of ids of the objects to be considered in each dimension
centroid - the centroid or reference vector of the ids
Returns:
the variances in each dimension of the specified objects

min_max

public static double[][] min_max(Database<RealVector<?,?>> database)
Determines the minimum and maximum values in each dimension of all objects stored in the given database.

Parameters:
database - the database storing the objects
Returns:
an array consisting of an array of the minimum and an array of the maximum values in each dimension of all objects stored in the given database

getClassLabels

public static SortedSet<ClassLabel> getClassLabels(Database<?> database)
Retrieves all class labels within the database.

Parameters:
database - the database to be scanned for class labels
Returns:
a set comprising all class labels that are currently set in the database

guessObjectClass

public static <O extends DatabaseObject> Class<? extends O> guessObjectClass(Database<O> database)
Do a cheap guess at the databases object class.

Type Parameters:
O - Restriction type
Parameters:
database - Database
Returns:
Class of first object in the Database.

getBaseObjectClassExpensive

public static <O extends DatabaseObject> Class<? extends DatabaseObject> getBaseObjectClassExpensive(Database<O> database)
Do a full inspection of the database to find the base object class. Note: this can be an abstract class or interface! TODO: Implement a full search for shared superclasses. But since currently the databases will always use only once class, this is not yet implemented.

Type Parameters:
O - Restriction type
Parameters:
database - Database
Returns:
Superclass of all objects in the database

Release 0.2 (2009-07-06_1820)