de.lmu.ifi.dbs.elki.utilities
Class DatabaseUtil

java.lang.Object
  extended by de.lmu.ifi.dbs.elki.utilities.DatabaseUtil

public final class DatabaseUtil
extends Object

Class with Database-related utility functions such as centroid computation, covariances etc.


Nested Class Summary
static class DatabaseUtil.CollectionFromRelation<O>
          Collection view on a database that retrieves the objects when needed.
static class DatabaseUtil.RelationObjectIterator<O>
          Iterator class that retrieves the given objects from the database.
 
Constructor Summary
DatabaseUtil()
           
 
Method Summary
static
<V extends FeatureVector<?,?>>
VectorFieldTypeInformation<V>
assumeVectorField(Relation<V> relation)
          Get the dimensionality of a database
static
<V extends NumberVector<? extends V,?>>
V
centroid(Relation<? extends V> relation)
          Returns the centroid as a NumberVector object of the specified database.
static
<V extends NumberVector<? extends V,?>>
V
centroid(Relation<? extends V> relation, DBIDs ids)
          Returns the centroid as a NumberVector object of the specified objects stored in the given database.
static
<V extends NumberVector<? extends V,?>>
V
centroid(Relation<? extends V> relation, DBIDs ids, BitSet dimensions)
          Returns the centroid w.r.t. the dimensions specified by the given BitSet as a NumberVector object of the specified objects stored in the given database.
static
<NV extends NumberVector<NV,?>>
Pair<NV,NV>
computeMinMax(Relation<NV> database)
          Determines the minimum and maximum values in each dimension of all objects stored in the given database.
static Matrix covarianceMatrix(Matrix data)
          Determines the d x d covariance matrix of the given n x d data matrix.
static
<V extends NumberVector<? extends V,?>>
Matrix
covarianceMatrix(Relation<? extends V> database, DBIDs ids)
          Determines the covariance matrix of the objects stored in the given database.
static int dimensionality(Relation<? extends FeatureVector<?,?>> relation)
          Get the dimensionality of a database
static
<V extends NumberVector<?,?>>
double
exactMedian(Relation<V> relation, DBIDs ids, int dimension)
          Returns the median of a data set in the given dimension.
static
<O> Class<?>
getBaseObjectClassExpensive(Relation<O> database)
          Do a full inspection of the database to find the base object class.
static SortedSet<ClassLabel> getClassLabels(Database database)
          Retrieves all class labels within the database.
static SortedSet<ClassLabel> getClassLabels(Relation<? extends ClassLabel> database)
          Retrieves all class labels within the database.
static
<V extends FeatureVector<?,?>>
String
getColumnLabel(Relation<? extends V> rel, int col)
          Get the column name or produce a generic label "Column XY".
static ArrayModifiableDBIDs getObjectsByLabelMatch(Database database, Pattern name_pattern)
          Find object by matching their labels.
static Relation<String> guessLabelRepresentation(Database database)
          Guess a potentially label-like representation.
static
<O> Class<? extends O>
guessObjectClass(Relation<O> database)
          Do a cheap guess at the databases object class.
static Relation<String> guessObjectLabelRepresentation(Database database)
          Guess a potentially object label-like representation.
static
<V extends NumberVector<?,?>>
double
quickMedian(Relation<V> relation, ArrayDBIDs ids, int dimension, int numberOfSamples)
          Returns the median of a data set in the given dimension by using a sampling method.
static
<V extends NumberVector<?,?>,T extends NumberVector<?,?>>
Relation<V>
relationUglyVectorCast(Relation<T> database)
          An ugly vector type cast unavoidable in some situations due to Generics.
static double[] variances(Relation<? extends NumberVector<?,?>> database, NumberVector<?,?> centroid, DBIDs ids)
          Determines the variances in each dimension of the specified objects stored in the given database.
static
<V extends NumberVector<? extends V,?>>
double[]
variances(Relation<V> database)
          Determines the variances in each dimension of all objects stored in the given database.
static
<V extends NumberVector<? extends V,?>>
double[]
variances(Relation<V> database, DBIDs ids)
          Determines the variances in each dimension of the specified objects stored in the given database.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DatabaseUtil

public DatabaseUtil()
Method Detail

assumeVectorField

public static <V extends FeatureVector<?,?>> VectorFieldTypeInformation<V> assumeVectorField(Relation<V> relation)
Get the dimensionality of a database

Parameters:
relation - relation
Returns:
Vector field type information

dimensionality

public static int dimensionality(Relation<? extends FeatureVector<?,?>> relation)
Get the dimensionality of a database

Parameters:
relation - relation
Returns:
Database dimensionality

centroid

public static <V extends NumberVector<? extends V,?>> V centroid(Relation<? extends V> relation)
Returns the centroid as a NumberVector object of the specified database. The objects must be instance of NumberVector.

Type Parameters:
V - Vector type
Parameters:
relation - the Relation storing the objects
Returns:
the centroid of the specified objects stored in the given database
Throws:
IllegalArgumentException - if the database is empty

centroid

public static <V extends NumberVector<? extends V,?>> V centroid(Relation<? extends V> relation,
                                                                 DBIDs ids)
Returns the centroid as a NumberVector object of the specified objects stored in the given database. The objects belonging to the specified ids must be instance of NumberVector.

Type Parameters:
V - Vector type
Parameters:
relation - the relation
ids - the ids of the objects
Returns:
the centroid of the specified objects stored in the given database
Throws:
IllegalArgumentException - if the id list is empty

centroid

public static <V extends NumberVector<? extends V,?>> V centroid(Relation<? extends V> relation,
                                                                 DBIDs ids,
                                                                 BitSet dimensions)
Returns the centroid w.r.t. the dimensions specified by the given BitSet as a NumberVector object of the specified objects stored in the given database. The objects belonging to the specified IDs must be instance of NumberVector.

Type Parameters:
V - Vector type
Parameters:
relation - the database storing the objects
ids - the identifiable objects
dimensions - the BitSet representing the dimensions to be considered
Returns:
the centroid of the specified objects stored in the given database w.r.t. the specified subspace
Throws:
IllegalArgumentException - if the id list is empty

covarianceMatrix

public static <V extends NumberVector<? extends V,?>> Matrix covarianceMatrix(Relation<? extends V> database,
                                                                              DBIDs ids)
Determines the covariance matrix of the objects stored in the given database.

Type Parameters:
V - Vector type
Parameters:
database - the database storing the objects
ids - the ids of the objects
Returns:
the covariance matrix of the specified objects

covarianceMatrix

public static Matrix covarianceMatrix(Matrix data)
Determines the d x d covariance matrix of the given n x d data matrix.

Parameters:
data - the database storing the objects
Returns:
the covariance matrix of the given data matrix.

variances

public static <V extends NumberVector<? extends V,?>> double[] variances(Relation<V> database)
Determines the variances in each dimension of all objects stored in the given database.

Parameters:
database - the database storing the objects
Returns:
the variances in each dimension of all objects stored in the given database

variances

public static <V extends NumberVector<? extends V,?>> double[] variances(Relation<V> database,
                                                                         DBIDs ids)
Determines the variances in each dimension of the specified objects stored in the given database. Returns variances(database, centroid(database, ids), ids)

Parameters:
database - the database storing the objects
ids - the ids of the objects
Returns:
the variances in each dimension of the specified objects

variances

public static double[] variances(Relation<? extends NumberVector<?,?>> database,
                                 NumberVector<?,?> centroid,
                                 DBIDs ids)
Determines the variances in each dimension of the specified objects stored in the given database.

Parameters:
database - the database storing the objects
ids - the ids of the objects
centroid - the centroid or reference vector of the ids
Returns:
the variances in each dimension of the specified objects

computeMinMax

public static <NV extends NumberVector<NV,?>> Pair<NV,NV> computeMinMax(Relation<NV> database)
Determines the minimum and maximum values in each dimension of all objects stored in the given database.

Type Parameters:
NV - vector type
Parameters:
database - the database storing the objects
Returns:
Minimum and Maximum vector for the hyperrectangle

quickMedian

public static <V extends NumberVector<?,?>> double quickMedian(Relation<V> relation,
                                                               ArrayDBIDs ids,
                                                               int dimension,
                                                               int numberOfSamples)
Returns the median of a data set in the given dimension by using a sampling method.

Parameters:
relation - Relation to process
ids - DBIDs to process
dimension - Dimensionality
numberOfSamples - Number of samples to draw
Returns:
Median value

exactMedian

public static <V extends NumberVector<?,?>> double exactMedian(Relation<V> relation,
                                                               DBIDs ids,
                                                               int dimension)
Returns the median of a data set in the given dimension.

Parameters:
relation - Relation to process
ids - DBIDs to process
dimension - Dimensionality
Returns:
Median value

guessLabelRepresentation

public static Relation<String> guessLabelRepresentation(Database database)
                                                 throws NoSupportedDataTypeException
Guess a potentially label-like representation.

Parameters:
database -
Returns:
string representation
Throws:
NoSupportedDataTypeException

guessObjectLabelRepresentation

public static Relation<String> guessObjectLabelRepresentation(Database database)
                                                       throws NoSupportedDataTypeException
Guess a potentially object label-like representation.

Parameters:
database -
Returns:
string representation
Throws:
NoSupportedDataTypeException

getClassLabels

public static SortedSet<ClassLabel> getClassLabels(Relation<? extends ClassLabel> database)
Retrieves all class labels within the database.

Parameters:
database - the database to be scanned for class labels
Returns:
a set comprising all class labels that are currently set in the database

getClassLabels

public static SortedSet<ClassLabel> getClassLabels(Database database)
Retrieves all class labels within the database.

Parameters:
database - the database to be scanned for class labels
Returns:
a set comprising all class labels that are currently set in the database

guessObjectClass

public static <O> Class<? extends O> guessObjectClass(Relation<O> database)
Do a cheap guess at the databases object class.

Type Parameters:
O - Restriction type
Parameters:
database - Database
Returns:
Class of first object in the Database.

getBaseObjectClassExpensive

public static <O> Class<?> getBaseObjectClassExpensive(Relation<O> database)
Do a full inspection of the database to find the base object class. Note: this can be an abstract class or interface! TODO: Implement a full search for shared superclasses. But since currently the databases will always use only once class, this is not yet implemented.

Type Parameters:
O - Restriction type
Parameters:
database - Database
Returns:
Superclass of all objects in the database

getObjectsByLabelMatch

public static ArrayModifiableDBIDs getObjectsByLabelMatch(Database database,
                                                          Pattern name_pattern)
Find object by matching their labels.

Parameters:
database - Database to search in
name_pattern - Name to match against class or object label
Returns:
found cluster or it throws an exception.

relationUglyVectorCast

public static <V extends NumberVector<?,?>,T extends NumberVector<?,?>> Relation<V> relationUglyVectorCast(Relation<T> database)
An ugly vector type cast unavoidable in some situations due to Generics.

Type Parameters:
V - Base vector type
T - Derived vector type (is actually V, too)
Parameters:
database - Database
Returns:
Database

getColumnLabel

public static <V extends FeatureVector<?,?>> String getColumnLabel(Relation<? extends V> rel,
                                                                   int col)
Get the column name or produce a generic label "Column XY".

Parameters:
rel - Relation
col - Column
Returns:
Label

Release 0.4.0 (2011-09-20_1324)