
O - Object typeD - Distance type@Title(value="Distance Histogram") @Description(value="Computes a histogram over the distances occurring in the data set.") public class DistanceStatisticsWithClasses<O,D extends NumberDistance<D,?>> extends AbstractDistanceBasedAlgorithm<O,D,CollectionResult<DoubleVector>>
| Modifier and Type | Class and Description | 
|---|---|
static class  | 
DistanceStatisticsWithClasses.Parameterizer<O,D extends NumberDistance<D,?>>
Parameterization class. 
 | 
| Modifier and Type | Field and Description | 
|---|---|
private boolean | 
exact
Compute exactly (slower). 
 | 
static OptionID | 
EXACT_ID
Flag to compute exact value range for binning. 
 | 
static OptionID | 
HISTOGRAM_BINS_ID
Option to configure the number of bins to use. 
 | 
private static Logging | 
LOG
The logger for this class. 
 | 
private int | 
numbin
Number of bins to use in sampling. 
 | 
private boolean | 
sampling
Sampling flag. 
 | 
static OptionID | 
SAMPLING_ID
Flag to enable sampling. 
 | 
DISTANCE_FUNCTION_ID| Constructor and Description | 
|---|
DistanceStatisticsWithClasses(DistanceFunction<? super O,D> distanceFunction,
                             int numbins,
                             boolean exact,
                             boolean sampling)
Constructor. 
 | 
| Modifier and Type | Method and Description | 
|---|---|
private DoubleMinMax | 
exactMinMax(Relation<O> relation,
           DistanceQuery<O,D> distFunc)
Compute the exact maximum and minimum. 
 | 
TypeInformation[] | 
getInputTypeRestriction()
Get the input type restriction used for negotiating the data query. 
 | 
protected Logging | 
getLogger()
Get the (STATIC) logger for this class. 
 | 
HistogramResult<DoubleVector> | 
run(Database database)
Runs the algorithm. 
 | 
private DoubleMinMax | 
sampleMinMax(Relation<O> relation,
            DistanceQuery<O,D> distFunc)
Estimate minimum and maximum via sampling. 
 | 
private static void | 
shrinkHeap(TreeSet<DoubleDBIDPair> hotset,
          int k)
Shrink the heap of "hot" (extreme) items. 
 | 
getDistanceFunctionmakeParameterDistanceFunctionprivate static final Logging LOG
public static final OptionID EXACT_ID
public static final OptionID SAMPLING_ID
public static final OptionID HISTOGRAM_BINS_ID
private int numbin
private boolean sampling
private boolean exact
public DistanceStatisticsWithClasses(DistanceFunction<? super O,D> distanceFunction, int numbins, boolean exact, boolean sampling)
distanceFunction - Distance function to usenumbins - Number of binsexact - Exactness flagsampling - Sampling flagpublic HistogramResult<DoubleVector> run(Database database)
Algorithmrun in interface Algorithmrun in class AbstractAlgorithm<CollectionResult<DoubleVector>>database - the database to run the algorithm onprivate DoubleMinMax sampleMinMax(Relation<O> relation, DistanceQuery<O,D> distFunc)
relation - Relation to processdistFunc - Distance function to useprivate DoubleMinMax exactMinMax(Relation<O> relation, DistanceQuery<O,D> distFunc)
relation - Relation to processdistFunc - Distance functionprivate static void shrinkHeap(TreeSet<DoubleDBIDPair> hotset, int k)
hotset - Set of hot itemsk - target sizepublic TypeInformation[] getInputTypeRestriction()
AbstractAlgorithmgetInputTypeRestriction in interface AlgorithmgetInputTypeRestriction in class AbstractAlgorithm<CollectionResult<DoubleVector>>protected Logging getLogger()
AbstractAlgorithmgetLogger in class AbstractAlgorithm<CollectionResult<DoubleVector>>