Some examples for completely parameterized calls of ELKI

Here, we provide just some examples of usage of ELKI for some algorithms. Hopefully, from here you can easily extend to other algorithms and data sets. Throughout all examples, we assume you have the executable jar-archive elki.jar in some directory locally reachable from your console as mypath, and downloaded the example data file from (http://www.dbs.ifi.lmu.de/research/KDD/ELKI/datasets/example/exampledata.txt) to a location reachable from your console as mydata/exampledata.txt.

Example: DBSCAN

Basic Call:

java -jar mypath/elki.jar -algorithm clustering.DBSCAN -dbc.in mydata/exampledata.txt -dbscan.epsilon 20 -dbscan.minpts 10
This requests the algorithm DBSCAN to cluster the data set using DBSCAN parameters epsilon=20 and minpts=10. The clustering result is just printed to the console.

Call with specified output file/directory:

java -jar mypath/elki.jar -algorithm clustering.DBSCAN -dbc.in mydata/exampledata.txt -dbscan.epsilon 20 -dbscan.minpts 10 -out myresults/DBSCANeps20min10
Same as before but, this time, a directory for collecting the output is explicitly specified. This results in one file per cluster as found by DBSCAN within the specified directory myresults/DBSCANeps20min10. Each file starts with providing metadata information and information concerning the used parameters before listing the data points contained in the cluster. For example, in this case, the file for cluster 1 starts like:
###############################################################
# Settings and meta information:
# db size = 2600
# db dimensionality = 3
#
# KDDTask:
# -algorithm de.lmu.ifi.dbs.elki.algorithm.clustering.DBSCAN
# -dbc de.lmu.ifi.dbs.elki.database.connection.FileBasedDatabaseConnection
# -description null
# -h false
# -help false
# -norm null
# -normUndo false
# -resulthandler de.lmu.ifi.dbs.elki.result.ResultWriter
#
# DBSCAN:
# -algorithm.distancefunction de.lmu.ifi.dbs.elki.distance.distancefunction.EuclideanDistanceFunction
# -dbscan.epsilon 20
# -dbscan.minpts 10
# -time false
# -verbose false
#
# FileBasedDatabaseConnection:
# -dbc.classLabelClass de.lmu.ifi.dbs.elki.data.SimpleClassLabel
# -dbc.classLabelIndex null
# -dbc.database de.lmu.ifi.dbs.elki.database.SequentialDatabase
# -dbc.externalIDIndex null
# -dbc.in mydata/exampledata.txt
# -dbc.parser de.lmu.ifi.dbs.elki.parser.DoubleVectorLabelParser
#
# DoubleVectorLabelParser:
# -parser.classLabelIndex -1
#
# ResultWriter:
# -out myresults/DBSCANeps20min10
# -out.gzip false
# -out.silentoverwrite false
#
###############################################################
# Group class: de.lmu.ifi.dbs.elki.data.cluster.Cluster
# Serialization class: de.lmu.ifi.dbs.elki.data.cluster.Cluster
# Name: Cluster
# Noise flag: false
# Size: 18
# Model class: de.lmu.ifi.dbs.elki.data.model.ClusterModel
###############################################################

Most of the parameters shown here are set implicitly with default values or not used (default false or null). For example, it is possible to additionally request verbose messages during the computation by setting the flag -verbose or to request the time used by the core computations by setting the flag -time. This is possible for all algorithms.

Unused was also the possibility of normalizing the data. Via the option -norm, a normalization procedure can be performed prior to the analysis performed by the algorithm. As option value, a class is expected. ELKI provides for example the AttributeWiseMinMaxNormalization as a possibility. Other normalization procedures could easily be provided by any user by implementing the interface de.lmu.ifi.dbs.elki.normalization.Normalization. Setting the flag -normUndo will revert the normalization before writing the result, otherwise, the resulting files will list the normalized data vectors.

Example call requesting time and verbose messages and using a normalization:

java -jar mypath/elki.jar -algorithm clustering.DBSCAN -dbc.in mydata/exampledata.txt -dbscan.epsilon 0.02 -dbscan.minpts 10 -out myresults/DBSCANeps20min10 -verbose -time -norm AttributeWiseMinMaxNormalization -normUndo
Note that the value for dbscan.epsilon is decreased considerably to suit the normalized data (the AttributeWiseMinMaxNormalization normalizes all attribute values to the range [0:1]).

Different algorithms

To become acquainted with an unknown algorithm, try the option -description. For example, here, we request a description of how to use the algorithm clustering.correlation.FourC:
java -jar mypath/elki.jar -description de.lmu.ifi.dbs.elki.algorithm.clustering.correlation.FourC
The output describes the general parameters for any KDDTask and additionally the parameters required for FourC.

Note that we here gave the full name of the class FourC (i.e., including the complete package name), while we ommitted the prefix de.lmu.ifi.dbs.elki.algorithm. for clustering.DBSCAN above. The reason for this difference is as follows:

If as a parameter value a class name is expected, usually also a restriction class is known, i.e., an interface or a class which must be implemented or extended by the specified parameter value. For example,

If the specified class cannot be initialized by the given name, the initialization tries the same class name using as prefix the package of the restriction class. Thus, Hence, here (i.e., for parameter -description), we are to specify the complete class name in the first place. On the other hand, would we like to use FourC as algorithm, as parameter value for -algorithm the specification clustering.correlation.FourC would suffice.

The restriction class and already available implementations (suitable as possible values for the parameter) are listed in the parameter description. See, e.g., the description of -algorithm (as provided after using -description as above or using -help):

-algorithm 
   Algorithm to run.
   Implementing de.lmu.ifi.dbs.elki.algorithm.Algorithm
   Known classes (default package de.lmu.ifi.dbs.elki.algorithm.):
   -> APRIORI
   -> DependencyDerivator
   -> KNNDistanceOrder
   -> KNNJoin
   -> clustering.DBSCAN
   -> clustering.DeLiClu
   -> clustering.EM
   -> clustering.KMeans
   -> clustering.OPTICS
   -> clustering.SLINK
   -> clustering.SNNClustering
   -> clustering.ByLabelClustering
   -> clustering.TrivialAllInOne
   -> clustering.TrivialAllNoise
   -> clustering.correlation.CASH
   -> clustering.correlation.COPAC
   -> clustering.correlation.ERiC
   -> clustering.correlation.FourC
   -> clustering.correlation.ORCLUS
   -> clustering.subspace.CLIQUE
   -> clustering.subspace.DiSH
   -> clustering.subspace.PreDeCon
   -> clustering.subspace.PROCLUS