Here, we provide just some examples of usage of ELKI for some algorithms. Hopefully, from here you can easily extend to other algorithms and data sets.
Throughout all examples, we assume you have the executable jar-archive elki.jar
in some directory locally reachable from your console as mypath
,
and downloaded the example data file from (http://www.dbs.ifi.lmu.de/research/KDD/ELKI/datasets/example/exampledata.txt)
to a location reachable from your console as mydata/exampledata.txt
.
java -jar mypath/elki.jar -algorithm clustering.DBSCAN -dbc.in mydata/exampledata.txt -dbscan.epsilon 20 -dbscan.minpts 10This requests the algorithm DBSCAN to cluster the data set using DBSCAN parameters
epsilon=20
and minpts=10
. The clustering result is just printed to the console.
java -jar mypath/elki.jar -algorithm clustering.DBSCAN -dbc.in mydata/exampledata.txt -dbscan.epsilon 20 -dbscan.minpts 10 -out myresults/DBSCANeps20min10Same as before but, this time, a directory for collecting the output is explicitly specified. This results in one file per cluster as found by DBSCAN within the specified directory
myresults/DBSCANeps20min10
.
Each file starts with providing metadata information and information concerning the used parameters before listing the data points contained in the cluster.
For example, in this case, the file for cluster 1 starts like:
############################################################### # Settings and meta information: # db size = 2600 # db dimensionality = 3 # # KDDTask: # -algorithm de.lmu.ifi.dbs.elki.algorithm.clustering.DBSCAN # -dbc de.lmu.ifi.dbs.elki.database.connection.FileBasedDatabaseConnection # -description null # -h false # -help false # -norm null # -normUndo false # -resulthandler de.lmu.ifi.dbs.elki.result.ResultWriter # # DBSCAN: # -algorithm.distancefunction de.lmu.ifi.dbs.elki.distance.distancefunction.EuclideanDistanceFunction # -dbscan.epsilon 20 # -dbscan.minpts 10 # -time false # -verbose false # # FileBasedDatabaseConnection: # -dbc.classLabelClass de.lmu.ifi.dbs.elki.data.SimpleClassLabel # -dbc.classLabelIndex null # -dbc.database de.lmu.ifi.dbs.elki.database.SequentialDatabase # -dbc.externalIDIndex null # -dbc.in mydata/exampledata.txt # -dbc.parser de.lmu.ifi.dbs.elki.parser.DoubleVectorLabelParser # # DoubleVectorLabelParser: # -parser.classLabelIndex -1 # # ResultWriter: # -out myresults/DBSCANeps20min10 # -out.gzip false # -out.silentoverwrite false # ############################################################### # Group class: de.lmu.ifi.dbs.elki.data.cluster.Cluster # Serialization class: de.lmu.ifi.dbs.elki.data.cluster.Cluster # Name: Cluster # Noise flag: false # Size: 18 # Model class: de.lmu.ifi.dbs.elki.data.model.ClusterModel ###############################################################
Most of the parameters shown here are set implicitly with default values or not used
(default false
or null
).
For example, it is possible to additionally request verbose messages during the computation
by setting the flag -verbose
or to request the time used by the core computations by setting the flag -time
.
This is possible for all algorithms.
Unused was also the possibility of normalizing the data. Via the option -norm
,
a normalization procedure can be performed prior to the analysis performed by the algorithm.
As option value, a class is expected. ELKI provides for example the
AttributeWiseMinMaxNormalization as a possibility.
Other normalization procedures could easily be provided by any user by implementing the interface
de.lmu.ifi.dbs.elki.normalization.Normalization.
Setting the flag -normUndo
will revert the normalization before writing the result,
otherwise, the resulting files will list the normalized data vectors.
java -jar mypath/elki.jar -algorithm clustering.DBSCAN -dbc.in mydata/exampledata.txt -dbscan.epsilon 0.02 -dbscan.minpts 10 -out myresults/DBSCANeps20min10 -verbose -time -norm AttributeWiseMinMaxNormalization -normUndoNote that the value for
dbscan.epsilon
is decreased considerably to suit the normalized data
(the AttributeWiseMinMaxNormalization normalizes all attribute values to the range [0:1]
).
-description
. For example, here, we request a description of how to use
the algorithm clustering.correlation.FourC:
java -jar mypath/elki.jar -description de.lmu.ifi.dbs.elki.algorithm.clustering.correlation.FourCThe output describes the general parameters for any KDDTask and additionally the parameters required for FourC.
Note that we here gave the full name of the class FourC
(i.e., including the complete package name),
while we ommitted the prefix de.lmu.ifi.dbs.elki.algorithm.
for clustering.DBSCAN
above.
The reason for this difference is as follows:
If as a parameter value a class name is expected, usually also a restriction class is known, i.e., an interface or a class which must be implemented or extended by the specified parameter value. For example,
-description
is
de.lmu.ifi.dbs.elki.utilities.optionhandling.Parameterizable,-algorithm
is
de.lmu.ifi.dbs.elki.algorithm.Algorithm.-algorithm
, clustering.DBSCAN
(which is not a valid class name per se),
will be automatically completed with the prefix
de.lmu.ifi.dbs.elki.algorithm.
,-description
,
clustering.correlation.FourC
,
however, would be automatically completed with the prefix
de.lmu.ifi.dbs.elki.utilities.optionhandling.
, which does not result in a valid class name.-description
), we are
to specify the complete class name in the first place.
On the other hand, would we like to use FourC as algorithm, as parameter value for -algorithm
the specification
clustering.correlation.FourC
would suffice.
The restriction class and already available implementations (suitable as possible values for the parameter)
are listed in the parameter description. See, e.g., the description of -algorithm
(as provided after using -description
as above or using -help
):
-algorithmAlgorithm to run. Implementing de.lmu.ifi.dbs.elki.algorithm.Algorithm Known classes (default package de.lmu.ifi.dbs.elki.algorithm.): -> APRIORI -> DependencyDerivator -> KNNDistanceOrder -> KNNJoin -> clustering.DBSCAN -> clustering.DeLiClu -> clustering.EM -> clustering.KMeans -> clustering.OPTICS -> clustering.SLINK -> clustering.SNNClustering -> clustering.ByLabelClustering -> clustering.TrivialAllInOne -> clustering.TrivialAllNoise -> clustering.correlation.CASH -> clustering.correlation.COPAC -> clustering.correlation.ERiC -> clustering.correlation.FourC -> clustering.correlation.ORCLUS -> clustering.subspace.CLIQUE -> clustering.subspace.DiSH -> clustering.subspace.PreDeCon -> clustering.subspace.PROCLUS