
Published at 22nd International Conference on Scientific and Statistical Database Management (SSDBM)
Conference Date: June 30th to July 2nd of 2010
Conference Location: Heidelberg, Germany
Conference Title: 22nd International Conference on Scientific and Statistical Database Management (SSDBM), Heidelberg, Germany, 2010
Conference Chairs: Andreas Reuter, Michael Gertz
Conference Co-Chairs: Tony Hey, Bertram Ludäscher
Bernecker, T., Emrich, T., Graf, F., Kriegel, H.-P., Kröger, P., Renz, M., Schubert, E., and Zimek, A.: Subspace Similarity Search Using the Ideas of Ranking and Top-k Retrieval. In Proceedings of the 26th International Conference on Data Engineering (ICDE) Workshop on Ranking in Databases (DBRank), Long Beach, CA, 2010.
DOI: 10.1109/ICDEW.2010.5452771.
Bernecker, T., Emrich, T., Graf, F., Kriegel, H.-P., Kröger, P., Renz, M., Schubert, E., and Zimek, A.: Subspace Similarity Search: Efficient k-NN Queries in Arbitrary Subspaces. In Proceedings of the 22nd International Conference on Scientific and Statistical Database Management (SSDBM), Heidelberg, Germany, 2010.
M. Gertz and B. Ludäscher (Eds.): SSDBM 2010, LNCS 6187, pp. 555–564, 2010. (c) Springer-Verlag Berlin Heidelberg 2010
DOI: 10.1007/978-3-642-13818-8_38.
This is the authors' version of the work. It is posted here by permission of Springer for your personal use. Not for redistribution.
Paper
@inproceedings{Bernecker2010,
title = {Subspace Similarity Search: Efficient k-NN Queries in Arbitrary Subspaces},
author = {Thomas Bernecker and Tobias Emrich and Franz Graf and Hans-Peter Kriegel and Peer Kröger and Matthias Renz and Erich Schubert and Arthur Zimek},
booktitle = {Scientific and Statistical Database Management},
publisher = {Springer Berlin / Heidelberg},
issn = {0302-9743 (Print) 1611-3349 (Online)},
isbn = {978-3-642-13817-1},
url = {http://www.springerlink.com/content/1232010q607v0113/},
volume = {6187},
series = {Lecture Notes in Computer Science},
year = {2010},
pages = {555--564},
subject_collection = {Computer Science},
doi = {10.1007/978-3-642-13818-8_38}
}
The top-down approach can be tested with ELKI, release 0.5.5 - here is an example call:
java -cp elki-0.5.5.jar de.lmu.ifi.dbs.elki.application.KDDCLIApplication -dbc.in inputdatafile -db.index tree.spatial.rstarvariants.rstar.RStarTreeFactory -treeindex.pagesize 8192 -spatial.bulkstrategy SortTileRecursiveBulkSplit -algorithm benchmark.KNNBenchmarkAlgorithm -knnbench.k 20 -algorithm.distancefunction subspace.SubspaceEuclideanDistanceFunction -distance.dims 1,2,7,11 -resulthandler DiscardResultHandler -evaluator NoAutomaticEvaluation -time
The runtime will be reported as follows:
de.lmu.ifi.dbs.elki.algorithm.benchmark.KNNBenchmarkAlgorithm runtime : 15928 milliseconds.
To get the linear scan runtime, remove the "-db.index" parameters to run without an index.
Results will depend a lot on the data set, dimensionality, query dimensionality and on the page size! As a rule of thumb, the runtime of an experiment should be on the magnitude of minutes. You should also take the average over runs with different subspaces: "-distance.dims".
Please, consider the hints for benchmarking with ELKI.