Lehr- und Forschungseinheit für Datenbanksysteme

Breadcrumb Navigation


Accepted paper at LWDA 2020

Grace - Limiting the Number of Grid Cells for Clustering High-Dimensional Data



Anna Beer, Daniyal Kazempour, Julian Busch, Alexander Tekles, Thomas Seidl


Lernen. Wissen. Daten. Analysen (LWDA2020),
Bonn, Germany, 09–11 September 2020



Using grid-based clustering algorithms on high-dimensional data has the advantage of being able to summarize datapoints into cells, but usually produces an exponential number of grid cells. In this paper we introduce Grace (using a Grid which is adaptive for clustering), a clustering algorithm which limits the number of cells produced depending
on the number of points in the dataset. A non-equidistant grid is constructed based on the distribution of points in one-dimensional projections of the data. A density threshold is automatically deduced from the data and used to detect dense cells, which are later combined to clusters. The adaptive grid structure makes an ecient but still accurate
clustering of multidimensional data possible. Experiments with synthetic as well as real-world data sets of various size and dimensionality confirm these properties.