Lehr- und Forschungseinheit für Datenbanksysteme
print


Breadcrumb Navigation


Content

Accepted paper at LWDA 2020

Grace - Limiting the Number of Grid Cells for Clustering High-Dimensional Data

30.07.2020

Authors

Anna Beer, Daniyal Kazempour, Julian Busch, Alexander Tekles, Thomas Seidl

lwda2020_logo


Lernen. Wissen. Daten. Analysen (LWDA2020),
Bonn, Germany, 09–11 September 2020

 

Abstract

Using grid-based clustering algorithms on high-dimensional data has the advantage of being able to summarize datapoints into cells, but usually produces an exponential number of grid cells. In this paper we introduce Grace (using a Grid which is adaptive for clustering), a clustering algorithm which limits the number of cells produced depending
on the number of points in the dataset. A non-equidistant grid is constructed based on the distribution of points in one-dimensional projections of the data. A density threshold is automatically deduced from the data and used to detect dense cells, which are later combined to clusters. The adaptive grid structure makes an ecient but still accurate
clustering of multidimensional data possible. Experiments with synthetic as well as real-world data sets of various size and dimensionality confirm these properties.