Accepted paper at LWDA 2020
Grace - Limiting the Number of Grid Cells for Clustering High-Dimensional Data
30.07.2020
Authors
Anna Beer, Daniyal Kazempour, Julian Busch, Alexander Tekles, Thomas Seidl
Lernen. Wissen. Daten. Analysen (LWDA2020),
Bonn, Germany, 09–11 September 2020
Abstract
Using grid-based clustering algorithms on high-dimensional data has the advantage of being able to summarize datapoints into cells, but usually produces an exponential number of grid cells. In this paper we introduce Grace (using a Grid which is adaptive for clustering), a clustering algorithm which limits the number of cells produced depending
on the number of points in the dataset. A non-equidistant grid is constructed based on the distribution of points in one-dimensional projections of the data. A density threshold is automatically deduced from the data and used to detect dense cells, which are later combined to clusters. The adaptive grid structure makes an ecient but still accurate
clustering of multidimensional data possible. Experiments with synthetic as well as real-world data sets of various size and dimensionality confirm these properties.