BigData TimeSeries Clustering and Classification
Type: Master's Thesis
Industrial partner: Audi AG
Prerequisites:
• Strong knowledge of Classification and Clustering with Bayesian/Stochastic approaches
• Programming Language knowledge, Python and Spark (pyspark)
• Interest in developing industry-relevant solutions
• Background in Big Data Analytics is beneficial
Description:
Audi aims to know the number of vehicle drivers in relation to a vehicle. Based on multidimensionally collected live vehicle data, which include seasonal, outlier, inaccurate or incomplete data, an unsupervised clustering shall be investigated. However, the multidimensionality of features (car data) needs to be selected in a suitable way to find adequate clusters. Therefore, a BigData set is provided to find a suitable inference to differentiate the drivers within the vehicle. To validate the clustering approach, a labelled sample set is available.
In addition, a classification approach shall be implemented to differentiate between drivers within a data set.
Contact:
o Prof. Dr. Matthias Schubert, schubert@dbs.ifi.lmu.de
o Johannes Ziegmann, johannes.ziegmann@audi.de
References:
1. Book: “Time Series Clustering and Classification”
from Elizabeth Ann Maharaj, Pierpaolo D'Urso, Jorge Caiado · 2021
2. Clustering Time Series with Nonlinear Dynamics: A Bayesian Non-Parametric and Particle-Based Approach
3. Time Series Clustering with an EM algorithm for Mixtures of Linear Gaussian State Space Models
4. Model-based clustering with Hidden Markov Model regression for time series with regime changes
5. Sensitivity Analysis for Driver Energy Prediction with Environmental Features and Naturalistic Data