Accepted paper at ICDM 2022
Sandra Gilhuber, Philipp Jahn, Yunpu Ma, Thomas Seidl
The 22nd IEEE International Conference on Data Mining (ICDM 2022),
28 November–01 December 2022, Orlando, Florida, USA
Active learning has the power to significantly reduce the amount of labeled data needed to build strong classifiers. Existing active pseudo-labeling methods show high potential in integrating pseudo-labels within the active learning loop but heavily depend on the prediction accuracy of the model. In this work, we propose Verips, an algorithm that significantly outperforms existing pseudo-labeling techniques for active learning. At its core, Verips uses a pseudo-label verification mechanism that consists of a second network only trained on data approved by the oracle and helps to discard questionable pseudo-labels. In particular, the verifier model eliminates all pseudo-labels for which it disagrees with the actual task model. Verips overcomes the problems of poorly performing initial models, e.g., due to imbalanced or too small initial pools, where previous methods select too many incorrect pseudo-labels and recovering takes long or is not possible. Moreover, Verips is particularly insensitive to parameter choices that existing approaches suffer from.