Accepted paper at EMNLP 2020
An Unsupervised Joint System for Text Generation from Knowledge Graphs and Semantic Parsing
08.10.2020
Authors
Martin Schmitt, Sahand Sharifzadeh, Volker Tresp, Hinrich Schütze
The 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020),
16–20 November 2020, Virtual
Abstract
Knowledge graph (KG) schemas can vary greatly from one domain to another. Therefore supervised approaches to graph-to-text generation and text-to-graph knowledge extraction (semantic parsing) will always suffer from a shortage of domain-specific parallel graph-text data, while adapting a model trained on a different domain is often impossible due to little or no overlap in entities and relations. This situation calls for an approach that (1) does not need large amounts of annotated data and (2) is easy to adapt to new KG schemas. To this end, we present the first approach to fully unsupervised text generation from KGs and KG generation from text. Inspired by recent work on unsupervised machine translation, we serialize a KG as a sequence of facts and frame both tasks as sequence translation. By means of a shared sequence encoder and decoder, our model learns to map both graphs and texts into a joint semantic space and thus generalizes over different surface representations with the same meaning. We evaluate our approach on WebNLG v2.1 and a new benchmark leveraging scene graphs from Visual Genome. Our system outperforms strong baselines for both text↔graph tasks without any manual adaptation from one dataset to the other. In additional experiments, we investigate the impact of using different unsupervised objectives. arXiv