Lehr- und Forschungseinheit für Datenbanksysteme

Breadcrumb Navigation


Gengyuan Zhang

Research Assistant


Ludwig-Maximilians-Universität München
Lehrstuhl für Datenbanksysteme und Data Mining
Oettingenstraße 67
80538 München

Room: E U103
Phone: +49-89-2180-9186


I received my bachelor's degree from Zhejiang University in China in 2018 and a master's degree from Technical University of Munich in 2021 and joined LMU as a Ph.D. computer science student. My research interests are focused on multimodal learning and reasoning with video/image and language data. If you are interested in a master's thesis or research projects on these research topics, please get in touch with me via zhang@dbs.ifi.lmu.de.

Research Interests

  • Multimodal reasoning
  • Video understanding


  1. WS23/24: Machine Learning
  2. WS23/24: Master Seminar: Foundation Models in AI
  3. WS23/24: Master Seminar: Knowledge Graph with Machine Learning
  4. SS23:       Machine Learning
  5. SS23:       Master Seminar: Foundation Models in AI
  6. SS23:       Master Seminar: Knowledge Graph with Machine Learning
  7. SS22:       Machine Learning


  1. Gengyuan Zhang, Jisen Ren, Jindong Gu, and Volker Tresp. Multi-event video-text retrieval. In
    Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 22113–22123, 2023.
  2. Zhen Han∗, Gengyuan Zhang∗, Yunpu Ma, and Volker Tresp. Time-dependent entity embedding is
    not all you need: A re-evaluation of temporal knowledge graph completion models under a unified
    framework. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing,
    pages 8104–8118. Association for Computational Linguistics, November 2021.
  3. Gengyuan Zhang, Yurui Zhang, Kerui Zhang, and Volker Tresp. Can vision-language models be a
    good guesser? Exploring vlms for times and location reasoning. arXiv preprint arXiv:2307.06166, 2023.
  4. Jindong Gu, Zhen Han, Shuo Chen, Ahmad Beirami, Bailan He, Gengyuan Zhang, Ruotong Liao, Yao
    Qin, Volker Tresp, and Philip Torr. A systematic survey of prompt engineering on vision-language
    foundation models. arXiv preprint arXiv:2307.12980, 2023.
  5. Yao Zhang, Haokun Chen, Ahmed Frikha, Yezi Yang, Denis Krompass, Gengyuan Zhang, Jindong
    Gu, and Volker Tresp. Cl-crossvqa: A continual learning benchmark for cross-domain visual question
    answering. arXiv preprint arXiv:2211.10567, 2022.