Lehr- und Forschungseinheit für Datenbanksysteme
print


Breadcrumb Navigation


Content

Seminare Connecting Language to Vision (SoSe 2025)

News

Organisation

  • Volume: 12 ECTS (Doppelpraktikum)
  • Lecture: Prof. Dr. Thomas Seidl
  • Contact: Tanveer Hannan
  • Audience: The course is directed towards master students in Informatics, Media Informatics, Statistics and Data Science
  • Registration: Moodle

Time and Locations

All times are s.t. (sine tempore). Please consult uni2work for an up-to-date schedule!

When Where Start
Mon, 14:00 - 18:00 h Oettingenstr. 67, 131 14.04.2025

Content

This seminar explores the intersection of computer vision and NLP, covering AI agents, video captioning, retrieval, QA, and query-based object/action localization. It also examines large reasoning language models for video understanding.

Goal:
Students will gain expertise in Vision-Language Modeling (VLM) research, covering problem formulation, literature review, model development, experimental design, and evaluation. The course provides insights into emerging research areas and thesis opportunities.

Format:

  • Block seminar: Mandatory attendance at all sessions.
  • Key meetings: Kick-off, final presentation, and two additional sessions.

Prerequisites:

  • Machine Learning / Deep Learning
  • Computer Vision and/or NLP with Deep Learning
  • Python (PyTorch) & Linux

Students not meeting these requirements must receive the explicit permission of the instructor to remain in this course.