Seminare Connecting Language to Vision (SoSe 2025)
News
Organisation
- Volume: 12 ECTS (Doppelpraktikum)
- Lecture: Prof. Dr. Thomas Seidl
- Contact: Tanveer Hannan
- Audience: The course is directed towards master students in Informatics, Media Informatics, Statistics and Data Science
- Registration: Moodle
Time and Locations
All times are s.t. (sine tempore). Please consult uni2work for an up-to-date schedule!
When | Where | Start |
Mon, 14:00 - 18:00 h | Oettingenstr. 67, 131 | 14.04.2025 |
Content
This seminar explores the intersection of computer vision and NLP, covering AI agents, video captioning, retrieval, QA, and query-based object/action localization. It also examines large reasoning language models for video understanding.
Goal:
Students will gain expertise in Vision-Language Modeling (VLM) research, covering problem formulation, literature review, model development, experimental design, and evaluation. The course provides insights into emerging research areas and thesis opportunities.
Format:
- Block seminar: Mandatory attendance at all sessions.
- Key meetings: Kick-off, final presentation, and two additional sessions.
Prerequisites:
- Machine Learning / Deep Learning
- Computer Vision and/or NLP with Deep Learning
- Python (PyTorch) & Linux
Students not meeting these requirements must receive the explicit permission of the instructor to remain in this course.