Seminare Connecting Language to Vision (SoSe 2025)

News

Organisation

Volume: 12 ECTS (Doppelpraktikum)
Lecture: Prof. Dr. Thomas Seidl
Contact: Tanveer Hannan
Audience: The course is directed towards master students in Informatics, Media Informatics, Statistics and Data Science
Registration: Moodle

Time and Locations

All times are s.t. (sine tempore). Please consult uni2work for an up-to-date schedule!

When	Where	Start
Mon, 14:00 - 18:00 h	Oettingenstr. 67, 131	14.04.2025

Content

This seminar explores the intersection of computer vision and NLP, covering AI agents, video captioning, retrieval, QA, and query-based object/action localization. It also examines large reasoning language models for video understanding.

Goal:
Students will gain expertise in Vision-Language Modeling (VLM) research, covering problem formulation, literature review, model development, experimental design, and evaluation. The course provides insights into emerging research areas and thesis opportunities.

Format:

Block seminar: Mandatory attendance at all sessions.
Key meetings: Kick-off, final presentation, and two additional sessions.

Prerequisites:

Machine Learning / Deep Learning
Computer Vision and/or NLP with Deep Learning
Python (PyTorch) & Linux

Students not meeting these requirements must receive the explicit permission of the instructor to remain in this course.

Search

Links and Functions

Breadcrumb Navigation

Main Navigation

Content

Seminare Connecting Language to Vision (SoSe 2025)

News

Organisation

Time and Locations

Footer