Transformers and Large Language Models
The course begins with an in-depth exploration of the core ideas that gave rise to the Transformer architecture, unpacking its structure, intuition, and evolution. We’ll then dive into the most influential research in the field—both theoretical advances and practical innovations—before examining how Transformers are applied across key domains, including natural language processing, speech recognition, and computer vision.
In the second half of the course, students will take a more active role by presenting selected papers from recent literature and developing a mini-project inspired by their assigned work, gaining hands-on experience with cutting-edge developments in the field.
Meetings
After approximately 7-8 introductory lectures by the lecturer, the course will shift focus to student-led presentations based on assigned projects. Each student will be assigned one or more recent papers in the field, which will serve as the foundation for their project work.
The project will consist of the following components:
-
Reading and understanding the assigned paper(s)
-
Implementing any proposed algorithms, if applicable
-
Investigating potential extensions or improvements to the original work
-
Preparing a presentation summarizing the work (the slides are subject to the lecturer’s approval)
-
Delivering the presentation to the class
Grade composition
- 50% of the grade will be dictated by the quality of the presentation created and the lecture given
- 50% of the grade will be dictated by the quality of the work that goes beyond the paper’s content
References
Transformers and Large Language Models
Transformers are powerful deep learning architectures that have revolutionized how machines understand and generate sequential data. From enabling breakthroughs in language models like ChatGPT to powering cutting-edge systems in speech recognition and computer vision, Transformers have become a cornerstone of modern AI. This course explores their design, capabilities, and impact across a range of domains.