Transformers and Large Language Models

January 4, 2025

Topics

Introduction
RNN, LSTM, sequence-to-sequence, attention mechanism, LAS Transformers as database query, self-attention mechanism
Components of Transformers and their usage
Transformers implementation in NLP
Transformers Implementation in Speech
Transformers Implementation in Vision
Prompts
In-context learning
The alignment problem and its limitation
Tuning Transformers using Reinforcement learning from human feedback (RLHF)
In-context learning
Flash Networks; RWKV; interpretability

The course begins with an in-depth exploration of the core ideas that gave rise to the Transformer architecture, unpacking its structure, intuition, and evolution. We’ll then dive into the most influential research in the field—both theoretical advances and practical innovations—before examining how Transformers are applied across key domains, including natural language processing, speech recognition, and computer vision.

In the second half of the course, students will take a more active role by presenting selected papers from recent literature and developing a mini-project inspired by their assigned work, gaining hands-on experience with cutting-edge developments in the field.

Meetings

After approximately 7-8 introductory lectures by the lecturer, the course will shift focus to student-led presentations based on assigned projects. Each student will be assigned one or more recent papers in the field, which will serve as the foundation for their project work.

The project will consist of the following components:

Reading and understanding the assigned paper(s)
Implementing any proposed algorithms, if applicable
Investigating potential extensions or improvements to the original work
Preparing a presentation summarizing the work (the slides are subject to the lecturer’s approval)
Delivering the presentation to the class

Grade composition

50% of the grade will be dictated by the quality of the presentation created and the lecture given
50% of the grade will be dictated by the quality of the work that goes beyond the paper’s content

References

Transformers and Large Language Models

Transformers are powerful deep learning architectures that have revolutionized how machines understand and generate sequential data. From enabling breakthroughs in language models like ChatGPT to powering cutting-edge systems in speech recognition and computer vision, Transformers have become a cornerstone of modern AI. This course explores their design, capabilities, and impact across a range of domains.

Back to all articles