Courses

I teach three courses: Speech Processing with Deep Learning, Deep Learning; Transformers, and Large Language Models. These courses align with my research interests in speech recognition and understanding, conversational AI, and deep learning.

My courses:

|ECE 460747 / DDS 970201|ECE 460217|ECE 0480011

Courses Details

Speech Processing with Deep Learning

Course ECE 460747 / DDS 970201

Topics

Introduction to the speech signal
Mechanical model of speech generation
Speech signal representation
Single word recognition: Keyword spotting
From single words to large vocabulary continues speech recognition
Acoustic Modeling and Connectionist Temporal Classification (CTC)
Modern deep network architectures for sequential data
The CPC loss and self-supervised learning
Speech synthesis and text-to-speech (TTS)
Conversational systems

Speech recognition and synthesis have become foundational components in the development of modern artificial intelligence, bridging the gap between human communication and machine interaction. This course aims to provide a comprehensive introduction to the core principles of speech signal processing and modeling, essential for understanding how machines perceive and generate human language.

We begin with an exploration of the fundamental nature of the speech signal, examining how humans produce speech and the elements that characterize it—such as pitch, formants, and prosody. We delve into the mechanics behind speech generation, using mathematical models to represent the speech process, notably through Linear Predictive Coding (LPC) and autoregressive models. As speech signals are often complex and variable, we explore various techniques to transform these signals into machine-interpretable formats using spectral representations like STFT, Mel-spectrum, and MFCCs.

Progressing to practical applications, the course covers keyword spotting and modern automatic speech recognition (ASR). With advancements in deep learning, we explore modern architectures such as RNNs, LSTMs, and transformers, and the Connectionist Temporal Classification (CTC) loss function, which have revolutionized speech processing by leveraging neural networks for sequence modeling and self-supervised learning.

The course culminates with state-of-the-art speech synthesis techniques, including autoregressive models like WaveNet, and GAN-based approaches, leading to more natural and human-like text-to-speech systems. Advanced topics will also cover emerging trends in speech-language models that operate without explicit text, showcasing the future trajectory of research in this field.

Books

Quatieri, Discrete time speech signal processing, 2002
Rabiner & Schafer Introduction to Digital Speech Processing, 2007

Grade composition

Assignments: 50%
Exam: 50%

Deep Learning

Course ECE 460217

Topics

Introduction
Optimization
Efficient Differentiation
Single Neuron
Multilayer Neural Networks
Neural Networks for Vision Tasks
Neural Networks for Sequential Tasks
Training Methods
Impact Analysis of Training Methods
Data Efficiency and Pre-training
Resource Efficiency and Model Compression

We will explore both the theoretical foundations and practical techniques for designing, building, and analyzing deep neural networks, with a focus on supervised learning. Topics include the behavior and convergence of gradient descent and its variants, efficient methods for automatic differentiation, and the theoretical and empirical properties of multilayer networks—such as approximation capabilities, initialization strategies, generalization, and symmetry. We will also cover convolutional networks and their extensions for visual tasks, advanced training methods and their analysis, and neural architectures for sequential data, including recurrent networks, attention mechanisms, and transformers. Additionally, the course addresses strategies to improve data efficiency (e.g., pre-training, self-supervised learning) and resource efficiency (e.g., model quantization, pruning).

This course is taught with Prof. Daniel Soudry.

Meetings

There will be 12 meetings: the first 7-8 meetings will be given by the lecturer (me). The last meetings will be given by the course participants (see note about working in pairs under the grade composition). Each student will have to present a paper and possible future directions. The article described in class and the proposed continuation directions will be the basis for the project. Discussions will take place during all the meetings, so attendance in the lectures is mandatory.

Books

Dive into Deep Learning, Aston Zhang, Zack C. Lipton, Mu Li, Alex Smola, 2020
Machine Learning with PyTorch and Scikit-Learn, Sebastian Raschka, Packt , 2022
Understand Deep Learning, Simon J.D. Prince, MIT Press, 2023
Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, MIT Press , 2016

Grade composition

Final exam: 40%
Assignment: 30%
Final project: 30%

Transformers and Large Language Models

Course ECE 0480011

Topics

Introduction
RNN, LSTM, sequence-to-sequence, attention mechanism, LAS Transformers as database query, self-attention mechanism
Components of Transformers and their usage
Transformers implementation in NLP
Transformers Implementation in Speech
Transformers Implementation in Vision
Prompts
In-context learning
The alignment problem and its limitation
Tuning Transformers using Reinforcement learning from human feedback (RLHF)
In-context learning
Flash Networks; RWKV; interpretability

The course begins with an in-depth exploration of the core ideas that gave rise to the Transformer architecture, unpacking its structure, intuition, and evolution. We’ll then dive into the most influential research in the field—both theoretical advances and practical innovations—before examining how Transformers are applied across key domains, including natural language processing, speech recognition, and computer vision.

In the second half of the course, students will take a more active role by presenting selected papers from recent literature and developing a mini-project inspired by their assigned work, gaining hands-on experience with cutting-edge developments in the field.

Meetings

After approximately 7-8 introductory lectures by the lecturer, the course will shift focus to student-led presentations based on assigned projects. Each student will be assigned one or more recent papers in the field, which will serve as the foundation for their project work.

The project will consist of the following components:

Reading and understanding the assigned paper(s)
Implementing any proposed algorithms, if applicable
Investigating potential extensions or improvements to the original work
Preparing a presentation summarizing the work (the slides are subject to the lecturer’s approval)
Delivering the presentation to the class

Grade composition

50% of the grade will be dictated by the quality of the presentation and the lecture given on the assigned paper.
50% of the grade will be dictated by the quality of the work that goes beyond the assigned paper’s content