Joseph Keshet

Courses

I teach three courses: Speech Processing with Deep Learning, Deep Learning; Transformers, and Large Language Models. These courses align with my research interests in speech recognition and understanding, conversational AI, and deep learning.

Courses Details

Speech Processing with Deep Learning

Course ECE 460747 / DDS 970201
Topics
  • Introduction to the speech signal
  • Mechanical model of speech generation
  • Speech signal representation
  • Single word recognition: Keyword spotting
  • From single words to large vocabulary continues speech recognition
  • Acoustic Modeling and Connectionist Temporal Classification (CTC)
  • Modern deep network architectures for sequential data
  • The CPC loss and self-supervised learning
  • Speech synthesis and text-to-speech (TTS)
  • Conversational systems

Speech recognition and synthesis have become foundational components in the development of modern artificial intelligence, bridging the gap between human communication and machine interaction. This course aims to provide a comprehensive introduction to the core principles of speech signal processing and modeling, essential for understanding how machines perceive and generate human language.

Read more

We begin with an exploration of the fundamental nature of the speech signal, examining how humans produce speech and the elements that characterize it—such as pitch, formants, and prosody. We delve into the mechanics behind speech generation, using mathematical models to represent the speech process, notably through Linear Predictive Coding (LPC) and autoregressive models. As speech signals are often complex and variable, we explore various techniques to transform these signals into machine-interpretable formats using spectral representations like STFT, Mel-spectrum, and MFCCs.

Progressing to practical applications, the course covers keyword spotting and modern automatic speech recognition (ASR). With advancements in deep learning, we explore modern architectures such as RNNs, LSTMs, and transformers, and the Connectionist Temporal Classification (CTC) loss function, which have revolutionized speech processing by leveraging neural networks for sequence modeling and self-supervised learning.

The course culminates with state-of-the-art speech synthesis techniques, including autoregressive models like WaveNet, and GAN-based approaches, leading to more natural and human-like text-to-speech systems. Advanced topics will also cover emerging trends in speech-language models that operate without explicit text, showcasing the future trajectory of research in this field.

Books
  • Quatieri, Discrete time speech signal processing, 2002
  • Rabiner & Schafer Introduction to Digital Speech Processing, 2007
Grade composition
  • Assignments: 50%
  • Exam: 50%

Deep Learning

Course ECE 460217
Topics
  • Introduction
  • Optimization
  • Efficient Differentiation
  • Single Neuron
  • Multilayer Neural Networks
  • Neural Networks for Vision Tasks
  • Neural Networks for Sequential Tasks
  • Training Methods
  • Impact Analysis of Training Methods
  • Data Efficiency and Pre-training
  • Resource Efficiency and Model Compression

We will explore both the theoretical foundations and practical techniques for designing, building, and analyzing deep neural networks, with a focus on supervised learning. Topics include the behavior and convergence of gradient descent and its variants, efficient methods for automatic differentiation, and the theoretical and empirical properties of multilayer networks—such as approximation capabilities, initialization strategies, generalization, and symmetry. We will also cover convolutional networks and their extensions for visual tasks, advanced training methods and their analysis, and neural architectures for sequential data, including recurrent networks, attention mechanisms, and transformers. Additionally, the course addresses strategies to improve data efficiency (e.g., pre-training, self-supervised learning) and resource efficiency (e.g., model quantization, pruning).

Read more

This course is taught with Prof. Daniel Soudry.

Meetings

There will be 12 meetings: the first 7-8 meetings will be given by the lecturer (me). The last meetings will be given by the course participants (see note about working in pairs under the grade composition). Each student will have to present a paper and possible future directions. The article described in class and the proposed continuation directions will be the basis for the project. Discussions will take place during all the meetings, so attendance in the lectures is mandatory.

Books
Grade composition
  • Final exam: 40%
  • Assignment: 30%
  • Final project: 30%

Transformers and Large Language Models

Course ECE 0480011
Topics
  • Introduction
  • RNN, LSTM, sequence-to-sequence, attention mechanism, LAS Transformers as database query, self-attention mechanism
  • Components of Transformers and their usage
  • Transformers implementation in NLP
  • Transformers Implementation in Speech
  • Transformers Implementation in Vision
  • Prompts
  • In-context learning
  • The alignment problem and its limitation
  • Tuning Transformers using Reinforcement learning from human feedback (RLHF)
    In-context learning
  • Flash Networks; RWKV; interpretability

The course begins with an in-depth exploration of the core ideas that gave rise to the Transformer architecture, unpacking its structure, intuition, and evolution. We’ll then dive into the most influential research in the field—both theoretical advances and practical innovations—before examining how Transformers are applied across key domains, including natural language processing, speech recognition, and computer vision.

Read more

In the second half of the course, students will take a more active role by presenting selected papers from recent literature and developing a mini-project inspired by their assigned work, gaining hands-on experience with cutting-edge developments in the field.

Meetings

After approximately 7-8 introductory lectures by the lecturer, the course will shift focus to student-led presentations based on assigned projects. Each student will be assigned one or more recent papers in the field, which will serve as the foundation for their project work.

The project will consist of the following components:

  • Reading and understanding the assigned paper(s)

  • Implementing any proposed algorithms, if applicable

  • Investigating potential extensions or improvements to the original work

  • Preparing a presentation summarizing the work (the slides are subject to the lecturer’s approval)

  • Delivering the presentation to the class

 

Grade composition
  • 50% of the grade will be dictated by the quality of the presentation created and the lecture given
  • 50% of the grade will be dictated by the quality of the work that goes beyond the paper’s content