Joseph Keshet

Deep Learning

Topics
  • Introduction
  • Optimization
  • Efficient Differentiation
  • Single Neuron
  • Multilayer Neural Networks
  • Neural Networks for Vision Tasks
  • Neural Networks for Sequential Tasks
  • Training Methods
  • Impact Analysis of Training Methods
  • Data Efficiency and Pre-training
  • Resource Efficiency and Model Compression

We will explore both the theoretical foundations and practical techniques for designing, building, and analyzing deep neural networks, with a focus on supervised learning. Topics include the behavior and convergence of gradient descent and its variants, efficient methods for automatic differentiation, and the theoretical and empirical properties of multilayer networks—such as approximation capabilities, initialization strategies, generalization, and symmetry. We will also cover convolutional networks and their extensions for visual tasks, advanced training methods and their analysis, and neural architectures for sequential data, including recurrent networks, attention mechanisms, and transformers. Additionally, the course addresses strategies to improve data efficiency (e.g., pre-training, self-supervised learning) and resource efficiency (e.g., model quantization, pruning).

This course is taught with Prof. Daniel Soudry.

Meetings

There will be 12 meetings: the first 7-8 meetings will be given by the lecturer (me). The last meetings will be given by the course participants (see note about working in pairs under the grade composition). Each student will have to present a paper and possible future directions. The article described in class and the proposed continuation directions will be the basis for the project. Discussions will take place during all the meetings, so attendance in the lectures is mandatory.

Books
Grade composition
  • Final exam: 40%
  • Assignment: 30%
  • Final project: 30%

References

Deep Learning

This course covers key theoretical and practical tools for deep learning, with an emphasis on supervised learning. Topics include gradient descent, fully-connected and convolutional networks, training methods, and architectures for sequential data like transformers. We also explore techniques to improve data and resource efficiency, such as pre-training, self-supervised learning, quantization, and pruning.