Deep Learning
We will explore both the theoretical foundations and practical techniques for designing, building, and analyzing deep neural networks, with a focus on supervised learning. Topics include the behavior and convergence of gradient descent and its variants, efficient methods for automatic differentiation, and the theoretical and empirical properties of multilayer networks—such as approximation capabilities, initialization strategies, generalization, and symmetry. We will also cover convolutional networks and their extensions for visual tasks, advanced training methods and their analysis, and neural architectures for sequential data, including recurrent networks, attention mechanisms, and transformers. Additionally, the course addresses strategies to improve data efficiency (e.g., pre-training, self-supervised learning) and resource efficiency (e.g., model quantization, pruning).
This course is taught with Prof. Daniel Soudry.
Meetings
There will be 12 meetings: the first 7-8 meetings will be given by the lecturer (me). The last meetings will be given by the course participants (see note about working in pairs under the grade composition). Each student will have to present a paper and possible future directions. The article described in class and the proposed continuation directions will be the basis for the project. Discussions will take place during all the meetings, so attendance in the lectures is mandatory.
Books
- Dive into Deep Learning, Aston Zhang, Zack C. Lipton, Mu Li, Alex Smola, 2020
- Machine Learning with PyTorch and Scikit-Learn, Sebastian Raschka, Packt , 2022
- Understand Deep Learning, Simon J.D. Prince, MIT Press, 2023
- Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, MIT Press , 2016
Grade composition
- Final exam: 40%
- Assignment: 30%
- Final project: 30%
References
Deep Learning
This course covers key theoretical and practical tools for deep learning, with an emphasis on supervised learning. Topics include gradient descent, fully-connected and convolutional networks, training methods, and architectures for sequential data like transformers. We also explore techniques to improve data and resource efficiency, such as pre-training, self-supervised learning, quantization, and pruning.