Date Lecture Topics
9/1 Lecture 1.1:
Course introduction
[ slides | video ]

Research and technical challenges
Course syllabus and requirements

9/3 Lecture 1.2:
Multimodal applications and datasets
[ slides | video ]

Research tasks and datasets
Team projects

9/8 Lecture 2.1:
Basic concepts: neural networks
[ slides | video ]

Language, visual and acoustic
Loss functions and neural networks

9/10 Lecture 2.2:
Basic concepts: network optimization
[ slides | video ]

Gradients and backpropagation
Practical deep model optimization

9/15 Lecture 3.1:
Visual unimodal representations
[ slides | video ]

Convolutional kernels and CNNs
Residual network and skip connection

9/17 Lecture 3.2:
Language unimodal representations
[ slides | video ]

Gated networks and LSTM
Backpropagation Through Time

9/22 Lecture 4.1:
Multimodal representation learning
[ slides | video ]

Multimodal auto-encoders
Multimodal joint representations

9/24 Lecture 4.2:
Coordinated representations
[ slides | video ]

Deep canonical correlation analysis
Non-negative matrix factorization

9/29 Lecture 5.1:
Multimodal alignment
[ slides | video ]

Explicit - dynamic time warping
Implicit - attention models

10/1 Lecture 5.2:
Alignment and representation
[ slides | video ]

Multi-head attention
Multimodal transformers

10/6 Lecture 6.1: First project assignment (live working sessions instead of lectures)
10/8 Lecture 6.2: First project assignment (live working sessions instead of lectures)
10/13 Lecture 7.1:
Alignment and translation
[ slides | video ]

Module networks
Tree-based and stack models

10/15 Lecture 7.2:
Probabilistic graphical models
[ slides | video ]

Dynamic Bayesian networks
Coupled and factor HMMs

10/20 Lecture 8.1:
Discriminative graphical models
[ slides | video ]

Conditional random fields
Continuous and fully-connected CRFs

10/22 Lecture 8.2:
Deep Generative Models
[ slides | video ]

Variational auto-encoder
Generative adversarial networks

10/27 Lecture 9.1:
Reinforcement learning
[ slides | video ]

Markov decision process
Q learning and Deep Q learning

10/29 Lecture 9.2:
Multimodal RL
[ slides | video ]

Policy gradients
Multimodal applications

11/3 Lecture 10.1:
Fusion and co-learning
[ slides | video ]

Multi-kernel learning and fusion
Few shot learning and co-learning

11/5 Lecture 10.2:
New research directions
[ slides | video ]

Recent approaches in multimodal ML

11/10 Lecture 11.1: Mid-term project assignment (live working sessions instead of lectures)
11/12 Lecture 11.2: Mid-term project assignment (live working sessions instead of lectures)
11/17 Lecture 12.1:
Embodied Language Grounding
[ slides | video ]

Connecting Language to Action
Guest lecture by Yonatan Bisk

11/19 Lecture 12.2:
Multimodal Human-inspired Language Learning
[ slides | video ]

Grounded langauge learning
Guest lecture by Graham Neubig

11/24 Lecture 13.1: Thanksgiving week (no lectures)
11/26 Lecture 13.2: Thanksgiving week (no lectures)
12/1 Lecture 14.2:
Learning to connect text and images
[ slides | video ]

Discourse approaches, text & images
Guest lecture by Malihe Alikhani

12/3 Lecture 14.1:
Bias and fairness
[ slides | video ]

Computational Ethics
Guest lecture by Yulia Tsvetkov

12/8 Lecture 15.1: Final project assignment (live working sessions instead of lectures)
12/10 Lecture 15.2: Final project assignment (live working sessions instead of lectures)