11-777 MMML | Schedule

Date	Lecture	Topics
9/1	Lecture 1.1: Course introduction [ slides \| video ]	Research and technical challenges Course syllabus and requirements
9/3	Lecture 1.2: Multimodal applications and datasets [ slides \| video ]	Research tasks and datasets Team projects
9/8	Lecture 2.1: Basic concepts: neural networks [ slides \| video ]	Language, visual and acoustic Loss functions and neural networks
9/10	Lecture 2.2: Basic concepts: network optimization [ slides \| video ]	Gradients and backpropagation Practical deep model optimization
9/15	Lecture 3.1: Visual unimodal representations [ slides \| video ]	Convolutional kernels and CNNs Residual network and skip connection
9/17	Lecture 3.2: Language unimodal representations [ slides \| video ]	Gated networks and LSTM Backpropagation Through Time
9/22	Lecture 4.1: Multimodal representation learning [ slides \| video ]	Multimodal auto-encoders Multimodal joint representations
9/24	Lecture 4.2: Coordinated representations [ slides \| video ]	Deep canonical correlation analysis Non-negative matrix factorization
9/29	Lecture 5.1: Multimodal alignment [ slides \| video ]	Explicit - dynamic time warping Implicit - attention models
10/1	Lecture 5.2: Alignment and representation [ slides \| video ]	Multi-head attention Multimodal transformers
10/6	Lecture 6.1: First project assignment (live working sessions instead of lectures)
10/8	Lecture 6.2: First project assignment (live working sessions instead of lectures)
10/13	Lecture 7.1: Alignment and translation [ slides \| video ]	Module networks Tree-based and stack models
10/15	Lecture 7.2: Probabilistic graphical models [ slides \| video ]	Dynamic Bayesian networks Coupled and factor HMMs
10/20	Lecture 8.1: Discriminative graphical models [ slides \| video ]	Conditional random fields Continuous and fully-connected CRFs
10/22	Lecture 8.2: Deep Generative Models [ slides \| video ]	Variational auto-encoder Generative adversarial networks
10/27	Lecture 9.1: Reinforcement learning [ slides \| video ]	Markov decision process Q learning and Deep Q learning
10/29	Lecture 9.2: Multimodal RL [ slides \| video ]	Policy gradients Multimodal applications
11/3	Lecture 10.1: Fusion and co-learning [ slides \| video ]	Multi-kernel learning and fusion Few shot learning and co-learning
11/5	Lecture 10.2: New research directions [ slides \| video ]	Recent approaches in multimodal ML
11/10	Lecture 11.1: Mid-term project assignment (live working sessions instead of lectures)
11/12	Lecture 11.2: Mid-term project assignment (live working sessions instead of lectures)
11/17	Lecture 12.1: Embodied Language Grounding [ slides \| video ]	Connecting language to action Guest lecture by Yonatan Bisk
11/19	Lecture 12.2: Multimodal Human-inspired Language Learning [ slides \| video ]	Grounded language learning Guest lecture by Graham Neubig
11/24	Lecture 13.1: Thanksgiving week (no lectures)
11/26	Lecture 13.2: Thanksgiving week (no lectures)
12/1	Lecture 14.1: Learning to connect text and images [ slides \| video ]	Discourse approaches, text & images Guest lecture by Malihe Alikhani
12/3	Lecture 14.2: Bias and fairness [ slides \| video ]	Computational Ethics Guest lecture by Yulia Tsvetkov
12/8	Lecture 15.1: Final project assignment (live working sessions instead of lectures)
12/10	Lecture 15.2: Final project assignment (live working sessions instead of lectures)

9/3

Lecture 1.2:
Multimodal applications and datasets
[ slides | video ]

Research tasks and datasets
Team projects

9/8

Lecture 2.1:
Basic concepts: neural networks
[ slides | video ]

Language, visual and acoustic
Loss functions and neural networks

9/10

Lecture 2.2:
Basic concepts: network optimization
[ slides | video ]

Gradients and backpropagation
Practical deep model optimization

9/15

Lecture 3.1:
Visual unimodal representations
[ slides | video ]

Convolutional kernels and CNNs
Residual network and skip connection

9/17

Lecture 3.2:
Language unimodal representations
[ slides | video ]

Gated networks and LSTM
Backpropagation Through Time

9/22

Lecture 4.1:
Multimodal representation learning
[ slides | video ]

Multimodal auto-encoders
Multimodal joint representations

9/24

Lecture 4.2:
Coordinated representations
[ slides | video ]

Deep canonical correlation analysis
Non-negative matrix factorization

9/29

Lecture 5.1:
Multimodal alignment
[ slides | video ]

Explicit - dynamic time warping
Implicit - attention models

10/1

Lecture 5.2:
Alignment and representation
[ slides | video ]

Multi-head attention
Multimodal transformers

10/6

Lecture 6.1: First project assignment (live working sessions instead of lectures)

10/8

Lecture 6.2: First project assignment (live working sessions instead of lectures)

10/13

Lecture 7.1:
Alignment and translation
[ slides | video ]

Module networks
Tree-based and stack models

10/15

Lecture 7.2:
Probabilistic graphical models
[ slides | video ]

Dynamic Bayesian networks
Coupled and factor HMMs

10/20

Lecture 8.1:
Discriminative graphical models
[ slides | video ]

Conditional random fields
Continuous and fully-connected CRFs

10/22

Lecture 8.2:
Deep Generative Models
[ slides | video ]

Variational auto-encoder
Generative adversarial networks

10/27

Lecture 9.1:
Reinforcement learning
[ slides | video ]

Markov decision process
Q learning and Deep Q learning

10/29

Lecture 9.2:
Multimodal RL
[ slides | video ]

Policy gradients
Multimodal applications

11/3

Lecture 10.1:
Fusion and co-learning
[ slides | video ]

Multi-kernel learning and fusion
Few shot learning and co-learning

11/5

Lecture 10.2:
New research directions
[ slides | video ]

Recent approaches in multimodal ML

11/10

Lecture 11.1: Mid-term project assignment (live working sessions instead of lectures)

11/12

Lecture 11.2: Mid-term project assignment (live working sessions instead of lectures)

11/17

Lecture 12.1:
Embodied Language Grounding
[ slides | video ]

Connecting language to action
Guest lecture by Yonatan Bisk

11/19

Lecture 12.2:
Multimodal Human-inspired Language Learning
[ slides | video ]

Grounded language learning
Guest lecture by Graham Neubig

11/24

Lecture 13.1: Thanksgiving week (no lectures)

11/26

Lecture 13.2: Thanksgiving week (no lectures)

12/1

Lecture 14.1:
Learning to connect text and images
[ slides | video ]

Discourse approaches, text & images
Guest lecture by Malihe Alikhani

12/3

Lecture 14.2:
Bias and fairness
[ slides | video ]

Computational Ethics
Guest lecture by Yulia Tsvetkov

12/8

Lecture 15.1: Final project assignment (live working sessions instead of lectures)

12/10

Lecture 15.2: Final project assignment (live working sessions instead of lectures)