Multimodal machine learning (MMML) is a vibrant multi-disciplinary research field which addresses some of the original goals of artificial intelligence by integrating and modeling multiple communicative modalities, including language, vision, and acoustic. This research field brings some unique challenges for multimodal researchers given the heterogeneity of the data and the contingency often found between modalities. This course is designed to be a graduate-level course covering recent research papers in multimodal machine learning, including technical challenges with representation, alignment, reasoning, generation, co-learning and quantifications. The main goal of the course is to increase critical thinking skills, knowledge of recent technical achievements, and understanding of future research directions.


  • Time: Friday 10:10-11:30 am
  • Location: Virtual for the first 2 weeks (find zoom link in piazza), GHC 5222 thereafter
  • Discussion and Q&A: Piazza
  • Assignment submissions: Canvas (for registered students only)
  • Contact: Students should ask all course-related questions on Piazza, where you will also find announcements.