** Exact topics and schedule subject to change, based on interests and time. **

Part Topics Readings
1 Introduction [slides] [video]
  • What is Multimodal? Definitions, dimensions of heterogeneity and cross-modal interactions.
  • Historical view and multimodal research tasks.
  • Core technical challenges: representation, alignment, transference, reasoning, generation, and quantification.
2 Representation [slides] [video]
  • Representation fusion: additive, multiplicative, non-linear, complex fusion strategies.
  • Representation coordination: contrastive learning, vector-space models, canonical correlation analysis.
  • Representation fission: factorization, component analysis, clustering.
3 Alignment [slides] [video]
  • Discrete alignment: grounding, optimal transport, distribution matching.
  • Continuous alignment: time warping, CTC, temporal alignment, clustering.
  • Aligned representations: attention models, multimodal transformers.
4 Reasoning [slides] [video]
  • Structure: hierarchical, graphical, temporal, and interactive structure, structure discovery.
  • Concepts: dense and neuro-symbolic.
  • Inference: logical and causal inference.
  • Knowledge: external knowledge bases, commonsense reasoning.
5 Generation [slides] [video]
  • Summarization, translation, and creation.
  • Model evaluation and ethical concerns.
6 Transference [slides] [video]
  • Transfer via pre-trained models: pre-trained models, prefix tuning, representation tuning, multitask models.
  • Co-learning: co-learning via representation and generation.
7 Quantification [slides] [video]
  • Dimensions of heterogenity: modality importance, dataset biases, social biases, noise topologies and robustness.
  • Cross-modal interactions: interpreting cross-model connections and interactions.
  • Learning: learning and optimization challenges.