** Exact topics and schedule subject to change, based on interests and time. **
1 |
Introduction [slides] [video]
-
What is Multimodal? Definitions, dimensions of heterogeneity and cross-modal interactions.
-
Historical view and multimodal research tasks.
-
Core technical challenges: representation, alignment, transference, reasoning, generation, and quantification.
|
|
2 |
Representation [slides] [video]
-
Representation fusion: additive, multiplicative, non-linear, complex fusion strategies.
-
Representation coordination: contrastive learning, vector-space models, canonical correlation analysis.
-
Representation fission: factorization, component analysis, clustering.
|
|
3 |
Alignment [slides] [video]
-
Discrete alignment: grounding, optimal transport, distribution matching.
-
Continuous alignment: time warping, CTC, temporal alignment, clustering.
-
Aligned representations: attention models, multimodal transformers.
|
|
4 |
Reasoning [slides] [video]
-
Structure: hierarchical, graphical, temporal, and interactive structure, structure discovery.
-
Concepts: dense and neuro-symbolic.
-
Inference: logical and causal inference.
-
Knowledge: external knowledge bases, commonsense reasoning.
|
|
5 |
Generation [slides] [video]
-
Summarization, translation, and creation.
-
Model evaluation and ethical concerns.
|
|
6 |
Transference [slides] [video]
-
Transfer via pre-trained models: pre-trained models, prefix tuning, representation tuning, multitask models.
-
Co-learning: co-learning via representation and generation.
|
|
7 |
Quantification [slides] [video]
-
Dimensions of heterogenity: modality importance, dataset biases, social biases, noise topologies and robustness.
-
Cross-modal interactions: interpreting cross-model connections and interactions.
-
Learning: learning and optimization challenges.
|
|