Resources

MultiBench datasets:

MUStARD: Castro et al., Towards multimodal sarcasm detection (an obviously perfect paper), ACL 2019
CMU-MOSI: Zadeh et al., MOSI: multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos, IEEE Intelligent Systems 2016
UR-FUNNY: Hasan et al., UR-FUNNY: A multimodal language dataset for understanding humor, EMNLP 2019
CMU-MOSEI: Zadeh et al., Multimodal language analysis in the wild: CMU-MOSEI dataset and interpretable dynamic fusion graph, ACL 2018
MIMIC: Johnson et al., MIMIC-III, a freely accessible critical care database, Nature Scientific Data 2016
MuJoCo Push: Lee et al., Multimodal sensor fusion with differentiable filters, IROS 2020
Vision & Touch: Lee et al., Making sense of vision and touch: Self-supervised learning of multimodal representations for contact-rich tasks, ICRA 2019
ENRICO: Leiva et al., Enrico: A dataset for topic modeling of mobile UI designs, MobileHCI 2020
MM-IMDb: Arevalo et al., Gated multimodal units for information fusion, ICLR workshop 2017
AV-MNIST: Vielzeuf et al., Centralnet: a multilayer approach for multimodal fusion, ECCV workshop 2018
Kinetics-400: Kay et al., The kinetics human action video dataset, arXiv 2017

Other resources:

awesome-multimodal-ml: A Reading List for Topics in Multimodal Machine Learning