Readings
Week 2:
- Baltrusaitis et al., Multimodal Machine Learning: A Survey and Taxonomy. TPAMI 2018
- Bengio et al., Representation Learning: A Review and New Perspectives. TPAMI 2013
Week 3:
- Zeiler and Fergus, Visualizing and Understanding Convolutional Networks. ECCV 2014
- Selvaraju et al., Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. ICCV 2017
- Karpathy et al., Visualizing and Understanding Recurrent Networks. arXiv 2015
- Khandelwal et al., Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context. ACL 2018
Week 4:
- Owens et al., Audio-Visual Scene Analysis with Self-Supervised Multisensory Features. ECCV 2018
- Wang et al., Learning Deep Structure-Preserving Image-Text Embeddings. CVPR 2016
- Eisenschtat and Wolf, Linking Image and Text with 2-Way Nets. CVPR 2017
- Zhang et al., AE2-Nets: Autoencoder in Autoencoder Networks. CVPR 2019
Week 5:
- Anderson et al., Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering. CVPR 2018
- Wiegreffe and Pinter, Attention is not not Explanation. EMNLP 2019
- Le et al., Multimodal Transformer Networks for End-to-End Video-Grounded Dialogue Systems. ACL 2019
- Tan and Bansal, LXMERT: Learning Cross-Modality Encoder Representations from Transformers. EMNLP 2019
Week 7:
- Mao et al., The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision. ICLR 2019
- Kottur et al., Visual Coreference Resolution in Visual Dialog using Neural Module Networks. ECCV 2018
- Cuturi and Blondel, Soft-DTW: a Differentiable Loss Function for Time-Series. ICML 2017
- Zhu et al., Toward Multimodal Image-to-Image Translation. NeurIPS 2017
Week 8:
- Sigurdsson et al., Asynchronous Temporal Fields for Action Recognition. CVPR 2017
- Dai et al., Detecting Visual Relationships with Deep Relational Networks. CVPR 2017
- Wu and Goodman, Multimodal Generative Models for Scalable Weakly-Supervised Learning. NeurIPS 2018
- Zhu et al., Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. ICCV 2017
Week 9:
- Lee et al., Making Sense of Vision and Touch: Learning Multimodal Representations for Contact-Rich Tasks. ICRA 2019
- Luketina at al., A Survey of Reinforcement Learning Informed by Natural Language. IJCAI 2019
- Das et al., Neural Modular Control for Embodied Question Answering. CoRL 2018
- Dai et al., Towards Diverse and Natural Image Descriptions via a Conditional GAN. ICCV 2019
Week 10:
- Pang and Wang, Guessing State Tracking for Visual Dialogue. ECCV 2020
- Hu et al., Iterative Answer Prediction with Pointer-Augmented Multimodal Transformers for TextVQA. CVPR 2020
- Hudson and Manning, Learning by Abstraction: The Neural State Machine. NeurIPS 2019
- Hill et al., Grounded Language Learning Fast and Slow. 2020
Week 12:
- Anderson et al., Sim-to-Real Transfer for Vision-and-Language Navigation. CoRL 2020
- Blukis et al., Mapping Navigation Instructions to Continuous Control Actions with Position-Visitation Prediction. CoRL 2018
- Kojima et al., What is Learned in Visually Grounded Neural Syntax Acquisition. ACL 2020
- Zhu et al., The Return of Lexical Dependencies: Neural Lexicalized PCFGs. TACL 2020
Week 14:
- Alikhani et al., Clue: Cross-modal Coherence Modeling for Caption Generation. ACL 2020
- Agarwal et al., History for Visual Dialog: Do we really need it?. ACL 2020
- Barocas and Selbst, Big Data’s Disparate Impact. California Law Review 2016
- Hovy and Spruit, The Social Impact of Natural Language Processing. ACL 2016