Multimodal Deep Learning - BKJackson/BKJackson_Wiki GitHub Wiki
Review Papers
Survey: Transformer-based Models in Data Modality Conversion
Research papers
FLAVA: A Foundational Language And Vision Alignment Model - Mar 29, 2022, Singh et al. at FAIR, Used for visual question answering.
TorchMultimodal Tutorial: Finetuning FLAVA - related PyTorch tutorial with code