Study Target:
- understand of the current state of machine learning development
- study about GNN
- study about pretrain model about BERT-class multimodal
- review at least 20 papers
Studied Contents:
NLP:
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
GNN:
- Literature:Vision GNN: An Image is Worth Graph of Nodes
- Literature:Dynamic graph cnn for learning on point clouds
- Literature:Deepgcns: Can gcns go as deep as cnns?
BERT-class multimodal model
- Literature:LXMERT: Learning Cross-Modality Encoder Representations from Transformers
- Literature:ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Vision Transformer
- Vision Xformers: Efficient Attention for Image Classification
- A Survey of Visual Transformers
Thesis
- Visual and Textual Common Semantic Spaces for the Analysis of Multimodal Content
Video
- CMU’s Multimodal Machine Learning course (11-777), Fall 2020 - YouTube