Image-sentence Matching 模型整理 (持续更新)

最新推荐文章于 2022-12-14 22:49:25 发布

斡艾汀余

最新推荐文章于 2022-12-14 22:49:25 发布

阅读量2.5k

点赞数 1

分类专栏： Deep Learning 文章标签： Deep Learning

本文链接：https://blog.csdn.net/lry_xueshu/article/details/83030906

版权

Deep Learning 专栏收录该内容

8 篇文章 1 订阅

订阅专栏

DeViSE: DeViSE: A Deep Visual-Semantic Embedding Model, NIPS, 2013 (tri, AlexNet, w2v)
SDT-RNN: Grounded Compositional Semantics for Finding and Describing Images with Sentences (tri, CNN, w2v + RNN*)
VSE0: Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models, NIPSw, 2014 (tri, CNN, w2v + LSTM)
Deep Fragment: Deep Fragment Embeddings for Bidirectional Image Sentence Mapping, NIPS, 2014 (tri, R-CNN, w2v)
m-RNN: Explain images with multimodal recurrent neural networks, arXiv, 2014 (LL, VGG16, one-hot + simple RNN)
DCCA: Deep Correlation for Matching Images and Text, CVPR, 2015 (corr, AlexNet, TF-IDF)
DVSA: Deep Visual-Semantic Alignments for Generating Image Descriptions, ICCV, 2015 (tri, R-CNN, w2v + RNN)
LRCN: Long-term Recurrent Convolutional Networks for Visual Recognition and Description, CVPR, 2015 (LL, VGG16, one-hot + LSTM)
m-CNN: Multimodal Convolutional Neural Networks for Matching Image and Sentence, ICCV, 2015 (tri, VGG19, w2v + CNN)
GMM-FV: Associating neural word embeddings with deep image representations using fisher vectors, CVPR, 2015 (VGG19, w2v + GMM + HGLMM)
VQA-A: Leveraging visual question answering for image-caption ranking, ECCV, 2016 (LL, VGG19, BOW + LSTM)
RNN-FV: RNN Fisher Vectors for Action Recognition and Image Annotation, ECCV, 2016 (LL, VGG19, GMM-FV)
SPE: Learning Deep Structure-Preserving Image-Text Embeddings, CVPR, 2016 (tri, VGG19, GMM-FV)
HM-LSTM: Hierarchical Multimodal LSTM for Dense Visual-Semantic Embedding, ICCV, 2017 (tri, R-CNN, w2v + LSTM)
sm-LSTM: Instance-aware Image and Sentence Matching with Selective Multimodal LSTM, CVPR, 2017 (tri, VGG19, w2v + Bi-LSTM)
RRF-Net: Learning a Recurrent Residual Fusion Network for Multimodal Matching, ICCV, 2017 (tri, ResNet152, GMM-FV)
2WayNet: Linking Image and Text with 2-Way Nets, CVPR, 2017 (corr, VGG16, GMM-FV)
DAN: Dual Attention Networks for Multimodal Reasoning and Matching, CVPR, 2017 (tri, ResNet152, one-hot + Bi-LSTM)
DPC: Dual-Path Convolutional Image-Text Embedding with Instance Loss, arXiv, 2017 (tri + CE, ResNet152, w2v + ResNet152)
VSE++: VSE++: Improving Visual-Semantic Embeddings with Hard Negatives, BMVC, 2018 (tri, ResNet152, w2v + GRU)
SCO: Learning Semantic Concepts and Order for Image and Sentence Matching, CVPR, 2018, (tri, ResNet152, one-hot + conventional LSTM)
GNX: Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models, CVPR, 2018 (tri + CE + RL + GAN, ResNet152, Bi-GRU)
SCAN: Stacked Cross Attention for Image-Text Matching, ECCV, 2018 (tri, Faster R-CNN (ResNet101), one-hot -> w2v + Bi-GRU)
Multi-task Learning of Hierarchical Vision-Language Representation, CVPR, 2019
Saliency-Guided Attention Network for Image-Sentence Matching, arXiv, 2019 (SOTA now!)