image captioning经典论文分类整理+部分有源码

little06960

已于 2022-12-27 16:37:10 修改

阅读量953

点赞数 4

分类专栏：论文阅读文章标签：神经网络深度学习计算机视觉自然语言处理人工智能

于 2022-12-27 16:28:23 首次发布

本文链接：https://blog.csdn.net/little06960/article/details/128457764

版权

论文阅读专栏收录该内容

8 篇文章 1 订阅

订阅专栏

Attention-Based Methods

O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. CVPR 2015.

https://github.com/karpathy/neuraltalk

K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov, R. Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. ICML 2015.

https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Image-Captioning、https://github.com/yunjey/show-attend-and-tell

P. Anderson, X. He, C. Buehler, D. Teney, M. Johnson, S. Gould, and L. Zhang. Bottom-up and top-down attention for image captioning and visual question answering. CVPR 2018. https://github.com/peteanderson80/bottom-up-attention

J. Gu, J. Cai, G. Wang, and T. Chen. Stack-captioning: Coarse-to-fine learning for image captioning. AAAI 2018.

https://github.com/showkeyjar/chinese_im2text.pytorch

L. Huang, W. Wang, J. Chen, and X.-Y. Wei. Attention on attention for image captioning. ICCV, 2019.

https://github.com/husthuaan/AoANet

W. Jiang, L. Ma, Y.-G. Jiang, W. Liu, and T. Zhang. Recurrent fusion network for image captioning. ECCV 2018.

Attention-Based Methods that Consider Spatial and Semantic Relations between Image Elements

Image captioning: Transforming objects into words. NIPS 2019. S. Herdade, A. Kappeler, K. Boakye, and J. Soares.

https://github.com/yahoo/object_relation_transformer

X-linear attention networks for image captioning. CVPR, 2020. Y. Pan, T. Yao, Y. Li, and T. Mei. https://github.com/Panda-Peter/image-captioning

F. Liu, X. Ren, Y. Liu, K. Lei, and X. Sun. Exploring and distilling cross-modal information for image captioning. IJCAI, 2020.

Meshed-memory transformer for image captioning. CVPR 2020. M. Cornia, M. Stefanini, L. Baraldi, and R. Cucchiara.

https://github.com/aimagelab/meshed-memory-transformer

Oscar: Object semantics aligned pre-training for vision-language tasks. ECCV 2020. X. Li, X. Yin, C. Li, P. Zhang, X. Hu, L. Zhang, L. Wang, H. Hu, L. Dong, F. Wei, et al.https://github.com/microsoft/Oscar

Unified vision-language pre-training for image captioning and vqa. AAAI 2020. L. Zhou, H. Palangi, L. Zhang, H. Hu, J. Corso, and J. Gao.

https://github.com/LuoweiZhou/VLP

Show, control and tell: A framework for generating controllable and grounded captions. CVPR 2019. M. Cornia, L. Baraldi, and R. Cucchiara.https://github.com/aimagelab/show-control-and-tell

Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning. 2017-CVPR. Jiasen Lu2, Caiming Xiong, Devi Parikh.

https://github.com/jiasenlu/AdaptiveAttention

Graph-Based Methods for Spatial and Semantic Relations between Image Elements

Auto-encoding scene graphs for image captioning. CVPR, 2019. X. Yang, K. Tang, H. Zhang, and J. Cai.

https://github.com/yangxuntu/SGAE

J. Gu, S. Joty, J. Cai, H. Zhao, X. Yang, and G. Wang. Unpaired image captioning via scene graph alignments. ICCV 2019.

Yiwu Zhong, Liwei Wang, et al. Comprehensive Image Captioning via Scene Graph Decomposition. ECCV 2020.

https://github.com/YiwuZhong/Sub-GC

Combining Attention-Based Methods and Graph-Based Methods

T. Yao, Y. Pan, Y. Li, and T. Mei. Exploring visual relationship for image captioning. ECCV 2018. https://github.com/airsplay/VisualRelationships

S. Chen, Q. Jin, P. Wang, and Q. Wu. Say as you wish: Fine-grained control of image caption generation with abstract scene graphs. CVPR 2020.

https://github.com/cshizhe/asg2cap

Convolutional-Based Methods

J. Aneja, A. Deshpande, and A. G. Schwing. Convolutional image captioning. CVPR 2018.

https://github.com/aditya12agd5/convcap

Q. Wang and A. B. Chan. Cnn+ cnn: Convolutional decoders for image captioning. CoRR, 2018. https://github.com/qingzwang/GHA-ImageCaptioning

Unsupervised Methods and Reinforcement Learning

C. Chen, S. Mu, W. Xiao, Z. Ye, L. Wu, and Q. Ju. Improving image captioning with conditional generative adversarial nets. AAAI 2019. https://github.com/Anjaney1999/image-captioning-seqgan

X. Liu, H. Li, J. Shao, D. Chen, and X. Wang. Show, tell and discriminate: Image captioning by self-retrieval with partially labeled data. ECCV 2018.

Towards Unsupervised Image Captioning with Shared Multimodal Embeddings. ICCV2019.

Generating Multi-Style Captions

SentiCap: Generating Image Descriptions with Sentiments. Alexander Mathews 2016. 数据集

StyleNet: Generating Attractive Visual Captions with Styles. Chuang Gan et al. CVPR 2017.

“Factual” or “Emotional”: Stylized Image Captioning with Adaptive Learning and Attention. Tianlang Chen et al. CVPR 2018.

SemStyle: Learning to Generate Stylised Image Captions using Unaligned Text. Mathews A et al. CVPR 2018.

https://github.com/computationalmedia/semstyle

Beyond Narrative Description: Generating Poetry from Images by Multi-Adversarial Training. ACM MM 2018.

https://github.com/researchmm/img2poem

Engaging image captioning via personality. K. Shuster, S. Humeau, H. Hu, A. Bordes, and J. Weston. CVPR 2019.

Mscap: Multi-style image captioning with unpaired stylized text. L. Guo, J. Liu, P. Yao, J. Li, and H. Lu. CVPR 2019.

Unsupervised Stylish Image Description Generation via Domain Layer Norm. Cheng-Kuan Chen et al. AAAI 2019.

MemCap: Memorizing Style Knowledge for Image Captioning. Wentian Zhao, et al. AAAI 2020.

Human-like Controllable Image Captioning with Verb-specific Semantic Roles. Long Chen, Zhihong Jiang, Jun Xiao, Wei Liu. CVPR 2021.

https://github.com/mad-red/VSR-guided-CIC

3M: Multi-style image caption generation using Multi-modality features under Multi-UPDOWN model. Chengxi Li and Brent Harrison. arkiv2021.

StyleM: Stylized Metrics for Image Captioning Built with Contrastive N-grams. Chengxi Li and Brent Harrison. arkiv2022. 风格化描述度量指标

Papers about image caption按年份和会议补充

(2015-2020):

https://github.com/zhjohnchan/awesome-image-captioning

CVPR 2019:

Unsupervised Image Captioning - Yang F et al, CVPR 2019. https://github.com/fengyang0317/unsupervised_captioning

Pointing Novel Objects in Image Captioning - Li Y et al, CVPR 2019.

Context and Attribute Grounded Dense Captioning - Yin G et al, CVPR 2019.

Look Back and Predict Forward in Image Captioning - Qin Y et al, CVPR 2019.

Self-critical n-step Training for Image Captioning - Gao J et al, CVPR 2019.

Intention Oriented Image Captions with Guiding Objects - Zheng Y et al, CVPR 2019.

Describing like humans: on diversity in image captioning - Wang Q et al, CVPR 2019. https://github.com/qingzwang/DiversityMetrics

Adversarial Semantic Alignment for Improved Image Captions - Dognin P et al, CVPR 2019.

https://github.com/vacancy/SceneGraphParser

Fast, Diverse and Accurate Image Captioning Guided By Part-of-Speech - Aditya D et al, CVPR 2019.

Good News, Everyone! Context driven entity-aware captioning for news images - Biten A F et al, CVPR 2019.

https://github.com/furkanbiten/GoodNews

CapSal: Leveraging Captioning to Boost Semantics for Salient Object Detection - Zhang L et al, CVPR 2019.

https://github.com/zhangludl/code-and-dataset-for-CapSal

Dense Relational Captioning: Triple-Stream Networks for Relationship-Based Captioning - Kim D et al, CVPR 2019.

https://github.com/Dong-JinKim/DenseRelationalCaptioning

Exact Adversarial Attack to Image Captioning via Structured Output Learning With Latent Variables - Xu Y et al, CVPR 2019.

https://github.com/wubaoyuan/adversarial-attack-to-caption

AAAI 2019

Meta Learning for Image Captioning - Li N et al, AAAI 2019. https://github.com/facebookresearch/LaMCTS、https://github.com/linnanwang/AlphaX-NASBench101

Learning Object Context for Dense Captioning - Li X et al, AAAI 2019. https://github.com/ttengwang/ESGN

Hierarchical Attention Network for Image Captioning - Wang W et al, AAAI 2019. https://github.com/ltguo19/VSUA-Captioning

Improving Image Captioning with Conditional Generative Adversarial Nets - Chen C et al, AAAI 2019.

https://github.com/Anjaney1999/image-captioning-seqgan

ICCV 2019

Hierarchy Parsing for Image Captioning - Yao T et al, ICCV 2019.

Entangled Transformer for Image Captioning - Li G et al, ICCV 2019.

Reflective Decoding Network for Image Captioning - Ke L at al, ICCV 2019.

https://github.com/researchmm/generate-it

Learning to Collocate Neural Modules for Image Captioning - Yang X et al, ICCV 2019.

NeurIPS 2019

Adaptively Aligned Image Captioning via Adaptive Attention Time - Huang L et al, NeurIPS 2019.

https://github.com/husthuaan/AAT

Variational Structured Semantic Inference for Diverse Image Captioning - Chen F et al, NeurIPS 2019.

Aligning Visual Regions and Textual Concepts for Semantic-Grounded Image Representations - Liu F et al, NeurIPS 2019.

https://github.com/fenglinliu98/MIA

IJCAI 2019

Image Captioning with Compositional Neural Module Networks - Tian J et al, IJCAI 2019.

Exploring and Distilling Cross-Modal Information for Image Captioning - Liu F et al, IJCAI 2019.

Swell-and-Shrink: Decomposing Image Captioning by Transformation and Summarization - Wang H et al, IJCAI 2019.

Hornet: a hierarchical offshoot recurrent network for improving person re-ID via image captioning - Yan S et al, IJCAI 2019.

AAAI 2020

MemCap: Memorizing Style Knowledge for Image Captioning - Zhao et al, AAAI 2020.

https://github.com/entalent/MemCap

Unified Vision-Language Pre-Training for Image Captioning and VQA - Zhou L et al, AAAI 2020.

https://github.com/LuoweiZhou/VLP

Show, Recall, and Tell: Image Captioning with Recall Mechanism - Wang L et al, AAAI 2020.

Reinforcing an Image Caption Generator using Off-line Human Feedback - Hongsuck Seo P et al, AAAI, 2020.

Interactive Dual Generative Adversarial Networks for Image Captioning - Liu et al, AAAI 2020.

Feature Deformation Meta-Networks in Image Captioning of Novel Objects - Cao et al, AAAI 2020.

Joint Commonsense and Relation Reasoning for Image and Video Captioning - Hou et al, AAAI 2020.

Learning Long- and Short-Term User Literal-Preference with Multimodal Hierarchical Transformer Network for Personalized Image Caption - Zhang et al, AAAI 2020.

CVPR 2020

Normalized and Geometry-Aware Self-Attention Network for Image Captioning - Guo L et al, CVPR 2020.

Object Relational Graph with Teacher-Recommended Learning for Video Captioning - Zhang Z et al, CVPR 2020.

More Grounded Image Captioning by Distilling Image-Text Matching Model.

https://github.com/YuanEZhou/Grounded-Image-Captioning

Better Captioning with Sequence-Level Exploration.

ECCV 2020

Length-Controllable Image Captioning - Deng C et al, ECCV 2020.

https://github.com/ruotianluo/self-critical.pytorch

Captioning Images Taken by People Who Are Blind - Gurari D et al, ECCV 2020.

Towards Unique and Informative Captioning of Images - Wang Z et al, ECCV 2020.

https://github.com/princetonvisualai/SPICE-U

Learning Visual Representations with Caption Annotations - Sariyildiz M et al, ECCV 2020. https://github.com/MicPie/clasp

SODA: Story Oriented Dense Video Captioning Evaluation Framework - Fujita S et al, ECCV 2020.

https://github.com/fujiso/SODA

TextCaps: a Dataset for Image Captioning with Reading Comprehension - Sidorov O et al, ECCV 2020.

Compare and Reweight: Distinctive Image Captioning Using Similar Images Sets - Wang J et al, ECCV 2020.

Learning to Generate Grounded Visual Captions without Localization Supervision - Ma C et al, ECCV 2020.

https://github.com/chihyaoma/cyclical-visual-captioning

Fashion Captioning: Towards Generating Accurate Descriptions with Semantic Rewards - Yang X et al, ECCV 2020.

https://github.com/xuewyang/Fashion_Captioning

NeurIPS 2020

Diverse Image Captioning with Context-Object Split Latent Spaces - Mahajan S et al, NeurIPS 2020.

https://github.com/visinf/cos-cvae

RATT: Recurrent Attention to Transient Tasks for Continual Image Captioning - Chiaro R et al, NeurIPS 2020.

https://github.com/delchiaro/RATT

CVPR 2021

Towards Accurate Text-based Image Captioning with Content Diversity Exploration. Guanghui Xu et al. CVPR2021.

https://github.com/guanghuixu/AnchorCaptioner

Image Change Captioning by Learning from an Auxiliary Task. Mehrdad Hosseinzadeh and Yang Wang.

FAIEr: Fidelity and Adequacy Ensured Image Caption Evaluation. Sijin Wang et al.

Improving OCR-based Image Captioning by Incorporating Geometrical Relationship. Jing Wang et al.

Scan2Cap: Context-aware Dense Captioning in RGB-D Scans. Dave Zhenyu Chen et al.

https://github.com/daveredrum/Scan2Cap

CVPR2022

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation.Junnan Li Dongxu Li Caiming Xiong Steven Hoi.2022-CVPR. https://github.com/salesforce/BLIP

未完

2022-02-13

by littleoo