第四十九周学习笔记

最新推荐文章于 2022-09-26 16:22:31 发布

luputo

最新推荐文章于 2022-09-26 16:22:31 发布

阅读量325

点赞数 1

分类专栏：学习笔记

本文链接：https://blog.csdn.net/luo3300612/article/details/95644707

版权

学习笔记专栏收录该内容

61 篇文章 3 订阅

订阅专栏

第四十九周学习笔记

论文阅读概述

neural baby talk: This article introduce a novel model dubbed neural baby talk to exploit object detection information in image caption generation. To put object word in caption, it firstly generates a template in which object words are replaced by a pointer to correspond region. Then use MLP to predict its find-grained form. Achieve SoTA performance.
Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions?: This article studies the relevance between SoTA VQA models unsupervised attention mechanism and human attention mechanism by comparing their attention heat map to draw a conclusion that VQA model and human attend differently.
GroupCap: Group-based Image Captioning with Structured Relevance and Diversity Constraints: This article introduce a novel model GroupCap to take the relevance between images as a factor to image caption generation by visual parser tree, achieving better performance on created datasets.
SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning: This article introduce a novel attention mechanism named channel-wise attention and multi-layer attention to weight average on not only spatially but also “channelly” in multiple CNN layers to get a more contextual image feature, achieving tiny improvement on COCO and Flickr
Skeleton Key: Image Captioning by Skeleton-Attribute Decomposition: This article firstly come up with the idea to divide image caption to two sub-task——skeleton caption and attribute caption to make the whole process more similar to human intuition and make attribute generation able to exploit the information of object in skeleton, achieving tiny improvement on COCO and created Stock3M.
Deep Reinforcement Learning-based Image Captioning with Embedding Reward: This article introduce a new image captioning framework which use policy network and value network in reinforcement learning to generate captions as local and global guiding respectively, achieving better performance than SoTA models.
Attend to You: Personalized Image Captioning with Context Sequence Memory Networks: This article introduce a novel model with well-designed memory mechanism to address personalization image captioning, which eliminate drawbacks like unablilty to model long-term dependency and gradient vanishing in RNN, achieving better than baselines
Context-aware Captions from Context-agnostic Supervision: This article introduce a novel loss to make caption more discriminative to others by emmiter-suppressor mechanism to encourage model generate differently on different classes of object’s image, making model adaptively find the difference between target and negative image(class)

本周小结

上周目标：

读完17-19年image captioning的CVPR论文：剩余8篇
整理近年来的SoTA image captioning model：没做
整理先前阅读的所有论文：没做
整理论文的书写方法：没做
记录重要的引用文献：没做
研究CIDEr optimization和top-down model的细节：没做
所有模型跑一个CIDEr optimization的版本：没跑完
跑基于top-down feature的模型：没跑

本周花了两天时间准备组会报告，所以很多事情没来得及做完，就作为下周的目标

下周目标

读完17-19年image captioning的CVPR论文
整理近年来的SoTA image captioning model
整理先前阅读的所有论文
整理论文的书写方法
整理重要的引用文献
研究CIDEr optimization和top-down model的细节
所有模型跑一个CIDEr optimization的版本
跑基于top-down feature的模型

luputo

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
第四十九周学习笔记

第四十九周学习笔记论文阅读概述neural baby talk: This article introduce a novel model dubbed neural baby talk to exploit object detection information in image caption generation. To put object word in caption, it ...
复制链接

扫一扫

专栏目录