第四十九周学习笔记

第四十九周学习笔记

论文阅读概述

  • neural baby talk: This article introduce a novel model dubbed neural baby talk to exploit object detection information in image caption generation. To put object word in caption, it firstly generates a template in which object words are replaced by a pointer to correspond region. Then use MLP to predict its find-grained form. Achieve SoTA performance.
  • Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions?: This article studies the relevance between SoTA VQA models unsupervised attention mechanism and human attention mechanism by comparing their attention heat map to draw a conclusion that VQA model and human attend differently.
  • GroupCap: Group-based Image Captioning with Structured Relevance and Diversity Constraints: This article introduce a novel model GroupCap to take the relevance between images as a factor to image caption generation by visual parser tree, achieving better performance on created datasets.
  • SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning: This article introduce a novel attention mechanism named channel-wise attention and multi-layer attention to weight average on not only spatially but also “channelly” in multiple CNN layers to get a more contextual image feature, achieving tiny improvement on COCO and Flickr
  • Skeleton Key: Image Captioning by Skeleton-Attribute Decomposition: This article firstly come up with the idea to divide image caption to two sub-task——skeleton caption and attribute caption to make the whole process more similar to human intuition and make attribute generation able to exploit the information of object in skeleton, achieving tiny improvement on COCO and created Stock3M.
  • Deep Reinforcement Learning-based Image Captioning with Embedding Reward: This article introduce a new image captioning framework which use policy network and value network in reinforcement learning to generate captions as local and global guiding respectively, achieving better performance than SoTA models.
  • Attend to You: Personalized Image Captioning with Context Sequence Memory Networks: This article introduce a novel model with well-designed memory mechanism to address personalization image captioning, which eliminate drawbacks like unablilty to model long-term dependency and gradient vanishing in RNN, achieving better than baselines
  • Context-aware Captions from Context-agnostic Supervision: This article introduce a novel loss to make caption more discriminative to others by emmiter-suppressor mechanism to encourage model generate differently on different classes of object’s image, making model adaptively find the difference between target and negative image(class)

本周小结

上周目标:

  • 读完17-19年image captioning的CVPR论文:剩余8篇
  • 整理近年来的SoTA image captioning model:没做
  • 整理先前阅读的所有论文:没做
  • 整理论文的书写方法:没做
  • 记录重要的引用文献:没做
  • 研究CIDEr optimization和top-down model的细节:没做
  • 所有模型跑一个CIDEr optimization的版本:没跑完
  • 跑基于top-down feature的模型:没跑

本周花了两天时间准备组会报告,所以很多事情没来得及做完,就作为下周的目标

下周目标

  • 读完17-19年image captioning的CVPR论文
  • 整理近年来的SoTA image captioning model
  • 整理先前阅读的所有论文
  • 整理论文的书写方法
  • 整理重要的引用文献
  • 研究CIDEr optimization和top-down model的细节
  • 所有模型跑一个CIDEr optimization的版本
  • 跑基于top-down feature的模型
  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值