第五十二周学习笔记

第五十二周学习笔记

论文阅读概述

  • Boosted Attention: Leveraging Human Attention for Image Captioning: This article incorporates object saliency detection information with conventional image captioning model to boost image caption performance based on a idea that human vision depend on not only task-specific attention but also task-independent stimuli, achieving comparable performance.
  • Show, Adapt and Tell: Adversarial Training of Cross-domain Image Captioner: This article aims at domain transfer learning on image captioning which transfer COCO caption to images of different domain and comes up with an adversarial training strategy to respectively supervise caption’s domain and matchablity with policy gradient to deal with the non-differentiable sample operation, achieving better performance than baseline.
  • Show, Tell and Discriminate: Image Captioning by Self-retrieval with Partially Labeled Data: This article aims at solving conventional captioner’s ‘play it safe’ problem and generate more discriminative caption by self-retrieval mechanism and margin loss between similar image’s generated caption, resulting in a more discriminative captioner especially on highly-similar images.c
  • Going Deeper with Convolutions: This article introduces GoogLeNet composed of optimal local sparse structure Inception with 1×1,3×3,5×5 convolution and idea of reducing computation with 1×1 convolution and amplify gradient by auxiliary classifier, winning the championship of 2014 ILSVRC.
  • Unpaired Image Captioning by Language Pivoting: This article aims at transferring image captioning to target language with only source language paired data by autoencoder and captioner + MT with word embedding regularization, achieving slightly better result than google translation.
  • Rethinking the Form of Latent States in Image Captioning: This article explores how the dimension of captioner’s hidden state influences the captioning by making hidden state a multi-scale 2-D tensor, achieving good interpertable results such as the spatial relation between hidden state and caption, saliency map in hidden state and relationship between specific channels of hidden states.
  • Women Also Snowboard: Overcoming Bias in Captioning Models: This article aims at solving two problem in conventional image captioning model —— one is unwanted data bias, the other is right for the right reason —— image captioning model always use contextual information to decide a person’s gender. So this article forces model confused when person is masked and confident when not, achieving better interpretability on gender in caption.
  • NNEval: Neural Network based Evaluation Metric for Image Captioning: This article introduces NNEval to exploit traditional auto metrics’ result to form a learned auto metric with indirect contact with generated captions, achieving better consensus with human as a learned metric.
  • How Does Batch Normalization Help Optimization?: This article explores the true reason of success of batch normalization which shows that unstable input distribution dosen’t influence much of the performance of CNN with BN and batch normalization dosen’t reduce internal variate shift, at best, revealing the true reason of success of batch normalization lies in that it smooths the loss function to make the gradient more predictable to result in faster and more stable training process.
  • “Factual” or “Emotional”: Stylized Image Captioning with Adaptive Learning and Attention: This article introduces style-factual LSTM to adaptively deal with stylistic word and non-stylistic word like adaptive attention , and use adaptive learning strategy to embed more factual information about ’ how a image really is ’ into stylistic weights to compensate the lack of stylistic data, achieving better performance.
  • Recurrent Fusion Network for Image Captioning: This article introduces multi-encoder and fusion LSTM to extract more information from image, improving performance remarkably.
  • Exploring Visual Relationship for Image Captioning: This article explores how relationship information can boost image captioning model by incorporating visual relationship information on Visual Genome and exploiting object-relation graph to let region feature mutually compensate information, improving performance remarkably.

2017 ICCV 小结

2017 ICCV的论文的主题有

主题篇数备注
attention机制的改进1
novel architecture1CNN language model
融合high-level semantic信息2
RL1
GAN1
模型可解释性1
迁移学习1用的也是GAN
总计8

2018 ECCV 小结

2018 ECCV的论文的主题有

主题篇数备注
do not play it safe1
novel architecture2multi-encoder、多维hidden state
融合high-level semantic信息2
风格1
模型可解释性1
迁移学习1
learned metric1
总计9

实验

手写的GoogLeNet,稍微改了一点,在CIFAR10上的结果
在这里插入图片描述
92.74%,比VGG好0.1%

本周小结

上周任务完成情况

  • 读完17ICCV 18年ECCV的image captioning论文 √
  • 整理近年来的SoTA image captioning model ×
  • 整理先前阅读的论文 ~
  • 整理论文的书写方法 ~
  • 整理重要的引用文献 ~
  • 完成基本模型的运行任务,比对其原论文详细分析结果 ×

下周目标

  • 不少于5篇的论文阅读
  • 整理先前阅读的论文
  • 整理论文的书写方法
  • 整理重要的引用文献
  • 完成基本模型的运行任务,比对其原论文详细分析结果
  • 研究策略梯度、CIDEr optimization和top-down model的细节
  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值