第五十二周学习笔记
论文阅读概述
- Boosted Attention: Leveraging Human Attention for Image Captioning: This article incorporates object saliency detection information with conventional image captioning model to boost image caption performance based on a idea that human vision depend on not only task-specific attention but also task-independent stimuli, achieving comparable performance.
- Show, Adapt and Tell: Adversarial Training of Cross-domain Image Captioner: This article aims at domain transfer learning on image captioning which transfer COCO caption to images of different domain and comes up with an adversarial training strategy to respectively supervise caption’s domain and matchablity with policy gradient to deal with the non-differentiable sample operation, achieving better performance than baseline.
- Show, Tell and Discriminate: Image Captioning by Self-retrieval with Partially Labeled Data: This article aims at solving conventional captioner’s ‘play it safe’ problem and generate more discriminative caption by self-retrieval mechanism and margin loss between similar image’s generated caption, resulting in a more discriminative captioner especially on highly-similar images.c
- Going Deeper with Convolutions: This article introduces GoogLeNet composed of optimal local sparse structure Inception with 1×1,3×3,5×5 convolution and idea of reducing computation with 1×1 convolution and amplify gradient by auxiliary classifier, winning the championship of 2014 ILSVRC.
- Unpaired Image Captioning by Language Pivoting: This article aims at transferring image captioning to target language with only source language paired data by autoencoder and captioner + MT with word embedding regularization, achieving slightly better result than google translation.
- Rethinking the Form of Latent States in Image Captioning: This article explores how the dimension of captioner’s hidden state influences the captioning by making hidden state a multi-scale 2-D tensor, achieving good interpertable results such as the spatial relation between hidden state and caption, saliency map in hidden state and relationship between specific channels of hidden states.
- Women Also Snowboard: Overcoming Bias in Captioning Models: This article aims at solving two problem in conventional image captioning model —— one is unwanted data bias, the other is right for the right reason —— image captioning model always use contextual information to decide a person’s gender. So this article forces model confused when person is masked and confident when not, achieving better interpretability on gender in caption.
- NNEval: Neural Network based Evaluation Metric for Image Captioning: This article introduces NNEval to exploit traditional auto metrics’ result to form a learned auto metric with indirect contact with generated captions, achieving better consensus with human as a learned metric.
- How Does Batch Normalization Help Optimization?: This article explores the true reason of success of batch normalization which shows that unstable input distribution dosen’t influence much of the performance of CNN with BN and batch normalization dosen’t reduce internal variate shift, at best, revealing the true reason of success of batch normalization lies in that it smooths the loss function to make the gradient more predictable to result in faster and more stable training process.
- “Factual” or “Emotional”: Stylized Image Captioning with Adaptive Learning and Attention: This article introduces style-factual LSTM to adaptively deal with stylistic word and non-stylistic word like adaptive attention , and use adaptive learning strategy to embed more factual information about ’ how a image really is ’ into stylistic weights to compensate the lack of stylistic data, achieving better performance.
- Recurrent Fusion Network for Image Captioning: This article introduces multi-encoder and fusion LSTM to extract more information from image, improving performance remarkably.
- Exploring Visual Relationship for Image Captioning: This article explores how relationship information can boost image captioning model by incorporating visual relationship information on Visual Genome and exploiting object-relation graph to let region feature mutually compensate information, improving performance remarkably.
2017 ICCV 小结
2017 ICCV的论文的主题有
主题 | 篇数 | 备注 |
---|---|---|
attention机制的改进 | 1 | |
novel architecture | 1 | CNN language model |
融合high-level semantic信息 | 2 | |
RL | 1 | |
GAN | 1 | |
模型可解释性 | 1 | |
迁移学习 | 1 | 用的也是GAN |
总计 | 8 |
2018 ECCV 小结
2018 ECCV的论文的主题有
主题 | 篇数 | 备注 |
---|---|---|
do not play it safe | 1 | |
novel architecture | 2 | multi-encoder、多维hidden state |
融合high-level semantic信息 | 2 | |
风格 | 1 | |
模型可解释性 | 1 | |
迁移学习 | 1 | |
learned metric | 1 | |
总计 | 9 |
实验
手写的GoogLeNet,稍微改了一点,在CIFAR10上的结果
92.74%,比VGG好0.1%
本周小结
上周任务完成情况
- 读完17ICCV 18年ECCV的image captioning论文 √
- 整理近年来的SoTA image captioning model ×
- 整理先前阅读的论文 ~
- 整理论文的书写方法 ~
- 整理重要的引用文献 ~
- 完成基本模型的运行任务,比对其原论文详细分析结果 ×
下周目标
- 不少于5篇的论文阅读
- 整理先前阅读的论文
- 整理论文的书写方法
- 整理重要的引用文献
- 完成基本模型的运行任务,比对其原论文详细分析结果
- 研究策略梯度、CIDEr optimization和top-down model的细节