第五十二周学习笔记

最新推荐文章于 2020-11-27 10:11:03 发布

luputo

最新推荐文章于 2020-11-27 10:11:03 发布

阅读量603

点赞数 1

分类专栏：学习笔记

本文链接：https://blog.csdn.net/luo3300612/article/details/98209574

版权

学习笔记专栏收录该内容

61 篇文章 3 订阅

订阅专栏

第五十二周学习笔记

论文阅读概述

Boosted Attention: Leveraging Human Attention for Image Captioning: This article incorporates object saliency detection information with conventional image captioning model to boost image caption performance based on a idea that human vision depend on not only task-specific attention but also task-independent stimuli, achieving comparable performance.
Show, Adapt and Tell: Adversarial Training of Cross-domain Image Captioner: This article aims at domain transfer learning on image captioning which transfer COCO caption to images of different domain and comes up with an adversarial training strategy to respectively supervise caption’s domain and matchablity with policy gradient to deal with the non-differentiable sample operation, achieving better performance than baseline.
Show, Tell and Discriminate: Image Captioning by Self-retrieval with Partially Labeled Data: This article aims at solving conventional captioner’s ‘play it safe’ problem and generate more discriminative caption by self-retrieval mechanism and margin loss between similar image’s generated caption, resulting in a more discriminative captioner especially on highly-similar images.c
Going Deeper with Convolutions: This article introduces GoogLeNet composed of optimal local sparse structure Inception with 1×1,3×3,5×5 convolution and idea of reducing computation with 1×1 convolution and amplify gradient by auxiliary classifier, winning the championship of 2014 ILSVRC.
Unpaired Image Captioning by Language Pivoting: This article aims at transferring image captioning to target language with only source language paired data by autoencoder and captioner + MT with word embedding regularization, achieving slightly better result than google translation.
Rethinking the Form of Latent States in Image Captioning: This article explores how the dimension of captioner’s hidden state influences the captioning by making hidden state a multi-scale 2-D tensor, achieving good interpertable results such as the spatial relation between hidden state and caption, saliency map in hidden state and relationship between specific channels of hidden states.
Women Also Snowboard: Overcoming Bias in Captioning Models: This article aims at solving two problem in conventional image captioning model —— one is unwanted data bias, the other is right for the right reason —— image captioning model always use contextual information to decide a person’s gender. So this article forces model confused when person is masked and confident when not, achieving better interpretability on gender in caption.
NNEval: Neural Network based Evaluation Metric for Image Captioning: This article introduces NNEval to exploit traditional auto metrics’ result to form a learned auto metric with indirect contact with generated captions, achieving better consensus with human as a learned metric.
How Does Batch Normalization Help Optimization?: This article explores the true reason of success of batch normalization which shows that unstable input distribution dosen’t influence much of the performance of CNN with BN and batch normalization dosen’t reduce internal variate shift, at best, revealing the true reason of success of batch normalization lies in that it smooths the loss function to make the gradient more predictable to result in faster and more stable training process.
“Factual” or “Emotional”: Stylized Image Captioning with Adaptive Learning and Attention: This article introduces style-factual LSTM to adaptively deal with stylistic word and non-stylistic word like adaptive attention , and use adaptive learning strategy to embed more factual information about ’ how a image really is ’ into stylistic weights to compensate the lack of stylistic data, achieving better performance.
Recurrent Fusion Network for Image Captioning: This article introduces multi-encoder and fusion LSTM to extract more information from image, improving performance remarkably.
Exploring Visual Relationship for Image Captioning: This article explores how relationship information can boost image captioning model by incorporating visual relationship information on Visual Genome and exploiting object-relation graph to let region feature mutually compensate information, improving performance remarkably.

2017 ICCV 小结

2017 ICCV的论文的主题有

主题	篇数	备注
attention机制的改进	1
novel architecture	1	CNN language model
融合high-level semantic信息	2
RL	1
GAN	1
模型可解释性	1
迁移学习	1	用的也是GAN
总计	8

2018 ECCV 小结

2018 ECCV的论文的主题有

主题	篇数	备注
do not play it safe	1
novel architecture	2	multi-encoder、多维hidden state
融合high-level semantic信息	2
风格	1
模型可解释性	1
迁移学习	1
learned metric	1
总计	9