第六十二周学习笔记

最新推荐文章于 2022-12-27 17:50:55 发布

luputo

最新推荐文章于 2022-12-27 17:50:55 发布

阅读量684

点赞数

分类专栏：学习笔记

本文链接：https://blog.csdn.net/luo3300612/article/details/102371796

版权

学习笔记专栏收录该内容

61 篇文章 3 订阅

订阅专栏

第六十二周学习笔记

论文阅读概述

Hierarchy Parsing for Image Captioning: This article introduces a hierarchy encoder for image captioning which combine object, subobject and semantic segmentation information to form a tree structure, applying tree-LSTM to get a feature with more semantic and hierarchy information.
Reflective Decoding Network for Image Captioning: This article introduces a image captioning decoder which attend to all previous hidden state to strengthen LSTM’s ability to model long-term dependency.
Attention on Attention for Image Captioning: This article introduces Attention on Attention mechanism which adaptively choose whether to use attention information by a gate variable which is interpreted as similarity between query and attention vector, achieving 129.8 CIDEr without any out source labels which is impressive.
Human Attention in Image Captioning: Dataset and Analysis: This article studies the difference between human attention and machine attention and found that the consistence between human and machine attention will not lead to higher performance while it will help machine to perform better that use human attention to boost it.
Unpaired Image Captioning via Scene Graph Alignments: This article introduces a unsupervised image captioning training strategy and its key idea is to align scene graph feature of sentence and image by cycle gan, achieving SoTA CIDEr 69.5.
What do different evaluation metrics tell us about saliency models?: This article is about metrics of saliency detection and it draws the conclusion that NSS and CC is more fair and correlative metrics which are recommended to use.
Aesthetic Image Captioning From Weakly-Labelled Photographs: This article introduces a new aesthetic image captioning dataset and a weakly supervised strategy to train encoder of captioning model by extracting label from captions which is impressive.

bottom-up attention可视化

结果

红蓝绿分别是被attend到的top3区域，图片标题是当前time step生成的词，以及这三个区域的累积权值
在这里插入图片描述

一些现象

重复的attention区域，如图2
不知所云的attend区域（尤其是none-visual word ），所有图片
较低的累积概率，所有图片
gt中没有出现却正确的词（surfboard）,且attend对了，如图3
错误的caption，似乎正确的attention对应(elephant)，如图4

本周小结

上周任务

完成ROI attention可视化 √
读论文>5篇 √

下周目标

完成倒排权值的attention模型
参考19CVPR的object detection借助image captioning的文章，作出saliency map的版本
读论文 > 5篇，偏向attention以及跨level视觉任务的文章

Appendix(日记)

10月8日

读了两篇论文，一篇是关于unsupervised的，一篇是关于attention机制的
完成了bu attention的可视化

10月9日TODO

代码重构
结合之前的观察，作出更加丰富的可视化结果

10月9日小结

重构完成，很happy，之前代码写的太烂了
根据visual word可视化了topdown模型
读了一篇论文

10月10日TODO

把PPT做好
paper reading

10月10日小结

把miccai challenge的ppt做好了
开了组会，汇报了工作，准备下一步做倒排attention和借鉴19年CVPR那篇文章来做image captioning guided saliency prediction
下午小组讨论了论文，值得注意的是以下的问题
- LSTM的dropout，mask掉某些输入
- AoA其实就是adaptive attention
- 在language model之前是否可以加额外的层做特征映射，从而让image 特征更加适应到captioning任务上
准备周日写一个简单的科研辅助网页
读了一篇论文