Are You Talking to Me? Reasoned Visual Dialog Generation through Adversarial Learning

最新推荐文章于 2023-12-06 19:09:01 发布

fuxin607

最新推荐文章于 2023-12-06 19:09:01 发布

阅读量927

点赞数

分类专栏：跨媒体文章标签： Visual Dialog Generation

本文链接：https://blog.csdn.net/fuxin607/article/details/80074384

版权

跨媒体专栏收录该内容

23 篇文章 2 订阅

订阅专栏

这是CVPR2018 Oral的一篇关于做Visual Dialog Generation的文章，paper连接https://arxiv.org/abs/1711.07613，作者的homepage http://qi-wu.me/home.html，一作是University of Adelaide Chunhua Shen组的Assistant Professor，code暂时还没有被released出来。
文章要做的事情：
输入：image+question（text）　　　输出：answer（text）
文章中show出来的example如下所示。
example
与state-of-the-art比较的实验结果如下所示。
comparision with SOTA

method

文章的framework如下所示。

用CNN提取图像的特征，LSTM提取问题，答案以及历史答案的信息，其中提取信息的方式采用的co-attention[ https://arxiv.org/abs/1612.05386 ]，然后再讲图像，问题和历史答案特征做concatenation操作，然后用LSTM softmax得到当前问题的答案。
为了使得得到的答案的语法符合人的理解（套路），文章加入了GAN。首先将问题和答案输入到LSTM中得到一个新的特征，然后再将新的特征与图像和历史答案信息做concatenation（表示不能理解为什么不直接把4个feature做concatenation），将concatenation之后的特征输入到GAN中。
为了是的生成的answer更适合visual dialog（其实不管是visual dialog generation还是存dialog generation都是套路），文章加入了reinforcement learning，其中有两个trick在word层面给reward（Intermediate reward），用teacher forcing[ https://arxiv.org/abs/1610.09038 ]的方式更新generator。

总结：感觉文章中的trick很多，但是都不太work（调参很重要）。

fuxin607

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Are You Talking to Me? Reasoned Visual Dialog Generation through Adversarial Learning

这是CVPR2018 Oral的一篇关于做Visual Dialog Generation的文章，paper连接https://arxiv.org/abs/1711.07613，作者的homepage http://qi-wu.me/home.html，一作是University of Adelaide Chunhua Shen组的Assistant Professor，code暂时还没有被rel...
复制链接

扫一扫