cs231n'18： Assignment 3 | RNN Captioning

最新推荐文章于 2024-06-30 16:35:46 发布

FortiLZ

最新推荐文章于 2024-06-30 16:35:46 发布

阅读量2.8k

点赞数 4

分类专栏： cs231n 文章标签： AI CNN cs231n

本文链接：https://blog.csdn.net/FortiLZ/article/details/80935136

版权

这篇博客详细介绍了如何使用 RNN 进行图像描述（captioning），包括数据集、基础 RNN 模型的隐藏状态转移、全 RNN 步骤、词嵌入以及图像描述的 RNN 应用。内容涵盖了 RNN 的前向和反向传播，以及训练过程中参数更新的细节。

摘要由CSDN通过智能技术生成

Assignment3 | RNN Captioning

这部分实际上做了两件事情，首先建立一个 RNN，然后以此 RNN 为基础，训练一个模型来完成图片 caption 的工作。我感觉作业中的代码先后顺序有些混乱，这里依照自己的理解，把内容重新组织一下。

Dataset

train 和 val 使用的是 Coco2014，从打印出来的 data 信息来大概浏览一下数据的构成。

train_captions <class 'numpy.ndarray'> (400135, 17) int32
train_image_idxs <class 'numpy.ndarray'> (400135,) int32
train_features <class 'numpy.ndarray'> (82783, 512) float32
idx_to_word <class 'list'> 1004
word_to_idx <class 'dict'> 1004
train_urls <class 'numpy.ndarray'> (82783,) <U63

train dataset 中有 82783 张图片，每一张图片对应多个 caption，共有 400135 个caption，每一个 caption 最多包含 17 个整形数字，每一个数字通过 idx_to_word 对应到一个单词。idx_to_word 是一个 list，每一个位置上对应一个单词，其中位置0-3分别是特殊字符 \, \，\，\。所有的 caption 都是以 \ 起，以 \ 止，如果不足17个单词，那么在 \ 以后补 \，不在 idx_to_word 中的记为 \。

print(data['train_captions'][1])
print(decode_captions(data['train_captions'][1], data['idx_to_word']))
[  1   4   3 172   6   4  62  10 317   6 114 612   2   0   0   0   0]
<START> a <UNK> view of a kitchen and all of its appliances <END>