flickr30k数据集是什么
这个数据集的核心就两点,一是图像,二是图像对应的描述语言。
先上图:
在token文件中的标注信息:
667626.jpg#0 A girl wearing a red and multicolored bikini is laying on her back in shallow water .
667626.jpg#1 Girl wearing a bikini lying on her back in a shallow pool of clear blue water .
667626.jpg#2 A young girl is lying in the sand , while ocean water is surrounding her .
667626.jpg#3 A little girl in a red swimsuit is laying on her back in shallow water .
667626.jpg#4 A girl is stretched out in shallow water
可以看到,每副图像都搭配有5句描述,五句描述语言的的意思基本都差不多。
我们的目标是训练出一个模型,需要达到的效果是:将一张图像放进去,出来一句对应的还算正确的图像描述,俗话说的看图说话。
数据集下载
官网传送门:点我点我
在本页最下面填个表,然后就可以下载了,但是,有大概率很慢而且不稳定。
百度云链接:链接: https://pan.baidu.com/s/1nQ_t-OzuFkxJmfbzRH2vPA 提取码: md6z (链接失效请留言)
数据集文件结构
两个tar压缩包
- flickr30k-images.tar
- flickr30k.tar.gz
第一个包存放图片,第二个包存放图像的标注信息(一张图像有几句语言表述)。
在 flickr30k.tar.gz 中,有一个名为 results_20130124.token,可以输入该文件进行查看。
import pandas as pd
annotations = pd.read_table('results_20130124.token', sep='\t', header=None,
names=['image', 'caption'])
print(annotations)
结果为:
image caption
0 1000092795.jpg#0 Two young guys with shaggy hair look at their ...
1 1000092795.jpg#1 Two young , White males are outside near many ...
2 1000092795.jpg#2 Two men in green shirts are standing in a yard .
3 1000092795.jpg#3 A man in a blue shirt standing in a garden .
4 1000092795.jpg#4 Two friends enjoy time spent together .
5 10002456.jpg#0 Several men in hard hats are operating a giant...
6 10002456.jpg#1 Workers look down from up above on a piece of ...
7 10002456.jpg#2 Two men working on a machine wearing hard hats .
8 10002456.jpg#3 Four men on top of a tall structure .
9 10002456.jpg#4 Three men on a large rig .
... ... ...
[158915 rows x 2 columns]
可以看出一张图像,对应5条描述语言,一共有158915条语言描述。