视频问答数据集解析和可视化

最新推荐文章于 2024-06-18 17:00:07 发布

Horizon2333

最新推荐文章于 2024-06-18 17:00:07 发布

阅读量4.2k

点赞数 7

文章标签：可视化深度学习

本文链接：https://blog.csdn.net/Horizon2333/article/details/115387677

版权

最近在学习视频问答相关的内容，相关的数据集找起来比较麻烦，网上也没有太多介绍，因此写这篇博客，总结一下常用的数据集，对他们的数据解析方式进行介绍，并进行简单的可视化。

1. MSVD-QA

该数据集共有1970个视频，大小约为1.7G
数据集官方网站：link
数据集视频下载网站：link (相比官网命名更规律)
数据集标注下载网站：link
标注文件格式为json,每个标注都是一个dict
标注加载代码：

import json

with open("val_qa.json") as f:
   annotation = json.load(f)

示例标注：

>>> annotation[0]
{'answer': 'someone', 'id': 30933, 'question': 'who pours liquid from a plastic container into a ziploc bag containing meat pieces?', 'video_id': 1201}

获取视频名字、问题和标注：

video_name = 'vid' + annotation[index]['video_id'] + '.avi' 
# video_name = 'vid1201.avi'
question = annotation[index]['question']                    
# question = 'who pours liquid from a plastic container into a ziploc bag containing meat pieces?'
answer = annotation[index]['answer']
# answer = 'someone'

获得了视频的名称之后，就可以使用opencv等视觉库进行视频加载了。可视化的例子如下：
MSVD-QA可视化结果

2. MSRVTT-QA

该数据集共有10000个视频，大小约为6.3G

数据集视频下载网站：link
数据集标注下载网站：link
标注与MSVD-QA相同，文件格式为json,每个标注都是一个dict
标注加载代码：

import json

with open("val_qa.json") as f:
   annotation = json.load(f)

示例标注：

>>> annotation[0]
{'answer': 'couch', 'category_id': 14, 'id': 158581, 'question': 'what are three people sitting on?', 'video_id': 6513}

获取视频名字、问题和标注：

video_name = 'video' + annotation[index]['video_id'] + '.mp4' 
# video_name = 'video6513.mp4'
question = annotation[index]['question']                    
# question = 'what are three people sitting on?'
answer = annotation[index]['answer']
# answer = 'couch'

可视化的例子如下：
MSRVTT-QA可视化结果

3. tgif-qa

该数据集由大量gif动图构成，总大小约为123G
动图和标注下载网站：link
该数据集标注保存在由制表符分割的csv文件中，标注csv文件的第一行是该类型问题的列名
数据集共有四种问答类型，分别是action、count、frameqa和transition，各自的标注有所不同，下面将分别进行介绍。

3.1 action

加载标注：

import numpy as np

tgif_test_action_annotation = np.loadtxt("Test_action_question.csv", dtype=str, delimiter='\t')

其中第一行如下：

>>> tgif_test_action_annotation[0]
array(['gif_name', 'question', 'a1', 'a2', 'a3', 'a4', 'a5', 'answer', 'vid_id', 'key'], dtype='<U73')

从上面的输出可以看出，action是一个选择题，共有5个选项
示例标注如下：

>>> tgif_test_action_annotation[1]
array(['tumblr_nk172bbdPI1u1lr18o1_250',
       'What does the butterfly do 10 or more than 10 times ?',
       'stuff marshmallow', 'holds a phone towards face', 'fall over',
       'talk', 'flap wings', '4', 'ACTION4', '26'], dtype='<U73')

获得动图名称、问题、选项和答案代码如下：

gif_name = tgif_test_action_annotation[index][0] + '.gif' 
# video_name = 'tumblr_nk172bbdPI1u1lr18o1_250.gif'
question = tgif_test_action_annotation[index][1]                    
# question = 'What does the butterfly do 10 or more than 10 times ?'
multi_choice = tgif_test_action_annotation[index][2:7]
# multi_choice = array(['stuff marshmallow', 'holds a phone towards face', 'fall over', 'talk', 'flap wings'], dtype='<U73')
answer = tgif_test_action_annotation[index][7]
# answer = '4', means correct answer is 'flap wings'.

3.2 count

加载标注：

import numpy as np

tgif_test_count_annotation = np.loadtxt("Test_count_question.csv", dtype=str, delimiter='\t')

其中第一行如下：

>>> tgif_test_count_annotation[0]
array(['gif_name', 'question', 'answer', 'vid_id', 'key'], dtype='<U97')

从上面的输出可以看出，count是一个开放式问答题，因此没有候选项，只有答案
示例标注如下：

>>> tgif_test_count_annotation[1]
array(['tumblr_nezfs4uELd1u1a7cmo1_250',
       'How many times does the man adjust waistband ?', '3', 'COUNT12',
       '52'], dtype='<U97')

获得动图名称、问题和答案代码如下：

gif_name = tgif_test_count_annotation[index][0] + '.gif' 
# video_name = 'tumblr_nezfs4uELd1u1a7cmo1_250.gif'
question = tgif_test_count_annotation[index][1]                    
# question = 'How many times does the man adjust waistband ?'
answer = tgif_test_count_annotation[index][2]
# answer = '3'

3.3 其他

对于另两种类型，frameqa与count相同都是开放式问答题，transition与action类似都是选择题，因此标注解析方法可以参考上面两个，此处不在赘述。
可视化的例子如下：
tgif-qa可视化结果

使用界面进行可视化

为了方便可视化各个数据集，我使用了Qt设计了一个可视化的程序，界面如下：
视频问答数据集可视化界面
使用方法：

首先见数据集路径进行修改，保证该数据集下至少存在MSVD-QA、MSRVTT-QA、tgif-qa之一，然后点击Search按钮搜索数据集
使用Dataset下拉框选择对应的数据集，即随机加载标注，并播放视频
如果是tgif-qa数据集，还可以通过QA Type下拉框选择不同的问题类型
使用Refresh按钮即可随即加载另一个标注
使用Callback按钮集可重播视频

上述可视化结果和界面设计代码和运行方法详见我的github仓库：videoqa_dataset_visualization，如果对您有帮助请给我一个star

Horizon2333

关注

7
点赞
踩
32

收藏

觉得还不错? 一键收藏
13
评论
视频问答数据集解析和可视化

最近在学习视频问答相关的内容，相关的数据集找起来比较麻烦，网上也没有太多介绍，因此写这篇博客，总结一下常用的数据集，对他们的数据解析方式进行介绍，并进行简单的可视化。1. MSVD-QA该数据集共有1970个视频，大小约为1.7G数据集官方网站：link数据集视频下载网站：link (相比官网命名更规律)数据集标注下载网站：link标注文件格式为json,每个标注都是一个dict标注加载代码：import jsonwith open("val_qa.json") as f: ann
复制链接

扫一扫