I3D泛读【Que Vadis,Action Recognition?A New Model and the Kinetics Dataset】

weixin_47341656

已于 2022-04-16 21:49:28 修改

阅读量2.7k

点赞数

分类专栏：论文阅读笔记文章标签：视频理解 I3D模型 Kinetics数据集迁移学习动作识别

于 2022-04-14 18:37:38 首次发布

本文链接：https://blog.csdn.net/weixin_47341656/article/details/124152968

版权

0、前沿

泛读我们主要读文章标题，摘要、结论和图表数据四个部分。需要回答用什么方法，解决什么问题，达到什么效果这三个问题。需要了解更多视频理解相关文章可以关注我们视频理解系列目录了解我们当前更新情况。

1、标题

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

一个新模型和动作识别数据集，Quo Vadis是一个电影名字

2、摘要

The paucity of videos in current action classification datasets (UCF-101 and HMDB-51) has made it difficult to identify good video architectures, as most methods obtain similar performance on existing small-scale benchmarks. This paper re-evaluates state-of-the-art architectures in light of the new Kinetics Human Action Video dataset. Kinetics has two orders of magnitude more data, with 400 human action classes and over 400 clips per class, and is collected from realistic, challenging YouTube videos. We provide an analysis on how current architectures fare on the task of action classification on this dataset and how much performance improves on the smaller benchmark datasets after pre-training on Kinetics.

在当前的动作分类数据集(UCF-101,hmd-51)中，缺乏足够视频识别一个好的网络，因为大多数方法在现有的小规模基准上都能获得类似的表现。本文根据新的Kinetics分类数据集重新评估这些SOTA网络。Kinetics-400：从现实中采集，视频有400个人类动作类别，每类超400个片段。我们分析了当前的网络在Kinetics-400的表现，以及在Kinetics-400上进行预训练后，在小的数据集上的性能提高了多少。

We also introduce a new Two-Stream Inflated 3D ConvNet (I3D) that is based on 2D ConvNet inflation: filters and pooling kernels of very deep image classification ConvNets are expanded into 3D, making it possible to learn seamless spatio-temporal feature extr