概览
Dataset | Year | Actions | Videos | Video Type | SOTA |
---|---|---|---|---|---|
HMDB51 | ICCV2011 | 51 | 6849 | movie & web video | ≥0.82 |
UCF101 | ICCV2013 | 101 | 13320 | web video | ≥0.98 |
Sports-1M | ICCV2014 | 487 | 1,000,000 | web video | ≥0.91 |
ActivityNet | ICCV2015 | 200 | 20,000 | web video | ≥0.40 |
Charades | ICCV2016 | 157 | 9,848 | controled settings | mAP≥0.58 |
Youtube8M(2019) | 2016 | 1000 | 237,000 | movie & web video | mAP≥0.83 |
AVA | CVPR2018 | 80 | 57,600 | movie | mAP≥0.27 |
Kinetics-400 | 2017 | 400 | 306,245 | web video | ≥0.82 |
Something-Something V1 | 2017 | 174 | 108,499 | controled settings | ≥0.52 |
Something-Something V2 | 2018 | 174 | 220,847 | controled settings | ≥0.67 |
Kinetics-600 | 2018 | 600 | 495,547 | web video | ≥0.71 |
Kinetics-700 | 2019 | 700 | 650,317 | web video | ≥0.57 |
Epic-Kitchens | ECCV2018 | 149 | 432 | controled settings | ≥0.36 |
Jester | ICCVW2019 | 27 | 148,092 | controled settings | ≥0.96 |
Moments in Time | TPAMI 2019 | 339 | 1,000,000 | web video | ≥0.34 |
Multi-Moments in Time | 2019 | 339 | 1,000,000 | web video | ≥0.59 |
按任务划分
video classifification
fullysupervised, whole-clip, forced-choice video classififiers
trim的单个动作样本,适合训练分类器
- KTH
- Weizmann
- Hollywood-2
- HMDB
- UCF101
large-scale video classifification
也是单个动作样本,规模大,通常噪声也比较大,适合做预训练用
- TrecVid MED
- Sports-1M
- YouTube-8M
- Something-something
- SLAC
- Moments in Time
- Kinetics
temporal localization
大规模的untrim的视频,一个样本里有多个动作,提供了每个动作的时间位置
前三个都是从youtube视频来的。
- ActivityNet
- THUMOS
- MultiTHUMOS
- Charades
spatio-temporal localization
提供了时间(动作发生的时间位置)和空间(物体的框)的标注,前三个和最后一个比规模小,时长段,动作是复杂动作。这种动作往往不够明确。ava规模大,同时它的标注是原子级别的动作也就是单个动词之类的。也有一些在UCF101, DALY, Hollywood2Tubes上做untrim视频的时空定位。
- CMU
- MSR Actions
- UCF Sports
- JHMDB
- AVA