【视频理解数据集汇总】’A collection of recent video understanding datasets, under construction!’ by Yao Zhou 原文地址:https://github.com//yoosan/video-understanding-dataset
Video-understanding-dataset
Video Classification
Dataset Paper Website Category Examples Classes Duration Organizer SOTA performance UCF101 PDF Link human action 13,320 101 <10s UCF 98% (DeepMind I3D) HMDB51 PDF Link human action 6,766 51 <10s SERRE LAB, Brown - ActivityNet v1.3 PDF Link human activities ~20,000 200 - ActivityNet 8.83% err (iBUG) Charades PDF Link daily human activities 9,848 157 - AI2 - Kinetics PDF Link human action ~300,000 400 10s DeepMind - Sports-1M PDF Link sports ~1 million 478 5m36s Google & Stanford - YouTube-8M PDF Link visual contents ~7 million 4716 120-500s Google Cloud 85% GAP (WILLOW) FCVID PDF Link visual contents 91,223 239 100s+ Fudan-Columbia - Something-Something PDF Link action with objects 108,499 174 ~4s TwentyBN - Moments in Time PDF Link action or activity ~1 million 339 3s MIT-IBM Watson -
Temporal Action Detection
Dataset Paper Website Examples Organizer SOTA performance THUMOS2014 PFD Link 9.682 UCF - ActivityNet(v1.3) PFD Link ~20,000 ActivityNet 0.344(SJTU & Columbia )
Video Captioning
Dataset Paper Website Context Examples Organizer SOTA performance MPII-MD PDF Link movie 68,337 clips with 68,375 sentences MPII - MSR-VTT PDF Link 20 categories 10,000 clips wth 200,000 sentences MSR - Charades PDF Link human activity 9,848 clips wth 27,847 sentences AI2 - Densevid PDF Link event 20k clips and 100k sentences Stanford, ActivityNet -
Video Question Answering
Dataset Paper Website Task Examples Organizer SOTA performance MovieQA PDF Link question-answering in movies 408 movies & 14944 QAs UToronto - MarioQA PDF Link reasoning events in game videos 187,757 examples with 92,874 QAs POSTECH -