在深度学习出现之前,表现最好的算法是iDT^{[1][2]},之后的工作基本上都是在iDT方法上进行改进。IDT的思路是利用光流场来获得视频序列中的一些轨迹,再沿着轨迹提取HOF,HOG,MBH,trajectory4中特征,其中HOF基于灰度图计算,另外几个均基于dense optical flow(密集光流计算)。最后利用FV(Fisher Vector)方法对特征进行编码,再基于编码训练结果训练SVM分类器。深度学习出来后,陆续出来多种方式来尝试解决这个问题,包含:Two-Stream^{[3][4]}、C3D(Convolution 3 Dimension)^{[6]},还有RNN^{[7]}方向。
参考 博客 行为识别数据集汇总
参考 链接 Human activity video datasets
ActivityNet
A Large-Scale Video Benchmark for Human Activity Understanding
数据集介绍链接
类别
- Eating and Drinking
- Food and drink preparation
- kitchen and food clean-up
- participating in sports ,exercises and recreation
- participation in equestrian sports
- socializing, relaxing and leisure
- personal care(brushing teeth……)
- household activities
- vehicle repair and maintenance
……
2017年的结果
untrimmed video是没有修剪的视频,每个视频里面包含多个行为的片段,trimmed video是修剪后的视频,包含一个动作,对此进行分类,temproal action proposals是对存在动作的时间段进行查找。
UCF101 Action Recognition Data Set
HMDB51
brush hair, cartwheel, catch, chew, clap, climb, climb stairs, dive, draw sword, dribble, drink, eat, fall floor, fencing shoot bow, shoot gun, flic flac, golf, hand stand, hit, hug, jump, kick, stand, kick ball, kiss, laugh, pick pour, pullup, punch, push, pushup, ride bike, ride horse, run, shake hands, shoot ball, shoot bow, shoot gun, sit, situp, smile, smoke, somersault, swing baseball, sword exercise, sword, talk, throw, turn, walk, wave.