NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding
(2019 TPAMI)
Jun Liu, Amir Shahroudy, Mauricio Perez, Gang Wang, Ling-Yu Duan, and Alex C. Kot
Note
论文链接:https://arxiv.org/pdf/1905.04757.pdf
Github:https://github.com/shahroudy/NTURGB-D
数据集链接:
- https://rose1.ntu.edu.sg/dataset/actionRecognition/
- https://drive.google.com/open?id=1CUZnBtYwifVXS21yVg62T-vrPVayso5H (only NTU RGB+D skeleton data)
- https://drive.google.com/open?id=1tEbuaEqMxAV7dNc4fqu1O4M7mC6CJ50w(only NTU RGB+D 120 skeleton data)
相关数据集论文讲解链接:
- 【NTU RGB+D 数据集】https://blog.csdn.net/qq_36627158/article/details/119907320
Contribution
1、introduce a large-scale dataset for RGB+D human action recognition, which is collected from 106 distinct subjects and contains more than 114 thousand video samples and 8 million frames. This dataset contains 120 different action classes including daily, mutual, and health-related activities.
2、investigate a novel one-shot 3D activity recognition problem on our dataset, and a simple yet effective Action-Part Semantic Relevance-aware (APSR) framework is proposed for this task, which yields promising results for recognition of the novel action classes.
Comparison with Other Datasets
Details of NTU RGB+D 120
1、114, 480 RGB+D video samples
2、120 action categories in total
- 82 daily actions (eating, writing, sitting down, moving objects, etc),
- 12 health-related actions (blowing nose, vomiting, staggering, falling down, etc), and
- 26 mutual actions (handshaking, pushing, hitting, hugging, etc).
3、106 distinct human subjects.
4、RGB videos, depth sequences, skeleton data (3D locations of 25 major body joints), and infrared frames
5、hardware:Microsoft Kinect v2
6、155 different camera viewpoints.
7、The subjects in this dataset are in a wide range of age distribution (from 10 to 57) and from different cultural backgrounds (15 countries)
8、various environmental conditions (96 different backgrounds with illumination variation)
9、cross-subject and cross-setup evaluations metrics
Newly Added Actions Compared to the NTU RGB+D
(1) Fine-grained hand/finger motions.
fine-grained hand and finger motions, such as “make ok sign” and “snapping fingers”.
(2) Fine-grained object-related individual actions.
the body movements are not significant and the sizes of the involved objects are relatively small, such as “counting money” and “play magic cube”. 【这个用单用骨架点数据应该会很难区分】
(3) Object-related mutual actions.
the interactions with objects, such as “wield knife towards other person” and “hit other person with object”.【这个用单用骨架点数据应该会很难区分】
(4) Different actions with similar posture patters but with different motion speeds.
there are some different actions that have similar posture patterns but have different motion speeds. For example, “grab other person’s stuff” is a newly added action, and its main difference compared to “touch other person’s pocket (steal)” is the motion speed.
(5)Different actions with similar body motions but with different objects involved.
There are some different actions that have very similar body motions but involve different objects. For example, the motions in the newly added action “put on bag/backpack” are similar to those in the existing action “put on jacket”.【这个用单用骨架点数据应该会很难区分】
(6)Different actions with similar objects involved but with different body motions.
share the same interacted objects, such as “put on bag/backpack” and “take something out of a bag/backpack”.
Benchmark Evaluations
1、Cross-Subject Evaluation
- the 106 subjects are split into training and testing groups. Each group consists of 53 sub- jects.
- The IDs of the training subjects in this evaluation are: 1, 2, 4, 5, 8, 9, 13, 14, 15, 16, 17, 18, 19, 25, 27, 28, 31, 34, 35, 38, 45, 46, 47, 49, 50, 52, 53, 54, 55, 56, 57, 58, 59, 70, 74, 78, 80, 81, 82, 83, 84, 85, 86, 89, 91, 92, 93, 94, 95, 97, 98, 100, 103.
- The remaining subjects are reserved for testing.
2、Cross-Setup Evaluation
- we pick all the samples with even collection setup IDs for training, and those with odd setup IDs for testing, i.e., 16 setups are used for training, and
- the other 16 setups are reserved for testing.
APSR FRAMEWORK FOR ONE-SHOT 3D ACTION RECOGNITION(待了解 One-shot)