NTU RGB+D Datasets
Basic size
This dataset consists of 56,880 action samples containing 4 different modalities of data for each sample:
- RGB videos 136 GB
- depth map sequences
- Masked depth maps 83 GB
- Full depth maps 886 GB
- 3D skeletal data 5.8 GB
- infrared videos 221 GB
- Total 1.3 TB
The resolution of RGB videos are 1920×1080, depth maps and IR videos are all in 512×424, and 3D skeletal data contains the three dimensional locations of 25 major body joints, at each frame.
File Format
Each file/folder name in the dataset is in the format of SsssCcccPpppRrrrAaaa (e.g. S001C002P003R002A013), for which
- sss is the setup number, // Height and Distance
- ccc is the camera ID, // 1, 2, 3 => -45, 0, 45 degrees views; 2, 3 -> front and side views.
- ppp is the performer ID, // just performer.
- rrr is the replication number (1 or 2), // perform action twice.
- and aaa is the action class label. // 60 action classes(40 daily actions/ 9 health-related actions/ 11 mutual actions)