ImageNet
overview
(updated on April 30, 2010)
- Total number of non-empty synsets: 21841
- Total number of images: 14,197,122
- Number of images with bounding box annotations: 1,034,908
- Number of synsets with SIFT features: 1000
Number of images with SIFT features: 1.2 million
总类别: 21,841
- 总图像: 14,197,122
- 有bounding box标注的图像数量: 1,034,908
- 有SIFT特征的类别数: 1000
- 有SIFT特征的图像数: 1.2 million
tasks
- Scene classification/场景分类
- Object localization/目标定位
- Object detection/目标检测
- Object detection from video/视频目标检测
MSCOCO
overview
- Object segmentation
- Recognition in Context
- Multiple objects per image
- More than 300,000 images
- More than 2 Million instances
- 80 object categories
- 5 captions per image
Keypoints on 100,000 people
目标分割
- 内容识别
- 每幅图像多个实例
- 图像:300,000+
- 实例:2,000,000+
- 目标种类:80
- 平均每张图像标注数:5
- 有关键点人数:100,000
tasks
- Detection/检测
- Keypoints/关键点
- Captioning/标注
Open Images
- images URL: ~9,000,000
- 2,000,000 bounding boxes spanning 600 object classes (1.24M in train, 830K in validation+test)
- 4,300,000 human-verified positive image-level labels on the training set
- coming soon: Trained models (both image-level and object detectors).
Youtube-8M
(2017 update)
- Video URLs: 7,000,000
- Video: 450,000 hours
- Audio/Visual Features: 3,200,000,000
- Classes: 4716
- Avg.Labels/Video: 3.4
悲哀的是大天朝用不了youtube
SUN
overview
- Images: 131,067
- Scene categories: 908
- Segmented objects: 313,884
- Object categories: 4,479
- 图像数: 131,067
- 场景种类: 908
- 分割种类: 313,884
- 物体种类: 4,479
tasks
- Scene Recognition Benchmark(场景识别)
- scene categories: 397
- Object Detection Benchmark(目标检测)
- images: 16,873
NUS-WIDE
- images: 269,648
- unique tags: 5,018
- low-level features types: 6
PASCAL VOC 2010
- Training and validation images: 10,103
- Testing images: 9,637
- categories: 33/59