Mapillary
https://www.mapillary.com/dataset
Mapillary Vistas Dataset
A diverse street-level imagery dataset with pixel‑accurate and instance‑specific human annotations for understanding street scenes around the world.
一个具有精确像素和具体实例的人工注释的不同街道级别的图像数据集,用于理解世界各地的街道场景。
Features特征
- 25,000 high-resolution images 25,000高分辨率影像
- 152 object categories 152中对象分类
- 100 instance-specifically annotated categories 100个特定于实例的注释类别
- Global reach, covering 6 continents 包含六大洲,世界范围
- Variety of weather, season, time of day, camera, and viewpoint 不同天气,季节,时间,相机,视角
The Cityscapes Dataset
https://github.com/mcordts/cityscapesScripts
This repository contains scripts for inspection, preparation, and evaluation of the Cityscapes dataset. This large-scale dataset contains a diverse set of stereo video sequences recorded in street scenes from 50 different cities, with high quality pixel-level annotations of 5 000 frames in addition to a larger set of 20 000 weakly annotated frames.
这个存储库包含用于检查、准备和评估Cityscapes数据集的脚本。这个大型数据集包含一个多样化的立体视频序列中记录来自50个不同城市的街景,高质量的进行像素级的注释5 000帧除了一套较大的20 000 弱带注释的帧。
Details and download are available at: www.cityscapes-dataset.net
Dataset Structure
The folder structure of the Cityscapes dataset is as follows:
{root}/{type}{video}/{split}/{city}/{city}_{seq:0>6}_{frame:0>6}_{type}{ext}
The meaning of the individual elements is:
- root :the root folder of the Cityscapes dataset. Many of our scripts check if an environment variable CITYSCAPES_DATASET pointing to this folder exists and use this as the default choice.
- type the type/modality of data, e.g. gtFine for fine ground truth, or leftImg8bit for left 8-bit images.
- split the split, i.e. train/val/test/train_extra/demoVideo. Note that not all kinds of data exist for all splits. Thus, do not be surprised to occasionally find empty folders.
- city the city in which this part of the dataset was recorded.
- seq the sequence number using 6 digits.
- frame the frame number using 6 digits. Note that in some cities very few, albeit very long sequences were recorded, while in some cities many short sequences were recorded, of which only the 19th frame is annotated.
- ext the extension of the file and optionally a suffix, e.g. _polygons.json for ground truth files
Possible values of type
- gtFine the fine annotations, 2975 training, 500 validation, and 1525 testing. This type of annotations is used for validation, testing, and optionally for training. Annotations are encoded using json files containing the individual polygons. Additionally, we provide png images, where pixel values encode labels. Please refer to helpers/labels.py and the scripts in preparation for details.
- gtFine 精细注释,2975训练,500验证和1525测试。这种类型的注释用于验证、测试,也可用于培训。注释使用包含单个多边形的json文件进行编码。此外,我们提供png图像,其中像素值编码标签。
- gtCoarse the coarse annotations, available for all training and validation images and for another set of 19998 training images (train_extra). These annotations can be used for training, either together with gtFine or alone in a weakly supervised setup.
- gtCoarse 粗注释,可用于所有训练和验证图像以及另一组19998训练图像(train_extra)。这些注释可以用于培训,可以与gtFine一起使用,也可以单独在一个监管薄弱的设置中使用。
- gtBboxCityPersons pedestrian bounding box annotations, available for all training and validation images. Please refer to helpers/labels_cityPersons.py as well as the CityPersons publication (Zhang et al., CVPR '17) for more details. The four values of a bounding box are (x, y, w, h), where (x, y) is its top-left corner and (w, h) its width and height.
- gtBboxCityPersons 行人边界框注释,可用于所有训练和验证图像。详情请参阅helpers/labels_cityPersons.py以及CityPersons出版物(Zhang et al., CVPR '17)。边界框的四个值是(x, y, w, h),其中(x, y)是它的左上角,(w, h)是它的宽度和高度。
- leftImg8bit the left images in 8-bit LDR format. These are the standard annotated images.
- leftImg8bit 左侧图像为8位LDR格式。这些是标准的带注释的图像。
- leftImg8bit_blurred the left images in 8-bit LDR format with faces and license plates blurred. Please compute results on the original images but use the blurred ones for visualization. We thank Mapillary for blurring the images.
- leftImg16bit the left images in 16-bit HDR format. These images offer 16 bits per pixel of color depth and contain more information, especially in very dark or bright parts of the scene. Warning: The images are stored as 16-bit pngs, which is non-standard and not supported by all libraries.
- rightImg8bit the right stereo views in 8-bit LDR format.
- rightImg16bit the right stereo views in 16-bit HDR format.
- timestamp the time of recording in ns. The first frame of each sequence always has a timestamp of 0.
- disparity precomputed disparity depth maps. To obtain the disparity values, compute for each pixel p with p > 0: d = ( float§ - 1. ) / 256., while a value p = 0 is an invalid measurement. Warning: the images are stored as 16-bit pngs, which is non-standard and not supported by all libraries.
camera internal and external camera calibration. For details, please refer to csCalibration.pdf
vehicle vehicle odometry, GPS coordinates, and outside temperature. For details, please refer to csCalibration.pdf
Possible values of split
- train usually used for training, contains 2975 images with fine and coarse annotations
- val should be used for validation of hyper-parameters, contains 500 image with fine and coarse annotations. Can also be used for training.
- test used for testing on our evaluation server. The annotations are not public, but we include annotations of ego-vehicle and rectification border for convenience.
- train_extra can be optionally used for training, contains 19998 images with coarse annotations
- demoVideo video sequences that could be used for qualitative evaluation, no annotations are available for these videos
Scripts
Installation
Install cityscapesscripts with pip
python -m pip install cityscapesscripts
Graphical tools (viewer and label tool) are based on Qt5 and can be installed via
python -m pip install cityscapesscripts[gui]
PASCAL VOC
Pascal VOC 数据集的下载:
http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
https://pjreddie.com/projects/pascal-voc-dataset-mirror/