coco数据集因为官网被墙了,所以无法看到下载链接,翻墙后拷贝过来,直接用链接下载就可以。
网页格式拷贝过来后就与官网的不一样, 凑合看。
Images
2014 Train images [83K/13GB]
2014 Val images [41K/6GB]
2014 Test images [41K/6GB]
2015 Test images [81K/12GB]
2017 Train images [118K/18GB]
2017 Val images [5K/1GB]
2017 Test images [41K/6GB]
2017 Unlabeled images [123K/19GB]
Annotations
2014 Train/Val annotations [241MB]
2014 Testing Image info [1MB]
2015 Testing Image info [2MB]
2017 Train/Val annotations [241MB]
2017 Stuff Train/Val annotations [1.1GB]
2017 Panoptic Train/Val annotations [821MB]
2017 Testing Image info [1MB]
2017 Unlabeled Image info [4MB]
1. Overview
Which dataset splits should you download? Each year's images are associated with different tasks. Specifically:
2014 Train/Val
Detection 2015, Captioning 2015, Detection 2016, Keypoints 2016
2014 Testing
2015 Testing
Detection 2015, Detection 2016, Keypoints 2016
2017 Train/Val/Test
Detection 2017, Keypoints 2017, Stuff 2017,
Detection 2018, Keypoints 2018, Stuff 2018, Panoptic 2018,
Detection 2019, Keypoints 2019, Stuff 2019, Panoptic 2019
2017 Unlabeled
[optional data for any competition]
If you are submitting to a 2017, 2018, or 2019 task, you only need to download the 2017 images. You can disregard earlier splits. Note: the split year refers to the year the image splits were released, not the year in which the annotations were released.
For efficiently downloading the images, we recommend using gsutil rsync to avoid the download of large zip files. Please follow the instructions in the COCO API Readme to setup the downloaded COCO data (the images and annotations should go in coco/images/ and coco/annotations/). By downloading this dataset, you agree to our Terms of Use.
2019 Update: All data for all challenges stays unchanged.
2018 Update: Detection and keypoint data is unchanged. New in 2018, complete stuff and panoptic annotations for all 2017 images are available. Note: if you downloaded the stuff annotations prior to 06/17/2018, please re-download.
2017 Update: The main change in 2017 is that instead of an 83K/41K train/val split, based on community feedback the split is now 118K/5K for train/val. The same exact images are used, and no new annotations for detection/keypoints are provided. However, new in 2017 are stuff annotations on 40K train images (subset of the full 118K train images from 2017) and 5K val images. Also, for testing, in 2017 the test set only has two splits (dev / challenge), instead of the four splits (dev / standard / reserve / challenge) used in previous years. Finally, new in 2017 we are releasing 120K unlabeled images from COCO that follow the same class distribution as the labeled images; this may be useful for semi-supervised learning on COCO.
2. COCO API
The COCO API assists in loading, parsing, and visualizing annotations in COCO. The API supports multiple annotation formats (please see the data format page). For additional details see: CocoApi.m, coco.py, and CocoApi.lua for Matlab, Python, and Lua code, respectively, and also the Python API demo.
Throughout the API "ann"=annotation, "cat"=category, and "img"=image.
getAnnIds
Get ann ids that satisfy given filter conditions.
getCatIds
Get cat ids that satisfy given filter conditions.
getImgIds
Get img ids that satisfy given filter conditions.
loadAnns
Load anns with the specified ids.
loadCats
Load cats with the specified ids.
loadImgs
Load imgs with the specified ids.
loadRes
Load algorithm results and create API for accessing them.
showAnns
Display the specified annotations.
3. MASK API
COCO provides segmentation masks for every object instance. This creates two challenges: storing masks compactly and performing mask computations efficiently. We solve both challenges using a custom Run Length Encoding (RLE) scheme. The size of the RLE representation is proportional to the number of boundaries pixels of a mask and operations such as area, union, or intersection can be computed efficiently directly on the RLE. Specifically, assuming fairly simple shapes, the RLE representation is O(√n) where n is number of pixels in the object, and common computations are likewise O(√n). Naively computing the same operations on the decoded masks (stored as an array) would be O(n).
The MASK API provides an interface for manipulating masks stored in RLE format. The API is defined below, for additional details see: MaskApi.m, mask.py, or MaskApi.lua. Finally, we note that a majority of ground truth masks are stored as polygons (which are quite compact), these polygons are converted to RLE when needed.
encode
Encode binary masks using RLE.
decode
Decode binary masks encoded via RLE.
merge
Compute union or intersection of encoded masks.
iou
Compute intersection over union between masks.
area
Compute area of encoded masks.
toBbox
Get bounding boxes surrounding encoded masks.
frBbox
Convert bounding boxes to encoded masks.
frPoly
Convert polygon to encoded mask.