COCO数据集使用——COCO API配置

最新推荐文章于 2024-10-11 07:59:38 发布

aaon22357

最新推荐文章于 2024-10-11 07:59:38 发布

阅读量9.3k

点赞数 5

分类专栏：安装教程文章标签： COCO

本文链接：https://blog.csdn.net/aaon22357/article/details/82963108

版权

安装教程专栏收录该内容

24 篇文章 1 订阅

订阅专栏

1.coco官网下数据集，包括训练集，验证集，测试集，annotation等。
2. 下载新版API，地址在这里。
3. 进入PythonAPI/路径里，进行配置，下面的配置过程分为两种情况。一是ubuntu系统，一是windows系统。
【用ubuntu配置】 ——比较推荐，坑比较少！
激活tensorflow环境，进入~/cocostuffapi/PythonAPI/路径下，输入 python setup.py install，如果不报错即配置成功，在python环境中import pycocotools试试看，如果不报错，说明安装成功。
报错可能有：cython版本过低，pycocotools要求cython版本大于0.27.3
解决方案：pip install Cython（注意C是大写），安装完后输入cython测试是否安装成功。

【用windows配置】
输入 python setup.py build_ext --inplace，如果不报错，则配置成功，如果报错，参考以下链接解决：
https://blog.csdn.net/gxiaoyaya/article/details/78363391

ps：
如果上面的方法安装不成功的话，在github上下载windows版的api，执行pip install git+https://github.com/philferriere/cocoapi.git#subdirectory=PythonAPI 即可，要确保Visual C++是2015才可以，如果不是的话，在上述链接中下载Visual C++ 2015 Build Tools并安装。

参考教程：
https://blog.csdn.net/qq_33000225/article/details/78985635
https://blog.csdn.net/chixia1785/article/details/80040172

2018.10.10更新：

看到了几个关于coco数据集的处理教程，分享下链接～
1.pycocotools中包含了一个coco.py文件，是一个对coco数据json文件的解析工具（这也是我们前面要千辛万苦安装它的原因），在程序开头这样调用：from pycocotools.coco import COCO
coco.py中包含以下几个接口（参考这里）：

 # decodeMask - Decode binary mask M encoded via run-length encoding. 
 # encodeMask - Encode binary mask M using run-length encoding. 
 # getAnnIds - Get ann ids that satisfy given filter conditions. 
 # getCatIds - Get cat ids that satisfy given filter conditions. 
 # getImgIds - Get img ids that satisfy given filter conditions. 
 # loadAnns - Load anns with the specified ids. 
 # loadCats - Load cats with the specified ids. 
 # loadImgs - Load imgs with the specified ids. 
 # annToMask - Convert segmentation in an annotation to binary mask. 
 # showAnns - Display the specified annotations. 
 # loadRes - Load algorithm results and create API for accessing them. 
 # download - Download COCO images from mscoco.org server. 
 # Throughout the API "ann"=annotation, "cat"=category, and "img"=image.

2018.10.11更新：

coco数据集的使用：
图片名称file_name和图片id的对应关系：id号就是图片名称从非零位开始的部分，所以想要通过图片名称读取id，再读取信息可以这样做：

def filename_imgid(filename_list):
    imgIds = []
    for i in range(len(filename_list)):
        for j in range(12):
            if(filename_list[i][j]!='0'):
                imgIds.append(int(filename_list[i][j:12])) #将字符串转换成数字存储
                break
    return imgIds

【注意】：
break只跳出最内层循环！！！
filename_list[i][j]是字符’0’，不是数字0！！！

获取了imgIds之后，需要通过loadImgs操作来提取信息，我们先来查看一下coco数据存储格式：

coco=COCO(annFile)
imgs = [(img_id, coco.imgs[img_id]) for img_id in coco.imgs] #获取全部图片信息
print(imgs[0])  #输出的是第一个图片的信息
print(type(imgs[0]))  #<class 'tuple'>  说明是以tuple存储的。

coco数据集的存储格式是这样的（也就是上面的print(imgs[0])输出）：

(532481, {'width': 640, 'file_name': '000000532481.jpg', 'coco_url': 'http://images.cocodataset.org/val2017/000000532481.jpg', 'height': 426, 'id': 532481, 'license': 3, 'date_captured': '2013-11-20 16:28:24', 'flickr_url': 'http://farm7.staticflickr.com/6048/5915494136_da3cfa7c5a_z.jpg'})

也就是说，必须通过img的id才能获取后面这个tuple的信息。现在已知了imgIds，要load的时候，需要使用命令：

# img = coco.loadImgs(imgIds)[0]
img = coco.loadImgs(imgIds)  #因为我是自己构建的imgIds，本来就已经是一个数字构成的list了，所以就不需要[0]了，关于[0]的说明见文章最下面。

主程序部分：

pylab.rcParams['figure.figsize'] = (8.0, 10.0)
dataDir='/mask/data/coco'
dataType='val2017'
annFile='{}/annotations/instances_{}.json'.format(dataDir,dataType)

coco=COCO(annFile)
filename_list = _get_img_filename(dataDir,dataType)
imgIds = filename_imgid(filename_list)
print(imgIds)
img = coco.loadImgs(imgIds)
print(type(img[1]['file_name']))
for i in range(len(imgIds)):
    I =io.imread('%s/%s/%s' % (dataDir, dataType, img[i]['file_name']))
    plt.imshow(I)
    annIds = coco.getAnnIds(imgIds=img[i]['id'])
    anns = coco.loadAnns(annIds)
    coco.showAnns(anns)
    plt.show()

在这里插入图片描述

对[0]的说明：

imgIds = [1296, 1490, 1000, 1353, 872, 1425, 885, 1503]  #自己的图片id
img = coco.loadImgs(imgIds[np.random.randint(0,len(imgIds))])
print(img)
img2 = coco.loadImgs(imgIds[np.random.randint(0,len(imgIds))])[0]
print(img2)

#img:
[{'license': 4, 'file_name': '000000001000.jpg', 'coco_url': 'http://images.cocodataset.org/val2017/000000001000.jpg', 'height': 480, 'width': 640, 'id': 1000, 'date_captured': '2013-11-21 05:13:59', 'flickr_url': 'http://farm5.staticflickr.com/4115/4906536419_6113bd7de4_z.jpg'}]
#img2:
{'license': 2, 'file_name': '000000001503.jpg', 'coco_url': 'http://images.cocodataset.org/val2017/000000001503.jpg', 'height': 240, 'width': 320, 'id': 1503, 'date_captured': '2013-11-22 17:22:02', 'flickr_url': 'http://farm1.staticflickr.com/4/4589204_0d42f46fe6_z.jpg'}

由此可以看出，img读到的是一个list，要对img[0]才是dict，也就是img2 = img[0]~~

下一部分是用提取的图片来构建tfrecords用来训练，之后在更。

更新：

在mask rcnn的download_and_convert_coco.py基础上，加入了我自己构造的两个函数，通过图片名字来获取图片id，然后load，在函数_add_to_tfrecord()中加了以下几句话：

filename_list = _get_img_filename(image_dir, split_name)
imgIds = filename_imgid(filename_list)
imgs = [(img_id, coco.imgs[img_id]) for img_id in coco.imgs if img_id in imgIds] #最重要的是这句

这两个自己定义的函数如下：

def _get_img_filename(image_dir,split_name):
    filename_list = os.listdir(os.path.join(image_dir, split_name))
    return filename_list


def filename_imgid(filename_list):
    imgIds = []
    for i in range(len(filename_list)):
        for j in range(12):
            if (filename_list[i][j] != '0'):
                imgIds.append(int(filename_list[i][j:12]))  # 将字符串转换成数字存储
                break
    return imgIds