CenterNet学习记录(一)——COCO数据处理

最新推荐文章于 2024-05-11 14:41:24 发布

Lyang-Never

最新推荐文章于 2024-05-11 14:41:24 发布

阅读量2.2k

点赞数 7

分类专栏： cv

本文链接：https://blog.csdn.net/qq_37690498/article/details/106976743

版权

cv 专栏收录该内容

17 篇文章 4 订阅

订阅专栏

1、前言

本个CenterNet系列的代码地址为：zzzxxxttt/torch_simple_CenterNet_45
此次主要记录的是数据处理方面，主要针对COCO数据。主要涉及脚本为：
datasets/coco.py、utils/images.py
以下主要结合代码进行对一些关键点和难点进行记录。

2、主要内容

在这里插入图片描述
这里主要对COCO数据集包含的对象进行罗列，这里额外加入了__background__，代表背景，同时列出对象所对应的编号，但其并不是连续的，从末尾是90而类别是80+1即可知道不对应，所以其并不连续；然后就是一些数据处理所需要用到的一些特定的常量。（图在上方 > - <）

def __init__(self, data_dir, split, split_ratio=1.0, img_size=512):    	
	super(COCO, self).__init__()    
	self.num_classes = 80    
	self.class_name = COCO_NAMES    
	self.valid_ids = COCO_IDS
	   
	self.cat_ids = {v: i for i, v in enumerate(self.valid_ids)}
	
	self.data_rng = np.random.RandomState(123)    
	self.eig_val = np.array(COCO_EIGEN_VALUES, dtype=np.float32)    
	self.eig_vec = np.array(COCO_EIGEN_VECTORS, dtype=np.float32)    
	self.mean = np.array(COCO_MEAN, dtype=np.float32)[None, None, :]    
	self.std = np.array(COCO_STD, dtype=np.float32)[None, None, :]
	
	   	
	
	 	self.split = split   # 'train'  'val'    
  	self.data_dir = os.path.join(data_dir, 'COCO2017/')    
  	self.img_dir = os.path.join(self.data_dir, 'images/%s2017' % split)
          
   	if split == 'test':      
    		self.annot_path = os.path.join(self.data_dir, 'annotations', 'image_info_test-dev2017.json')    
   	else:      
   		self.annot_path = os.path.join(self.data_dir, 'annotations', 'instances_%s2017.json' % split)
   		
    	self.max_objs = 128    
    	self.padding = 127  
    	self.down_ratio = 4
    	self.img_size = {'h': img_size, 'w': img_size}
    	self.fmap_size = {'h': img_size // self.down_ratio, 'w': img_size // self.down_ratio}
    	self.rand_scales = np.arange(0.6, 1.4, 0.1)
    	self.gaussian_iou = 0.7
    
    	print('==> initializing coco 2017 %s data.' % split)    
    	self.coco = coco.COCO(self.annot_path)
    	self.images = self.coco.getImgIds()

接着是class COCO(data.Dataset)这个类，这是这个脚本的核心。def __init__()是进行的一些初始化处理操作，其中比较重要的是:
self.annot_path = os.path.join(self.data_dir, 'annotations', 'instances_%s2017.json' % split)
self.coco = coco.COCO(self.annot_path)
self.images = self.coco.getImgIds()

首先我们看coco这个库，需要我们导入pycocotools，这个的安装，具体可参考如下博客：
Windows下安装 pycocotools
安装好这个后，我们便来获取coco数据，如下是coco2017的目录结构：
在这里插入图片描述
训练所采用的是instances_train2017.json，通过此json标签文件，加上coco库，便可轻松获取COCO数据集我们想要的内容，如图片，图片所包含的物体的种类和编号，已经对应的bbox，都可以，具体操作流程如下：

self.coco = coco.COCO(self.annot_path)
self.images = self.coco.getImgIds()
#这里以index=0为例
img_id = self.images[0]
img_path = os.path.join(self.img_dir, self.coco.loadImgs(ids=[img_id])[0]['file_name'])
ann_ids = self.coco.getAnnIds(imgIds=[img_id])
annotations = self.coco.loadAnns(ids=ann_ids)

labels = np.array([self.cat_ids[anno['category_id']] for anno in annotations])
bboxes = np.array([anno['bbox'] for anno in annotations], dtype=np.float32)

下图是获取到的self.coco的内容：
在这里插入图片描述
这里的anns，是图片中物体所在位置的bbox的坐标，其下的category_id，代表物体所属种类的编号，id，代表所属图片的标号，由于一张图片中可能存在多个物体，故可能出现不同的ann所属id相同的情况。这里的长度是860001。
在这里插入图片描述
这个的imgs，主要的作用是提供img所在路径，即提供file_name，图片名称，通过图片名称和根目录来读取图片。其长度为118287。

然后是def __getitem__(self, index)这个函数，也是数据处理和获取的主要函数。
下面这些主要是获取每张图片上的标签信息，包括其所属类别，bbox框，同时最后转化为[x1,y1,x2,y2]这种格式。(需要记住这种获取coco数据集的方法)

  def __getitem__(self, index):

    img_id = self.images[index]
    img_path = os.path.join(self.img_dir, self.coco.loadImgs(ids=[img_id])[0]['file_name'])
    #print(img_path)
    ann_ids = self.coco.getAnnIds(imgIds=[img_id])
    annotations = self.coco.loadAnns(ids=ann_ids)

    labels = np.array([self.cat_ids[anno['category_id']] for anno in annotations])
    bboxes = np.array([anno['bbox'] for anno in annotations], dtype=np.float32)

    if len(bboxes) == 0:
      bboxes = np.array([[0., 0., 0., 0.]], dtype=np.float32)
      labels = np.array([[0]])
    bboxes[:, 2:] += bboxes[:, :2]  # xywh to xyxy

然后是数据增强处理（代码见下方，示意图下），包括翻转、仿射变化（包括放缩，裁剪等）。翻转，即flipped为True时，代表进行翻转，img = img[ :, : :-1 , : ]作用是水平翻转，img = img[ : : -1 , : , : ]代表的是上下翻转；然后是get_border()这个函数，位于utils/image.py中，作用是当图片的w/h大于256时，便返回128，否则返回64，意思是图片较大时返回128，较小是返回64，然后center便是其选定的中心点，可见下面的示意图。然后是get_affine_transform()这个函数，其作用是求warpAffine()所需的变换矩阵，求变换矩阵，至少需要3个点，刚才的center是一个点，然后通过旋转一定角度又是一个点，在通过对称又可以获得一个点。这些点的处理全部位于utils/image.py中的get_affine_transform()函数中，其中，get_dir()是获取旋转后的点，get_3rd_point是获取对称的点。
在这里插入图片描述

flipped = False
    if self.split == 'train':
      scale = scale * np.random.choice(self.rand_scales)
      w_border = get_border(128, width)
      h_border = get_border(128, height)

      center[0] = np.random.randint(low=w_border, high=width - w_border)
      center[1] = np.random.randint(low=h_border, high=height - h_border)

      if np.random.random() < 0.5:
        flipped = True
        img = img[:, ::-1, :]
        center[0] = width - center[0] - 1

    trans_img = get_affine_transform(center, scale, 0, [self.img_size['w'], self.img_size['h']])
    img = cv2.warpAffine(img, trans_img, (self.img_size['w'], self.img_size['h']))

接着，是下面的代码，归一化操作，颜色增强，均值处理，通道变化，特征图的变化矩阵，初始化所要用的参数，对bbox坐标点也进行相应的仿射变化，然后求出包含对象的中心点。

	img = img.astype(np.float32) / 255.

    if self.split == 'train':
      color_aug(self.data_rng, img, self.eig_val, self.eig_vec)

    img -= self.mean
    img /= self.std
    img = img.transpose(2, 0, 1)  # from [H, W, C] to [C, H, W]

    trans_fmap = get_affine_transform(center, scale, 0, [self.fmap_size['w'], self.fmap_size['h']])

    hmap = np.zeros((self.num_classes, self.fmap_size['h'], self.fmap_size['w']), dtype=np.float32)  # heatmap
    w_h_ = np.zeros((self.max_objs, 2), dtype=np.float32)  # width and height
    regs = np.zeros((self.max_objs, 2), dtype=np.float32)  # regression
    inds = np.zeros((self.max_objs,), dtype=np.int64)
    ind_masks = np.zeros((self.max_objs,), dtype=np.uint8)

    # detections = []
    for k, (bbox, label) in enumerate(zip(bboxes, labels)):
      if flipped:
        bbox[[0, 2]] = width - bbox[[2, 0]] - 1
      bbox[:2] = affine_transform(bbox[:2], trans_fmap)
      bbox[2:] = affine_transform(bbox[2:], trans_fmap)
      bbox[[0, 2]] = np.clip(bbox[[0, 2]], 0, self.fmap_size['w'] - 1)
      bbox[[1, 3]] = np.clip(bbox[[1, 3]], 0, self.fmap_size['h'] - 1)
      h, w = bbox[3] - bbox[1], bbox[2] - bbox[0]
      if h > 0 and w > 0:
        obj_c = np.array([(bbox[0] + bbox[2]) / 2, (bbox[1] + bbox[3]) / 2], dtype=np.float32)
        obj_c_int = obj_c.astype(np.int32)

最后，是关于高斯处理的一些知识，具体见下列代码。首先是获取’‘高斯半径’’，具体求解见gaussian_radius这个函数，见下列图片，可以看出，是r1,r2,r3中的最小值。当时初看的时候，感觉很像二次函数的求根公式，后来经查询资料，果然真是。。。具体推导，可见下列我的手写版，也可参考下列网址讲解heatmap里面如何应用高斯散射核。求出高斯半径之后，便是绘制高斯核，这个主要是要了解高斯的二位表达式，然后数据获取到此基本结束，最后return所需要的东西即可。
在这里插入图片描述

		radius = max(0, int(gaussian_radius((math.ceil(h), math.ceil(w)), self.gaussian_iou)))
        draw_umich_gaussian(hmap[label], obj_c_int, radius)
        w_h_[k] = 1. * w, 1. * h
        regs[k] = obj_c - obj_c_int  # discretization error
        inds[k] = obj_c_int[1] * self.fmap_size['w'] + obj_c_int[0]
        ind_masks[k] = 1

在这里插入图片描述

3、总结

这部分代码不是很难，主要是对一些数据处理方法、数学知识等的理解，明白代码所代表的的含义，理解原理才是难点。故记录一下。

Lyang-Never

关注

7
点赞
踩
26

收藏

觉得还不错? 一键收藏
2
评论
CenterNet学习记录(一)——COCO数据处理

1、前言本个CenterNet系列的代码地址为：zzzxxxttt/torch_simple_CenterNet_45此次主要记录的是数据处理方面，主要针对COCO数据。主要涉及脚本为：datasets/coco.py、utils/images.py以下主要结合代码进行对一些关键点和难点进行记录。2、主要内容这里主要对COCO数据集包含的对象进行罗列，这里额外加入了__background__，代表背景，同时列出对象所对应的编号；然后就是一些数据处理所需要用到的一些特定的常量。接着是clas
复制链接

扫一扫

专栏目录