【深度学习笔记】CenterNet源码解析

最新推荐文章于 2022-11-18 11:28:13 发布

极客程序设计

最新推荐文章于 2022-11-18 11:28:13 发布

阅读量785

点赞数 1

分类专栏：深度学习目标识别检测计算机视觉文章标签：深度学习 python 人工智能

原文链接：https://blog.csdn.net/FSALICEALEX/article/details/91955759?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522166757659116782391880706%2522%252C%2522scm%2522%253A%252220140713.130102334..%2522%257D&request_id=166757659116782391880706&biz_id=0&utm_med

版权

深度学习同时被 3 个专栏收录

89 篇文章 26 订阅

订阅专栏

计算机视觉

46 篇文章 6 订阅

订阅专栏

目标识别检测

33 篇文章 15 订阅

订阅专栏

这篇文章主要就是介绍一些用到的重要的函数，只介绍detection部分。

0.网站

https://github.com/xingyizhou/CenterNet

install:

https://github.com/xingyizhou/CenterNet/blob/master/readme/INSTALL.md

dataset:

https://github.com/xingyizhou/CenterNet/blob/master/readme/DATA.md

1.ctdet_decode：作用是将heat_map解码成b-box

将输出转化成det的函数是lib\models\decode.py中的ctdet_decode。

1.1 首先经过_nms：


   
   
     
     
      
      
     
     
     
     
      
      
       
       def 
       
       _nms(
       
       heat, kernel=3):
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
           pad = (kernel - 
       
       1) // 
       
       2
      
      
     
     

     
     
      
      
     
     
     
     
      
       
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
           hmax = nn.functional.max_pool2d(
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
               heat, (kernel, kernel), stride=
       
       1, padding=pad)
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
           keep = (hmax == heat).
       
       float()
      
      
     
     

     
     
      
      
     
     
     
     
      
          
       
       return heat * keep

hmax用来寻找8-近邻极大值点，keep为h极大值点的位置，返回heat*keep，筛选出极大值点，为原值，其余为0。

2.1 之后经过_topk：


   
   
     
     
      
      
     
     
     
     
      
      
       
       def 
       
       _topk(scores, K=
       
       40):
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
           batch, cat, height, width = scores.
       
       size()
      
      
     
     

     
     
      
      
     
     
     
     
      
            
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
           topk_scores, topk_inds = torch.
       
       topk(scores.
       
       view(batch, cat, -
       
       1), K)
      
      
     
     

     
     
      
      
     
     
     
     
      
       
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
           topk_inds = topk_inds % (height * width)
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
           topk_ys   = (topk_inds / width).
       
       int().
       
       float()
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
           topk_xs   = (topk_inds % width).
       
       int().
       
       float()
      
      
     
     

     
     
      
      
     
     
     
     
      
            
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
           topk_score, topk_ind = torch.
       
       topk(topk_scores.
       
       view(batch, -
       
       1), K)
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
           topk_clses = (topk_ind / K).
       
       int()
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
           topk_inds = 
       
       _gather_feat(
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
               topk_inds.
       
       view(batch, -
       
       1, 
       
       1), topk_ind).
       
       view(batch, K)
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
           topk_ys = 
       
       _gather_feat(topk_ys.
       
       view(batch, -
       
       1, 
       
       1), topk_ind).
       
       view(batch, K)
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
           topk_xs = 
       
       _gather_feat(topk_xs.
       
       view(batch, -
       
       1, 
       
       1), topk_ind).
       
       view(batch, K)
      
      
     
     

     
     
      
      
     
     
     
     
      
       
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
           return topk_score, topk_inds, topk_clses, topk_ys, topk_xs

topk_scores: batch * cat * K， batch代表batchsize，cat代表类别数，K代表K个最大值。

topk_inds：batch * cat * K， index取值：[0, W x H - 1]

topk_scores和topk_inds分别为每个batch每张heatmap（每个类别）中前K个最大的score和id。

之后对topk_inds使用取余和除法得到横纵坐标top_ys、top_xs。

然后在每个batch中取所有heatmap的前K个最大score以及id，不考虑类别的影响。

topk_score：batch * K

topk_ind：batch * K index取值：[0, cat x K - 1]

之后对topk_inds（view后）和topk_ind调用了_gather_feat函数，在utils文件中：

2.2 _gather_feat


   
   
     
     
      
      
     
     
     
     
      
      
       
       def 
       
       _gather_feat(feat, ind, mask=None):
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
           dim  = feat.
       
       size(
       
       2)
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
           ind  = ind.
       
       unsqueeze(
       
       2).
       
       expand(ind.
       
       size(
       
       0), ind.
       
       size(
       
       1), dim)
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
           feat = feat.
       
       gather(
       
       1, ind)
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
           if mask is not None:
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
               mask = mask.
       
       unsqueeze(
       
       2).
       
       expand_as(feat)
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
               feat = feat[mask]
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
               feat = feat.
       
       view(-
       
       1, dim)
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
           return feat

输入：

feat（topk_inds）: batch * (cat x K) * 1 (假设输入的是topk_inds和topk_ind)

ind（topk_ind）：batch * K

首先将ind扩展一个指标，变为 batch * K * 1

之后使用gather，将ind对应的值取出来。

返回的是index：

feat: batch * K * 1 取值：[0, cat x K - 1]

更一般的情况如下：

feat : A * B * C

ind：A * D

首先将ind扩展一个指标，并且expand为dim的大小，变为 A * D * C，其中对于任意的i, j, 数组ind[i, j, :]中所有的元素均相同，等于原来A * D shape的ind[i, j]。

之后使用gather，将ind对应的值取出来。

得到的feat： A * D * C

2.3 返回值

最后返回有四个：topk_score, topk_inds, topk_clses, topk_ys, topk_xs

topk_score：batch * K。每张图片中最大的K个值

topk_inds：batch * K 。没张图片中最大的K个值对应的index，这个index在[0, W x H - 1]之间。

后两个类似。

3.3 _tranpose_and_gather_feat，将_topk得到的index用于取值。

_tranpose_and_gather_feat的输入有reg，也有wh，前者应该是回归offset的，后者应该是得到bbox的W和H的。


   
   
     
     
      
      
     
     
     
     
      
      
       
           scores, inds, clses, ys, xs = _topk(heat, K=K)
      
      
     
     

     
     
      
      
     
     
     
     
      
          
       
       if reg 
       
       is 
       
       not 
       
       None:
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
             reg = _tranpose_and_gather_feat(reg, inds)

wh = _tranpose_and_gather_feat(wh, inds)

以下是_tranpose_and_gather_feat的定义：


   
   
     
     
      
      
     
     
     
     
      
      
       
       def 
       
       _tranpose_and_gather_feat(feat, ind):
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
           feat = feat.
       
       permute(
       
       0, 
       
       2, 
       
       3, 
       
       1).
       
       contiguous()
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
           feat = feat.
       
       view(feat.
       
       size(
       
       0), -
       
       1, feat.
       
       size(
       
       3))
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
           feat = 
       
       _gather_feat(feat, ind)
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
           return feat

输入：

feat：batch * C（channel） * W * H

ind：batch * K

首先将feat中各channel的元素放到最后一个index中，并且使用contiguous将内存变为连续的，用于后面的view。

之后将feat变为batch * (W x H) * C的形状，使用_gather_feat根据ind取出feat中对应的元素

feat：batch * K * C

feat[i, j, k]为第i个batch，第k个channel的第j个最大值。

总体来说有点复杂，直接把它的逻辑用图来描述出来

假设输入是： $\begin{bmatrix} [1 & 2 & 3\\ 1 & 2 & 3\\ 1 & 2 & 6]\\ [3 & 4 & 5\\ 3 & 4 & 7\\ 3 & 4 & 5]\\ \end{bmatrix}$ ，shape为batch * C * W * H（batch size直接设为1，忽略），对应于图中就是1 * 2 * 3 * 3，假设K=2。则经过以下两步之后


   
   
     
     
      
      
     
     
     
     
      
      
       
           scores, inds, clses, ys, xs = _topk(heat, K=K)
      
      
     
     

     
     
      
      
     
     
     
     
      
          
       
       if reg 
       
       is 
       
       not 
       
       None:
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
             reg = _tranpose_and_gather_feat(reg, inds)

最终得到的是： $\begin{bmatrix} [3 & 7] [6 & 5] \end{bmatrix}$ ，shape为batch * K * C。[3, 7]中的7是所有channel中最大的元素，6则是第二大的元素，将所有channel对应对应位置的元素取出来就得到了最终的结果。

其中__gather_feat起到的作用是消除各个channel区别的作用，最终得到的inds是对于所有channel而言的。

而_tranpose_and_gather_feat的作用则是解码获得的inds，取得最终的结果。

_topk输入的feat就是定位的heat_map，在这上面获得inds后，这个inds就可以应用到offset_heat_map、size_heat_map上面。

下面用图示详细解释这两行代码的过程：

ctdet_decode的代码解释如下：


   
   
     
     
      
      
     
     
     
     
      
      
       
       def 
       
       ctdet_decode(heat, wh, reg=None, cat_spec_wh=False, K=
       
       100):
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
           batch, cat, height, width = heat.
       
       size()
      
      
     
     

     
     
      
      
     
     
     
     
      
       
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
           # heat = torch.
       
       sigmoid(heat)
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
           # perform nms on heatmaps
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
           heat = 
       
       _nms(heat)
      
      
     
     

     
     
      
      
     
     
     
     
      
          
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
           scores, inds, clses, ys, xs = 
       
       _topk(heat, K=K)
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
           # xs、ys是inds转化成在heat_map上面的行、列
      
      
     
     

     
     
      
      
     
     
     
     
      
       
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
           if reg is not None:
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
             reg = 
       
       _tranpose_and_gather_feat(reg, inds)
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
             reg = reg.
       
       view(batch, K, 
       
       2)
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
             xs = xs.
       
       view(batch, K, 
       
       1) + reg[:, :, 
       
       0:
       
       1]
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
             ys = ys.
       
       view(batch, K, 
       
       1) + reg[:, :, 
       
       1:
       
       2]
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
           else:
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
             xs = xs.
       
       view(batch, K, 
       
       1) + 
       
       0.5
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
             ys = ys.
       
       view(batch, K, 
       
       1) + 
       
       0.5
      
      
     
     

     
     
      
      
     
     
     
     
      
       
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
           # xs、ys都加上一个偏移
      
      
     
     

     
     
      
      
     
     
     
     
      
       
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
           wh = 
       
       _tranpose_and_gather_feat(wh, inds)
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
           # 取wh中对应与inds的元素，就像上面的例子中一样。
      
      
     
     

     
     
      
      
     
     
     
     
      
       
      
      
     
     

     
     
      
      
     
     
     
     
      
       
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
           if cat_spec_wh:
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
             wh = wh.
       
       view(batch, K, cat, 
       
       2)
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
             clses_ind = clses.
       
       view(batch, K, 
       
       1, 
       
       1).
       
       expand(batch, K, 
       
       1, 
       
       2).
       
       long()
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
             wh = wh.
       
       gather(
       
       2, clses_ind).
       
       view(batch, K, 
       
       2)
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
           else:
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
             wh = wh.
       
       view(batch, K, 
       
       2)
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
           clses  = clses.
       
       view(batch, K, 
       
       1).
       
       float()
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
           scores = scores.
       
       view(batch, K, 
       
       1)
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
           bboxes = torch.
       
       cat([xs - wh[..., 
       
       0:
       
       1] / 
       
       2, 
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
                               ys - wh[..., 
       
       1:
       
       2] / 
       
       2,
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
                               xs + wh[..., 
       
       0:
       
       1] / 
       
       2, 
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
                               ys + wh[..., 
       
       1:
       
       2] / 
       
       2], dim=
       
       2)
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
           # bbox就这样获得了。
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
           detections = torch.
       
       cat([bboxes, scores, clses], dim=
       
       2)
      
      
     
     

     
     
      
      
     
     
     
     
      
            
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
           return detections

2.后处理

上面根据heatmap得到了dets，但是还需要进一步处理：

1. demo中的line 30：ret = detector.run(img)，detector为ctdet

2. base_detector中的line 82：run函数：

images -> output、dets

dets-> dets = self.post_process(dets, meta, scale) -> detections.append(dets)

detections -> results = self.merge_outputs(detections) ->results

3.上面的两个过程：post_process和merge_outputs在ctdet中进行了定义

4.post_process：


   
   
     
     
      
      
     
     
     
     
      
      
       
           dets = dets.
       
       detach().
       
       cpu().
       
       numpy()
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
           dets = dets.
       
       reshape(
       
       1, -
       
       1, dets.shape[
       
       2])
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
           dets = 
       
       ctdet_post_process(
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
               dets.
       
       copy(), [meta[
       
       'c']], [meta[
       
       's']],
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
               meta[
       
       'out_height'], meta[
       
       'out_width'], 
       
       self.opt.num_classes)
      
      
     
     

     
     
      
      
     
     
     
     
      
          
       
       for j in 
       
       range(
       
       1, 
       
       self.num_classes + 
       
       1):
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
             dets[
       
       0][j] = np.
       
       array(dets[
       
       0][j], dtype=np.float32).
       
       reshape(-
       
       1, 
       
       5)
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
             dets[
       
       0][j][:, :
       
       4] /= scale
      
      
     
     

     
     
      
      
     
     
     
     
      
          
       
       return dets[
       
       0]

做的应该就是尺度变换之类的吧。

5.merge_outputs：


   
   
     
     
      
      
     
     
     
     
      
        
       
       def 
       
       merge_outputs(
       
       self, detections):
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
           results = {}
      
      
     
     

     
     
      
      
     
     
     
     
      
          
       
       for j 
       
       in 
       
       range(
       
       1, self.num_classes + 
       
       1):
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
             results[j] = np.concatenate(
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
               [detection[j] 
       
       for detection 
       
       in detections], axis=
       
       0).astype(np.float32)
      
      
     
     

     
     
      
      
     
     
     
     
      
            
       
       if 
       
       len(self.scales) > 
       
       1 
       
       or self.opt.nms:
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
                soft_nms(results[j], Nt=
       
       0.5, method=
       
       2)
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
           scores = np.hstack(
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
             [results[j][:, 
       
       4] 
       
       for j 
       
       in 
       
       range(
       
       1, self.num_classes + 
       
       1)])
      
      
     
     

     
     
      
      
     
     
     
     
      
          
       
       if 
       
       len(scores) > self.max_per_image:
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
             kth = 
       
       len(scores) - self.max_per_image
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
             thresh = np.partition(scores, kth)[kth]
      
      
     
     

     
     
      
      
     
     
     
     
      
            
       
       for j 
       
       in 
       
       range(
       
       1, self.num_classes + 
       
       1):
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
               keep_inds = (results[j][:, 
       
       4] >= thresh)
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
               results[j] = results[j][keep_inds]
      
      
     
     

     
     
      
      
     
     
     
     
      
          
       
       return results