Distilling Object Detectors with Fine-grained Feature Imitation的复现

最新推荐文章于 2023-02-22 15:27:57 发布

一路狂奔的猪

最新推荐文章于 2023-02-22 15:27:57 发布

阅读量3.7k

点赞数 6

分类专栏：目标检测文章标签：检测的蒸馏检测特征图层面蒸馏 Distilling Object Detectors wi

本文链接：https://blog.csdn.net/qq_33547191/article/details/95014337

版权

目标检测专栏收录该内容

35 篇文章 0 订阅

订阅专栏

复现基于原文开源代码：https://github.com/twangnh/Distilling-Object-Detectors

代码问题和细节可以在我的github讨论：

https://github.com/HqWei/Distillation-of-Faster-rcnn

这篇文章的本质是对于目标检测在Feature Level的蒸馏的改进，你首先得实现检测的特征图层面的蒸馏，实现起来比较简单：

sup_feature=output_teacher['features'][0]
stu_feature=output['features'][0]
＃model_adap是一个卷积层＋Relu层：作用是把student网络的特征图变得和ｔｅａｃｈｅｒ一样,通道数相同，后面才能直接求Ｌ２距离。
stu_feature_adap=model_adap(stu_feature)

start_weigth=cfg_feature_distillation.get('start_weigth')
end_weigth=cfg_feature_distillation.get('end_weigth')

imitation_loss_weigth=start_weigth+(end_weigth-start_weigth)*(float(epoch)/max_epoch)
# imitation_loss_weigth=0.0001
＃Ｌ２距离：特征图对应位置的差值的平方和
sup_loss = (torch.pow(sup_feature - stu_feature_adap, 2)).sum()
sup_loss = sup_loss * imitation_loss_weigth

然后就是本文的核心创新点：（没必要让ｓｔｕｄｅｎｔ在整个特征图模仿ｔｅａｃｈｅｒ，只需要在ＧＴ附近模仿）：

主要难点在于ｍａｓｋ的生成，原文是：

Specifically, as shown in Fig. 2, for each ground truth box, we compute the IOU between it and all anchors, which
forms a W × H × K IOU map m. Here W and H denote width and height of the feature map, and K indicates the
K preset anchor boxes. Then we find the largest IOU value M = max(m), times the thresholding factor ψ to obtain
a filter threshold F = ψ ∗ M . With F , we filter the IOU map to keep those larger then F locations and combine them
with OR operation to get a W × H mask.

大意是：先计算ＧＴ框和所有ａｎｃｈｏｒ的ＩＯＵ，得到一个ＷｘＨｘＫ的ＩＯＵｍａｐ：称为ｍ；Ｗ和Ｈ是特征图的高宽，Ｋ是单个点产生的ａｎｃｈｏｒ的数量（如ａｎｃｈｏｒ－ｒａｔｅ为0.5,1,2;scale为2,4,8,16,32时，Ｋ=3x5），也就是一个Ａｎｃｈｏｒ得到一个ＷｘＨ的ＩＯＵ得分图，这个得分图里面每个点（ＷｘＨ个）的值指的是该位置产生的ａｎｃｈｏｒ与ＧＴ的ＩＯＵ，对比Ｋ个ＷｘＨ，取ｋ个最大的值因为只要有一个ＩＯＵ大说明那地方离ＧＴ近。而最后我们得到的是一个ＷｘＨ的ｍａｓｋ，也就是只有０和１，怎么得到的呢？设定一个阈值，这个阈值在这篇文中比较巧妙，阈值为最大值的0.5;最后通过或运算合并。

我复现的代码：

＃生成ｍａｓｋ,单个ｂａｔｃｈ中每张图一个对应的ｍａｓｋ 
 mask_batch = []
    if cfg.get('need_mask',None):
        for i in range(B):
            K1 = int(all_anchors.shape[0] / (height * width))
            # A : sum of GT numbers in  batches
            A = gt_bboxes[i].shape[0]
            gt_boxes = gt_bboxes[i] #torch.cat((gt_bboxes[0], gt_bboxes[1]), 0)
            gt_boxes = gt_boxes.view(1, gt_boxes.shape[0], gt_boxes.shape[1])
            IOU_map = bbox_overlaps_batch(all_anchors, gt_boxes).view(height, width, K1, A)
            max_iou, _ = torch.max(IOU_map.view(height * width * K1,
                                                  A), dim=0)
            mask_per_im = torch.zeros([height, width], dtype=torch.int64).cuda()
            #walk through every gt box
            for k in range(gt_boxes.shape[1]):
                if torch.sum(gt_boxes[0][k]) == 0.:
                    break
                max_iou_per_gt = max_iou[k] * 0.5
                mask_per_gt = torch.sum(IOU_map[:, :, :, k] > max_iou_per_gt,
                                        dim=2)
                mask_per_im += mask_per_gt
            mask_batch.append(mask_per_im)

＃其中计算ＩＯＵｍａｐ的代码，采用原文ｃｏｄｅ：
def bbox_overlaps_batch(anchors, gt_boxes):
    """
    anchors: (N, 4) ndarray of float
    gt_boxes: (b, K, 5) ndarray of float

    overlaps: (N, K) ndarray of overlap between boxes and query_boxes
    """
    batch_size = len(gt_boxes)
    batch_size=1
    # for i in range(batch_size):


    # gt_boxes=gt_boxes.view(batch_size,gt_boxes.shape[0],gt_boxes.shape[1])

    if anchors.dim() == 2:

        N = anchors.size(0)
        K = gt_boxes.size(1)

        anchors = anchors.view(1, N, 4).expand(batch_size, N, 4).contiguous()
        gt_boxes = gt_boxes[:,:,:4].contiguous()


        gt_boxes_x = (gt_boxes[:,:,2] - gt_boxes[:,:,0] + 1)
        gt_boxes_y = (gt_boxes[:,:,3] - gt_boxes[:,:,1] + 1)
        gt_boxes_area = (gt_boxes_x * gt_boxes_y).view(batch_size, 1, K)

        anchors_boxes_x = (anchors[:,:,2] - anchors[:,:,0] + 1)
        anchors_boxes_y = (anchors[:,:,3] - anchors[:,:,1] + 1)
        anchors_area = (anchors_boxes_x * anchors_boxes_y).view(batch_size, N, 1)

        gt_area_zero = (gt_boxes_x == 1) & (gt_boxes_y == 1)
        anchors_area_zero = (anchors_boxes_x == 1) & (anchors_boxes_y == 1)

        boxes = anchors.view(batch_size, N, 1, 4).expand(batch_size, N, K, 4)
        query_boxes = gt_boxes.view(batch_size, 1, K, 4).expand(batch_size, N, K, 4)

        iw = (torch.min(boxes[:,:,:,2], query_boxes[:,:,:,2]) -
            torch.max(boxes[:,:,:,0], query_boxes[:,:,:,0]) + 1)
        iw[iw < 0] = 0

        ih = (torch.min(boxes[:,:,:,3], query_boxes[:,:,:,3]) -
            torch.max(boxes[:,:,:,1], query_boxes[:,:,:,1]) + 1)
        ih[ih < 0] = 0
        ua = anchors_area + gt_boxes_area - (iw * ih)
        overlaps = iw * ih / ua

        # mask the overlap here.
        overlaps.masked_fill_(gt_area_zero.view(batch_size, 1, K).expand(batch_size, N, K), 0)
        overlaps.masked_fill_(anchors_area_zero.view(batch_size, N, 1).expand(batch_size, N, K), -1)

    elif anchors.dim() == 3:
        N = anchors.size(1)
        K = gt_boxes.size(1)

        if anchors.size(2) == 4:
            anchors = anchors[:,:,:4].contiguous()
        else:
            anchors = anchors[:,:,1:5].contiguous()

        gt_boxes = gt_boxes[:,:,:4].contiguous()

        gt_boxes_x = (gt_boxes[:,:,2] - gt_boxes[:,:,0] + 1)
        gt_boxes_y = (gt_boxes[:,:,3] - gt_boxes[:,:,1] + 1)
        gt_boxes_area = (gt_boxes_x * gt_boxes_y).view(batch_size, 1, K)

        anchors_boxes_x = (anchors[:,:,2] - anchors[:,:,0] + 1)
        anchors_boxes_y = (anchors[:,:,3] - anchors[:,:,1] + 1)
        anchors_area = (anchors_boxes_x * anchors_boxes_y).view(batch_size, N, 1)

        gt_area_zero = (gt_boxes_x == 1) & (gt_boxes_y == 1)
        anchors_area_zero = (anchors_boxes_x == 1) & (anchors_boxes_y == 1)

        boxes = anchors.view(batch_size, N, 1, 4).expand(batch_size, N, K, 4)
        query_boxes = gt_boxes.view(batch_size, 1, K, 4).expand(batch_size, N, K, 4)

        iw = (torch.min(boxes[:,:,:,2], query_boxes[:,:,:,2]) -
            torch.max(boxes[:,:,:,0], query_boxes[:,:,:,0]) + 1)
        iw[iw < 0] = 0

        ih = (torch.min(boxes[:,:,:,3], query_boxes[:,:,:,3]) -
            torch.max(boxes[:,:,:,1], query_boxes[:,:,:,1]) + 1)
        ih[ih < 0] = 0
        ua = anchors_area + gt_boxes_area - (iw * ih)

        overlaps = iw * ih / ua

        # mask the overlap here.
        overlaps.masked_fill_(gt_area_zero.view(batch_size, 1, K).expand(batch_size, N, K), 0)
        overlaps.masked_fill_(anchors_area_zero.view(batch_size, N, 1).expand(batch_size, N, K), -1)
    else:
        raise ValueError('anchors input dimension is not correct.')

    return overlaps

＃主程序中调用位置：
      '''
        Feature level distillation:
        '''
        # sup_loss = (torch.pow(sup_feature - stu_feature_adap, 2) * mask_batch).sum() / norms
        # sup_loss = sup_loss * args.imitation_loss_weigth
        if cfg_distillation.get('feature_distillation', None):
            cfg_feature_distillation=cfg_distillation.get('feature_distillation')
            sup_feature=output_teacher['features'][0]
            stu_feature=output['features'][0]
            stu_feature_adap=model_adap(stu_feature)


            start_weigth=cfg_feature_distillation.get('start_weigth')
            end_weigth=cfg_feature_distillation.get('end_weigth')
            imitation_loss_weigth = start_weigth + (end_weigth - start_weigth) * (float(epoch) / max_epoch)
            if cfg_feature_distillation.get('start_weigth', None):
                mask_batch = output_teacher['RoINet.mask_batch']
                mask_list = []
                for mask in mask_batch:
                    mask = (mask > 0).float().unsqueeze(0)
                    mask_list.append(mask)
                mask_batch = torch.stack(mask_list, dim=0)
                norms = mask_batch.sum() * 2
                sup_loss = (torch.pow(sup_feature - stu_feature_adap, 2) * mask_batch).sum() / norms
            else:
                sup_loss = (torch.pow(sup_feature - stu_feature_adap, 2)).sum()

            # imitation_loss_weigth=0.0001

            sup_loss = sup_loss * imitation_loss_weigth
            output['sup.loss']=sup_loss

Adaptation-module的设计：特征图大小通过一个卷积层变换大小：OUT_W=(IN_W+2PADDING-Kernel_size)/Stride+1

import torch.nn as nn
import torch.nn.functional as F


class Stu_Feature_Adap(nn.Module):

	def __init__(self,input_channel=256, output_channel=1024,kernel_size=2,padding=0):
		super(Stu_Feature_Adap, self).__init__()

		self.conv1 = nn.Conv2d(input_channel, output_channel, kernel_size=kernel_size, padding=padding)
		self.relu = nn.ReLU()
		# self.conv2 = nn.Conv2d(ndf, ndf*2, kernel_size=4, stride=2, padding=1)
		# self.conv3 = nn.Conv2d(ndf*2, ndf*4, kernel_size=4, stride=2, padding=1)
		# self.conv4 = nn.Conv2d(ndf*4, ndf*8, kernel_size=4, stride=2, padding=1)
		# self.classifier = nn.Conv2d(ndf*8, 1, kernel_size=4, stride=2, padding=1)
        #
		# self.leaky_relu = nn.LeakyReLU(negative_slope=0.2, inplace=True)
		#self.up_sample = nn.Upsample(scale_factor=32, mode='bilinear')
		#self.sigmoid = nn.Sigmoid()


	def forward(self, x):
		x = self.conv1(x)
		x = self.relu(x)
		# x = self.leaky_relu(x)
		# x = self.conv2(x)
		# x = self.leaky_relu(x)
		# x = self.conv3(x)
		# x = self.leaky_relu(x)
		# x = self.conv4(x)
		# x = self.leaky_relu(x)
		# x = self.classifier(x)
		# #x = self.up_sample(x)
		# #x = self.sigmoid(x)

		return x

一路狂奔的猪

关注

6
点赞
踩
20

收藏

觉得还不错? 一键收藏
2
评论
Distilling Object Detectors with Fine-grained Feature Imitation的复现

复现基于原文开源代码：https://github.com/twangnh/Distilling-Object-Detectors代码问题和细节可以在我的github讨论：https://github.com/HqWei/Distillation-of-Faster-rcnn这篇文章的本质是对于目标检测在Feature Level的蒸馏的改进，你首先得实现检测的特征图层面的蒸馏，实现起...
复制链接

扫一扫

专栏目录