YOLOv2算法笔记

本文介绍了YOLOv2(YOLO9000)的改进,包括引入BN层提升精度,使用高分辨率特征提取,全卷积结构,anchor聚类优化,以及基于grid的定位预测方法。此外,文章还提到了YOLOv2的多尺度训练和检测分类数据集的联合训练策略。
摘要由CSDN通过智能技术生成

本文为个人学习过程中所记录笔记,便于梳理思路和后续查看用,如有错误,感谢批评指正!

参考:
paper链接
项目地址链接
YOLOv2-pytorch
  YOLOv2又名YOLO9000,作者说可以检测超过9000个类别,其主要由YOLOv1改进而来,文中还说可以采用检测数据集和分类数据集来联合训练YOLOv2。
误差分析显示:
  1、与Fast R-CNN相比,YOLOv1算法的预测框的位置不准;
  2、与基于区域候选框的方法相比,YOLOv1算法的Recall低。
  相比与YOLOv1改进点及其性能列表如下:
在这里插入图片描述
具体改进点叙述如下:
  1、引入BN,每个卷积层都加了BN,map提升了2%,并且不再使用dropout。
  2、引入高分辨率特征提取器,采用448 x 448全尺寸分辨率fintune微调特征提取器10epoch。
  3、全卷积和采用anchor,移除全连接,采用全卷积,最后采用1x1卷积输出,参考YOLOv2-pytorch,格子分成13 x 13。

class YOLOv2D19(nn.Module):
    def __init__(self, device, input_size=None, num_classes=20, trainable=False, conf_thresh=0.001, nms_thresh=0.5, anchor_size=None):
        super(YOLOv2D19, self).__init__()
        self.device = device
        self.input_size = input_size
        self.num_classes = num_classes
        self.trainable = trainable
        self.conf_thresh = conf_thresh
        self.nms_thresh = nms_thresh
        self.anchor_size = torch.tensor(anchor_size)
        self.num_anchors = len(anchor_size)
        self.stride = 32
        self.grid_cell, self.all_anchor_wh = self.create_grid(input_size)

        # backbone darknet-19
        self.backbone = build_backbone(model_name='darknet19', pretrained=trainable)
        
        # detection head
        self.convsets_1 = nn.Sequential(
            Conv(1024, 1024, k=3, p=1),
            Conv(1024, 1024, k=3, p=1)
        )

        self.route_layer = Conv(512, 64, k=1)
        self.reorg = reorg_layer(stride=2)

        self.convsets_2 = Conv(1280, 1024, k=3, p=1)
        
        # prediction layer
        self.pred = nn.Conv2d(1024, self.num_anchors*(1 + 4 + self.num_classes), kernel_size=1)


    def create_grid(self, input_size):
        w, h = input_size, input_size
        # generate grid cells
        ws, hs = w // self.stride, h // self.stride
        grid_y, grid_x = torch.meshgrid([torch.arange(hs), torch.arange(ws)])
        grid_xy = torch.stack([grid_x, grid_y], dim=-1).float()
        grid_xy = grid_xy.view(1, hs*ws, 1, 2).to(self.device)

        # generate anchor_wh tensor
        anchor_wh = self.anchor_size.repeat(hs*ws, 1, 1).unsqueeze(0).to(self.device)

        return grid_xy, anchor_wh


    def set_grid(self, input_size):
        self.input_size = input_size
        self.grid_cell, self.all_anchor_wh = self.create_grid(input_size)


    def decode_xywh(self, txtytwth_pred):
        """
            Input: \n
                txtytwth_pred : [B, H*W, anchor_n, 4] \n
            Output: \n
                xywh_pred : [B, H*W*anchor_n, 4] \n
        """
        B, HW, ab_n, _ = txtytwth_pred.size()
        # b_x = sigmoid(tx) + gride_x
        # b_y = sigmoid(ty) + gride_y
        xy_pred = torch.sigmoid(txtytwth_pred[..., :2]) + self.grid_cell
        # b_w = anchor_w * exp(tw)
        # b_h = anchor_h * exp(th)
        wh_pred = torch.exp(txtytwth_pred[..., 2:]) * self.all_anchor_wh
        # [B, H*W, anchor_n, 4] -> [B, H*W*anchor_n, 4]
        xywh_pred = torch.cat([xy_pred, wh_pred], -1).view(B, -1, 4) * self.stride

        return xywh_pred
    

    def decode_boxes(self, txtytwth_pred):
        """
            Input: \n
                txtytwth_pred : [B, H*W, anchor_n, 4] \n
            Output: \n
                x1y1x2y2_pred : [B, H*W*anchor_n, 4] \n
        """
        # txtytwth -> cxcywh
        xywh_pred = self.decode_xywh(txtytwth_pred)

        # cxcywh -> x1y1x2y2
        x1y1x2y2_pred = torch.zeros_like(xywh_pred)
        x1y1_pred = xywh_pred[..., :2] - xywh_pred[..., 2:] * 0.5
        x2y2_pred = xywh_pred[..., :2] + xywh_pred[..., 2:] * 0.5
        x1y1x2y2_pred = torch.cat([x1y1_pred, x2y2_pred], dim=-1)
        
        return x1y1x2y2_pred


    def nms(self, dets, scores):
        """"Pure Python NMS baseline."""
        x1 = dets[:, 0]  #xmin
        y1 = dets[:, 1]  #ymin
        x2 = dets[:, 2]  #xmax
        y2 = dets[:, 3]  #ymax

        areas = (x2 - x1) * (y2 - y1)
        order = scores.argsort()[::-1]

        keep = []
        while order.size > 0:
            i = order[0]
            keep.append(i)
            xx1 = np.maximum(x1[i], x1[order[1:]])
            yy1 = np.maximum(y1[i], y1[order[1:]])
            xx2 = np.minimum(x2[i], x2[order[1:]])
            yy2 = np.minimum(y2[i], y2[order[1:]])

            w = np.maximum(1e-10, xx2 - xx1)
            h = np.maximum(1e-10, yy2 - yy1)
            inter = w * h

            # Cross Area / (bbox + particular area - Cross Area)
            ovr = inter / (areas[i] + areas[order[1:]] - inter)
            #reserve all the boundingbox whose ovr less than thresh
            inds = np.where(ovr <= self.nms_thresh)[0]
            order = order[inds + 1]

        return keep


    def postprocess(self, bboxes, scores):
        """
        bboxes: (HxW, 4), bsize = 1
        scores: (HxW, num_classes), bsize = 1
        """

        cls_inds = np.argmax(scores, axis=1)
        scores = scores[(np.arange(scores.shape[0]), cls_inds)]
        
        # threshold
        keep = np.where(scores >= self.conf_thresh)
        bboxes = bboxes[keep]
        scores = scores[keep]
        cls_inds = cls_inds[keep]

        # NMS
        keep = np.zeros(len(bboxes), dtype=np.int)
        for i in range(self.num_classes):
            inds = np.where(cls_inds == i)[0]
            if len(inds) == 0:
                continue
            c_bboxes = bboxes[inds]
            c_scores = scores[inds]
            c_keep = self.nms(c_bboxes, c_scores)
            keep[inds[c_keep]] = 1

        keep = np.where(keep > 0)
        bboxes = bboxes[keep]
        scores = scores[keep]
        cls_inds = cls_inds[keep]

        return bboxes, scores, cls_inds


    @ torch.no_grad()
    def inference(self, x):
        # backbone
        feats = self.backbone(x)

        # reorg layer
        p5 = self.convsets_1(feats['layer3'])
        p4 = self.reorg(self.route_layer(feats['layer2']))
        p5 = torch.cat([p4, p5], dim=1)

        # head
        p5 = self.convsets_2(p5)

        # pred
        pred = self.pred(p5)

        B, abC, H, W = pred.size()

        # [B, num_anchor * C, H, W] -> [B, H, W, num_anchor * C] -> [B, H*W, num_anchor*C]
        pred = pred.permute(0, 2, 3, 1).contiguous().view(B, H*W, abC)

        # [B, H*W*num_anchor, 1]
        conf_pred = pred[:, :, :1 * self.num_anchors].contiguous().view(B, H*W*self.num_anchors, 1)
        # [B, H*W, num_anchor, num_cls]
        cls_pred = pred[:, :, 1 * self.num_anchors : (1 + self.num_classes) * self.num_anchors].contiguous().view(B, H*W*self.num_anchors, self.num_classes)
        # [B, H*W, num_anchor, 4]
        reg_pred = pred[:, :, (1 + self.num_classes) * self.num_anchors:].contiguous()
        # decode box
        reg_pred = reg_pred.view(B, H*W, self.num_anchors, 4)
        box_pred = self.decode_boxes(reg_pred)

        # batch size = 1
        conf_pred = conf_pred[0]
        cls_pred = cls_pred[0]
        box_pred = box_pred[0]

        # score
        scores = torch.sigmoid(conf_pred) * torch.softmax(cls_pred, dim=-1)

        # normalize bbox
        bboxes = torch.clamp(box_pred / self.input_size, 0., 1.)

        # to cpu
        scores = scores.to('cpu').numpy()
        bboxes = bboxes.to('cpu').numpy()

        # post-process
        bboxes, scores, cls_inds = self.postprocess(bboxes, scores)

        return bboxes, scores, cls_inds


    def forward(self, x, target=None):
        if not self.trainable:
            return self.inference(x)
        else:
            # backbone
            feats = self.backbone(x)

            # reorg layer
            p5 = self.convsets_1(feats['layer3'])
            p4 = self.reorg(self.route_layer(feats['layer2']))
            p5 = torch.cat([p4, p5], dim=1)

            # head
            p5 = self.convsets_2(p5)

            # pred
            pred = self.pred(p5)

            B, abC, H, W = pred.size()

            # [B, num_anchor * C, H, W] -> [B, H, W, num_anchor * C] -> [B, H*W, num_anchor*C]
            pred = pred.permute(0, 2, 3, 1).contiguous().view(B, H*W, abC)

            # [B, H*W*num_anchor, 1]
            conf_pred = pred[:, :, :1 * self.num_anchors].contiguous().view(B, H*W*self.num_anchors, 1)
            # [B, H*W, num_anchor, num_cls]
            cls_pred = pred[:, :, 1 * self.num_anchors : (1 + self.num_classes) * self.num_anchors].contiguous().view(B, H*W*self.num_anchors, self.num_classes)
            # [B, H*W, num_anchor, 4]
            reg_pred = pred[:, :, (1 + self.num_classes) * self.num_anchors:].contiguous()
            reg_pred = reg_pred.view(B, H*W, self.num_anchors, 4)

            # decode bbox
            x1y1x2y2_pred = (self.decode_boxes(reg_pred) / self.input_size).view(-1, 4)
            x1y1x2y2_gt = target[:, :, 7:].view(-1, 4)
            reg_pred = reg_pred.view(B, H*W*self.num_anchors, 4)

            # set conf target
            iou_pred = tools.iou_score(x1y1x2y2_pred, x1y1x2y2_gt).view(B, -1, 1)
            gt_conf = iou_pred.clone().detach()

            # [obj, cls, txtytwth, x1y1x2y2] -> [conf, obj, cls, txtytwth]
            target = torch.cat([gt_conf, target[:, :, :7]], dim=2)

            # loss
            (
                conf_loss,
                cls_loss,
                bbox_loss,
                iou_loss
            ) = tools.loss(pred_conf=conf_pred,
                           pred_cls=cls_pred,
                           pred_txtytwth=reg_pred,
                           pred_iou=iou_pred,
                           label=target
                           )

            return conf_loss, cls_loss, bbox_loss, iou_loss   

  4、anchor特征聚类,kmeans聚类方法去聚类得到anchor尺寸。最后选择k=5,设计高瘦的和矮宽的形状。
在这里插入图片描述

  5、位置预测,采用基于grid的坐标偏移来预测中心位置和基于anchor宽高的偏移来预测宽高。指标提升了5%。
在这里插入图片描述

在这里插入图片描述
  tx, ty, tw, tw, to是模型的预测值
  cx, cy是grid的左上角坐标
  pw, ph是anchor的宽高
  bx, by, bw, bh是最终的结果

  6、Fine-Grained Features:passthrough层,就是例如将,26*26特征图缩小 一半,间隔采样(代码见下:),然后在通道上堆积。提升了1%。

batch_size, channels, height, width = x.size()
_height, _width = height // self.stride, width // self.stride
        
x = x.view(batch_size, channels, _height, self.stride, _width, self.stride).transpose(3, 4).contiguous()
x = x.view(batch_size, channels, _height * _width, self.stride * self.stride).transpose(2, 3).contiguous()
x = x.view(batch_size, channels, self.stride * self.stride, _height, _width).transpose(1, 2).contiguous()
x = x.view(batch_size, -1, _height, _width)

#示例:
"""
import torch 

x = torch.randn(1, 2, 6, 6)
print(x)
x = x.view(1, 2, 3, 2, 3,2).transpose(3, 4).contiguous()
x = x.view(1, 2, 9, 4).transpose(2, 3).contiguous()
x = x.view(1, 2, 4, 9).transpose(1, 2).contiguous()
x = x.view(1, -1 ,3, 3)
print(x)

tensor([[[[ 0.9921,  0.9713,  0.9921,  0.1224, -0.0216, -1.0089],
          [ 0.8749,  1.1419,  0.5288,  2.1689, -0.9657,  0.6227],
          [ 0.1907,  0.6249,  1.1909,  0.2477, -0.5035,  1.0367],
          [-1.1894,  0.8621,  0.3574, -1.6438, -0.5417,  1.0281],
          [ 1.0450,  0.8697, -0.1390,  1.2647,  0.0254,  0.1552],
          [ 0.6997, -0.9260, -1.4323,  1.9593,  0.2191,  0.0327]],

         [[-0.6903, -1.2901,  0.0410, -1.8020, -0.8611, -1.0579],
          [ 1.2106, -0.1224,  0.7847, -1.2574, -1.8933, -0.3707],
          [-1.8992,  1.1298,  0.8073, -1.3516,  0.2481,  0.6220],
          [ 0.4965, -0.2016,  0.1038, -0.5439,  0.8829,  0.6646],
          [-0.3316, -0.7143, -0.9919,  1.5706, -0.2006,  1.7617],
          [-0.8411,  0.3575,  0.5623, -0.0685,  0.6562,  0.5704]]]])
tensor([[[[ 0.9921,  0.9921, -0.0216],
          [ 0.1907,  1.1909, -0.5035],
          [ 1.0450, -0.1390,  0.0254]],

         [[-0.6903,  0.0410, -0.8611],
          [-1.8992,  0.8073,  0.2481],
          [-0.3316, -0.9919, -0.2006]],

         [[ 0.9713,  0.1224, -1.0089],
          [ 0.6249,  0.2477,  1.0367],
          [ 0.8697,  1.2647,  0.1552]],

         [[-1.2901, -1.8020, -1.0579],
          [ 1.1298, -1.3516,  0.6220],
          [-0.7143,  1.5706,  1.7617]],

         [[ 0.8749,  0.5288, -0.9657],
          [-1.1894,  0.3574, -0.5417],
          [ 0.6997, -1.4323,  0.2191]],

         [[ 1.2106,  0.7847, -1.8933],
          [ 0.4965,  0.1038,  0.8829],
          [-0.8411,  0.5623,  0.6562]],

         [[ 1.1419,  2.1689,  0.6227],
          [ 0.8621, -1.6438,  1.0281],
          [-0.9260,  1.9593,  0.0327]],

         [[-0.1224, -1.2574, -0.3707],
          [-0.2016, -0.5439,  0.6646],
          [ 0.3575, -0.0685,  0.5704]]]])
"""

  7、多尺度训练:采用多种分辨率的输入图片进行训练,因为网络中已经没有了全连接层,所有输入分辨率都是32的倍数,因为总的下采样倍数是32倍。
在这里插入图片描述
  预测种类更多,每个grid有5个anchor,每个anchor都有一个类别,与YOLOv1不同,这样总的预测向量为:13 x 13 x 5 x ( 5 + num_classes)。
  检测和分类数据集联合训练:检测数据执行YOLOv2损失函数,分类数据集仅仅执行分类损失函数。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
YOLOV1算法是一种目标检测算法,它与传统的目标检测算法不同,采用了一阶段的端到端的方式进行目标检测。该算法将目标检测问题设计为回归问题,通过输入图片到一个神经网络中,输出得到物体的边界框和分类概率等信息。\[2\] 在改进的YOLOV1算法中,主干网络使用了ResNet50,并使用了pytorch框架的1.7.1版本。本文通过代码实现了改进的YOLOV1算法,并对代码中的函数进行了讲解和备注,以帮助读者更好地理解代码的含义。然而,本文只展示了部分代码,完整的代码可以通过GitHub进行下载。\[1\] 对于理解YOLOV1算法的一些细节,如bounding box的生成和如何确定哪个bounding box负责哪个grid cell等问题,有些地方可能在文字或视频讲解中难以理解。因此,通过下载相关代码并阅读代码,可以更进一步地了解YOLOV1算法的实现细节。\[3\] #### 引用[.reference_title] - *1* [YOLOV1(pytorch) 代码讲解](https://blog.csdn.net/ing100/article/details/125155065)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^control_2,239^v3^insert_chatgpt"}} ] [.reference_item] - *2* *3* [YOLOv1学习笔记以及代码介绍](https://blog.csdn.net/weixin_45731018/article/details/127335775)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^control_2,239^v3^insert_chatgpt"}} ] [.reference_item] [ .reference_list ]
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值