YOLOv2算法笔记

__YOLO__0

已于 2023-05-17 10:05:52 修改

阅读量158

点赞数

文章标签：算法 YOLO 笔记

于 2023-05-16 15:18:35 首次发布

本文链接：https://blog.csdn.net/weixin_41922853/article/details/130698178

版权

本文介绍了YOLOv2（YOLO9000）的改进，包括引入BN层提升精度，使用高分辨率特征提取，全卷积结构，anchor聚类优化，以及基于grid的定位预测方法。此外，文章还提到了YOLOv2的多尺度训练和检测分类数据集的联合训练策略。

摘要由CSDN通过智能技术生成

本文为个人学习过程中所记录笔记，便于梳理思路和后续查看用，如有错误，感谢批评指正！

参考：
paper链接
 项目地址链接
 YOLOv2-pytorch
YOLOv2又名YOLO9000，作者说可以检测超过9000个类别，其主要由YOLOv1改进而来，文中还说可以采用检测数据集和分类数据集来联合训练YOLOv2。
误差分析显示：
1、与Fast R-CNN相比，YOLOv1算法的预测框的位置不准；
2、与基于区域候选框的方法相比，YOLOv1算法的Recall低。
相比与YOLOv1改进点及其性能列表如下：
在这里插入图片描述
具体改进点叙述如下：
1、引入BN，每个卷积层都加了BN，map提升了2%，并且不再使用dropout。
2、引入高分辨率特征提取器，采用448 x 448全尺寸分辨率fintune微调特征提取器10epoch。
3、全卷积和采用anchor，移除全连接，采用全卷积，最后采用1x1卷积输出，参考YOLOv2-pytorch，格子分成13 x 13。

class YOLOv2D19(nn.Module):
    def __init__(self, device, input_size=None, num_classes=20, trainable=False, conf_thresh=0.001, nms_thresh=0.5, anchor_size=None):
        super(YOLOv2D19, self).__init__()
        self.device = device
        self.input_size = input_size
        self.num_classes = num_classes
        self.trainable = trainable
        self.conf_thresh = conf_thresh
        self.nms_thresh = nms_thresh
        self.anchor_size = torch.tensor(anchor_size)
        self.num_anchors = len(anchor_size)
        self.stride = 32
        self.grid_cell, self.all_anchor_wh = self.create_grid(input_size)

        # backbone darknet-19
        self.backbone = build_backbone(model_name='darknet19', pretrained=trainable)
        
        # detection head
        self.convsets_1 = nn.Sequential(
            Conv(1024, 1024, k=3, p=1),
            Conv(1024, 1024, k=3, p=1)
        )

        self.route_layer = Conv(512, 64, k=1)
        self.reorg = reorg_layer(stride=2)

        self.convsets_2 = Conv(1280, 1024, k=3, p=1)
        
        # prediction layer
        self.pred = nn.Conv2d(1024, self.num_anchors*(1 + 4 + self.num_classes), kernel_size=1)


    def create_grid(self, input_size):
        w, h = input_size, input_size
        # generate grid cells
        ws, hs = w // self.stride, h // self.stride
        grid_y, grid_x = torch.meshgrid([torch.arange(hs), torch.arange(ws)])
        grid_xy = torch.stack([grid_x, grid_y], dim=-1).float()
        grid_xy = grid_xy.view(1, hs*ws, 1, 2).to(self.device)

        # generate anchor_wh tensor
        anchor_wh = self.anchor_size.repeat(hs*ws, 1, 1).unsqueeze(0).to(self.device)

        return grid_xy, anchor_wh


    def set_grid(self, input_size):
        self.input_size = input_size
        self.grid_cell, self.all_anchor_wh = self.create_grid(input_size)


    def decode_xywh(self, txtytwth_pred):
        """
            Input: \n
                txtytwth_pred : [B, H*W, anchor_n, 4] \n
            Output: \n
                xywh_pred : [B, H*W*anchor_n, 4] \n
        """
        B, HW, ab_n, _ = txtytwth_pred.size()
        # b_x = sigmoid(tx) + gride_x
        # b_y = sigmoid(ty) + gride_y
        xy_pred = torch.sigmoid(txtytwth_pred[..., :2]) + self.grid_cell
        # b_w = anchor_w * exp(tw)
        # b_h = anchor_h * exp(th)
        wh_pred = torch.exp(txtytwth_pred[..., 2:]) * self.all_anchor_wh
        # [B, H*W, anchor_n, 4] -> [B, H*W*anchor_n, 4]
        xywh_pred = torch.cat([xy_pred, wh_pred], -1).view(B, -1, 4) * self.stride

        return xywh_pred
    

    def decode_boxes(self, txtytwth_pred):
        """
            Input: \n
                txtytwth_pred : [B, H*W, anchor_n, 4] \n
            Output: \n
                x1y1x2y2_pred : [B, H*W*anchor_n, 4] \n
        """
        # txtytwth -> cxcywh
        xywh_pred = self.decode_xywh(txtytwth_pred)

        # cxcywh -> x1y1x2y2
        x1y1x2y2_pred = torch.zeros_like(xywh_pred)
        x1y1_pred = xywh_pred[..., :2] - xywh_pred[..., 2:] * 0.5
        x2y2_pred = xywh_pred[..., :2] + xywh_pred[..., 2:] * 0.5
        x1y1x2y2_pred = torch.cat([x1y1_pred, x2y2_pred], dim=-1)
        
        return x1y1x2y2_pred


    def nms(self, dets, scores):
        """"Pure Python NMS baseline."""
        x1 = dets[:, 0]  #xmin
        y1 = dets[:, 1]  #ymin
        x2 = dets[:, 2]  #xmax
        y2 = dets[:, 3]  #ymax

        areas = (x2 - x1) * (y2 - y1)
        order = scores.argsort()[::-1]

        keep = []
        while order.size > 0:
            i = order[0]
            keep.append(i)
            xx1 = np.maximum(x1[i], x1[order[1:]])
            yy1 = np.maximum(y1[i], y1[order[1:]])
            xx2 = np.minimum(x2[i], x2[order[1:]])
            yy2 = np.minimum(y2[i], y2[order[1:]])

            w = np.maximum(1e-10, xx2 - xx1)
            h = np.maximum(1e-10, yy2 - yy1)
            inter = w * h

            # Cross Area / (bbox + particular area - Cross Area)
            ovr = inter / (areas[i] + areas[order[1:]] - inter)
            #reserve all the boundingbox whose ovr less than thresh
            inds = np.where(ovr <= self.nms_thresh)[0]
            order = order[inds + 1]

        return keep


    def postprocess(self, bboxes, scores):
        """
        bboxes: (HxW, 4), bsize = 1
        scores: (HxW, num_classes), bsize = 1
        """

        cls_inds = np.argmax(scores, axis=1)
        scores = scores[(np.arange(scores.shape[0]), cls_inds)]
        
        # threshold
        keep = np.where(scores >= self.conf_thresh)
        bboxes = bboxes[keep]
        scores = scores[keep]
        cls_inds = cls_inds[keep]

        # NMS
        keep = np.zeros(len(bboxes), dtype=np.int)
        for i in range(self.num_classes):
            inds = np.where(cls_inds == i)[0]
            if len(inds) == 0:
                continue
            c_bboxes = bboxes[inds]
            c_scores = scores[inds]
            c_keep = self.nms(c_bboxes, c_scores)
            keep[inds[c_keep]] = 1

        keep = np.where(keep > 0)
        bboxes = bboxes[keep]
        scores = scores[keep]
        cls_inds = cls_inds[keep]

        return bboxes, scores, cls_inds


    @ torch.no_grad()
    def inference(self, x):
        # backbone
        feats = self.backbone(x)

        # reorg layer
        p5 = self.convsets_1(feats['layer3'])
        p4 = self.reorg(self.route_layer(feats['layer2']))
        p5 = torch.cat([p4, p5], dim=1)

        # head
        p5 = self.convsets_2(p5)

        # pred
        pred = self.pred(p5)

        B, abC, H, W = pred.size()

        # [B, num_anchor * C, H, W] -> [B, H, W, num_anchor * C] -> [B, H*W, num_anchor*C]
        pred = pred.permute(0, 2, 3, 1).contiguous().view(B, H*W, abC)

        # [B, H*W*num_anchor, 1]
        conf_pred = pred[:, :, :1 * self.num_anchors].contiguous().view(B, H*W*self.num_anchors, 1)
        # [B, H*W, num_anchor, num_cls]
        cls_pred = pred[:, :, 1 * self.num_anchors : (1 + self.num_classes) * self.num_anchors].contiguous().view(B, H*W*self.num_anchors, self.num_classes)
        # [B, H*W, num_anchor, 4]
        reg_pred = pred[:, :, (1 + self.num_classes) * self.num_anchors:].contiguous()
        # decode box
        reg_pred = reg_pred.view(B, H*W, self.num_anchors, 4)
        box_pred = self.decode_boxes(reg_pred)

        # batch size = 1
        conf_pred = conf_pred[0]
        cls_pred = cls_pred[0]
        box_pred = box_pred[0]

        # score
        scores = torch.sigmoid(conf_pred) * torch.softmax(cls_pred, dim=-1)

        # normalize bbox
        bboxes = torch.clamp(box_pred / self.input_size, 0., 1.)

        # to cpu
        scores = scores.to('cpu').numpy()
        bboxes = bboxes.to('cpu').numpy()

        # post-process
        bboxes, scores, cls_inds = self.postprocess(bboxes, scores)

        return bboxes, scores, cls_inds


    def forward(self, x, target=None):
        if not self.trainable:
            return self.inference(x)
        else:
            # backbone
            feats = self.backbone(x)

            # reorg layer
            p5 = self.convsets_1(feats['layer3'])
            p4 = self.reorg(self.route_layer(feats['layer2']))
            p5 = torch.cat([p4, p5], dim=1)

            # head
            p5 = self.convsets_2(p5)

            # pred
            pred = self.pred(p5)

            B, abC, H, W = pred.size()

            # [B, num_anchor * C, H, W] -> [B, H, W, num_anchor * C] -> [B, H*W, num_anchor*C]
            pred = pred.permute(0, 2, 3, 1).contiguous().view(B, H*W, abC)

            # [B, H*W*num_anchor, 1]
            conf_pred = pred[:, :, :1 * self.num_anchors].contiguous().view(B, H*W*self.num_anchors, 1)
            # [B, H*W, num_anchor, num_cls]
            cls_pred = pred[:, :, 1 * self.num_anchors : (1 + self.num_classes) * self.num_anchors].contiguous().view(B, H*W*self.num_anchors, self.num_classes)
            # [B, H*W, num_anchor, 4]
            reg_pred = pred[:, :, (1 + self.num_classes) * self.num_anchors:].contiguous()
            reg_pred = reg_pred.view(B, H*W, self.num_anchors, 4)

            # decode bbox
            x1y1x2y2_pred = (self.decode_boxes(reg_pred) / self.input_size).view(-1, 4)
            x1y1x2y2_gt = target[:, :, 7:].view(-1, 4)
            reg_pred = reg_pred.view(B, H*W*self.num_anchors, 4)

            # set conf target
            iou_pred = tools.iou_score(x1y1x2y2_pred, x1y1x2y2_gt).view(B, -1, 1)
            gt_conf = iou_pred.clone().detach()

            # [obj, cls, txtytwth, x1y1x2y2] -> [conf, obj, cls, txtytwth]
            target = torch.cat([gt_conf, target[:, :, :7]], dim=2)

            # loss
            (
                conf_loss,
                cls_loss,
                bbox_loss,
                iou_loss
            ) = tools.loss(pred_conf=conf_pred,
                           pred_cls=cls_pred,
                           pred_txtytwth=reg_pred,
                           pred_iou=iou_pred,
                           label=target
                           )

            return conf_loss, cls_loss, bbox_loss, iou_loss

4、anchor特征聚类，kmeans聚类方法去聚类得到anchor尺寸。最后选择k=5，设计高瘦的和矮宽的形状。
在这里插入图片描述

5、位置预测，采用基于grid的坐标偏移来预测中心位置和基于anchor宽高的偏移来预测宽高。指标提升了5%。
在这里插入图片描述

在这里插入图片描述
tx, ty, tw, tw, to是模型的预测值
cx, cy是grid的左上角坐标
pw, ph是anchor的宽高
bx, by, bw, bh是最终的结果

6、Fine-Grained Features：passthrough层，就是例如将，26*26特征图缩小一半，间隔采样（代码见下：），然后在通道上堆积。提升了1%。

batch_size, channels, height, width = x.size()
_height, _width = height // self.stride, width // self.stride
        
x = x.view(batch_size, channels, _height, self.stride, _width, self.stride).transpose(3, 4).contiguous()
x = x.view(batch_size, channels, _height * _width, self.stride * self.stride).transpose(2, 3).contiguous()
x = x.view(batch_size, channels, self.stride * self.stride, _height, _width).transpose(1, 2).contiguous()
x = x.view(batch_size, -1, _height, _width)

#示例：
"""
import torch 

x = torch.randn(1, 2, 6, 6)
print(x)
x = x.view(1, 2, 3, 2, 3,2).transpose(3, 4).contiguous()
x = x.view(1, 2, 9, 4).transpose(2, 3).contiguous()
x = x.view(1, 2, 4, 9).transpose(1, 2).contiguous()
x = x.view(1, -1 ,3, 3)
print(x)

tensor([[[[ 0.9921,  0.9713,  0.9921,  0.1224, -0.0216, -1.0089],
          [ 0.8749,  1.1419,  0.5288,  2.1689, -0.9657,  0.6227],
          [ 0.1907,  0.6249,  1.1909,  0.2477, -0.5035,  1.0367],
          [-1.1894,  0.8621,  0.3574, -1.6438, -0.5417,  1.0281],
          [ 1.0450,  0.8697, -0.1390,  1.2647,  0.0254,  0.1552],
          [ 0.6997, -0.9260, -1.4323,  1.9593,  0.2191,  0.0327]],

         [[-0.6903, -1.2901,  0.0410, -1.8020, -0.8611, -1.0579],
          [ 1.2106, -0.1224,  0.7847, -1.2574, -1.8933, -0.3707],
          [-1.8992,  1.1298,  0.8073, -1.3516,  0.2481,  0.6220],
          [ 0.4965, -0.2016,  0.1038, -0.5439,  0.8829,  0.6646],
          [-0.3316, -0.7143, -0.9919,  1.5706, -0.2006,  1.7617],
          [-0.8411,  0.3575,  0.5623, -0.0685,  0.6562,  0.5704]]]])
tensor([[[[ 0.9921,  0.9921, -0.0216],
          [ 0.1907,  1.1909, -0.5035],
          [ 1.0450, -0.1390,  0.0254]],

         [[-0.6903,  0.0410, -0.8611],
          [-1.8992,  0.8073,  0.2481],
          [-0.3316, -0.9919, -0.2006]],

         [[ 0.9713,  0.1224, -1.0089],
          [ 0.6249,  0.2477,  1.0367],
          [ 0.8697,  1.2647,  0.1552]],

         [[-1.2901, -1.8020, -1.0579],
          [ 1.1298, -1.3516,  0.6220],
          [-0.7143,  1.5706,  1.7617]],

         [[ 0.8749,  0.5288, -0.9657],
          [-1.1894,  0.3574, -0.5417],
          [ 0.6997, -1.4323,  0.2191]],

         [[ 1.2106,  0.7847, -1.8933],
          [ 0.4965,  0.1038,  0.8829],
          [-0.8411,  0.5623,  0.6562]],

         [[ 1.1419,  2.1689,  0.6227],
          [ 0.8621, -1.6438,  1.0281],
          [-0.9260,  1.9593,  0.0327]],

         [[-0.1224, -1.2574, -0.3707],
          [-0.2016, -0.5439,  0.6646],
          [ 0.3575, -0.0685,  0.5704]]]])
"""

7、多尺度训练：采用多种分辨率的输入图片进行训练，因为网络中已经没有了全连接层，所有输入分辨率都是32的倍数，因为总的下采样倍数是32倍。
在这里插入图片描述
预测种类更多，每个grid有5个anchor，每个anchor都有一个类别，与YOLOv1不同，这样总的预测向量为：13 x 13 x 5 x ( 5 + num_classes)。
检测和分类数据集联合训练：检测数据执行YOLOv2损失函数，分类数据集仅仅执行分类损失函数。

__YOLO__0

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
YOLOv2算法笔记

预测种类更多，每个grid有5个anchor，每个anchor都有一个类别，与YOLOv1不同，这样总的预测向量为：13 x 13 x 5 x ( 5 + num_classes)。YOLOv2又名YOLO9000，作者说可以检测超过9000个类别，其主要由YOLOv1改进而来，文中还说可以采用检测数据集和分类数据集来联合训练YOLOv2。：采用多种分辨率的输入图片进行训练，因为网络中已经没有了全连接层，所有输入分辨率都是32的倍数，因为总的下采样倍数是32倍。最后选择k=5，设计高瘦的和矮宽的形状。
复制链接

扫一扫