pytorch_yolov3解决类别不均衡导致MAP较低的问题

最新推荐文章于 2024-09-12 08:00:00 发布

小楞

最新推荐文章于 2024-09-12 08:00:00 发布

阅读量8.4k

点赞数 5

分类专栏：深度学习yolov3 深度学习知识文章标签：深度学习

本文链接：https://blog.csdn.net/qq_33270279/article/details/105599949

版权

深度学习yolov3 同时被 2 个专栏收录

13 篇文章 2 订阅

订阅专栏

深度学习知识

9 篇文章 1 订阅

订阅专栏

文章目录

pytorch_yolov3 loss函数设计解决样本不均衡

pytorch_yolov3 loss函数设计解决样本不均衡

问题描述

1.样本量类别间不均衡导致MAP过低（占比较低的类别AP值接近于0）。

2.正负样本比例失衡，导致检出率过低。

原始loss函数

有关yolov3 loss函数的探索参考：https://blog.csdn.net/qq_33270279/article/details/102631557

self.obj_scale = 1		#confidence 正样本loss的调节权重
self.noobj_scale =100	#confidence 负样本loss的调节权重（此值越大，模型的抗误检能力越强）
...
self.mse_loss = nn.MSELoss()  # yolo层的子层  均方误差 损失函数，计算 检测时的坐标损失
self.bce_loss = nn.BCELoss()  # yolo层的子层  计算目标和输出之间的二进制交叉熵损失函数，对应sigmod激活函数的二分类问题。激活函数与loss函数的对应关系请参考博客：https://blog.csdn.net/qq_33270279/article/details/102631557
...
# 只计算标志位（obj_mask：一个batch中所有的目标标志位）之下的loss
loss_x = self.mse_loss(x[obj_mask], tx[obj_mask])  #框回归的loss
loss_y = self.mse_loss(y[obj_mask], ty[obj_mask])
loss_w = self.mse_loss(w[obj_mask], tw[obj_mask])
loss_h = self.mse_loss(h[obj_mask], th[obj_mask]) 

loss_conf_obj = self.bce_loss(pred_conf[obj_mask], tconf[obj_mask])  # 计算置信度loss  
loss_conf_noobj = self.bce_loss(pred_conf[noobj_mask], tconf[noobj_mask]) 
loss_conf = self.obj_scale * loss_conf_obj + self.noobj_scale * loss_conf_noobj

loss_cls = self.bce_loss(pred_cls[obj_mask], tcls[obj_mask])  # 计算类别概率loss

total_loss = loss_x + loss_y + loss_w + loss_h + loss_conf + loss_cls	#总loss

存在的问题：

1.框回归、置信度以及类别不同属性间的loss，不存在重要程度的衡量权重。

2.类别属性loss函数计算过程中，未考虑类别间比例失衡的因素。

3.缺少正负样本失衡情况的处理机制。

loss函数的重新设计

主要是从以下四个维度添加其对loss的调节机制：

1.不同属性间比重：

self.clsWeight = 10	#类别属性占总loss函数的权重参数
...
loss_cls = self.clsWeight * self.bce_loss(pred_cls[obj_mask], tcls[obj_mask])  # 计算类别概率loss
total_loss = loss_x + loss_y + loss_w + loss_h + loss_conf + loss_cls	#总loss

原因分析：

增加类别属性在总loss函数中的占比，相当于增加类别属性的重要程度，从而能在一定程度上提升样本占比较少的类别的检测效果。

通过实验验证，确实有很大提高（具体效果秘密，可以自己尝试）。

2.不同类别样本比重：

思路：根据不同样本的占比，动态调节不同类别loss的权重大小，从而解决类别不均衡导致类别AP低的问题。

使用的基础函数：torch.nn.BCELoss(weight=weight)

其表达式为：

在这里插入图片描述

统计样本不同类别数量的分布情况：

ratios=[class1_ratio,…,classn_ratio]

class YOLOLayer(nn.Module):
    """Detection layer"""
    ....
    def forward(self, newAP, x, targets=None, img_dim=None):  
        ...
            # 将ratio值转换为BCELoss中的权重。
            weight = torch.zeros(tcls.shape)  # [num_samples, num_anchors, grid_size_w, grid_size_h, num_classes]
            batchratio = torch.Tensor([1/ratios[0],..., 1/ratios[n-1]])
            batchratio /= batchratio.max()
            weight[0, ...] = batchratio.mul_(10.5)#增加权重尺度
            weight = weight.cuda()
            self.bce_clsloss = nn.BCELoss(weight=weight[obj_mask])

3.不同类别Ap：

思路：根据类别AP，动态调节不同类别loss的权重大小，从而解决类别不均衡导致类别AP低的问题。

使用的基础函数：torch.nn.BCELoss(weight=weight)

其表达式为：
在这里插入图片描述

class YOLOLayer(nn.Module):
    """Detection layer"""
    ....
    def forward(self, newAP, x, targets=None, img_dim=None):  
        ...
            # 将AP值转换为BCELoss中的权重。
            d = torch.zeros(newAP.shape)
            batchweight = batchratio * (torch.max(d, (1 - newAP))) ** 2
            for i, bweight in enumerate(batchweight):
                    bweight /= batchweight.max()
                    bweight *= 10
            for i, bweight in enumerate(batchweight):
                if bweight == 0:
                    batchweight[i] = 10.5
            weight[0, ...] = batchweight
            weight = weight.cuda()
            self.bce_clsloss = nn.BCELoss(weight=weight[obj_mask])

4.难例挖掘：

困难样例挖掘：置信度较高的样例loss贡献值低于置信度低的贡献值。

在这里插入图片描述

从图可知：e+log的难例挖掘效果最好。

class focal_BCELoss(nn.Module):
    def __init__(self, alpha=10, gamma=2):
        super(focal_BCELoss, self).__init__()
        self.alpha = alpha
        self.gamma = gamma

    def forward(self, input, target, eps=1e-7):
        input = torch.clamp(input, eps, 1-eps)
        # loss = -(target * torch.log(input)) * ( self.alpha * (1 - input) ** self.gamma) - (1 - target) * torch.log(1 - input)
        # loss = -target * torch.log(input) - (1 - target) * torch.log(1 - input)
        loss = -(target * torch.log(input)) * np.e**( self.alpha * (1 - input) ** self.gamma) - (1 - target) * torch.log(1 - input) #e+log
        final_loss = torch.mean(loss)
        return final_loss

5.正负目标的匹配策略：

增加正样本的预测可能性，原来正目标的可能预测位置只给出匹配最好的anchor，现在都用（只要大于一定阈值）。

新增use_all_anchors,reject两个标志位，来控制正负目标的匹配策略。

结论：当开启两个标志位时，可以有效的改善样本不均衡带来的问题。

def build_targets(pred_boxes, pred_cls, target, anchors, ignore_thres):
    ByteTensor = torch.cuda.ByteTensor if pred_boxes.is_cuda else torch.ByteTensor  # 标志位类型
    FloatTensor = torch.cuda.FloatTensor if pred_boxes.is_cuda else torch.FloatTensor

    nB = pred_boxes.size(0)  # num_samples(batch_size), num_anchor * (3*(class+5)), grid?
    nA = pred_boxes.size(1)  # num_anchors
    nC = pred_cls.size(-1)   # 维度:num_samples, num_anchors, grid_size_w, grid_size_h, num_classes  取最后一维关于classes的预测值

    nH = pred_boxes.size(2)
    nW = pred_boxes.size(3)

    obj_mask = ByteTensor(nB, nA, nH, nW).fill_(0)  # 对应yolo forward中x y的维度 高*宽
    noobj_mask = ByteTensor(nB, nA, nH, nW).fill_(1)
    class_mask = FloatTensor(nB, nA, nH, nW).fill_(0)
    iou_scores = FloatTensor(nB, nA, nH, nW).fill_(0)
    tx = FloatTensor(nB, nA, nH, nW).fill_(0)
    ty = FloatTensor(nB, nA, nH, nW).fill_(0)
    tw = FloatTensor(nB, nA, nH, nW).fill_(0)
    th = FloatTensor(nB, nA, nH, nW).fill_(0)
    tcls = FloatTensor(nB, nA, nH, nW, nC).fill_(0)  # nC的值可以取到所有label索引的值

    # Convert to position relative to box
    target_boxes = torch.stack([target[:, 2] * nW, target[:, 3] * nH, target[:, 4] * nW, target[:, 5] * nH], 1)  # target_boxes与target两个变量指向同一个元素,要改都会改  # groud truth

    # 坐标信息归一化到了0~1之间,需要进行放大(乘的是feature map的宽高 实际上也是grid的个数)
    gxy = target_boxes[:, :2]
    gwh = target_boxes[:, 2:]  # 好几个target坐标框的宽和高
    gwh2 = gwh
    gxy2 = gxy
    #****************************
    # iou of targets-anchors
    t, best_n,tbox = target, [],[]
    nt = len(target)
    use_all_anchors,reject = True, True
    if nt:
        ious = torch.stack([bbox_wh_iou(anchor, gwh) for anchor in anchors])

        if use_all_anchors:
            na = len(anchors)  # number of anchors
            best_n = torch.arange(na).view((-1, 1)).repeat([1, nt]).view(-1)
            t = t.repeat([na, 1])
            gwh = gwh.repeat([na, 1])
            gxy = gxy.repeat([na, 1])
        else:  # use best anchor only
            best_ious, best_n = ious.max(0)  # best iou and anchor

        # reject anchors below iou_thres (OPTIONAL, increases P, lowers R)
        if reject:
            j = ious.view(-1) > 0.225  # iou threshold hyperparameter
            t, best_n, gwh,gxy = t[j], best_n[j], gwh[j],gxy[j]
            if len(best_n) == 0:
                best_ious, best_n = ious.max(0)  # best iou and anchor
                gwh = gwh2
                gxy = gxy2
                t = target
    tbox.append(torch.cat((gxy, gwh), 1))  # xywh (grids)
    # anchor框的中心点最开始时都在原点
    # 确定用来预测的anchor 用来生成预测框 从而生成loss 其他只生成conf noobj loss
    # Separate target values
    b, target_labels = t[:, :2].long().t()  # target = [idx, labels, x, y, w, h]
    gx, gy = gxy.t()  # 转置
    gw, gh = gwh.t()
    gi, gj = gxy.long().t()  # long是变成长整型  先列索引后行索引  特征图左上角坐标
    # Set masks
    obj_mask[b, best_n, gj, gi] = 1  # 行索引列索引 # 先筛选anchor()只有anchor索引有变化: [图片idx 符合iou阈值的anchor索引 grid grid (先行索引后列索引)] 设为obj = 1 obj_mask为用于预测的anchor
    noobj_mask[b, best_n, gj, gi] = 0  # 对应设no_obj = 0  这里只考虑了负责预测的anchor的noobj loss
    # 这里的gi gj由target的x y转换而来,指有target的特定的grid
    # 之后在最佳anchor的基础上继续进行预测
    if not use_all_anchors:
    # Set noobj mask to zero where iou exceeds ignore threshold
        for i, anchor_ious in enumerate(ious.t()):  # 图片索引和对应的anchors_iou
            noobj_mask[b[i], anchor_ious > ignore_thres, gj[i], gi[i]] = 0  # 一个target中大于iou阈值的anchors索引的 no_obj = 0  其他的本来就被设置为1, 设为1的需要计算no_obj置信度误差,设为0的说明此处的anchor框要么用于生成预测框,即使没有用来生成预测框,也不记入置信度误差,其预测结果被忽略.
    # 此处考虑所有anchor的no obj loss
    # 以上用来设置mask以计算对应位置的loss
    # 以下用来计算真值
    # Coordinates
    # 用于预测的anchor,逐个筛选其grid索引
    tx[b, best_n, gj, gi] = gx - gx.floor()  # gx减去gx取整,即groud truth中的偏置值
    ty[b, best_n, gj, gi] = gy - gy.floor()  # gy减去gy取整,即groud truth中的偏置值
    # Width and height
    tw[b, best_n, gj, gi] = torch.log(gw / anchors[best_n][:, 0] + 1e-16)  # 同理也是groud truth中的偏置值
    th[b, best_n, gj, gi] = torch.log(gh / anchors[best_n][:, 1] + 1e-16)
    # 为什么要除以anchor的宽和高:  tw和th的公式yolov3和faster-rcnn系列是一样的，是物体所在边框的长宽和anchor box长宽之间的比率 再缩放到对数空间
    # 此处的anchors是scaled_anchors: self.scaled_anchors = FloatTensor([(a_w / self.stride, a_h / self.stride) for a_w, a_h in self.anchors])  # anchor的尺度除以每一个grid的宽高 维度为[ self.num_anchors,2] 即每一个anchor占据了几个grid 即按输出特征图大小缩放后的anchor
    # One-hot encoding of label
    tcls[b, best_n, gj, gi, target_labels] = 1  # 对应索引位置的对应target label 的 class label = 1 groud truth # 用于计算类别概率
    # Compute label correctness and iou at best anchor
    class_mask[b, best_n, gj, gi] = (pred_cls[b, best_n, gj, gi].argmax(-1) == target_labels).float()  # argmax(-1):沿-1轴返回最大值索引  即class概率最大的那一个的索引正好等于label的情况 索引对应的class_masks为1 即预测出了正确的class的索引为1
    iou_scores[b, best_n, gj, gi] = bbox_iou(pred_boxes[b, best_n, gj, gi], tbox[0], x1y1x2y2=False)  # 每个box的iou得分为iou值  考虑xywh的iou
    tconf = obj_mask.float()  # 置信度 groud truth的边框置信度为1
    return iou_scores, class_mask, obj_mask, noobj_mask, tx, ty, tw, th, tcls, tconf