yolov4项目记录5-构建标签

最新推荐文章于 2024-04-28 00:05:56 发布

Swayzzu

最新推荐文章于 2024-04-28 00:05:56 发布

阅读量636

点赞数

分类专栏： CV 文章标签：深度学习计算机视觉目标检测

本文链接：https://blog.csdn.net/Swayzzu/article/details/122240134

版权

CV 专栏收录该内容

18 篇文章 0 订阅

订阅专栏

一、构建目标数据

输入的target图片，是(B, m, 5)，我们需要把它转换一下，构建成和模型训练的输出一致的形状，才能计算损失。

1.构造掩码

这里需要对掩码进行初始化，这里包含了正样本、负样本、tx、ty、tw、th、t_box、以及置信度和分类等数据，比如正样本的掩码，只有目标值所在的那个网格才有数据，其他的地方都置为0，因此我们可以先初始化和输出形状一致的全零张量。这里形状不一样的有t_box，因为这个是要存放x,y,w,h信息的，因此要多一个维度；同时类别的形状也不一样，由于是one hot编码，因此也需要多一个维度。

mask = torch.zeros(bs, int(self.num_anchors // 3), in_h, in_w, requires_grad=False)
noobj_mask = torch.ones(bs, int(self.num_anchors // 3), in_h, in_w, requires_grad=False)
tx = torch.zeros(bs, int(self.num_anchors // 3), in_h, in_w, requires_grad=False)
ty = torch.zeros(bs, int(self.num_anchors // 3), in_h, in_w, requires_grad=False)
tw = torch.zeros(bs, int(self.num_anchors // 3), in_h, in_w, requires_grad=False)
th = torch.zeros(bs, int(self.num_anchors // 3), in_h, in_w, requires_grad=False)
t_box = torch.zeros(bs, int(self.num_anchors // 3), in_h, in_w, 4, requires_grad=False)
tconf = torch.zeros(bs, int(self.num_anchors // 3), in_h, in_w, requires_grad=False)
tcls = torch.zeros(bs, int(self.num_anchors // 3), in_h, in_w, self.num_classes, requires_grad=False)

2.计算Label和IOU

这部分需要对每张标签图遍历，并且对每一个框遍历，把标签中的xywh数据转为以网格为单位的数据。这样就可以计算出中心点所属网格。

之后还需要把所有的先验框长宽，和标签框进行计算IOU，找到和标签框最相近的先验框的索引，比如最相近的是6号先验框，那么对于6,7,8这三个先验框，就填入正样本，其他的先验框就不填。计算IOU的函数前面的文章已经有介绍。

# 对每张图，对每一个真实的框遍历
for b in range(bs):
    for t in range(targets[b].shape[0]):
        # 将xywh换算成以网格为单位的数值gx,gy,gw,gh
        gx = targets[b][t, 0] * in_w
        gy = targets[b][t, 1] * in_h
        gw = targets[b][t, 2] * in_w
        gh = targets[b][t, 3] * in_h
        # 计算出属于哪个网格gi, gj
        gi = int(gx)
        gj = int(gy)
        # 将gt_box移动到0,0坐标上，创建4维数据，其实就是真实框的x,y,w,h
        gt_box = torch.FloatTensor(np.array(0, 0, gw, gh)).unsqueeze(0)
        # 先验框的位置，将9个先验框移动到0,0坐标上。
        # 输入的先验框是9个，每一个都有w和h，转成数组[9,2]，前面都补上2个0，这样就把他们放到0,0坐标了
        anchors_shapes = torch.FloatTensor(np.concatenate((np.zeros(self.num_anchors, 2),
                                                           np.array(anchors)), 1))
        # 计算重合程度anch_ious，找到最匹配的anchor序号best_n，如果序号不在当前对应的头部，就continue，否则就进行掩码填充正样本
        anch_ious = bbox_iou(gt_box, anchors_shapes)
        best_n = np.argmax(anch_ious)
        if best_n not in anchor_index:
            continue

3.填充掩码

当我们找到了目标值所在的网格之后，就可以把前面初始化的掩码进行填充了，比如正样本掩码一开始全部初始化为0了，那么我们只需要在对应的位置，把那个值置为1即可。

# 填充正样本掩码
if (gj < in_h) and (gi < in_w):
    best_n = best_n - subtract_index
    # 判定那些先验框内存在物体
    mask[b, best_n, gj, gi] = 1
    noobj_mask[:, :, gj, gi] = 0
    # 计算先验框中心调整参数
    tx[b, best_n, gj, gi] = gx
    ty[b, best_n, gj, gi] = gy
    # 计算先验框宽高调整参数
    tw[b, best_n, gj, gi] = gw
    th[b, best_n, gj, gi] = gh
    # 物体置信度
    tconf[b, best_n, gj, gi] = 1
    # 种类对应位置
    tcls[b, best_n, gj, gi, int(targets[b][t, 4])] = 1
    # 用于获得xywh的比例anchor的相对位置------------------debug看看
    box_loss_scale_x[b, best_n, gj, gi] = targets[b][t, 2]
    box_loss_scale_y[b, best_n, gj, gi] = targets[b][t, 3]
else:
    print("Step {} out of bound.".format(b))
    print("gj:{}, height:{} | gi:{}, width:{}".format(gj, in_h, gi, in_w))
    continue

4.代码汇总

def get_target(self, targets, anchors, in_w, in_h):
    '''
    :param targets: 标签label，形状是[B, m, 6]
    :param anchors: 以网格为单位的anchors
    :param in_w:输入的宽
    :param in_h:输入的高
    :return:
    '''
    # 1.构造掩码
    # 计算一共有多少张图片bs
    bs = len(targets)
    # 获取先验框anchor_index绝对位置，以及用来计算相对位置的subtract_index
    anchor_index = [[0, 1, 2], [3, 4, 5], [6, 7, 8]][self.feature_length.index(in_w)]
    subtract_index = [0, 3, 6][self.feature_length.index(in_w)]
    # 掩码初始化mask，noobj_mask, tx, ty, tw, th, t_box, tconf, tcls, box_loss_scale_x和y
    # 注意这里不需要梯度
    mask = torch.zeros(bs, int(self.num_anchors // 3), in_h, in_w, requires_grad=False)
    noobj_mask = torch.ones(bs, int(self.num_anchors // 3), in_h, in_w, requires_grad=False)
    tx = torch.zeros(bs, int(self.num_anchors // 3), in_h, in_w, requires_grad=False)
    ty = torch.zeros(bs, int(self.num_anchors // 3), in_h, in_w, requires_grad=False)
    tw = torch.zeros(bs, int(self.num_anchors // 3), in_h, in_w, requires_grad=False)
    th = torch.zeros(bs, int(self.num_anchors // 3), in_h, in_w, requires_grad=False)
    t_box = torch.zeros(bs, int(self.num_anchors // 3), in_h, in_w, 4, requires_grad=False)
    tconf = torch.zeros(bs, int(self.num_anchors // 3), in_h, in_w, requires_grad=False)
    tcls = torch.zeros(bs, int(self.num_anchors // 3), in_h, in_w, self.num_classes, requires_grad=False)

    box_loss_scale_x = torch.zeros(bs, int(self.num_anchors // 3), in_h, in_w, requires_grad=False)
    box_loss_scale_y = torch.zeros(bs, int(self.num_anchors // 3), in_h, in_w, requires_grad=False)

    # 2.计算真实的label和9个anchor的IOU
    # 对每张图，对每一个真实的框遍历
    for b in range(bs):
        for t in range(targets[b].shape[0]):
            # 将xywh换算成以网格为单位的数值gx,gy,gw,gh
            gx = targets[b][t, 0] * in_w
            gy = targets[b][t, 1] * in_h
            gw = targets[b][t, 2] * in_w
            gh = targets[b][t, 3] * in_h
            # 计算出属于哪个网格gi, gj
            gi = int(gx)
            gj = int(gy)
            # 将gt_box移动到0,0坐标上，创建4维数据，其实就是真实框的x,y,w,h
            gt_box = torch.FloatTensor(np.array(0, 0, gw, gh)).unsqueeze(0)
            # 先验框的位置，将9个先验框移动到0,0坐标上。
            # 输入的先验框是9个，每一个都有w和h，转成数组[9,2]，前面都补上2个0，这样就把他们放到0,0坐标了
            anchors_shapes = torch.FloatTensor(np.concatenate((np.zeros(self.num_anchors, 2),
                                                               np.array(anchors)), 1))
            # 计算重合程度anch_ious，找到最匹配的anchor序号best_n，如果序号不在当前对应的头部，就continue，否则就进行掩码填充正样本
            anch_ious = bbox_iou(gt_box, anchors_shapes)
            best_n = np.argmax(anch_ious)
            if best_n not in anchor_index:
                continue
            # 填充正样本掩码
            if (gj < in_h) and (gi < in_w):
                best_n = best_n - subtract_index
                # 判定那些先验框内存在物体
                mask[b, best_n, gj, gi] = 1
                noobj_mask[:, :, gj, gi] = 0
                # 计算先验框中心调整参数
                tx[b, best_n, gj, gi] = gx
                ty[b, best_n, gj, gi] = gy
                # 计算先验框宽高调整参数
                tw[b, best_n, gj, gi] = gw
                th[b, best_n, gj, gi] = gh
                # 物体置信度
                tconf[b, best_n, gj, gi] = 1
                # 种类对应位置
                tcls[b, best_n, gj, gi, int(targets[b][t, 4])] = 1
                # 用于获得xywh的比例anchor的相对位置
                box_loss_scale_x[b, best_n, gj, gi] = targets[b][t, 2]
                box_loss_scale_y[b, best_n, gj, gi] = targets[b][t, 3]
            else:
                print("Step {} out of bound.".format(b))
                print("gj:{}, height:{} | gi:{}, width:{}".format(gj, in_h, gi, in_w))
                continue
    t_box[..., 0] = tx
    t_box[..., 1] = ty
    t_box[..., 2] = tw
    t_box[..., 3] = th
    return mask, noobj_mask, t_box, tconf, tcls, box_loss_scale_x, box_loss_scale_y

二、负样本筛选

前面已经计算出来了负样本的掩码noobj_mask，这里面，除了有物体的那个网格是0，其他地方都是1，这样就是一个完全错误的样本了。但这样的话，假如是19*19的结果的话，那么就有3*19*19个框，再加上另外两个头的输出，这样负样本太多了，于是就设定一个IOU阈值，去筛选负样本。具体操作就是把阈值大于0.7的那些负样本，原来填的是1，改成0就可以了。

这里其实就是需要对输出进行解码，解码后构建的box与真实框进行iou计算就行了。这里面的解码方式和之前的稍有差别，但是结果是一致的。当解码完毕并计算完iou之后，根据阈值筛选掉大于0.7的负样本，最终将其返回即可。

def get_ignore(self, prediction, targets, scaled_anchors, in_w, in_h, noobj_mask):
    bs = len(targets)
    anchor_index = [[0, 1, 2], [3, 4, 5], [6, 7, 8]][self.feature_length.index(in_w)]
    # 根据锚框索引获取对应的三个锚框
    scaled_anchors = np.array(scaled_anchors)[anchor_index]

    # 接下来是和之前的头部decode一样的目标，对预测进行解码
    x = torch.sigmoid(prediction[..., 0])
    y = torch.sigmoid(prediction[..., 1])
    w = prediction[..., 2]
    h = prediction[..., 3]

    FloatTensor = torch.cuda.FloatTensor if x.is_cuda else torch.FloatTensor
    LongTensor = torch.cuda.LongTensor if x.is_cuda else torch.LongTensor

    # 生成网格序号数组
    grid_x = torch.linspace(0,in_w-1,in_w).repeat(in_w, 1).repeat(
        int(bs*self.num_anchors/3),1,1).view(x.shape).type(FloatTensor)
    grid_y = torch.linspace(0,in_h-1,in_w).repeat(in_h, 1).t().repeat(
        int(bs * self.num_anchors / 3), 1, 1).view(y.shape).type(FloatTensor)

    # 生成先验框的宽高数组
    anchor_w = FloatTensor(scaled_anchors).index_select(1, LongTensor([0]))
    anchor_h = FloatTensor(scaled_anchors).index_select(1, LongTensor([1]))
    anchor_w = anchor_w.repeat(bs, 1).repeat(1,1,in_w*in_h).view(w.shape)
    anchor_h = anchor_h.repeat(bs, 1).repeat(1,1,in_w*in_h).view(h.shape)
    # 计算调整后的先验框中心与宽高
    pred_boxes = FloatTensor(prediction[..., :4].shape)
    pred_boxes[..., 0] = x+grid_x
    pred_boxes[..., 1] = y + grid_y
    pred_boxes[..., 2] = torch.exp(w) * anchor_w
    pred_boxes[..., 3] = torch.exp(h) * anchor_h

    # 筛选负样本
    for i in range(bs):
        pred_boxes_for_ignore = pred_boxes[i]
        pred_boxes_for_ignore = pred_boxes_for_ignore.view(-1,4)
        if len(targets[i]) > 0:
            gx = targets[i][:, 0:1] * in_w
            gy = targets[i][:, 1:2] * in_h
            gw = targets[i][:, 2:3] * in_w
            gh = targets[i][:, 3:4] * in_h
            gt_box = torch.FloatTensor(np.concatenate([gx, gy, gw, gh], axis=-1)).type(FloatTensor)

            anch_ious = iou(gt_box, pred_boxes_for_ignore)
            # 去掉大于0.7的
            for t in range(targets[i].shape[0]):
                anch_iou = anch_ious[t].view(pred_boxes[i].size()[:3])
                noobj_mask[i][anch_iou>self.ignore_thresh] = 0
    return noobj_mask, pred_boxes