目录
一、构建目标数据
输入的target图片,是(B, m, 5),我们需要把它转换一下,构建成和模型训练的输出一致的形状,才能计算损失。
1.构造掩码
这里需要对掩码进行初始化,这里包含了正样本、负样本、tx、ty、tw、th、t_box、以及置信度和分类等数据,比如正样本的掩码,只有目标值所在的那个网格才有数据,其他的地方都置为0,因此我们可以先初始化和输出形状一致的全零张量。这里形状不一样的有t_box,因为这个是要存放x,y,w,h信息的,因此要多一个维度;同时类别的形状也不一样,由于是one hot编码,因此也需要多一个维度。
mask = torch.zeros(bs, int(self.num_anchors // 3), in_h, in_w, requires_grad=False)
noobj_mask = torch.ones(bs, int(self.num_anchors // 3), in_h, in_w, requires_grad=False)
tx = torch.zeros(bs, int(self.num_anchors // 3), in_h, in_w, requires_grad=False)
ty = torch.zeros(bs, int(self.num_anchors // 3), in_h, in_w, requires_grad=False)
tw = torch.zeros(bs, int(self.num_anchors // 3), in_h, in_w, requires_grad=False)
th = torch.zeros(bs, int(self.num_anchors // 3), in_h, in_w, requires_grad=False)
t_box = torch.zeros(bs, int(self.num_anchors // 3), in_h, in_w, 4, requires_grad=False)
tconf = torch.zeros(bs, int(self.num_anchors // 3), in_h, in_w, requires_grad=False)
tcls = torch.zeros(bs, int(self.num_anchors // 3), in_h, in_w, self.num_classes, requires_grad=False)
2.计算Label和IOU
这部分需要对每张标签图遍历,并且对每一个框遍历,把标签中的xywh数据转为以网格为单位的数据。这样就可以计算出中心点所属网格。
之后还需要把所有的先验框长宽,和标签框进行计算IOU,找到和标签框最相近的先验框的索引,比如最相近的是6号先验框,那么对于6,7,8这三个先验框,就填入正样本,其他的先验框就不填。计算IOU的函数前面的文章已经有介绍。
# 对每张图,对每一个真实的框遍历
for b in range(bs):
for t in range(targets[b].shape[0]):
# 将xywh换算成以网格为单位的数值gx,gy,gw,gh
gx = targets[b][t, 0] * in_w
gy = targets[b][t, 1] * in_h
gw = targets[b][t, 2] * in_w
gh = targets[b][t, 3] * in_h
# 计算出属于哪个网格gi, gj
gi = int(gx)
gj = int(gy)
# 将gt_box移动到0,0坐标上,创建4维数据,其实就是真实框的x,y,w,h
gt_box = torch.FloatTensor(np.array(0, 0, gw, gh)).unsqueeze(0)
# 先验框的位置,将9个先验框移动到0,0坐标上。
# 输入的先验框是9个,每一个都有w和h,转成数组[9,2],前面都补上2个0,这样就把他们放到0,0坐标了
anchors_shapes = torch.FloatTensor(np.concatenate((np.zeros(self.num_anchors, 2),
np.array(anchors)), 1))
# 计算重合程度anch_ious,找到最匹配的anchor序号best_n,如果序号不在当前对应的头部,就continue,否则就进行掩码填充正样本
anch_ious = bbox_iou(gt_box, anchors_shapes)
best_n = np.argmax(anch_ious)
if best_n not in anchor_index:
continue
3.填充掩码
当我们找到了目标值所在的网格之后,就可以把前面初始化的掩码进行填充了,比如正样本掩码一开始全部初始化为0了,那么我们只需要在对应的位置,把那个值置为1即可。
# 填充正样本掩码
if (gj < in_h) and (gi < in_w):
best_n = best_n - subtract_index
# 判定那些先验框内存在物体
mask[b, best_n, gj, gi] = 1
noobj_mask[:, :, gj, gi] = 0
# 计算先验框中心调整参数
tx[b, best_n, gj, gi] = gx
ty[b, best_n, gj, gi] = gy
# 计算先验框宽高调整参数
tw[b, best_n, gj, gi] = gw
th[b, best_n, gj, gi] = gh
# 物体置信度
tconf[b, best_n, gj, gi] = 1
# 种类对应位置
tcls[b, best_n, gj, gi, int(targets[b][t, 4])] = 1
# 用于获得xywh的比例anchor的相对位置------------------debug看看
box_loss_scale_x[b, best_n, gj, gi] = targets[b][t, 2]
box_loss_scale_y[b, best_n, gj, gi] = targets[b][t, 3]
else:
print("Step {} out of bound.".format(b))
print("gj:{}, height:{} | gi:{}, width:{}".format(gj, in_h, gi, in_w))
continue
4.代码汇总
def get_target(self, targets, anchors, in_w, in_h):
'''
:param targets: 标签label,形状是[B, m, 6]
:param anchors: 以网格为单位的anchors
:param in_w:输入的宽
:param in_h:输入的高
:return:
'''
# 1.构造掩码
# 计算一共有多少张图片bs
bs = len(targets)
# 获取先验框anchor_index绝对位置,以及用来计算相对位置的subtract_index
anchor_index = [[0, 1, 2], [3, 4, 5], [6, 7, 8]][self.feature_length.index(in_w)]
subtract_index = [0, 3, 6][self.feature_length.index(in_w)]
# 掩码初始化mask,noobj_mask, tx, ty, tw, th, t_box, tconf, tcls, box_loss_scale_x和y
# 注意这里不需要梯度
mask = torch.zeros(bs, int(self.num_anchors // 3), in_h, in_w, requires_grad=False)
noobj_mask = torch.ones(bs, int(self.num_anchors // 3), in_h, in_w, requires_grad=False)
tx = torch.zeros(bs, int(self.num_anchors // 3), in_h, in_w, requires_grad=False)
ty = torch.zeros(bs, int(self.num_anchors // 3), in_h, in_w, requires_grad=False)
tw = torch.zeros(bs, int(self.num_anchors // 3), in_h, in_w, requires_grad=False)
th = torch.zeros(bs, int(self.num_anchors // 3), in_h, in_w, requires_grad=False)
t_box = torch.zeros(bs, int(self.num_anchors // 3), in_h, in_w, 4, requires_grad=False)
tconf = torch.zeros(bs, int(self.num_anchors // 3), in_h, in_w, requires_grad=False)
tcls = torch.zeros(bs, int(self.num_anchors // 3), in_h, in_w, self.num_classes, requires_grad=False)
box_loss_scale_x = torch.zeros(bs, int(self.num_anchors // 3), in_h, in_w, requires_grad=False)
box_loss_scale_y = torch.zeros(bs, int(self.num_anchors // 3), in_h, in_w, requires_grad=False)
# 2.计算真实的label和9个anchor的IOU
# 对每张图,对每一个真实的框遍历
for b in range(bs):
for t in range(targets[b].shape[0]):
# 将xywh换算成以网格为单位的数值gx,gy,gw,gh
gx = targets[b][t, 0] * in_w
gy = targets[b][t, 1] * in_h
gw = targets[b][t, 2] * in_w
gh = targets[b][t, 3] * in_h
# 计算出属于哪个网格gi, gj
gi = int(gx)
gj = int(gy)
# 将gt_box移动到0,0坐标上,创建4维数据,其实就是真实框的x,y,w,h
gt_box = torch.FloatTensor(np.array(0, 0, gw, gh)).unsqueeze(0)
# 先验框的位置,将9个先验框移动到0,0坐标上。
# 输入的先验框是9个,每一个都有w和h,转成数组[9,2],前面都补上2个0,这样就把他们放到0,0坐标了
anchors_shapes = torch.FloatTensor(np.concatenate((np.zeros(self.num_anchors, 2),
np.array(anchors)), 1))
# 计算重合程度anch_ious,找到最匹配的anchor序号best_n,如果序号不在当前对应的头部,就continue,否则就进行掩码填充正样本
anch_ious = bbox_iou(gt_box, anchors_shapes)
best_n = np.argmax(anch_ious)
if best_n not in anchor_index:
continue
# 填充正样本掩码
if (gj < in_h) and (gi < in_w):
best_n = best_n - subtract_index
# 判定那些先验框内存在物体
mask[b, best_n, gj, gi] = 1
noobj_mask[:, :, gj, gi] = 0
# 计算先验框中心调整参数
tx[b, best_n, gj, gi] = gx
ty[b, best_n, gj, gi] = gy
# 计算先验框宽高调整参数
tw[b, best_n, gj, gi] = gw
th[b, best_n, gj, gi] = gh
# 物体置信度
tconf[b, best_n, gj, gi] = 1
# 种类对应位置
tcls[b, best_n, gj, gi, int(targets[b][t, 4])] = 1
# 用于获得xywh的比例anchor的相对位置
box_loss_scale_x[b, best_n, gj, gi] = targets[b][t, 2]
box_loss_scale_y[b, best_n, gj, gi] = targets[b][t, 3]
else:
print("Step {} out of bound.".format(b))
print("gj:{}, height:{} | gi:{}, width:{}".format(gj, in_h, gi, in_w))
continue
t_box[..., 0] = tx
t_box[..., 1] = ty
t_box[..., 2] = tw
t_box[..., 3] = th
return mask, noobj_mask, t_box, tconf, tcls, box_loss_scale_x, box_loss_scale_y
二、负样本筛选
前面已经计算出来了负样本的掩码noobj_mask,这里面,除了有物体的那个网格是0,其他地方都是1,这样就是一个完全错误的样本了。但这样的话,假如是19*19的结果的话,那么就有3*19*19个框,再加上另外两个头的输出,这样负样本太多了,于是就设定一个IOU阈值,去筛选负样本。具体操作就是把阈值大于0.7的那些负样本,原来填的是1,改成0就可以了。
这里其实就是需要对输出进行解码,解码后构建的box与真实框进行iou计算就行了。这里面的解码方式和之前的稍有差别,但是结果是一致的。当解码完毕并计算完iou之后,根据阈值筛选掉大于0.7的负样本,最终将其返回即可。
def get_ignore(self, prediction, targets, scaled_anchors, in_w, in_h, noobj_mask):
bs = len(targets)
anchor_index = [[0, 1, 2], [3, 4, 5], [6, 7, 8]][self.feature_length.index(in_w)]
# 根据锚框索引获取对应的三个锚框
scaled_anchors = np.array(scaled_anchors)[anchor_index]
# 接下来是和之前的头部decode一样的目标,对预测进行解码
x = torch.sigmoid(prediction[..., 0])
y = torch.sigmoid(prediction[..., 1])
w = prediction[..., 2]
h = prediction[..., 3]
FloatTensor = torch.cuda.FloatTensor if x.is_cuda else torch.FloatTensor
LongTensor = torch.cuda.LongTensor if x.is_cuda else torch.LongTensor
# 生成网格序号数组
grid_x = torch.linspace(0,in_w-1,in_w).repeat(in_w, 1).repeat(
int(bs*self.num_anchors/3),1,1).view(x.shape).type(FloatTensor)
grid_y = torch.linspace(0,in_h-1,in_w).repeat(in_h, 1).t().repeat(
int(bs * self.num_anchors / 3), 1, 1).view(y.shape).type(FloatTensor)
# 生成先验框的宽高数组
anchor_w = FloatTensor(scaled_anchors).index_select(1, LongTensor([0]))
anchor_h = FloatTensor(scaled_anchors).index_select(1, LongTensor([1]))
anchor_w = anchor_w.repeat(bs, 1).repeat(1,1,in_w*in_h).view(w.shape)
anchor_h = anchor_h.repeat(bs, 1).repeat(1,1,in_w*in_h).view(h.shape)
# 计算调整后的先验框中心与宽高
pred_boxes = FloatTensor(prediction[..., :4].shape)
pred_boxes[..., 0] = x+grid_x
pred_boxes[..., 1] = y + grid_y
pred_boxes[..., 2] = torch.exp(w) * anchor_w
pred_boxes[..., 3] = torch.exp(h) * anchor_h
# 筛选负样本
for i in range(bs):
pred_boxes_for_ignore = pred_boxes[i]
pred_boxes_for_ignore = pred_boxes_for_ignore.view(-1,4)
if len(targets[i]) > 0:
gx = targets[i][:, 0:1] * in_w
gy = targets[i][:, 1:2] * in_h
gw = targets[i][:, 2:3] * in_w
gh = targets[i][:, 3:4] * in_h
gt_box = torch.FloatTensor(np.concatenate([gx, gy, gw, gh], axis=-1)).type(FloatTensor)
anch_ious = iou(gt_box, pred_boxes_for_ignore)
# 去掉大于0.7的
for t in range(targets[i].shape[0]):
anch_iou = anch_ious[t].view(pred_boxes[i].size()[:3])
noobj_mask[i][anch_iou>self.ignore_thresh] = 0
return noobj_mask, pred_boxes