在写badcase可视化的时候,用到了混淆矩阵,需要清楚混淆矩阵的原理,于是又开始了debug...
在debug的时候可以将batch设为1, 方便调试。
代码解析
首先看完整代码:
def process_batch(self, detections, labels):
"""
Update confusion matrix for object detection task.
Args:
detections (Array[N, 6]): Detected bounding boxes and their associated information.
Each row should contain (x1, y1, x2, y2, conf, class).
labels (Array[M, 5]): Ground truth bounding boxes and their associated class labels.
Each row should contain (class, x1, y1, x2, y2).
"""
if labels.size(0) == 0: # 检查是否有标签
if detections is not None: # 没有标签,有预测结果
detections = detections[detections[:, 4] > self.conf] # 获取大于置信度阈值的预测结果
detection_classes = detections[:, 5].int()
for dc in detection_classes:
self.matrix[dc, self.nc] += 1 # 记为fp
return
if detections is None: # 有标签,没有预测结果
gt_classes = labels.int()
for gc in gt_classes:
self.matrix[self.nc, gc] += 1 # 记为fn
return
detections = detections[detections[:, 4] > self.conf] # 获取大于置信度阈值的预测结果
gt_classes = labels[:, 0].int()
detection_classes = detections[:, 5].int()
iou = box_iou(labels[:, 1:], detections[:, :4])
x = torch.where(iou > self.iou_thres) # 找到 IoU 大于阈值 self.iou_thres 的匹配项,返回两个一维张量,分别对应 IoU 矩阵中满足条件的行索引和列索引
if x[0].shape[0]:
matches = torch.cat((torch.stack(x, 1), iou[x[0], x[1]][:, None]), 1).cpu().numpy()
if x[0].shape[0] > 1:
matches = matches[matches[:, 2].argsort()[::-1]]
matches = matches[np.unique(matches[:, 1], return_index=True)[1]]
matches = matches[matches[:, 2].argsort()[::-1]]
matches = matches[np.unique(matches[:, 0], return_index=True)[1]]
else:
matches = np.zeros((0, 3))
n = matches.shape[0] > 0
m0, m1, _ = matches.transpose().astype(int)
for i, gc in enumerate(gt_classes):
j = m0 == i
if n and sum(j) == 1:
self.matrix[detection_classes[m1[j]], gc] += 1 # correct
else:
self.matrix[self.nc, gc] += 1 # true background
if n:
for i, dc in enumerate(detection_classes):
if not any(m1 == i):
self.matrix[dc, self.nc] += 1 # predicted background
debug过程
可以到上一个堆栈,可视化一下真值和预测结果:
输入下面的可视化脚本,可以查看真值和预测值的对比:
import cv2
from torchvision.transforms import ToPILImage
import numpy as np
img_tensor = batch['img'].cpu().squeeze(0)
gt_cls = batch['cls'].cpu().squeeze().numpy()
gt_bboxes = batch['bboxes'].cpu().numpy()
to_pil = ToPILImage()
ori_img = to_pil(img_tensor)
gt_img = np.asarray(ori_img)
gt_img = np.copy(gt_img)
for i in range(len(gt_cls)):
x, y, w, h = gt_bboxes[i]
x, y, w, h = x * gt_img.shape[1], y * gt_img.shape[0], w * gt_img.shape[1], h * gt_img.shape[0]
x1, y1, x2, y2 = x - w / 2, y - h / 2, x + w / 2, y + h / 2
label = self.names[gt_cls[i]]
color = (0, 0, 255)
cv2.rectangle(gt_img, (int(x1), int(y1)), (int(x2), int(y2)), color, 2)
cv2.putText(gt_img, label, (int(x1), int(y1) - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, color, 2)
pred_img = np.asarray(ori_img)
pred_img = np.copy(pred_img)
pred_cls = preds[0][:, 5]
pred_bboxes = preds[0][:, :4]
pred_conf = preds[0][:, 4]
for i in range(len(pred_cls)):
x1, y1, x2, y2 = pred_bboxes[i]
label = self.names[pred_cls[i].tolist()] + "{:.2f}".format(pred_conf[i].tolist())
cv2.rectangle(pred_img, (int(x1), int(y1)), (int(x2), int(y2)), color, 2)
cv2.putText(pred_img, label, (int(x1), int(y1) - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, color, 2)
combined_img = cv2.hconcat([gt_img, pred_img])
cv2.imwrite(r'D:\360MoveData\Users\UNICORN\Desktop\ultralytics-main\ultralytics\models\yolo\detect\confusion_debug.jpg', combined_img)
再回到第一个堆栈,打印一下传入的参数:
由于detections和labels均不为none,所以跳过了两个if,进入了下面的代码:
detections = detections[detections[:, 4] > self.conf]
由于conf设的是0.5,所以detections的内容没有变化。接下来是获取真实类别、预测类别、真实框与预测框之间的iou:
gt_classes = labels[:, 0].int()
detection_classes = detections[:, 5].int()
iou = box_iou(labels[:, 1:], detections[:, :4])
然后找出iou大于阈值的索引对:
x = torch.where(iou > self.iou_thres) # 找出iou大于阈值的索引对
iou_thres=0.45,在这里x是两个张量,分别是行索引和列索引,所以相当于返回了两个索引,分别是(0,0)和(1,1)。接下来就检查是否有匹配项,如果 x[0]
的形状大于 0,即存在至少一个匹配项:
if x[0].shape[0]:
matches = torch.cat((torch.stack(x, 1), iou[x[0], x[1]][:, None]), 1).cpu().numpy()
if x[0].shape[0] > 1:
matches = matches[matches[:, 2].argsort()[::-1]]
matches = matches[np.unique(matches[:, 1], return_index=True)[1]]
matches = matches[matches[:, 2].argsort()[::-1]]
matches = matches[np.unique(matches[:, 0], return_index=True)[1]]
else:
matches = np.zeros((0, 3))
下面这几个matches一个一个看,先看第0个(暂且管它叫第0个):
matches = torch.cat((torch.stack(x, 1), iou[x[0], x[1]][:, None]), 1).cpu().numpy()
- 作用:构建一个包含匹配项的数组,包括匹配的行索引(真实标签索引)、列索引(预测结果索引)和对应的 IoU 值。
- 逻辑:
torch.stack(x, 1)
:将两个一维张量堆叠成一个二维张量,每一行包含一个匹配项的行索引和列索引。iou[x[0], x[1]][:, None]
:获取 IoU 矩阵中匹配项的 IoU 值,并将其转换为一列。torch.cat(..., 1)
:沿第二维度(列方向)拼接上述两个张量。
接下来对匹配项进行判断,如果有多个匹配项,则进行后续的匹配:
if x[0].shape[0] > 1:
matches = matches[matches[:, 2].argsort()[::-1]]
matches = matches[np.unique(matches[:, 1], return_index=True)[1]]
matches = matches[matches[:, 2].argsort()[::-1]]
matches = matches[np.unique(matches[:, 0], return_index=True)[1]]
第1个matches:
matches = matches[matches[:, 2].argsort()[::-1]]
- 作用:根据 IoU 值对匹配项进行排序。
- 逻辑:
matches[:, 2]
:获取 IoU 值。.argsort()
:返回按照 IoU 值从小到大排序的索引。[::-1]
:反转索引,使得 IoU 值从大到小排序。matches[...]
:根据排序后的索引重新排列matches
数组。
第2个matches:
matches = matches[np.unique(matches[:, 1], return_index=True)[1]]
- 作用:去除具有相同预测结果索引的重复项 。
- 逻辑:
matches[:, 1]
:获取matches
数组中第二列的内容,即预测结果的索引。np.unique(matches[:, 1], return_index=True)
:返回matches[:, 1]
中的唯一值及其在原始数组中的位置索引。np.unique(matches[:, 1], return_index=True)[1]
:获取位置索引。matches[np.unique(matches[:, 1], return_index=True)[1]]:
根据位置索引去除具有相同预测结果索引的重复匹配项。
接下来的第3、4次matches基本上是重复第1、2的操作,为了更好地理解四次matches,举一个充分的例子 来说明:
# 假设
matches = np.array([
[0, 1, 0.8],
[1, 2, 0.6],
[1, 3, 0.7],
[2, 4, 0.5],
[2, 5, 0.48],
[3, 5, 0.6],
[3, 6, 0.9],
[3, 7, 0.85],
[3, 8, 0.75]
])
# 步骤 1: matches = matches[matches[:, 2].argsort()[::-1]] iou从大到小排序,为了保留iou大的预测结果
matches
array([[ 3, 6, 0.9],
[ 3, 7, 0.85],
[ 0, 1, 0.8],
[ 3, 8, 0.75],
[ 1, 3, 0.7],
[ 3, 5, 0.6],
[ 1, 2, 0.6],
[ 2, 4, 0.5],
[ 2, 5, 0.48]])
# 步骤 2: matches = matches[np.unique(matches[:, 1], return_index=True)[1]] 去除重复的预测结果
matches
array([[ 0, 1, 0.8],
[ 1, 2, 0.6],
[ 1, 3, 0.7],
[ 2, 4, 0.5],
[ 3, 5, 0.6],
[ 3, 6, 0.9],
[ 3, 7, 0.85],
[ 3, 8, 0.75]])
# 步骤 3:matches = matches[matches[:, 2].argsort()[::-1]]
matches
array([[ 3, 6, 0.9],
[ 3, 7, 0.85],
[ 0, 1, 0.8],
[ 3, 8, 0.75],
[ 1, 3, 0.7],
[ 3, 5, 0.6],
[ 1, 2, 0.6],
[ 2, 4, 0.5]])
# 步骤 4:matches = matches[np.unique(matches[:, 0], return_index=True)[1]]
matches
array([[ 0, 1, 0.8],
[ 1, 3, 0.7],
[ 2, 4, 0.5],
[ 3, 6, 0.9]])
这么做是为了确保每个真实标签和预测结果最多只与一个匹配项关联。后面接着对matches进行判断,检查 matches
数组是否为空,再将 matches
数组转换为整数类型,m0
对应真实标签的索引,m1
对应预测结果的索引:
m0, m1, _ = matches.transpose().astype(int)
接下来就是基于 matches
数组中的匹配信息来统计每个类别的TP、FP、FN的数量。
for i, gc in enumerate(gt_classes):
j = m0 == i
if n and sum(j) == 1:
self.matrix[detection_classes[m1[j]], gc] += 1 # correct 这个并不是tp,当m1[j]] == gc时,才是tp!!!
else:
self.matrix[self.nc, gc] += 1 # true background 如果m0没有i,则记为fn
if n:
for i, dc in enumerate(detection_classes):
if not any(m1 == i):
self.matrix[dc, self.nc] += 1 # predicted background 如果没有预测结果的匹配项,那么这个预测结果就被记为fp
注:
conf参数是我们传入的(和nms的conf参数一致),但是iou_thres不是我们设置的nms的iou
self.confusion_matrix = ConfusionMatrix(nc=self.nc, conf=self.args.conf) # 初始化时,传入的是我们设置的conf
class ConfusionMatrix:
def __init__(self, nc, conf=0.25, iou_thres=0.45, task='detect'):
"""Initialize attributes for the YOLO model."""
self.task = task
self.matrix = np.zeros((nc + 1, nc + 1)) if self.task == 'detect' else np.zeros((nc, nc))
self.nc = nc # number of classes
self.conf = 0.25 if conf in (None, 0.001) else conf # apply 0.25 if default val conf is passed
self.iou_thres = iou_thres
# 但是iou_thres不是我们设置的,我们设置的是nms的iou,预测框与预测框之间的iou;这里的iou_thres是预测框与真实框之间的iou