faster rcnn中损失函数(三)——理解faster-rcnn中计算rpn_loss_cls&rpn_loss_box的过程

首先感想来源与pytorch的rpn.py。

我们都知道,rpn通过制作lable和targe_ shift来构造rpn loss的计算。那具体是怎么构造的呢?


首先rpn_loss_cls计算:

我们应该首先想到的是: rpn_loss_cls = F.cross_entropy(rpn_cls_score, rpn_label)

维度分析

cross_entropy要求输入是Variable,预测的是2D,label是1D。

所以可以根据默认规定的初始的数据格式b,2*9,h,w进行推导。

rpn_cls_score:  b,2*9,h,w -> b*9*h*w,2   #二分类

然后去除掉不感兴趣的区域:

rpn_cls_score: (b*9*h*w - 标签-1的 ,2)    #二分类

rpn_label:(b*9*h*w - 标签-1的,)

 #return outputs [label ,target ,inside-weight ,outside_weight]
rpn_data = self.RPN_anchor_target((rpn_cls_score.data, gt_boxes, im_info, num_boxes))
rpn_cls_score = rpn_cls_score_reshape.permute(0, 2, 3, 1).contiguous().view(batch_size, -1, 2)# b 9h*w 2
rpn_label = rpn_data[0].view(batch_size, -1)#B 1 9*H W._>b,9*h*w

数据本身分析:

label包括:1 0 -1

首先需要做的是去除-1,即不感兴趣的目标

#~@@!通过的ne去除掉-1,返回非0的索引
rpn_keep = Variable(rpn_label.view(-1).ne(-1).nonzero().view(-1))#nonzero返回b*9h*w行1列,所以需要view变成一维
rpn_cls_score = torch.index_select(rpn_cls_score.view(-1,2), 0, rpn_keep)#从rpn_cls_score(b*9h*w,2)从第0轴按照rpn_keep索引找
rpn_label = torch.index_select(rpn_label.view(-1), 0, rpn_keep.data)#rpn_data上文就是tensor,不是Variable
rpn_label = Variable(rpn_label.long())#运算完后的输出再用Variable( Tensor.long())转换回来

注意:

rpn_score是 Variable 而rpn_label刚开始是tensor;

因为anchor_target_layer和prosal_layer.py不需要反向传播,了解他们的输入输出这一点很简单,他们本身就是生成rpn_label 等,做的事情是制定选出的规则,并没有对选出的东西进行计算,所以无需反向传播,所以里面的forwardinput都是Tensor输入的时候都需要 Variable.data, 运算完后的输出再用Variable( Tensor.long())转换回来

================================================

分析:

1. rpn_label.view(-1).ne(-1).nonzero().view(-1)

ne(-1)返回 是-1就返回0,不是-1,返回1

nonzero返回不是0的索引,n行1列的(n,1)

综上,返回不是-1的所有索引,列成1维数组(n,)

2. torch.index_select(rpn_cls_score.view(-1,2), 0, rpn_keep)#从 0维,按照rpn_keep索引,找rpn_cls_score.view(-1,2)

==================================================

得到rpn_loss_cls

self.rpn_loss_cls = F.cross_entropy(rpn_cls_score, rpn_label)#  (b*9*h*w,2)   (b*9*h*w,) 

至于rpn_loss_box的内容,可以具体差不多,思想理解就可以了,了解了rpn_loss_box的输入和输出就好了。

Faster R-CNN 的 smooth L1 loss 修改为 IoU loss 可以尝试如下代码实现: ```python import torch def iou_loss(pred_bbox, gt_bbox, eps=1e-6): """ Compute IoU loss between predicted bboxes and ground truth bboxes. Args: pred_bbox: predicted bboxes, shape [N, 4] gt_bbox: ground truth bboxes, shape [N, 4] eps: epsilon to avoid divide by zero Returns: iou_loss: IoU loss between predicted bboxes and ground truth bboxes, shape [N] """ # compute IoU x1 = torch.max(pred_bbox[:, 0], gt_bbox[:, 0]) y1 = torch.max(pred_bbox[:, 1], gt_bbox[:, 1]) x2 = torch.min(pred_bbox[:, 2], gt_bbox[:, 2]) y2 = torch.min(pred_bbox[:, 3], gt_bbox[:, 3]) w = torch.clamp(x2 - x1, min=0) h = torch.clamp(y2 - y1, min=0) inter = w * h a1 = (pred_bbox[:, 2] - pred_bbox[:, 0]) * (pred_bbox[:, 3] - pred_bbox[:, 1]) a2 = (gt_bbox[:, 2] - gt_bbox[:, 0]) * (gt_bbox[:, 3] - gt_bbox[:, 1]) union = a1 + a2 - inter iou = inter / (union + eps) # compute IoU loss threshold = 0.5 iou_loss = torch.pow(iou - threshold, 2) return iou_loss # example usage pred_bbox = torch.tensor([[2.0, 3.0, 5.0, 6.0], [1.0, 2.0, 4.0, 5.0]]) gt_bbox = torch.tensor([[1.0, 2.0, 4.0, 5.0], [2.0, 3.0, 5.0, 6.0]]) loss = iou_loss(pred_bbox, gt_bbox) print(loss) ``` 然后将 Faster R-CNN 的 smooth L1 loss 替换为 iou loss,如下所示: ```python import torch import torch.nn as nn def iou_loss(pred_bbox, gt_bbox, eps=1e-6): """ Compute IoU loss between predicted bboxes and ground truth bboxes. Args: pred_bbox: predicted bboxes, shape [N, 4] gt_bbox: ground truth bboxes, shape [N, 4] eps: epsilon to avoid divide by zero Returns: iou_loss: IoU loss between predicted bboxes and ground truth bboxes, shape [N] """ # compute IoU x1 = torch.max(pred_bbox[:, 0], gt_bbox[:, 0]) y1 = torch.max(pred_bbox[:, 1], gt_bbox[:, 1]) x2 = torch.min(pred_bbox[:, 2], gt_bbox[:, 2]) y2 = torch.min(pred_bbox[:, 3], gt_bbox[:, 3]) w = torch.clamp(x2 - x1, min=0) h = torch.clamp(y2 - y1, min=0) inter = w * h a1 = (pred_bbox[:, 2] - pred_bbox[:, 0]) * (pred_bbox[:, 3] - pred_bbox[:, 1]) a2 = (gt_bbox[:, 2] - gt_bbox[:, 0]) * (gt_bbox[:, 3] - gt_bbox[:, 1]) union = a1 + a2 - inter iou = inter / (union + eps) # compute IoU loss threshold = 0.5 iou_loss = torch.pow(iou - threshold, 2) return iou_loss.mean() class FasterRCNN(nn.Module): def __init__(self, num_classes): super().__init__() self.num_classes = num_classes self.backbone = ... self.rpn = ... self.roi_head = ... self.bbox_head = nn.Linear(4096, 4 * self.num_classes) self.cls_head = nn.Linear(4096, self.num_classes) def forward(self, x, gt_bbox=None): # backbone x = self.backbone(x) # RPN rpn_cls, rpn_bbox = self.rpn(x) # RoI pooling rois = self.roi_head(x, rpn_bbox) # bbox regression bbox_pred = self.bbox_head(rois) bbox_pred = bbox_pred.reshape(-1, 4) # classification cls_score = self.cls_head(rois) cls_score = cls_score.reshape(-1, self.num_classes) cls_prob = nn.functional.softmax(cls_score, dim=1) # test or train if self.training: # compute loss rpn_loss, roi_loss = ... bbox_loss = iou_loss(bbox_pred, gt_bbox) cls_loss = ... total_loss = rpn_loss + roi_loss + bbox_loss + cls_loss return total_loss else: # inference result = ... return result ``` 需要注意的是,IoU loss 可能会导致梯度爆炸或梯度消失的问题,因此需要进行一些处理,例如使用渐进式策略或者加入正则化项等。
评论 10
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值