(五)PointPillars论文的MMDetection3D代码解读——损失计算篇
PointPillars 是一个来自工业界的模型,整体的思想是基于图片的处理框架,直接将点云从俯视图的视角划分为一个个的立方柱体(Pillars),从而构成了伪图片数据,然后再使用2D检测框架进行特征提取和预测得到检测框,从而使得该模型在速度和精度都达到了一个很好的平衡。 PointPillars 的网络结构如图所示:
本文将会以 MMDetection3D 的代码为基础,详细解读 PointPillars 的每一行代码实现以及原因。这是本人的第一篇代码讲解,解读中难免会出现不足之处,欢迎各位的批评指正,如果有好的意见大家都可以在评论区留言。感谢大家!
承接上文 PointPillars论文的MMDetection3D代码解读——网络结构篇
,接下来将进入到我们 PointPillars
的损失计算部分。
第五章 PointPillars的loss计算
5.1 从 Anchor3DHead 的 loss_by_feat()
函数开始解读:
实现代码在 mmdetection/mmdet3d/models/dense_heads/anchor3d_head.py
# Copyright (c) OpenMMLab. All rights reserved.
import warnings
from typing import List, Tuple
import numpy as np
import torch
from mmdet.models.utils import multi_apply
from torch import Tensor
from torch import nn as nn
from mmdet3d.models.task_modules import PseudoSampler
from mmdet3d.models.test_time_augs import merge_aug_bboxes_3d
from mmdet3d.registry import MODELS, TASK_UTILS
from mmdet3d.utils.typing_utils import (ConfigType, InstanceList,
OptConfigType, OptInstanceList)
from .base_3d_dense_head import Base3DDenseHead
from .train_mixins import AnchorTrainMixin
@MODELS.register_module()
class Anchor3DHead(Base3DDenseHead, AnchorTrainMixin):
"""Anchor-based head for SECOND/PointPillars/MVXNet/PartA2.
Args:
num_classes (int): Number of classes.
in_channels (int): Number of channels in the input feature map.
feat_channels (int): Number of channels of the feature map.
use_direction_classifier (bool): Whether to add a direction classifier.
anchor_generator(dict): Config dict of anchor generator.
assigner_per_size (bool): Whether to do assignment for each separate
anchor size.
assign_per_class (bool): Whether to do assignment for each class.
diff_rad_by_sin (bool): Whether to change the difference into sin
difference for box regression loss.
dir_offset (float | int): The offset of BEV rotation angles.
(TODO: may be moved into box coder)
dir_limit_offset (float | int): The limited range of BEV
rotation angles. (TODO: may be moved into box coder)
bbox_coder (dict): Config dict of box coders.
loss_cls (dict): Config of classification loss.
loss_bbox (dict): Config of localization loss.
loss_dir (dict): Config of direction classifier loss.
train_cfg (dict): Train configs.
test_cfg (dict): Test configs.
init_cfg (dict or list[dict], optional): Initialization config dict.
"""
def __init__(self,
num_classes: int,
in_channels: int,
feat_channels: int = 256,
use_direction_classifier: bool = True,
anchor_generator: ConfigType = dict(
type='Anchor3DRangeGenerator',
range=[0, -39.68, -1.78, 69.12, 39.68, -1.78],
strides=[2],
sizes=[[3.9, 1.6, 1.56]],
rotations=[0, 1.57],
custom_values=[],
reshape_out=False),
assigner_per_size: bool = False,
assign_per_class: bool = False,
diff_rad_by_sin: bool = True,
dir_offset: float = -np.pi / 2,
dir_limit_offset: int = 0,
bbox_coder: ConfigType = dict(type='DeltaXYZWLHRBBoxCoder'),
loss_cls: ConfigType = dict(
type='mmdet.CrossEntropyLoss',
use_sigmoid=True,
loss_weight=1.0),
loss_bbox: ConfigType = dict(
type='mmdet.SmoothL1Loss',
beta=1.0 / 9.0,
loss_weight=2.0),
loss_dir: ConfigType = dict(
type='mmdet.CrossEntropyLoss', loss_weight=0.2),
train_cfg: OptConfigType = None,
test_cfg: OptConfigType = None,
init_cfg: OptConfigType = None) -> None:
super().__init__(init_cfg=init_cfg)
self.in_channels = in_channels
self.num_classes = num_classes
self.feat_channels = feat_channels
self.diff_rad_by_sin = diff_rad_by_sin
self.use_direction_classifier = use_direction_classifier
self.train_cfg = train_cfg
self.test_cfg = test_cfg
self.assigner_per_size = assigner_per_size
self.assign_per_class = assign_per_class
self.dir_offset = dir_offset
self.dir_limit_offset = dir_limit_offset
warnings.warn(
'dir_offset and dir_limit_offset will be depressed and be '
'incorporated into box coder in the future')
self.fp16_enabled = False
# build anchor generator
self.prior_generator = TASK_UTILS.build(anchor_generator)
# In 3D detection, the anchor stride is connected with anchor size
self.num_anchors = self.prior_generator.num_base_anchors
# build box coder
self.bbox_coder = TASK_UTILS.build(bbox_coder)
self.box_code_size = self.bbox_coder.code_size
# build loss function
self.use_sigmoid_cls = loss_cls.get('use_sigmoid', False)
self.sampling = loss_cls['type'] not in [
'mmdet.FocalLoss', 'mmdet.GHMC'
]
if not self.use_sigmoid_cls:
self.num_classes += 1
self.loss_cls = MODELS.build(loss_cls)
self.loss_bbox = MODELS.build(loss_bbox)
self.loss_dir = MODELS.build(loss_dir)
self.fp16_enabled = False
self._init_layers()
self._init_assigner_sampler()
if init_cfg is None:
self.init_cfg = dict(
type='Normal',
layer='Conv2d',
std=0.01,
override=dict(
type='Normal', name='conv_cls', std=0.01, bias_prob=0.01))
def _init_assigner_sampler(self):
"""Initialize the target assigner and sampler of the head."""
if self.train_cfg is None:
return
if self.sampling:
self.bbox_sampler = TASK_UTILS.build(self.train_cfg.sampler)
else:
self.bbox_sampler = PseudoSampler()
if isinstance(self.train_cfg.assigner, dict):
self.bbox_assigner = TASK_UTILS.build(self.train_cfg.assigner)
elif isinstance(self.train_cfg.assigner, list):
self.bbox_assigner = [
TASK_UTILS.build(res) for res in self.train_cfg.assigner
]
def _init_layers(self):
"""Initialize neural network layers of the head."""
self.cls_out_channels = self.num_anchors * self.num_classes
self.conv_cls = nn.Conv2d(self.feat_channels, self.cls_out_channels, 1)
self.conv_reg = nn.Conv2d(self.feat_channels,
self.num_anchors * self.box_code_size, 1)
if self.use_direction_classifier:
self.conv_dir_cls = nn.Conv2d(self.feat_channels,
self.num_anchors * 2, 1)
def forward_single(self, x: Tensor) -> Tuple[Tensor, Tensor, Tensor]:
"""Forward function on a single-scale feature map.
Args:
x (Tensor): Features of a single scale level.
Returns:
tuple:
cls_score (Tensor): Cls scores for a single scale level
the channels number is num_base_priors * num_classes.
bbox_pred (Tensor): Box energies / deltas for a single scale
level, the channels number is num_base_priors * C.
dir_cls_pred (Tensor | None): Direction classification
prediction for a single scale level, the channels
number is num_base_priors * 2.
"""
cls_score = self.conv_cls(x)
bbox_pred = self.conv_reg(x)
dir_cls_pred = None
if self.use_direction_classifier:
dir_cls_pred = self.conv_dir_cls(x)
return cls_score, bbox_pred, dir_cls_pred
def forward(self, x: Tuple[Tensor]) -> Tuple[List[Tensor]]:
"""Forward pass.
Args:
x (tuple[Tensor]): Features from the upstream network,
each is a 4D-tensor.
Returns:
tuple: A tuple of classification scores, bbox and direction
classification prediction.
- cls_scores (list[Tensor]): Classification scores for all
scale levels, each is a 4D-tensor, the channels number
is num_base_priors * num_classes.
- bbox_preds (list[Tensor]): Box energies / deltas for all
scale levels, each is a 4D-tensor, the channels number
is num_base_priors * C.
- dir_cls_preds (list[Tensor|None]): Direction classification
predictions for all scale levels, each is a 4D-tensor,
the channels number is num_base_priors * 2.
"""
return multi_apply(self.forward_single, x)
# TODO: Support augmentation test
def aug_test(self,
aug_batch_feats,
aug_batch_input_metas,
rescale=False,
**kwargs):
aug_bboxes = []
# only support aug_test for one sample
for x, input_meta in zip(aug_batch_feats, aug_batch_input_metas):
outs = self.forward(x)
bbox_list = self.get_results(*outs, [input_meta], rescale=rescale)
bbox_dict = dict(
bboxes_3d=bbox_list[0].bboxes_3d,
scores_3d=bbox_list[0].scores_3d,
labels_3d=bbox_list[0].labels_3d)
aug_bboxes.append(bbox_dict)
# after merging, bboxes will be rescaled to the original image size
merged_bboxes = merge_aug_bboxes_3d(aug_bboxes, aug_batch_input_metas,
self.test_cfg)
return [merged_bboxes]
def get_anchors(self,
featmap_sizes: List[tuple],
input_metas: List[dict],
device: str = 'cuda') -> list:
"""Get anchors according to feature map sizes.
Args:
featmap_sizes (list[tuple]): Multi-level feature map sizes.
input_metas (list[dict]): contain pcd and img's meta info.
device (str): device of current module.
Returns:
list[list[torch.Tensor]]: Anchors of each image, valid flags
of each image.
"""
num_imgs = len(input_metas)
# since feature map sizes of all images are the same, we only compute
# anchors for one time
multi_level_anchors = self.prior_generator.grid_anchors(
featmap_sizes, device=device)
anchor_list = [multi_level_anchors for _ in range(num_imgs)]
return anchor_list
def _loss_by_feat_single(self, cls_score: Tensor, bbox_pred: Tensor,
dir_cls_pred: Tensor, labels: Tensor,
label_weights: Tensor, bbox_targets: Tensor,
bbox_weights: Tensor, dir_targets: Tensor,
dir_weights: Tensor, num_total_samples: int):
"""Calculate loss of Single-level results.
Args:
cls_score (Tensor): Class score in single-level.
bbox_pred (Tensor): Bbox prediction in single-level.
dir_cls_pred (Tensor): Predictions of direction class
in single-level.
labels (Tensor): Labels of class.
label_weights (Tensor): Weights of class loss.
bbox_targets (Tensor): Targets of bbox predictions.
bbox_weights (Tensor): Weights of bbox loss.
dir_targets (Tensor): Targets of direction predictions.
dir_weights (Tensor): Weights of direction loss.
num_total_samples (int): The number of valid samples.
Returns:
tuple[torch.Tensor]: Losses of class, bbox
and direction, respectively.
"""
# classification loss
if num_total_samples is None:
num_total_samples = int(cls_score.shape[0])
labels = labels.reshape(-1)
label_weights = label_weights.reshape(-1)
cls_score = cls_score.permute(0, 2, 3, 1).reshape(-1, self.num_classes)
assert labels.max().item() <= self.num_classes
loss_cls = self.loss_cls(
cls_score, labels, label_weights, avg_factor=num_total_samples)
# regression loss
bbox_pred = bbox_pred.permute(0, 2, 3,
1).reshape(-1, self.box_code_size)
bbox_targets = bbox_targets.reshape(-1, self.box_code_size)
bbox_weights = bbox_weights.reshape(-1, self.box_code_size)
bg_class_ind = self.num_classes
pos_inds = ((labels >= 0)
& (labels < bg_class_ind)).nonzero(
as_tuple=False).reshape(-1)
num_pos = len(pos_inds)
pos_bbox_pred = bbox_pred[pos_inds]
pos_bbox_targets = bbox_targets[pos_inds]
pos_bbox_weights = bbox_weights[pos_inds]
# dir loss
if self.use_direction_classifier:
dir_cls_pred = dir_cls_pred.permute(0, 2, 3, 1).reshape(-1, 2)
dir_targets = dir_targets.reshape(-1)
dir_weights = dir_weights.reshape(-1)
pos_dir_cls_pred = dir_cls_pred[pos_inds]
pos_dir_targets = dir_targets[pos_inds]
pos_dir_weights = dir_weights[pos_inds]
if num_pos > 0:
code_weight = self.train_cfg.get('code_weight', None)
if code_weight:
pos_bbox_weights = pos_bbox_weights * bbox_weights.new_tensor(
code_weight)
if self.diff_rad_by_sin:
pos_bbox_pred, pos_bbox_targets = self.add_sin_difference(
pos_bbox_pred, pos_bbox_targets)
loss_bbox = self.loss_bbox(
pos_bbox_pred,
pos_bbox_targets,
pos_bbox_weights,
avg_factor=num_total_samples)
# direction classification loss
loss_dir = None
if self.use_direction_classifier:
loss_dir = self.loss_dir(
pos_dir_cls_pred,
pos_dir_targets,
pos_dir_weights,
avg_factor=num_total_samples)
else:
loss_bbox = pos_bbox_pred.sum()
if self.use_direction_classifier:
loss_dir = pos_dir_cls_pred.sum()
return loss_cls, loss_bbox, loss_dir
@staticmethod
def add_sin_difference(boxes1: Tensor, boxes2: Tensor) -> tuple:
"""Convert the rotation difference to difference in sine function.
Args:
boxes1 (torch.Tensor): Original Boxes in shape (NxC), where C>=7
and the 7th dimension is rotation dimension.
boxes2 (torch.Tensor): Target boxes in shape (NxC), where C>=7 and
the 7th dimension is rotation dimension.
Returns:
tuple[torch.Tensor]: ``boxes1`` and ``boxes2`` whose 7th
dimensions are changed.
"""
rad_pred_encoding = torch.sin(boxes1[..., 6:7]) * torch.cos(
boxes2[..., 6:7])
rad_tg_encoding = torch.cos(boxes1[..., 6:7]) * torch.sin(boxes2[...,
6:7])
boxes1 = torch.cat(
[boxes1[..., :6], rad_pred_encoding, boxes1[..., 7:]], dim=-1)
boxes2 = torch.cat([boxes2[..., :6], rad_tg_encoding, boxes2[..., 7:]],
dim=-1)
return boxes1, boxes2
def loss_by_feat(
self,
cls_scores: List[Tensor],
bbox_preds: List[Tensor],
dir_cls_preds: List[Tensor],
batch_gt_instances_3d: InstanceList,
batch_input_metas: List[dict],
batch_gt_instances_ignore: OptInstanceList = None) -> dict:
"""Calculate the loss based on the features extracted by the detection
head.
Args:
cls_scores (list[torch.Tensor]): Multi-level class scores.
bbox_preds (list[torch.Tensor]): Multi-level bbox predictions.
dir_cls_preds (list[torch.Tensor]): Multi-level direction
class predictions.
batch_gt_instances_3d (list[:obj:`InstanceData`]): Batch of
gt_instances. It usually includes ``bboxes_3d``
and ``labels_3d`` attributes.
batch_input_metas (list[dict]): Contain pcd and img's meta info.
batch_gt_instances_ignore (list[:obj:`InstanceData`], optional):
Batch of gt_instances_ignore. It includes ``bboxes`` attribute
data that is ignored during training and testing.
Defaults to None.
Returns:
dict[str, list[torch.Tensor]]: Classification, bbox, and
direction losses of each level.
- loss_cls (list[torch.Tensor]): Classification losses.
- loss_bbox (list[torch.Tensor]): Box regression losses.
- loss_dir (list[torch.Tensor]): Direction classification
losses.
"""
featmap_sizes = [featmap.size()[-2:] for featmap in cls_scores]
assert len(featmap_sizes) == self.prior_generator.num_levels
device = cls_scores[0].device
anchor_list = self.get_anchors(
featmap_sizes, batch_input_metas, device=device)
label_channels = self.cls_out_channels if self.use_sigmoid_cls else 1
cls_reg_targets = self.anchor_target_3d(
anchor_list,
batch_gt_instances_3d,
batch_input_metas,
batch_gt_instances_ignore=batch_gt_instances_ignore,
num_classes=self.num_classes,
label_channels=label_channels,
sampling=self.sampling)
if cls_reg_targets is None:
return None
(labels_list, label_weights_list, bbox_targets_list, bbox_weights_list,
dir_targets_list, dir_weights_list, num_total_pos,
num_total_neg) = cls_reg_targets
num_total_samples = (
num_total_pos + num_total_neg if self.sampling else num_total_pos)
# num_total_samples = None
losses_cls, losses_bbox, losses_dir = multi_apply(
self._loss_by_feat_single,
cls_scores,
bbox_preds,
dir_cls_preds,
labels_list,
label_weights_list,
bbox_targets_list,
bbox_weights_list,
dir_targets_list,
dir_weights_list,
num_total_samples=num_total_samples)
return dict(
loss_cls=losses_cls, loss_bbox=losses_bbox, loss_dir=losses_dir)
- 第390行代码,
取出特征图的大小赋值给 featmap_sizes
;
- 第391行代码,
判断 len(featmap_sizes)=1 和 self.prior_generator.num_levels=1 是否相等
,相等的话程序继续运行;
- 第392行代码,
通过 cls_scores[0].device 来设置 device
;
- 第393行代码,
调用 self.get_anchors() 函数生成锚框
; - 第236行代码,
通过 len(input_metas)=8 来设置 num_imgs
;
- 第239行代码,
通过 self.prior_generator.grid_anchors() 函数生成多尺度的锚框
;
- 第241行代码,
为一个 batch 中的每个点云都生成相同的多尺度锚框
;
- 第242行代码,
返回最终生成多尺度锚框 anchor_list 给前面调用的函数
; - 第395行代码,
设置 label_channels 的值
;
- 第396行代码,
调用 self.anchor_target_3d() 函数获取标签的一系列数据 cls_reg_targets
;
- 第405行代码,
cls_reg_targets is not None
,if 条件不成立,所以跳过后面的语句; - 第407行代码,
将 cls_reg_targets 分解成以下标签数据
;
- 第410行代码,
设置 num_total_samples 的值
;
- 第414行代码,
调用 self._loss_by_feat_single() 进行损失计算
;
5.2 然后从 Anchor3DHead 的 _loss_by_feat_single()
函数开始解读:
- 第269行代码,
num_total_samples is not None
,if 条件不成立,所以跳过后面的语句;
- 第271行代码,
将 label 进行 reshape 展平
;
- 第272行代码,
将 label_weights 进行 reshape 展平
;
- 第273行代码,
将 cls_score 先交换通道顺序再进行 reshape 展平
;
- 第274行代码,
判断 labels.max().item() <= self.num_classes是否成立,成立的话程序继续运行
;
- 第275行代码,
通过 FocalLoss 来计算 classification loss
,详细过程见5.3节内容;
- 第279行代码,
将 bbox_pred 先交换通道顺序再进行 reshape 展平
;
- 第281行代码,
将 bbox_targets 进行 reshape 展平
;
- 第282行代码,
将 bbox_weights 进行 reshape 展平
;
- 第284行代码,
通过 self.num_classes=3 来设置 bg_class_ind
;
- 第285行代码,
通过 label 来设置 pos_inds
;
- 第288行代码,
通过 len(pos_inds) 来设置 num_pos
;
- 第290行代码,
通过 bbox_pred[pos_inds] 来设置 pos_bbox_pred
;
- 第291行代码,
通过 bbox_targets[pos_inds] 来设置 pos_bbox_targets
;
- 第292行代码,
通过 bbox_weights[pos_inds] 来设置 pos_bbox_weights
;
- 第295行代码,
self.use_direction_classifier is True
,if 条件成立,所以进入后面的语句;
- 第296行代码,
将 dir_cls_pred 先交换通道顺序再进行 reshape 展平
;
- 第297行代码,
将 dir_targets 进行 reshape 展平
;
- 第298行代码,
将 dir_weights 进行 reshape 展平
;
- 第299行代码,
通过 dir_cls_pred[pos_inds] 来设置 pos_dir_cls_pred
;
- 第300行代码,
通过 dir_targets[pos_inds] 来设置 pos_dir_targets
;
- 第301行代码,
通过 dir_weights[pos_inds] 来设置 pos_dir_weights
;
- 第303行代码,
num_pos>0
,if 条件成立,所以进入后面的语句;
- 第304行代码,
通过 self.train_cfg.get('code_weight', None) 来设置 code_weight
;
- 第305行代码,
code_weight is None
,if 条件不成立,所以跳过后面的语句; - 第308行代码,
self.diff_rad_by_sin is True
,if 条件成立,所以进入后面的语句;
- 第309行代码,
通过 self.add_sin_difference() 函数处理一下 pos_bbox_pred, pos_bbox_targets
;
- 第311行代码,
通过 SmoothL1Loss 来计算 regression loss
,详细过程见5.4节内容;
- 第318行代码,
设置 loss_dir = None
;
- 第319行代码,
self.use_direction_classifier is True
,if 条件成立,所以进入后面的语句;;
- 第320行代码,
通过 CrossEntropyLoss 来计算 dir loss
,详细过程见5.5节内容;
- 第330行代码,
返回计算好的损失 loss_cls, loss_bbox, loss_dir 给前面调用的函数
;
- 第426行代码,
返回三个损失组成的字典给 runner
;
5.3 从 FocalLoss 的 forward()
函数开始解读:
# Copyright (c) OpenMMLab. All rights reserved.
import torch
import torch.nn as nn
import torch.nn.functional as F
from mmcv.ops import sigmoid_focal_loss as _sigmoid_focal_loss
from mmdet.registry import MODELS
from .utils import weight_reduce_loss
# This method is only for debugging
def py_sigmoid_focal_loss(pred,
target,
weight=None,
gamma=2.0,
alpha=0.25,
reduction='mean',
avg_factor=None):
"""PyTorch version of `Focal Loss <https://arxiv.org/abs/1708.02002>`_.
Args:
pred (torch.Tensor): The prediction with shape (N, C), C is the
number of classes
target (torch.Tensor): The learning label of the prediction.
weight (torch.Tensor, optional): Sample-wise loss weight.
gamma (float, optional): The gamma for calculating the modulating
factor. Defaults to 2.0.
alpha (float, optional): A balanced form for Focal Loss.
Defaults to 0.25.
reduction (str, optional): The method used to reduce the loss into
a scalar. Defaults to 'mean'.
avg_factor (int, optional): Average factor that is used to average
the loss. Defaults to None.
"""
pred_sigmoid = pred.sigmoid()
target = target.type_as(pred)
pt = (1 - pred_sigmoid) * target + pred_sigmoid * (1 - target)
focal_weight = (alpha * target + (1 - alpha) *
(1 - target)) * pt.pow(gamma)
loss = F.binary_cross_entropy_with_logits(
pred, target, reduction='none') * focal_weight
if weight is not None:
if weight.shape != loss.shape:
if weight.size(0) == loss.size(0):
# For most cases, weight is of shape (num_priors, ),
# which means it does not have the second axis num_class
weight = weight.view(-1, 1)
else:
# Sometimes, weight per anchor per class is also needed. e.g.
# in FSAF. But it may be flattened of shape
# (num_priors x num_class, ), while loss is still of shape
# (num_priors, num_class).
assert weight.numel() == loss.numel()
weight = weight.view(loss.size(0), -1)
assert weight.ndim == loss.ndim
loss = weight_reduce_loss(loss, weight, reduction, avg_factor)
return loss
def py_focal_loss_with_prob(pred,
target,
weight=None,
gamma=2.0,
alpha=0.25,
reduction='mean',
avg_factor=None):
"""PyTorch version of `Focal Loss <https://arxiv.org/abs/1708.02002>`_.
Different from `py_sigmoid_focal_loss`, this function accepts probability
as input.
Args:
pred (torch.Tensor): The prediction probability with shape (N, C),
C is the number of classes.
target (torch.Tensor): The learning label of the prediction.
The target shape support (N,C) or (N,), (N,C) means one-hot form.
weight (torch.Tensor, optional): Sample-wise loss weight.
gamma (float, optional): The gamma for calculating the modulating
factor. Defaults to 2.0.
alpha (float, optional): A balanced form for Focal Loss.
Defaults to 0.25.
reduction (str, optional): The method used to reduce the loss into
a scalar. Defaults to 'mean'.
avg_factor (int, optional): Average factor that is used to average
the loss. Defaults to None.
"""
if pred.dim() != target.dim():
num_classes = pred.size(1)
target = F.one_hot(target, num_classes=num_classes + 1)
target = target[:, :num_classes]
target = target.type_as(pred)
pt = (1 - pred) * target + pred * (1 - target)
focal_weight = (alpha * target + (1 - alpha) *
(1 - target)) * pt.pow(gamma)
loss = F.binary_cross_entropy(
pred, target, reduction='none') * focal_weight
if weight is not None:
if weight.shape != loss.shape:
if weight.size(0) == loss.size(0):
# For most cases, weight is of shape (num_priors, ),
# which means it does not have the second axis num_class
weight = weight.view(-1, 1)
else:
# Sometimes, weight per anchor per class is also needed. e.g.
# in FSAF. But it may be flattened of shape
# (num_priors x num_class, ), while loss is still of shape
# (num_priors, num_class).
assert weight.numel() == loss.numel()
weight = weight.view(loss.size(0), -1)
assert weight.ndim == loss.ndim
loss = weight_reduce_loss(loss, weight, reduction, avg_factor)
return loss
def sigmoid_focal_loss(pred,
target,
weight=None,
gamma=2.0,
alpha=0.25,
reduction='mean',
avg_factor=None):
r"""A wrapper of cuda version `Focal Loss
<https://arxiv.org/abs/1708.02002>`_.
Args:
pred (torch.Tensor): The prediction with shape (N, C), C is the number
of classes.
target (torch.Tensor): The learning label of the prediction.
weight (torch.Tensor, optional): Sample-wise loss weight.
gamma (float, optional): The gamma for calculating the modulating
factor. Defaults to 2.0.
alpha (float, optional): A balanced form for Focal Loss.
Defaults to 0.25.
reduction (str, optional): The method used to reduce the loss into
a scalar. Defaults to 'mean'. Options are "none", "mean" and "sum".
avg_factor (int, optional): Average factor that is used to average
the loss. Defaults to None.
"""
# Function.apply does not accept keyword arguments, so the decorator
# "weighted_loss" is not applicable
loss = _sigmoid_focal_loss(pred.contiguous(), target.contiguous(), gamma,
alpha, None, 'none')
if weight is not None:
if weight.shape != loss.shape:
if weight.size(0) == loss.size(0):
# For most cases, weight is of shape (num_priors, ),
# which means it does not have the second axis num_class
weight = weight.view(-1, 1)
else:
# Sometimes, weight per anchor per class is also needed. e.g.
# in FSAF. But it may be flattened of shape
# (num_priors x num_class, ), while loss is still of shape
# (num_priors, num_class).
assert weight.numel() == loss.numel()
weight = weight.view(loss.size(0), -1)
assert weight.ndim == loss.ndim
loss = weight_reduce_loss(loss, weight, reduction, avg_factor)
return loss
@MODELS.register_module()
class FocalLoss(nn.Module):
def __init__(self,
use_sigmoid=True,
gamma=2.0,
alpha=0.25,
reduction='mean',
loss_weight=1.0,
activated=False):
"""`Focal Loss <https://arxiv.org/abs/1708.02002>`_
Args:
use_sigmoid (bool, optional): Whether to the prediction is
used for sigmoid or softmax. Defaults to True.
gamma (float, optional): The gamma for calculating the modulating
factor. Defaults to 2.0.
alpha (float, optional): A balanced form for Focal Loss.
Defaults to 0.25.
reduction (str, optional): The method used to reduce the loss into
a scalar. Defaults to 'mean'. Options are "none", "mean" and
"sum".
loss_weight (float, optional): Weight of loss. Defaults to 1.0.
activated (bool, optional): Whether the input is activated.
If True, it means the input has been activated and can be
treated as probabilities. Else, it should be treated as logits.
Defaults to False.
"""
super(FocalLoss, self).__init__()
assert use_sigmoid is True, 'Only sigmoid focal loss supported now.'
self.use_sigmoid = use_sigmoid
self.gamma = gamma
self.alpha = alpha
self.reduction = reduction
self.loss_weight = loss_weight
self.activated = activated
def forward(self,
pred,
target,
weight=None,
avg_factor=None,
reduction_override=None):
"""Forward function.
Args:
pred (torch.Tensor): The prediction.
target (torch.Tensor): The learning label of the prediction.
The target shape support (N,C) or (N,), (N,C) means
one-hot form.
weight (torch.Tensor, optional): The weight of loss for each
prediction. Defaults to None.
avg_factor (int, optional): Average factor that is used to average
the loss. Defaults to None.
reduction_override (str, optional): The reduction method used to
override the original reduction method of the loss.
Options are "none", "mean" and "sum".
Returns:
torch.Tensor: The calculated loss
"""
assert reduction_override in (None, 'none', 'mean', 'sum')
reduction = (
reduction_override if reduction_override else self.reduction)
if self.use_sigmoid:
if self.activated:
calculate_loss_func = py_focal_loss_with_prob
else:
if pred.dim() == target.dim():
# this means that target is already in One-Hot form.
calculate_loss_func = py_sigmoid_focal_loss
elif torch.cuda.is_available() and pred.is_cuda:
calculate_loss_func = sigmoid_focal_loss
else:
num_classes = pred.size(1)
target = F.one_hot(target, num_classes=num_classes + 1)
target = target[:, :num_classes]
calculate_loss_func = py_sigmoid_focal_loss
loss_cls = self.loss_weight * calculate_loss_func(
pred,
target,
weight,
gamma=self.gamma,
alpha=self.alpha,
reduction=reduction,
avg_factor=avg_factor)
else:
raise NotImplementedError
return loss_cls
- 第222行代码,
reduction_override is None
,程序继续运行;
- 第223行代码,
设置 reduction 的值
;
- 第225行代码,
self.use_sigmoid is True
,if 条件成立,所以进入后面的语句;
- 第226行代码,
self.activated is False
,if 条件不成立,所以跳过 if 后面的语句,进入 else 后面的语句; - 第229行代码,
pred.dim()=2 与 target.dim()=1 不相等
,所以跳过 if 后面的语句;
- 第232行代码,
torch.cuda.is_available() and pred.is_cuda are True
,所以进入后面的语句;
- 第233行代码,
将 calculate_loss_func 赋值为 sigmoid_focal_loss
; - 第240行代码,
调用 calculate_loss_func=sigmoid_focal_loss() 函数计算损失
; - 第141行代码,
调用 _sigmoid_focal_loss() 函数计算损失
,计算得到的损失如下所示;
- 第143行代码,
weight is not None
,if 条件成立,所以进入 if 后面的语句;
- 第144行代码,
weight.shape 和 loss.shape 不相等
,if 条件成立,所以进入 if 后面的语句; - 第145行代码,
weight.size(0) 和 loss.size(0) 相等
,if 条件成立,所以进入 if 后面的语句; - 第148行代码,
通过 view() 函数给 weight 增加一个维度
;
- 第156行代码,
判断 weight.ndim 和 loss.ndim 维度是否相等
,是的话程序继续运行;
- 第157行代码,
调用 weight_reduce_loss() 函数
,这个函数最关键的代码是对 loss 进行总的求和并除以 (avg_factor + 极小值),最终得到我们的 loss 如下所示;
- 第158行代码,
返回 loss 给前面调用的函数
; - 第251行代码,
返回计算好的 loss_cls 给前面调用的函数
。
5.4 从 SmoothL1Loss 的 forward()
函数开始解读:
# Copyright (c) OpenMMLab. All rights reserved.
from typing import Optional
import torch
import torch.nn as nn
from torch import Tensor
from mmdet.registry import MODELS
from .utils import weighted_loss
@weighted_loss
def smooth_l1_loss(pred: Tensor, target: Tensor, beta: float = 1.0) -> Tensor:
"""Smooth L1 loss.
Args:
pred (Tensor): The prediction.
target (Tensor): The learning target of the prediction.
beta (float, optional): The threshold in the piecewise function.
Defaults to 1.0.
Returns:
Tensor: Calculated loss
"""
assert beta > 0
if target.numel() == 0:
return pred.sum() * 0
assert pred.size() == target.size()
diff = torch.abs(pred - target)
loss = torch.where(diff < beta, 0.5 * diff * diff / beta,
diff - 0.5 * beta)
return loss
@weighted_loss
def l1_loss(pred: Tensor, target: Tensor) -> Tensor:
"""L1 loss.
Args:
pred (Tensor): The prediction.
target (Tensor): The learning target of the prediction.
Returns:
Tensor: Calculated loss
"""
if target.numel() == 0:
return pred.sum() * 0
assert pred.size() == target.size()
loss = torch.abs(pred - target)
return loss
@MODELS.register_module()
class SmoothL1Loss(nn.Module):
"""Smooth L1 loss.
Args:
beta (float, optional): The threshold in the piecewise function.
Defaults to 1.0.
reduction (str, optional): The method to reduce the loss.
Options are "none", "mean" and "sum". Defaults to "mean".
loss_weight (float, optional): The weight of loss.
"""
def __init__(self,
beta: float = 1.0,
reduction: str = 'mean',
loss_weight: float = 1.0) -> None:
super().__init__()
self.beta = beta
self.reduction = reduction
self.loss_weight = loss_weight
def forward(self,
pred: Tensor,
target: Tensor,
weight: Optional[Tensor] = None,
avg_factor: Optional[int] = None,
reduction_override: Optional[str] = None,
**kwargs) -> Tensor:
"""Forward function.
Args:
pred (Tensor): The prediction.
target (Tensor): The learning target of the prediction.
weight (Tensor, optional): The weight of loss for each
prediction. Defaults to None.
avg_factor (int, optional): Average factor that is used to average
the loss. Defaults to None.
reduction_override (str, optional): The reduction method used to
override the original reduction method of the loss.
Defaults to None.
Returns:
Tensor: Calculated loss
"""
assert reduction_override in (None, 'none', 'mean', 'sum')
reduction = (
reduction_override if reduction_override else self.reduction)
loss_bbox = self.loss_weight * smooth_l1_loss(
pred,
target,
weight,
beta=self.beta,
reduction=reduction,
avg_factor=avg_factor,
**kwargs)
return loss_bbox
@MODELS.register_module()
class L1Loss(nn.Module):
"""L1 loss.
Args:
reduction (str, optional): The method to reduce the loss.
Options are "none", "mean" and "sum".
loss_weight (float, optional): The weight of loss.
"""
def __init__(self,
reduction: str = 'mean',
loss_weight: float = 1.0) -> None:
super().__init__()
self.reduction = reduction
self.loss_weight = loss_weight
def forward(self,
pred: Tensor,
target: Tensor,
weight: Optional[Tensor] = None,
avg_factor: Optional[int] = None,
reduction_override: Optional[str] = None) -> Tensor:
"""Forward function.
Args:
pred (Tensor): The prediction.
target (Tensor): The learning target of the prediction.
weight (Tensor, optional): The weight of loss for each
prediction. Defaults to None.
avg_factor (int, optional): Average factor that is used to average
the loss. Defaults to None.
reduction_override (str, optional): The reduction method used to
override the original reduction method of the loss.
Defaults to None.
Returns:
Tensor: Calculated loss
"""
assert reduction_override in (None, 'none', 'mean', 'sum')
reduction = (
reduction_override if reduction_override else self.reduction)
loss_bbox = self.loss_weight * l1_loss(
pred, target, weight, reduction=reduction, avg_factor=avg_factor)
return loss_bbox
- 第99行代码,
reduction_override is None
,程序继续运行;
- 第100行代码,
设置 reduction 的值
;
- 第102行代码,
调用 smooth_l1_loss() 函数计算损失
; - 第25行代码,
判断 beta 是否大于 0
,是的话程序继续运行;
- 第26行代码,
target.numel() 不等于 0
,if 条件不满足,所以跳过后面的语句;
- 第29行代码,
判断 pred.size() 和 target.size() 是否相等
,是的话程序继续运行;
- 第30行代码,
通过 diff 来存储 pred 和 target 之间的差值
;
- 第31行代码,
根据 diff 是否小于 beta 分别计算损失loss
;
- 第31行代码,
调用 weight_reduce_loss() 函数
,这个函数最关键的代码是对 loss 进行总的求和并除以 (avg_factor + 极小值),最终得到我们的 loss 如下所示;
- 第110行代码,
返回计算好的 loss_bbox 给前面调用的函数
。
5.5 从 CrossEntropyLoss 的 forward()
函数开始解读:
# Copyright (c) OpenMMLab. All rights reserved.
import warnings
import torch
import torch.nn as nn
import torch.nn.functional as F
from mmdet.registry import MODELS
from .utils import weight_reduce_loss
def cross_entropy(pred,
label,
weight=None,
reduction='mean',
avg_factor=None,
class_weight=None,
ignore_index=-100,
avg_non_ignore=False):
"""Calculate the CrossEntropy loss.
Args:
pred (torch.Tensor): The prediction with shape (N, C), C is the number
of classes.
label (torch.Tensor): The learning label of the prediction.
weight (torch.Tensor, optional): Sample-wise loss weight.
reduction (str, optional): The method used to reduce the loss.
avg_factor (int, optional): Average factor that is used to average
the loss. Defaults to None.
class_weight (list[float], optional): The weight for each class.
ignore_index (int | None): The label index to be ignored.
If None, it will be set to default value. Default: -100.
avg_non_ignore (bool): The flag decides to whether the loss is
only averaged over non-ignored targets. Default: False.
Returns:
torch.Tensor: The calculated loss
"""
# The default value of ignore_index is the same as F.cross_entropy
ignore_index = -100 if ignore_index is None else ignore_index
# element-wise losses
loss = F.cross_entropy(
pred,
label,
weight=class_weight,
reduction='none',
ignore_index=ignore_index)
# average loss over non-ignored elements
# pytorch's official cross_entropy average loss over non-ignored elements
# refer to https://github.com/pytorch/pytorch/blob/56b43f4fec1f76953f15a627694d4bba34588969/torch/nn/functional.py#L2660 # noqa
if (avg_factor is None) and avg_non_ignore and reduction == 'mean':
avg_factor = label.numel() - (label == ignore_index).sum().item()
# apply weights and do the reduction
if weight is not None:
weight = weight.float()
loss = weight_reduce_loss(
loss, weight=weight, reduction=reduction, avg_factor=avg_factor)
return loss
def _expand_onehot_labels(labels, label_weights, label_channels, ignore_index):
"""Expand onehot labels to match the size of prediction."""
bin_labels = labels.new_full((labels.size(0), label_channels), 0)
valid_mask = (labels >= 0) & (labels != ignore_index)
inds = torch.nonzero(
valid_mask & (labels < label_channels), as_tuple=False)
if inds.numel() > 0:
bin_labels[inds, labels[inds]] = 1
valid_mask = valid_mask.view(-1, 1).expand(labels.size(0),
label_channels).float()
if label_weights is None:
bin_label_weights = valid_mask
else:
bin_label_weights = label_weights.view(-1, 1).repeat(1, label_channels)
bin_label_weights *= valid_mask
return bin_labels, bin_label_weights, valid_mask
def binary_cross_entropy(pred,
label,
weight=None,
reduction='mean',
avg_factor=None,
class_weight=None,
ignore_index=-100,
avg_non_ignore=False):
"""Calculate the binary CrossEntropy loss.
Args:
pred (torch.Tensor): The prediction with shape (N, 1) or (N, ).
When the shape of pred is (N, 1), label will be expanded to
one-hot format, and when the shape of pred is (N, ), label
will not be expanded to one-hot format.
label (torch.Tensor): The learning label of the prediction,
with shape (N, ).
weight (torch.Tensor, optional): Sample-wise loss weight.
reduction (str, optional): The method used to reduce the loss.
Options are "none", "mean" and "sum".
avg_factor (int, optional): Average factor that is used to average
the loss. Defaults to None.
class_weight (list[float], optional): The weight for each class.
ignore_index (int | None): The label index to be ignored.
If None, it will be set to default value. Default: -100.
avg_non_ignore (bool): The flag decides to whether the loss is
only averaged over non-ignored targets. Default: False.
Returns:
torch.Tensor: The calculated loss.
"""
# The default value of ignore_index is the same as F.cross_entropy
ignore_index = -100 if ignore_index is None else ignore_index
if pred.dim() != label.dim():
label, weight, valid_mask = _expand_onehot_labels(
label, weight, pred.size(-1), ignore_index)
else:
# should mask out the ignored elements
valid_mask = ((label >= 0) & (label != ignore_index)).float()
if weight is not None:
# The inplace writing method will have a mismatched broadcast
# shape error if the weight and valid_mask dimensions
# are inconsistent such as (B,N,1) and (B,N,C).
weight = weight * valid_mask
else:
weight = valid_mask
# average loss over non-ignored elements
if (avg_factor is None) and avg_non_ignore and reduction == 'mean':
avg_factor = valid_mask.sum().item()
# weighted element-wise losses
weight = weight.float()
loss = F.binary_cross_entropy_with_logits(
pred, label.float(), pos_weight=class_weight, reduction='none')
# do the reduction for the weighted loss
loss = weight_reduce_loss(
loss, weight, reduction=reduction, avg_factor=avg_factor)
return loss
def mask_cross_entropy(pred,
target,
label,
reduction='mean',
avg_factor=None,
class_weight=None,
ignore_index=None,
**kwargs):
"""Calculate the CrossEntropy loss for masks.
Args:
pred (torch.Tensor): The prediction with shape (N, C, *), C is the
number of classes. The trailing * indicates arbitrary shape.
target (torch.Tensor): The learning label of the prediction.
label (torch.Tensor): ``label`` indicates the class label of the mask
corresponding object. This will be used to select the mask in the
of the class which the object belongs to when the mask prediction
if not class-agnostic.
reduction (str, optional): The method used to reduce the loss.
Options are "none", "mean" and "sum".
avg_factor (int, optional): Average factor that is used to average
the loss. Defaults to None.
class_weight (list[float], optional): The weight for each class.
ignore_index (None): Placeholder, to be consistent with other loss.
Default: None.
Returns:
torch.Tensor: The calculated loss
Example:
>>> N, C = 3, 11
>>> H, W = 2, 2
>>> pred = torch.randn(N, C, H, W) * 1000
>>> target = torch.rand(N, H, W)
>>> label = torch.randint(0, C, size=(N,))
>>> reduction = 'mean'
>>> avg_factor = None
>>> class_weights = None
>>> loss = mask_cross_entropy(pred, target, label, reduction,
>>> avg_factor, class_weights)
>>> assert loss.shape == (1,)
"""
assert ignore_index is None, 'BCE loss does not support ignore_index'
# TODO: handle these two reserved arguments
assert reduction == 'mean' and avg_factor is None
num_rois = pred.size()[0]
inds = torch.arange(0, num_rois, dtype=torch.long, device=pred.device)
pred_slice = pred[inds, label].squeeze(1)
return F.binary_cross_entropy_with_logits(
pred_slice, target, weight=class_weight, reduction='mean')[None]
@MODELS.register_module()
class CrossEntropyLoss(nn.Module):
def __init__(self,
use_sigmoid=False,
use_mask=False,
reduction='mean',
class_weight=None,
ignore_index=None,
loss_weight=1.0,
avg_non_ignore=False):
"""CrossEntropyLoss.
Args:
use_sigmoid (bool, optional): Whether the prediction uses sigmoid
of softmax. Defaults to False.
use_mask (bool, optional): Whether to use mask cross entropy loss.
Defaults to False.
reduction (str, optional): . Defaults to 'mean'.
Options are "none", "mean" and "sum".
class_weight (list[float], optional): Weight of each class.
Defaults to None.
ignore_index (int | None): The label index to be ignored.
Defaults to None.
loss_weight (float, optional): Weight of the loss. Defaults to 1.0.
avg_non_ignore (bool): The flag decides to whether the loss is
only averaged over non-ignored targets. Default: False.
"""
super(CrossEntropyLoss, self).__init__()
assert (use_sigmoid is False) or (use_mask is False)
self.use_sigmoid = use_sigmoid
self.use_mask = use_mask
self.reduction = reduction
self.loss_weight = loss_weight
self.class_weight = class_weight
self.ignore_index = ignore_index
self.avg_non_ignore = avg_non_ignore
if ((ignore_index is not None) and not self.avg_non_ignore
and self.reduction == 'mean'):
warnings.warn(
'Default ``avg_non_ignore`` is False, if you would like to '
'ignore the certain label and average loss over non-ignore '
'labels, which is the same with PyTorch official '
'cross_entropy, set ``avg_non_ignore=True``.')
if self.use_sigmoid:
self.cls_criterion = binary_cross_entropy
elif self.use_mask:
self.cls_criterion = mask_cross_entropy
else:
self.cls_criterion = cross_entropy
def extra_repr(self):
"""Extra repr."""
s = f'avg_non_ignore={self.avg_non_ignore}'
return s
def forward(self,
cls_score,
label,
weight=None,
avg_factor=None,
reduction_override=None,
ignore_index=None,
**kwargs):
"""Forward function.
Args:
cls_score (torch.Tensor): The prediction.
label (torch.Tensor): The learning label of the prediction.
weight (torch.Tensor, optional): Sample-wise loss weight.
avg_factor (int, optional): Average factor that is used to average
the loss. Defaults to None.
reduction_override (str, optional): The method used to reduce the
loss. Options are "none", "mean" and "sum".
ignore_index (int | None): The label index to be ignored.
If not None, it will override the default value. Default: None.
Returns:
torch.Tensor: The calculated loss.
"""
assert reduction_override in (None, 'none', 'mean', 'sum')
reduction = (
reduction_override if reduction_override else self.reduction)
if ignore_index is None:
ignore_index = self.ignore_index
if self.class_weight is not None:
class_weight = cls_score.new_tensor(
self.class_weight, device=cls_score.device)
else:
class_weight = None
loss_cls = self.loss_weight * self.cls_criterion(
cls_score,
label,
weight,
class_weight=class_weight,
reduction=reduction,
avg_factor=avg_factor,
ignore_index=ignore_index,
avg_non_ignore=self.avg_non_ignore,
**kwargs)
return loss_cls
- 第280行代码,
reduction_override is None
,程序继续运行;
- 第281行代码,
设置 reduction 的值
;
- 第283行代码,
ignore_index is None
,if 条件成立,所以进入后面语句;
- 第284行代码,
通过 self.ignore_index 来设置 ignore_index 的值
;
- 第286行代码,
self.class_weight is None
,if 条件不成立,所以进入 else 后面语句;
- 第290行代码,
设置 class_weight 的值为 None;
- 第291行代码,
调用 cross_entropy() 函数计算损失
;
- 第40行代码,
设置 ignore_index 的值
;
- 第42行代码,
调用 F.cross_entropy() 函数计算损失 loss
;
- 第52行代码,
avg_factor is not None
,if 条件不成立,所以跳过后面语句;
- 第56行代码,
weight is not None
,if 条件成立,所以进入后面语句; - 第57行代码,
设置 weight 的类型为 float
;
- 第58行代码,
调用 weight_reduce_loss() 函数
,这个函数最关键的代码是对 loss 进行总的求和并除以 (avg_factor + 极小值),最终得到我们的 loss 如下所示;
- 第61行代码,
返回 loss 给前面调用的函数
; - 第301行代码,
返回计算好的 loss_cls 给前面调用的函数
。
至此,损失计算部分代码解读结束。
第六章 PointPillars的测试结果和消融实验
6.1 PointPillars 在 KITTI 数据集上的测试结果
6.2 PointPillars 的消融实验
6.2.1 空间分辨率
在实现中每个pillar的长宽都设定在0.16m,如果增大这个数据的话,可以加快的推理速度,因为更大的 pillars 会使得整个点云中的非空 pillars 更少,同时计算得到的伪图象长宽也会更小,加快了 PointNet 编码器和网络中 CNN 提取特征的速度;但是,更小的pillars可以使网络学习到更加细腻的特征,拥有更好的定位精度。测试结果如下:更大的pillar带来了更快的速度,更小的pillar拥有更高的精度。
6.2.2 每个 box 独立进行数据增强
虽然在 VoxelNet 和 SECOND 中都推荐大量的对每个 GT_Box 进行数据增强,但是在 PointPillars 中通过实验发现,这样的操作会使得对行人的检测性能大幅度的降低,反而较少的独立数据增强效果更好,可能的原因是在每一帧点云中放入从样本库中真实采样中的 GT 数据可以减轻对大幅度进行独立GT增强的需要。
6.2.3 点云表达特征增强
在对每个点云(x、y、z、r)数据进行增强的时候,PointPillars 采用了和 VoxelNet 一样的操作,都为每个点云空间的特征加入了当前点云到当前 pillar 底部中心的距离、 x p x_{p} xp、 y p y_{p} yp和 z p z_{p} zp,这一操作使得最终的整体检测性能提高了 0.5 mAP,同时也使得论文中结果更具复现性。
6.2.4 编码器
一个可以学习的编码器对于固定的编码器来说是实现网络端到端训练的重要架构,此处对 PointPillars 中使用不同编码器得到的结果进行了实验,结果如下:
至此,我们的损失计算篇
完满结束,我们 PointPillars 论文的 MMDetection3D 代码解读最终完结撒花,感谢大家的支持!