前言:
关于FCOS原理的部分参考:FCOS网络解析-CSDN博客
anchor-free 这类算法性能优异的核心在于 bbox 正负样本分配,不同的 anchor-free 算法主要区别就在于此,或者说各类 anchor-free 算法最大创新点就在于此,在fcos中针对正负样本分配问题主要提出的有:
1.center sampling
匹配准则:点(x,y)不仅要在GT的范围内,还要离GT的中心点( cx , cy )足够近才能被视为正样本。
2.当特征图上的某一点同时落入多个GT Box内时,默认将该点分配给面积Area最小的GT Box。
另外对于Center-ness分支,在预测特征图的每个位置上都会预测1个参数,
center-ness
反映的是该点(特征图上的某一点)距离目标中心的远近程度,它的值域在0~1之间,距离目标中心越近center-ness
越接近于1,在网络后处理部分筛选高质量bbox时,会将预测的目标class score与center-ness相乘再开根,然后根据得到的结果对bbox进行排序,只保留分数较高的bbox,这样做的目的是筛掉那些目标class score低且预测点距离目标中心较远的bbox。
源码解析:
【图片来自】:FCOS网络解析-CSDN博客
Backbone(Resnet50):
通过ResNet得到四个不同层级的特征list ,先看config文件:
backbone=dict(
depth=50,
frozen_stages=1,
init_cfg=dict(checkpoint='torchvision://resnet50', type='Pretrained'),
norm_cfg=dict(requires_grad=True, type='BN'), # PyTorch 模式下,除了 frozen_stages 的 BN 参数不更新外,其余层 BN 参数还是会更新的
norm_eval=True,
num_stages=4,
# 表示本模块输出的特征图索引,(0, 1, 2, 3),表示4个 stage 输出都需要,
# 其 stride 为 (4,8,16,32),channel 为 (256, 512, 1024, 2048)
out_indices=(
0,
1,
2,
3,
),
style='pytorch',
type='ResNet'),
在resnet.py---forward下打上断点:
def forward(self, x):
"""Forward function."""
if self.deep_stem: # torch.Size([1, 3, 480, 546])
x = self.stem(x)
else:
x = self.conv1(x)
x = self.norm1(x)
x = self.relu(x)
x = self.maxpool(x)
outs = []
for i, layer_name in enumerate(self.res_layers):
res_layer = getattr(self, layer_name)
x = res_layer(x) # # 原图片mask下采样4、8、16、32倍
if i in self.out_indices: # 通过out_indices索引决定哪几个stage 0 1 2 3
outs.append(x)
# 4个不同尺度的输出特征 list
# 0 torch.Size([1, 256, 256, 256])
# 1 torch.Size([1, 512, 128, 128])
# 2 torch.Size([1, 1024, 64, 64])
# 3 torch.Size([1, 2048, 32, 32])
return tuple(outs)
neck(FPN):
config文件:
neck=dict(
add_extra_convs='on_output',
in_channels=[
256,
512,
1024,
2048,
],
# FPN 输出特征图个数
num_outs=5,
# FPN 输出的每个尺度输出特征图通道
out_channels=256,
relu_before_extra_convs=True,
# 从输入多尺度特征图的第几个开始计算
start_level=1,
type='FPN'),
在fpn.py forward下打上断点:最终输出p3\p4\p5\p6\p7五个特征图
def forward(self, inputs: Tuple[Tensor]) -> tuple:
"""Forward function.
Args:
inputs (tuple[Tensor]): Features from the upstream network, each
is a 4D-tensor.
Returns:
tuple: Feature maps, each is a 4D-tensor.
"""
assert len(inputs) == len(self.in_channels)
# build laterals
# laterals list 0 torch.Size([1, 256, 128, 128])
# 1 torch.Size([1, 256, 64, 64])
# 2 torch.Size([1, 256, 32, 32])
laterals = [
lateral_conv(inputs[i + self.start_level])
for i, lateral_conv in enumerate(self.lateral_convs) # 将输入的1、2、3特征图 维度统一为256
]
# build top-down path top-down 部分,首先确定已使用的 backbone 层数(used_backbone_levels),然后从最高层往下处理到第二层。
used_backbone_levels = len(laterals)
for i in range(used_backbone_levels - 1, 0, -1): # 从最高层开始,逐渐向底层处理
# In some cases, fixing `scale factor` (e.g. 2) is preferred, but
# it cannot co-exist with `size` in `F.interpolate`.
if 'scale_factor' in self.upsample_cfg:
# fix runtime error of "+=" inplace operation in PyTorch 1.10
laterals[i - 1] = laterals[i - 1] + F.interpolate(
laterals[i], **self.upsample_cfg)
else:
prev_shape = laterals[i - 1].shape[2:]
laterals[i - 1] = laterals[i - 1] + F.interpolate(
laterals[i], size=prev_shape, **self.upsample_cfg)
# build outputs
# part 1: from original levels
outs = [
self.fpn_convs[i](laterals[i]) for i in range(used_backbone_levels)
] # 得到输出特征图 p3、p4、p5
# part 2: add extra levels 添加p6、p7out
if self.num_outs > len(outs):
# use max pool to get more levels on top of outputs
# (e.g., Faster R-CNN, Mask R-CNN)
if not self.add_extra_convs:
for i in range(self.num_outs - used_backbone_levels):
outs.append(F.max_pool2d(outs[-1], 1, stride=2))
# add conv layers on top of original feature maps (RetinaNet)
else:
if self.add_extra_convs == 'on_input':
extra_source = inputs[self.backbone_end_level - 1]
elif self.add_extra_convs == 'on_lateral':
extra_source = laterals[-1]
elif self.add_extra_convs == 'on_output': #
extra_source = outs[-1]
else:
raise NotImplementedError
outs.append(self.fpn_convs[used_backbone_levels](extra_source))
for i in range(used_backbone_levels + 1, self.num_outs):
if self.relu_before_extra_convs: # 是否最后一个特征图前需要relu
outs.append(self.fpn_convs[i](F.relu(outs[-1])))
else:
outs.append(self.fpn_convs[i](outs[-1]))
return tuple(outs)
Head(FCOSHead):
config文件:
bbox_head=dict(
center_sampling=True, # 是否开启中心采样策略
centerness_on_reg=True, # centerness 分支是否和 reg 分支共享权重
conv_bias=True,
dcn_on_last_conv=False,
# 中间特征图通道数
feat_channels=256,
in_channels=256,
loss_bbox=dict(loss_weight=1.0, type='GIoULoss'),
loss_centerness=dict(
loss_weight=1.0, type='CrossEntropyLoss', use_sigmoid=True),
loss_cls=dict(
alpha=0.25,
gamma=2.0,
loss_weight=1.0,
type='FocalLoss',
use_sigmoid=True),
norm_on_bbox=True, # bbox 分支输出值是否经过 relu 处理
num_classes=2,
# 每个分支堆叠4层卷积
stacked_convs=4,
strides=[
8,
16,
32,
64,
128,
],
type='FCOSHead'),
loss:
在base_dense_head.py的loss函数,fcos_head.py的loss_by_feat函数下打上断点:
def loss(self, x: Tuple[Tensor], batch_data_samples: SampleList) -> dict:
outs = self(x)
# 解析批处理数据样本,获取真实标注的实例、忽略实例以及图像元信息等信息,并将其分别赋值给 batch_gt_instances、batch_gt_instances_ignore 和 batch_img_metas
outputs = unpack_gt_instances(batch_data_samples)
(batch_gt_instances, batch_gt_instances_ignore,
batch_img_metas) = outputs
loss_inputs = outs + (batch_gt_instances, batch_img_metas,
batch_gt_instances_ignore)
losses = self.loss_by_feat(*loss_inputs)
return losses
输入tuple x通过FCOShead之后生成tuple out:
- 分类预测结果:通过
conv_cls
卷积层生成的分类预测。 - 边界框回归预测结果:通过
conv_reg
卷积层生成的边界框回归预测。 - 中心度预测结果:通过
conv_centerness
卷积层生成的中心度预测。
loss_inputs是包含outs与真实标注的实例、忽略实例以及图像元信息的touple
loss_inputs传入loss_by_feat函数:
def loss_by_feat(
self,
cls_scores: List[Tensor],
bbox_preds: List[Tensor],
centernesses: List[Tensor],
batch_gt_instances: InstanceList,
batch_img_metas: List[dict],
batch_gt_instances_ignore: OptInstanceList = None
) -> Dict[str, Tensor]:
assert len(cls_scores) == len(bbox_preds) == len(centernesses)
# 获取5个featmap level的尺寸信息list
featmap_sizes = [featmap.size()[-2:] for featmap in cls_scores]
# 第一步:根据特征图的高和宽计算anchor
all_level_points = self.prior_generator.grid_priors(
featmap_sizes,
dtype=bbox_preds[0].dtype,
device=bbox_preds[0].device)
labels, bbox_targets = self.get_targets(all_level_points,
batch_gt_instances)
num_imgs = cls_scores[0].size(0)
# flatten cls_scores, bbox_preds and centerness
flatten_cls_scores = [
cls_score.permute(0, 2, 3, 1).reshape(-1, self.cls_out_channels)
for cls_score in cls_scores
]
flatten_bbox_preds = [
bbox_pred.permute(0, 2, 3, 1).reshape(-1, 4)
for bbox_pred in bbox_preds
]
flatten_centerness = [
centerness.permute(0, 2, 3, 1).reshape(-1)
for centerness in centernesses
]
# 将flatten的各个level的tensor拼接起来
flatten_cls_scores = torch.cat(flatten_cls_scores)
flatten_bbox_preds = torch.cat(flatten_bbox_preds)
flatten_centerness = torch.cat(flatten_centerness)
# labels和bbox_targets是这个batch里各个图的相同level的点anchor的标签拼起来的list
# list每个元素为一个tensor,list的长度为level的数目,
# 所以cat完就是所有level的target放到一整个tensor了
# 这些步骤和上面转换预测结果是一致的,举例说明:
# flatten_cls_scores的size是(所有点的预测结果,2)
# flatten_labels的size是(所有点的预测结果,),存的值是对应的类别的编号
flatten_labels = torch.cat(labels)
flatten_bbox_targets = torch.cat(bbox_targets)
# repeat points to align with bbox_preds
# 将所有级别的点按照不同图片展平成一个张量
flatten_points = torch.cat(
[points.repeat(num_imgs, 1) for points in all_level_points])
losses = dict()
# FG cat_id: [0, num_classes -1], BG cat_id: num_classes
bg_class_ind = self.num_classes
# 找到展平后的标签 flatten_labels 中属于正类别(非背景类别)的索引位置,并存储在 pos_inds 中
pos_inds = ((flatten_labels >= 0)
& (flatten_labels < bg_class_ind)).nonzero().reshape(-1)
num_pos = torch.tensor(
len(pos_inds), dtype=torch.float, device=bbox_preds[0].device)
num_pos = max(reduce_mean(num_pos), 1.0)
loss_cls = self.loss_cls( # FocalLoss()
flatten_cls_scores, flatten_labels, avg_factor=num_pos)
if getattr(self.loss_cls, 'custom_accuracy', False):
acc = self.loss_cls.get_accuracy(flatten_cls_scores,
flatten_labels)
losses.update(acc)
pos_bbox_preds = flatten_bbox_preds[pos_inds] # 从展平后的边界框预测结果中提取属于正类别的预测边界框
pos_centerness = flatten_centerness[pos_inds] # 从展平后的中心度预测结果中提取属于正类别的中心度预测值
pos_bbox_targets = flatten_bbox_targets[pos_inds] # 从展平后的边界框目标中提取属于正类别的目标边界框
pos_centerness_targets = self.centerness_target(pos_bbox_targets) # 通过 centerness_target 方法计算正类别目标边界框对应的中心度目标值 每个值都在0-1之间
# centerness weighted iou loss 计算中心度加权的 IoU 损失的归一化因子
centerness_denorm = max(
reduce_mean(pos_centerness_targets.sum().detach()), 1e-6)
# 只对正样本计算loss
if len(pos_inds) > 0:
pos_points = flatten_points[pos_inds]
pos_decoded_bbox_preds = self.bbox_coder.decode(
pos_points, pos_bbox_preds) # 利用边界框解码器(bbox_coder)对正样本的边界框预测值进行解码,得到解码后的边界框预测结果。
pos_decoded_target_preds = self.bbox_coder.decode(
pos_points, pos_bbox_targets) # 利用边界框解码器对正样本的边界框目标值进行解码,得到解码后的边界框目标结果。
loss_bbox = self.loss_bbox(
pos_decoded_bbox_preds,
pos_decoded_target_preds,
weight=pos_centerness_targets,
avg_factor=centerness_denorm) # 计算边界框回归损失,其中传入解码后的边界框预测值、解码后的边界框目标值、中心度权重(pos_centerness_targets)、以及归一化因子(centerness_denorm)
loss_centerness = self.loss_centerness(
pos_centerness, pos_centerness_targets, avg_factor=num_pos) # 计算中心度损失,传入中心度预测值、中心度目标值、以及正样本数目作为平均因子(num_pos)
else:
loss_bbox = pos_bbox_preds.sum()
loss_centerness = pos_centerness.sum()
losses['loss_cls'] = loss_cls
losses['loss_bbox'] = loss_bbox
losses['loss_centerness'] = loss_centerness
return losses
关于prior_generator下grid_priors函数生成anchor的过程:
def grid_priors(self,
featmap_sizes: List[Tuple],
dtype: torch.dtype = torch.float32,
device: DeviceType = 'cuda',
with_stride: bool = False) -> List[Tensor]:
"""Generate grid points of multiple feature levels.
Args:
featmap_sizes (list[tuple]): List of feature map sizes in
multiple feature levels, each size arrange as
as (h, w).
dtype (:obj:`dtype`): Dtype of priors. Defaults to torch.float32.
device (str | torch.device): The device where the anchors will be
put on.
with_stride (bool): Whether to concatenate the stride to
the last dimension of points.
Return:
list[torch.Tensor]: Points of multiple feature levels.
The sizes of each tensor should be (N, 2) when with stride is
``False``, where N = width * height, width and height
are the sizes of the corresponding feature level,
and the last dimension 2 represent (coord_x, coord_y),
otherwise the shape should be (N, 4),
and the last dimension 4 represent
(coord_x, coord_y, stride_w, stride_h).
"""
assert self.num_levels == len(featmap_sizes)
multi_level_priors = []
for i in range(self.num_levels): # 遍历每个level
priors = self.single_level_grid_priors(
featmap_sizes[i],
level_idx=i,
dtype=dtype,
device=device,
with_stride=with_stride)
multi_level_priors.append(priors)
return multi_level_priors
def single_level_grid_priors(self,
featmap_size: Tuple[int],
level_idx: int,
dtype: torch.dtype = torch.float32,
device: DeviceType = 'cuda',
with_stride: bool = False) -> Tensor:
feat_h, feat_w = featmap_size # 当前featmap尺寸
stride_w, stride_h = self.strides[level_idx] # 金字塔层级(level_idx)上对应的水平和垂直步幅
shift_x = (torch.arange(0, feat_w, device=device) +
self.offset) * stride_w # 水平方向上的偏移值 shift_x shape.torch.Size([128])
# keep featmap_size as Tensor instead of int, so that we
# can convert to ONNX correctly
shift_x = shift_x.to(dtype)
shift_y = (torch.arange(0, feat_h, device=device) +
self.offset) * stride_h # 垂直方向上的偏移值 shift_y
# keep featmap_size as Tensor instead of int, so that we
# can convert to ONNX correctly
shift_y = shift_y.to(dtype)
shift_xx, shift_yy = self._meshgrid(shift_x, shift_y) # 生成所有可能的偏移组合,以便后续在特征图上生成锚点或先验框
if not with_stride:
shifts = torch.stack([shift_xx, shift_yy], dim=-1)
else:
# use `shape[0]` instead of `len(shift_xx)` for ONNX export
stride_w = shift_xx.new_full((shift_xx.shape[0], ),
stride_w).to(dtype)
stride_h = shift_xx.new_full((shift_yy.shape[0], ),
stride_h).to(dtype)
shifts = torch.stack([shift_xx, shift_yy, stride_w, stride_h],
dim=-1)
all_points = shifts.to(device)
return all_points # 返回一系列在特征图上的锚点或先验框的中心坐标
返回的multi_level_priors包含每个level在对应featmap上的锚点或先验框的中心坐标:
关于self.get_targets生成target的过程:
def get_targets(
self, points: List[Tensor], batch_gt_instances: InstanceList
) -> Tuple[List[Tensor], List[Tensor]]:
"""Compute regression, classification and centerness targets for points
in multiple images.
Args:
points (list[Tensor]): Points of each fpn level, each has shape
(num_points, 2).
batch_gt_instances (list[:obj:`InstanceData`]): Batch of
gt_instance. It usually includes ``bboxes`` and ``labels``
attributes.
Returns:
tuple: Targets of each level.
- concat_lvl_labels (list[Tensor]): Labels of each level.
- concat_lvl_bbox_targets (list[Tensor]): BBox targets of each \
level.
"""
assert len(points) == len(self.regress_ranges)
num_levels = len(points) # 5
# expand regress ranges to align with points
# 将每个级别的回归范围扩展到与对应级别的点相匹配,以便在后续的目标检测算法中能够使用这些扩展后的回归范围
expanded_regress_ranges = [
points[i].new_tensor(self.regress_ranges[i])[None].expand_as(
points[i]) for i in range(num_levels)
]
# concat all levels points and regress ranges
concat_regress_ranges = torch.cat(expanded_regress_ranges, dim=0)
concat_points = torch.cat(points, dim=0)
# the number of points per img, per lvl
num_points = [center.size(0) for center in points] # 每张图片、每个级level的点的数量
# get labels and bbox_targets of each image
labels_list, bbox_targets_list = multi_apply(
self._get_targets_single,
batch_gt_instances,
points=concat_points,
regress_ranges=concat_regress_ranges,
num_points_per_lvl=num_points)
# split to per img, per level
labels_list = [labels.split(num_points, 0) for labels in labels_list]
bbox_targets_list = [
bbox_targets.split(num_points, 0)
for bbox_targets in bbox_targets_list
]
# concat per level image
concat_lvl_labels = []
concat_lvl_bbox_targets = []
for i in range(num_levels):
concat_lvl_labels.append(
torch.cat([labels[i] for labels in labels_list]))
bbox_targets = torch.cat(
[bbox_targets[i] for bbox_targets in bbox_targets_list])
if self.norm_on_bbox:
bbox_targets = bbox_targets / self.strides[i]
concat_lvl_bbox_targets.append(bbox_targets)
# 拿到每个batch里每个图的每个level的点anchor的分类标签和回归标签了
return concat_lvl_labels, concat_lvl_bbox_targets
def _get_targets_single(
self, gt_instances: InstanceData, points: Tensor,
regress_ranges: Tensor,
num_points_per_lvl: List[int]) -> Tuple[Tensor, Tensor]:
"""Compute regression and classification targets for a single image."""
num_points = points.size(0)
num_gts = len(gt_instances)
gt_bboxes = gt_instances.bboxes
gt_labels = gt_instances.labels
if num_gts == 0:
return gt_labels.new_full((num_points,), self.num_classes), \
gt_bboxes.new_zeros((num_points, 4))
areas = (gt_bboxes[:, 2] - gt_bboxes[:, 0]) * (
gt_bboxes[:, 3] - gt_bboxes[:, 1])
# TODO: figure out why these two are different
# areas = areas[None].expand(num_points, num_gts)
# repeat是把对应维度复制(一维要向上补),size为(num_points,num_gts)
areas = areas[None].repeat(num_points, 1)
regress_ranges = regress_ranges[:, None, :].expand(
num_points, num_gts, 2)
gt_bboxes = gt_bboxes[None].expand(num_points, num_gts, 4)
xs, ys = points[:, 0], points[:, 1]
xs = xs[:, None].expand(num_points, num_gts)
ys = ys[:, None].expand(num_points, num_gts)
left = xs - gt_bboxes[..., 0]
right = gt_bboxes[..., 2] - xs
top = ys - gt_bboxes[..., 1]
bottom = gt_bboxes[..., 3] - ys
bbox_targets = torch.stack((left, top, right, bottom), -1)
if self.center_sampling:
# condition1: inside a `center bbox`
radius = self.center_sample_radius
center_xs = (gt_bboxes[..., 0] + gt_bboxes[..., 2]) / 2
center_ys = (gt_bboxes[..., 1] + gt_bboxes[..., 3]) / 2
center_gts = torch.zeros_like(gt_bboxes)
stride = center_xs.new_zeros(center_xs.shape)
# project the points on current lvl back to the `original` sizes
lvl_begin = 0
for lvl_idx, num_points_lvl in enumerate(num_points_per_lvl):
lvl_end = lvl_begin + num_points_lvl
stride[lvl_begin:lvl_end] = self.strides[lvl_idx] * radius
lvl_begin = lvl_end
x_mins = center_xs - stride
y_mins = center_ys - stride
x_maxs = center_xs + stride
y_maxs = center_ys + stride
center_gts[..., 0] = torch.where(x_mins > gt_bboxes[..., 0],
x_mins, gt_bboxes[..., 0])
center_gts[..., 1] = torch.where(y_mins > gt_bboxes[..., 1],
y_mins, gt_bboxes[..., 1])
center_gts[..., 2] = torch.where(x_maxs > gt_bboxes[..., 2],
gt_bboxes[..., 2], x_maxs)
center_gts[..., 3] = torch.where(y_maxs > gt_bboxes[..., 3],
gt_bboxes[..., 3], y_maxs)
cb_dist_left = xs - center_gts[..., 0]
cb_dist_right = center_gts[..., 2] - xs
cb_dist_top = ys - center_gts[..., 1]
cb_dist_bottom = center_gts[..., 3] - ys
center_bbox = torch.stack(
(cb_dist_left, cb_dist_top, cb_dist_right, cb_dist_bottom), -1)
inside_gt_bbox_mask = center_bbox.min(-1)[0] > 0
else:
# condition1: inside a gt bbox
inside_gt_bbox_mask = bbox_targets.min(-1)[0] > 0
# condition2: limit the regression range for each location
max_regress_distance = bbox_targets.max(-1)[0]
inside_regress_range = (
(max_regress_distance >= regress_ranges[..., 0])
& (max_regress_distance <= regress_ranges[..., 1]))
# if there are still more than one objects for a location,
# we choose the one with minimal area
areas[inside_gt_bbox_mask == 0] = INF
areas[inside_regress_range == 0] = INF
min_area, min_area_inds = areas.min(dim=1)
labels = gt_labels[min_area_inds]
labels[min_area == INF] = self.num_classes # set as BG
bbox_targets = bbox_targets[range(num_points), min_area_inds]
# 所以返回值为每个点的分类label和回归target
# size分别为(num_points,)和(num_points,4)
return labels, bbox_targets
Center-ness分支,在预测特征图的每个位置上都会预测1个参数,center-ness反映的是该点(特征图上的某一点)距离目标中心的远近程度,它的值域在0~1之间,距离目标中心越近center-ness越接近于1,下面是center-ness真实标签的计算公式与源码(计算损失时只考虑正样本,即预测点在目标内的情况)
def centerness_target(self, pos_bbox_targets: Tensor) -> Tensor:
"""Compute centerness targets.
Args:
pos_bbox_targets (Tensor): BBox targets of positive bboxes in shape
(num_pos, 4)
Returns:
Tensor: Centerness target.
"""
# only calculate pos centerness targets, otherwise there may be nan
left_right = pos_bbox_targets[:, [0, 2]]
top_bottom = pos_bbox_targets[:, [1, 3]]
if len(left_right) == 0:
centerness_targets = left_right[..., 0]
else:
centerness_targets = (
left_right.min(dim=-1)[0] / left_right.max(dim=-1)[0]) * (
top_bottom.min(dim=-1)[0] / top_bottom.max(dim=-1)[0])
return torch.sqrt(centerness_targets)
最终返回losses list
整体训练流程到这就结束了。
reference:
csdn:FCOS网络解析-CSDN博客