参考博客:
可参考代码:https://github.com/supernotman/Faster-RCNN-with-torchvision
调用方式
import torch
from torchvision import transforms
import torchvision
from PIL import Image
# For training
images, boxes = torch.rand(4, 3, 600, 1200), torch.rand(4, 11, 4)
labels = torch.randint(1, 91, (4, 11))
images = list(image for image in images)
targets = []
for i in range(len(images)):
d = {}
d['boxes'] = boxes[i]
d['labels'] = labels[i]
targets.append(d)
output = model(images, targets)
'''
output:
{loss_classifier: *, loss_box_reg: *, loss_objectness: *, loss_rpn_box_reg: *}
'''
# For inference
model.eval()
img1 = transforms.ToTensor()(Image.open('./imgs/000000000036.jpg').convert('RGB'))
img2 = transforms.ToTensor()(Image.open('./imgs/000000000042.jpg').convert('RGB'))
x = [img1, img2]
predictions = model(x)
'''
predictions:
[{'boxes': num_boxes * 4 (xyxy), 'labels': 1 * num_boxes, 'scores': 1 * num_boxes},
{'boxes': num_boxes * 4 (xyxy), 'labels': 1 * num_boxes, 'scores': 1 * num_boxes}]
'''
其中 fasterrcnn_resnet50_fpn 会在默认使用 resnet50 作为 backbone 的基础上创建一个 FasterRCNN 的对象。FasterRCNN 在创建对应模块(rpn,roi_heads,transform)之后调用其基类 GeneralizedRCNN 完成对象生成。
GeneralizedRCNN 主要逻辑
images, targets = self.transform(images, targets)
features = self.backbone(images.tensors)
if isinstance(features, torch.Tensor):
features = OrderedDict([('0', features)])
proposals, proposal_losses = self.rpn(images, features, targets)
detections, detector_losses = self.roi_heads(features, proposals, images.image_sizes, targets)
detections = self.transform.postprocess(detections, images.image_sizes, original_image_sizes)
losses = {}
losses.update(detector_losses)
losses.update(proposal_losses)
FasterRCNN 子模块生成
BackboneWithFPN (Backbone)
class BackboneWithFPN(nn.Module):
"""
Adds a FPN on top of a model.
Internally, it uses torchvision.models._utils.IntermediateLayerGetter to
extract a submodel that returns the feature maps specified in return_layers.
The same limitations of IntermediatLayerGetter apply here.
Arguments:
backbone (nn.Module)
return_layers (Dict[name, new_name]): a dict containing the names
of the modules for which the activations will be returned as
the key of the dict, and the value of the dict is the name
of the returned activation (which the user can specify).
in_channels_list (List[int]): number of channels for each feature map
that is returned, in the order they are present in the OrderedDict
out_channels (int): number of channels in the FPN.
Attributes:
out_channels (int): the number of channels in the FPN
"""
def __init__(self, backbone, return_layers, in_channels_list, out_channels):
super(BackboneWithFPN, self).__init__()
self.body = IntermediateLayerGetter(backbone, return_layers=return_layers)
self.fpn = FeaturePyramidNetwork(
in_channels_list=in_channels_list,
out_channels=out_channels,
extra_blocks=LastLevelMaxPool(),
)
self.out_channels = out_channels
def forward(self, x):
x = self.body(x)
x = self.fpn(x)
return x
参数说明:
- return_layers:指定 backbone 返回的 conv 层,并给出返回的 OrderDict 的新 Key
- in_channels_list:指定 fpn 输入各 feature map 层的 channel 数
- out_channels:指定 fpn 输出的各 feature map 层的 channel 数,统一为一个数
其中例子中使用的 ResNet50 backbone 返回的 feature map size 依次减小为原来的 4,8,16,32 分之一,同时 channel 数依次增加为 256,512,1024,2048。不过经过 fpn 层后,所有最终返回的 feature map 的通道数都统一为了 256.
RegionProposalNetwork (RPN)
class RegionProposalNetwork(torch.nn.Module):
"""
Implements Region Proposal Network (RPN).
Arguments:
anchor_generator (AnchorGenerator): module that generates the anchors for a set of feature
maps.
head (nn.Module): module that computes the objectness and regression deltas
fg_iou_thresh (float): minimum IoU between the anchor and the GT box so that they can be
considered as positive during training of the RPN.
bg_iou_thresh (float): maximum IoU between the anchor and the GT box so that they can be
considered as negative during training of the RPN.
batch_size_per_image (int): number of anchors that are sampled during training of the RPN
for computing the loss
positive_fraction (float): proportion of positive anchors in a mini-batch during training
of the RPN
pre_nms_top_n (Dict[int]): number of proposals to keep before applying NMS. It should
contain two fields: training and testing, to allow for different values depending
on training or evaluation
post_nms_top_n (Dict[int]): number of proposals to keep after applying NMS. It should
contain two fields: training and testing, to allow for different values depending
on training or evaluation
nms_thresh (float): NMS threshold used for postprocessing the RPN proposals
"""
# RPN uses all feature maps that are available
features = list(features.values())
objectness, pred_bbox_deltas = self.head(features)
anchors = self.anchor_generator(images, features)
num_images = len(anchors)
num_anchors_per_level_shape_tensors = [o[0].shape for o in objectness]
num_anchors_per_level = [s[0] * s[1] * s[2] for s in num_anchors_per_level_shape_tensors]
objectness, pred_bbox_deltas = \
concat_box_prediction_layers(objectness, pred_bbox_deltas)
# apply pred_bbox_deltas to anchors to obtain the decoded proposals
# note that we detach the deltas because Faster R-CNN do not backprop through
# the proposals
proposals = self.box_coder.decode(pred_bbox_deltas.detach(), anchors)
proposals = proposals.view(num_images, -1, 4)
boxes, scores = self.filter_proposals(proposals, objectness, images.image_sizes, num_anchors_per_level)
losses = {}
if self.training:
assert targets is not None
labels, matched_gt_boxes = self.assign_targets_to_anchors(anchors, targets)
regression_targets = self.box_coder.encode(matched_gt_boxes, anchors)
loss_objectness, loss_rpn_box_reg = self.compute_loss(
objectness, pred_bbox_deltas, labels, regression_targets)
losses = {
"loss_objectness": loss_objectness,
"loss_rpn_box_reg": loss_rpn_box_reg,
}
return boxes, losses
处理流程:
1、回归 proposals 及对应前景概率:head
基于 backbone 提取到的每层 feature map 通过 Conv2d 分别回归每个特征点的每个 anchor 的类别(前景的概率)和偏差(预测框与 anchor)。对应输出结果的 size 不会变化,不过 channel 数类别变为指定每个特征点提取的 num_anchors 和 4 * num_anchors。
2、生成 anchors:anchor_generator
基于传入的每个特征点需要提取的 anchor_size 和 anchor_ratios 提取 anchors。TODO
3、基于提取的 anchors 和回归的 anchors_deltas 得到 proposals
基于 det_utils.BoxCoder 的 decoder 得到 proposals
4、对提取的 propoals 进行筛选:pre_nms_top_n,nms_thresh,post_nms_top_n
- 基于每层 anchors 提取 pre_nms_top_n 个前景概率最高的 anchors
- 去除可视范围内过小的 proposals
- 基于 nms_thresh 去除重叠的 proposals
- 保留全局最优的 post_nms_top_n 个 proposals
5、(训练)编码 anchors offsets:fg_iou_thresh, bg_iou_thresh
- 将 anchors 与真值关联,det_utils.Matcher
- det_utils.BoxCoder 的 encoder 得到每个 anchor 对应真值的 offsets
6、(训练)计算 loss:batch_size_per_image, positive_fraction
-
前景背景均衡化采样:BalancedPositiveNegativeSampler
- 基于 L1 计算回归 loss,交叉熵计算分类 loss
RoIHeads
class RoIHeads(torch.nn.Module):
def __init__(self,
box_roi_pool,
box_head,
box_predictor,
# Faster R-CNN training
fg_iou_thresh, bg_iou_thresh,
batch_size_per_image, positive_fraction,
bbox_reg_weights,
# Faster R-CNN inference
score_thresh,
nms_thresh,
detections_per_img,
)
if self.training:
proposals, matched_idxs, labels, regression_targets = self.select_training_samples(proposals, targets)
else:
labels = None
regression_targets = None
matched_idxs = None
box_features = self.box_roi_pool(features, proposals, image_shapes)
box_features = self.box_head(box_features)
class_logits, box_regression = self.box_predictor(box_features)
result = torch.jit.annotate(List[Dict[str, torch.Tensor]], [])
losses = {}
if self.training:
assert labels is not None and regression_targets is not None
loss_classifier, loss_box_reg = fastrcnn_loss(
class_logits, box_regression, labels, regression_targets)
losses = {
"loss_classifier": loss_classifier,
"loss_box_reg": loss_box_reg
}
else:
boxes, scores, labels = self.postprocess_detections(class_logits, box_regression, proposals, image_shapes)
num_images = len(boxes)
for i in range(num_images):
result.append(
{
"boxes": boxes[i],
"labels": labels[i],
"scores": scores[i],
}
)
处理流程:
1、(训练)选取训练样本:fg_iou_thresh, bg_iou_thresh, batch_size_per_image, positive_fraction
- proposals 与真值匹配
- 不同类别目标均衡化采样
- 基于采样后的 proposals 的 offset
2、roi_align:box_roi_pool
根据 proposals 对 feature map 进行尺寸归一化:MultiScaleRoIAlign
3、特征提取:box_head
归一化特征经过两层 MLP 提取特征
4、回归框的类别和 offset:box_predictor
针对每个 proposals 预测属于每一类的概率和 offset
5、(训练)计算 loss
- 基于交叉熵计算分类 loss
- 基于 L1 计算 offset loss
6、框 offset decode 后处理:score_thresh, nms_thresh, detections_per_img
- decoder 得到预测框
- softmax 得到分类类别
- 基于 score_thresh 去除得分过低的预测
- 去除有效范围内过小的预测
- 基于 nms_thresh 执行 NMS
- 选取得分最高的 detections_per_img 个预测
GeneralizedRCNNTransform (transform)
class GeneralizedRCNNTransform(nn.Module):
"""
Performs input / target transformation before feeding the data to a GeneralizedRCNN
model.
The transformations it perform are:
- input normalization (mean subtraction and std division)
- input / target resizing to match min_size / max_size
It returns a ImageList for the inputs, and a List[Dict[Tensor]] for the targets
"""
- 在送入网络之前对图像进行归一化(减均值除方差),resize 操作。
- 基于网络输出,将预测框大小 resize 成原始尺寸。
杂项
框的编码方式
其中 分别为分别为框的中心点坐标和宽高, 分别表示预测框,anchor 框和真值框。由此,
- :表示中心点水平位置相比于框宽度的比例
- :表示中心点竖直位置相比于框高度的比例
- :表示框宽度比例值的对数
- :表示框高度比例值的对数