文章目录
1.参考资料
目标检测算法 - RetinaNet:https://zhuanlan.zhihu.com/p/67768433
https://zhuanlan.zhihu.com/p/59910080
RetinaNet论文原文:https://arxiv.org/pdf/1708.02002.pdf
RetinaNet论文最全解析!一文读懂!:https://blog.csdn.net/weixin_40400177/article/details/103449280
RetinaNet非官方代码::https://github.com/ChingHo97/RetinaNet-Pytorch-36.4AP(本文的附加代码的链接)
预备知识:
困难样本挖掘技术OHEM
本文提出了新的损失函数,解决类别不均衡问题。损失函数是动态缩放的交叉熵损失,其中缩放因子随着对正确类别的置信度增加而衰减到0.
设计了一个RetinaNet的one-stage对象检测器来说明focalloss的有效性。
相关工作
Two-stage的准确率比one-stage的准确率高是解决了类别不平衡问题,在第一阶段过滤掉了大量的背景。
2.RetinaNet
RetinaNet由三部分组成:ResNet,FPN,RetinaHead。
Focal Loss是为了解决one-stage算法中正负样本比例严重失调的问题。该损失函数降低大量简单正负样本的loss权重,是一种困难样本挖掘策略。
作者认为one-stage比two-stage的精确度低,由类别失衡引起的,且大量easy example的样本占据了大量loss。
(1)negative example数量太大,占据了大量loss。
(2)大量easy positive\negative example造成loss下,反向梯度小。
因此作者提出了facal loss。
class RetinaNet(nn.Module):
self.backbone = resnet50(pretrained=self.config.pretrained)
self.fpn = FPN(features=self.config.fpn_out_channels,
use_p5=self.config.use_p5)
self.head = RetinaHead(config=self.config)
def forward(self, images):
C3,C4,C5 = self.backbone(images)#三个特征图
all_p_level = self.fpn([C3,C4,C5])#[p3,p4,p5,p6,p7]
cls_logits, reg_preds = self.head(all_p_level)#预测值
return cls_logits, reg_preds
3.ResNet
3.ResNet论文详解
随着网络层数的加深,出现了梯度消失、爆炸的问题,和精度下降问题。精度下降问题是因为更多的层导致更高的训练误差。梯度消失和爆炸通过归一化(BN batch normalization)来解决,但是只能解决20层网络。一个解决办法是引入恒等映射。简单地说,原来网络的输出为H(x)预测值,现在H(x)=F(x)+x,因为x是恒等映射,不需要学习,则只需要学习F(x)=H(x)-x就可以了,学习残差F(x)=H(x)-x比学习原始特征简单的多。
如何说解决了梯度爆炸和消失
原始残差公式,函数F表示一个残差函数,函数f表示激活函数
ResNet v2使用恒等映射,且不适用激活函数,则有:
L层是比l更深的层次,到L层是,预测值表达式:
反向求解梯度:
在第l层包含了深层次L的梯度,也就是说L层的梯度直接传给了l层,而
在训练过程中,它也不会一直等于-1,因此解决了梯度消失的问题。
引入恒等映射使网络结构自动调整深度?
假设存在一个性能最强的完美网络N NN,与它相比,我们所训练的网络中必定有一些层是多余的(甚至是起反作用的),那么这些多余的层的训练目标就是恒等变换(换句话说,就是让这些多余的层变成一个恒等变换,使得整个网络无限趋近于完美网络N NN),只有达到这个目标我们的网络性能才能跟N NN一样。
那么,如何另这些多余的层变成恒等映射呢?很简单,要想多余的层变成恒等映射,只要把输出所要拟合的目标变成x xx就行了(此时H ( x ) H(x)H(x)摇身一变成了x xx),即输入为x xx,输出仍然是x xx。
因此,作者提出了利用残差来重构网络的映射,说白了就是把输入x xx再次引入到结果,这样堆叠层的权重会趋向于零,用表达式表示就是:F ( x , w ) + x = H ( x ) = x F(x,w) + x = H(x) = xF(x,w)+x=H(x)=x,即F ( x , w ) → 0 F(x,w) \to 0F(x,w)→0。
总结一句话:残差结构人为制造了恒等映射,就能让整个结构朝着恒等映射的方向去收敛,确保最终的错误率不会因为深度的变大而越来越差。
————————————————
版权声明:本文为CSDN博主「lairongxuan」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/lairongxuan/article/details/91040698
4.ResNet代码及resnet50
resnet18 50网络结构以及pytorch实现代码:https://www.jianshu.com/p/085f4c8256f1
虚线表示需要1*1的卷积修正channels,然后x+layer.out相加时张量大小必须保持一致。
代码:
逐层调用:
class RetinaNetDetector(nn.Module):
self.body = RetinaNet(config=config)
class RetinaNet(nn.Module):
self.backbone = resnet50(pretrained=self.config.pretrained)
def resnet50(pretrained=False, **kwargs):
model = ResNet(Bottleneck, [3, 4, 6, 3], **kwargs)
resnet50的基础结构,Bottleneck
class Bottleneck(nn.Module):
# ResNet-B
expansion = 4
def __init__(self, inplanes, planes, stride=1, downsample=None):
super(Bottleneck, self).__init__()
self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)
self.bn1 = nn.BatchNorm2d(planes)
self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride,
padding=1, bias=False)
self.bn2 = nn.BatchNorm2d(planes)
self.conv3 = nn.Conv2d(planes, planes * 4, kernel_size=1, bias=False)
self.bn3 = nn.BatchNorm2d(planes * 4)
self.relu = nn.ReLU(inplace=True)
self.downsample = downsample#虚线部分,为了使x和layer.out的形状保持一致
self.stride = stride
def forward(self, x):
residual = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
out = self.relu(out)
out = self.conv3(out)
out = self.bn3(out)
if self.downsample is not None:
residual = self.downsample(x)
out += residual
out = self.relu(out)
return out
resnet18的基础结构
class BasicBlock(nn.Module):
expansion = 1
def __init__(self, inplanes, planes, stride=1, downsample=None):
super(BasicBlock, self).__init__()
self.conv1 = conv3x3(inplanes, planes, stride)
self.bn1 = nn.BatchNorm2d(planes)
self.relu = nn.ReLU(inplace=True)
self.conv2 = conv3x3(planes, planes)
self.bn2 = nn.BatchNorm2d(planes)
self.downsample = downsample
self.stride = stride
def forward(self, x):
residual = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
if self.downsample is not None:
residual = self.downsample(x)
out += residual
out = self.relu(out)
return out
def conv3x3(in_planes, out_planes, stride=1):
"""3x3 convolution with padding"""
return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
padding=1, bias=False)
ResNet网络
class ResNet(nn.Module):
def __init__(self, block, layers, num_classes=1000,if_include_top=False):
self.inplanes = 64
super(ResNet, self).__init__()
self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3,
bias=False)#(ori_H,ori_H)/2
self.bn1 = nn.BatchNorm2d(64)
self.relu = nn.ReLU(inplace=True)
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)#(ori_H,ori_H)/2
self.layer1 = self._make_layer(block, 64, layers[0])
self.layer2 = self._make_layer(block, 128, layers[1], stride=2)#(ori_H,ori_H)/2
self.layer3 = self._make_layer(block, 256, layers[2], stride=2)#(ori_H,ori_H)/2
self.layer4 = self._make_layer(block, 512, layers[3], stride=2)#(ori_H,ori_H)/2
self.avgpool = nn.AvgPool2d(7, stride=1)
if if_include_top:
self.fc = nn.Linear(512 * block.expansion, num_classes)
self.if_include_top=if_include_top
#ResNet50的权重初始化
for m in self.modules():
if isinstance(m, nn.Conv2d):#conv卷积的weight初始化
n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
m.weight.data.normal_(0, math.sqrt(2. / n))
elif isinstance(m, nn.BatchNorm2d):#BN的weight初始化
m.weight.data.fill_(1)
m.bias.data.zero_()
def _make_layer(self, block, planes, blocks, stride=1):
downsample = None
if stride != 1 or self.inplanes != planes * block.expansion:#虚线部分
downsample = nn.Sequential(
nn.Conv2d(self.inplanes, planes * block.expansion,
kernel_size=1, stride=stride, bias=False),
nn.BatchNorm2d(planes * block.expansion),
)
layers = []
layers.append(block(self.inplanes, planes, stride, downsample))
self.inplanes = planes * block.expansion
for i in range(1, blocks):
layers.append(block(self.inplanes, planes))
return nn.Sequential(*layers)
def forward(self, x):#这个forward结合了FPN,所以输出了三层(out3,out4,out5)
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.maxpool(x)
x = self.layer1(x)
out3 = self.layer2(x)
out4 = self.layer3(out3)
out5 = self.layer4(out4)
if self.if_include_top:
x = self.avgpool(out5)
x = x.view(x.size(0), -1)
x = self.fc(x)
return x
else:
return (out3, out4, out5)
5.FPN
这部分的FPN代码是ResNet50网络下的
class FPN(nn.Module):
'''only for resnet50,101,152'''
def __init__(self,features=256,use_p5=True):
super(FPN,self).__init__()
self.prj_5 = nn.Conv2d(2048, features, kernel_size=1)
self.prj_4 = nn.Conv2d(1024, features, kernel_size=1)
self.prj_3 = nn.Conv2d(512, features, kernel_size=1)
self.conv_5 =nn.Conv2d(features, features, kernel_size=3, padding=1)
self.conv_4 =nn.Conv2d(features, features, kernel_size=3, padding=1)
self.conv_3 =nn.Conv2d(features, features, kernel_size=3, padding=1)
if use_p5:
self.conv_out6 = nn.Conv2d(features, features, kernel_size=3, padding=1, stride=2)
else:
self.conv_out6 = nn.Conv2d(2048, features, kernel_size=3, padding=1, stride=2)
self.conv_out7 = nn.Conv2d(features, features, kernel_size=3, padding=1, stride=2)
self.use_p5=use_p5
self.apply(self.init_conv_kaiming)
#上采样
def upsamplelike(self,inputs):
src,target=inputs
return F.interpolate(src, size=(target.shape[2], target.shape[3]),
mode='nearest')#torch的上采样函数,采样方式为“nearest”
def init_conv_kaiming(self,module):
if isinstance(module, nn.Conv2d):
nn.init.kaiming_uniform_(module.weight, a=1)
if module.bias is not None:
nn.init.constant_(module.bias, 0)
def forward(self,x):#对应FPN网络
C3,C4,C5=x#对应ResNet的(out3,out4,out5)
P5 = self.prj_5(C5)
P4 = self.prj_4(C4)
P3 = self.prj_3(C3)
P4 = P4 + self.upsamplelike([P5,C4])
P3 = P3 + self.upsamplelike([P4,C3])
P3 = self.conv_3(P3)
P4 = self.conv_4(P4)
P5 = self.conv_5(P5)
#P6 = P5 if self.use_p5 else C5
P6 = self.conv_out6(P5) if self.use_p5 else self.conv_out6(C5)
P7 = self.conv_out7(F.relu(P6))
return [P3,P4,P5,P6,P7]
6.RetinaHead
3D的输入,怎么样使用GroupNorm()解读:https://blog.csdn.net/qq_40178291/article/details/101615391
这部分的输入是[p3,p4,p5,p6,p7]5张特征图,根据特征图预测分类和位置。网络层次有4层33卷积和一层33的最终结果卷积,也就是一共5层3*3卷积。
模型初始化:(论文翻译)
二分类的模型的默认初始化是默认y=0,y=1具有相同的概率。在这样的初始化下,对于样本的分类严重不平衡的数据集,类别频繁的样本贡献全部的loss,会导致训练早期,模型不稳定。针对类别不平衡问题,引入了“prior”先验这个概念,并用p值表示先验值,即p为模型训练初期对稀有类别(前景)的预测值(模型评估值)。模型的稀有类别评估值比较低,可为0.01.这是模型初始化的一个改变,并不是损失函数范畴。这种初始化可提高训练的稳定性(对交叉熵损失函数和facal loss)。
在代码中的体现是
class RetinaHead(nn.Module):
self.prior = self.config.prior#0.01,类别先验值
self.apply(self.init_conv_RandomNormal)
nn.init.constant_(self.cls_out.bias, -math.log((1 - self.prior) / self.prior))
def init_conv_RandomNormal(self,module,std=0.01):
if isinstance(module, nn.Conv2d):
nn.init.normal_(module.weight, std=std)
if module.bias is not None:
nn.init.constant_(module.bias, 0)
对应文件model/retina_head.py
class RetinaHead(nn.Module):
def __init__(self, config = None):
super(RetinaHead, self).__init__()
if config is None:
self.config = DefaultConfig
else:
self.config = config
self.anchor_nums = self.config.anchor_nums#9
cls_branch = []
reg_branch = []
for i in range(4):
cls_branch.append(nn.Conv2d(self.config.fpn_out_channels, self.config.fpn_out_channels,
kernel_size=3, stride=1,padding=1,bias=True))#256
if self.config.use_GN_head:#True分组规范化,将输入数据的通道分成32份,每份
cls_branch.append(nn.GroupNorm(32,self.config.fpn_out_channels))
cls_branch.append(nn.ReLU(inplace=True))
reg_branch.append(nn.Conv2d(self.config.fpn_out_channels, self.config.fpn_out_channels,
kernel_size=3, stride=1, padding=1, bias=True))
if self.config.use_GN_head:
reg_branch.append(nn.GroupNorm(32, self.config.fpn_out_channels))
reg_branch.append(nn.ReLU(inplace=True))
self.cls_conv = nn.Sequential(*cls_branch)
self.reg_conv = nn.Sequential(*reg_branch)
self.cls_out = nn.Conv2d(self.config.fpn_out_channels, self.config.class_num * self.anchor_nums, kernel_size=3, stride=1,
padding=1, bias=True)
self.reg_out = nn.Conv2d(self.config.fpn_out_channels, self.anchor_nums * 4, kernel_size= 3, stride=1,padding=1,bias=True)
self.prior = self.config.prior#0.01,类别先验值
self.apply(self.init_conv_RandomNormal)
nn.init.constant_(self.cls_out.bias, -math.log((1 - self.prior) / self.prior))
def init_conv_RandomNormal(self,module,std=0.01):
if isinstance(module, nn.Conv2d):
nn.init.normal_(module.weight, std=std)
if module.bias is not None:
nn.init.constant_(module.bias, 0)
def forward(self, inputs):
"""
inputs:fpn output[P3,P4,P5,P6,P7]
"""
cls_out = []
reg_out = []
for pred in inputs:
batch_size, channel, H, W = pred.shape
cls_convput = self.cls_conv(pred)
cls_output = self.cls_out(cls_convput)#batch_size,H,W,cls_num*anchor_num
#permute:batch_size,H,W,cls_num*anchor_num->view:batch,H*W*9,class_num
cls_output = cls_output.permute(0,2,3,1).contiguous().view(batch_size, H * W * self.anchor_nums, -1)
cls_out.append(cls_output)
reg_output = self.reg_conv(pred)
reg_output = self.reg_out(reg_output)#batch_size,H,W,anchor_num*4
#batch_size,H*W*9,4
reg_output = reg_output.permute(0, 2, 3, 1).contiguous().view(batch_size, H * W * self.anchor_nums, -1)
reg_out.append(reg_output)
#有fpn output[P3,P4,P5,P6,P7]5个特征图
cls_logits = torch.cat(cls_out, dim = 1)#(batch_size,H*W*9*(feature_num=5),cls_num)
reg_preds = torch.cat(reg_out, dim = 1)#(batchw_size,H*W*9*(feature_num=5),4)
return cls_logits, reg_preds
8.从Anchors获得target
这部分代码在Class Loss中。
model\retina_loss.py
class LOSS(nn.Module):
def __init__(self,reg_mode = 'giou'):
super(LOSS, self).__init__()
self.reg_mode = reg_mode
def forward(self, inputs):#self.loss_func([cls_logits, reg_preds, anchors, boxes, classes])
"""
cls_logits :(n, sum(H*W)*A, class_num+1)#(batch_size,H*W*9*(feature_num=5),cls_num)
reg_preds:(n, sum(H*W)*A, 4)#(batch_size,H*W*9*(feature_num=5),4)
anchors:(sum(H*W)*A, 4)#
boxes:(n, max_num, 4)#真实
classes:(n, max_num)#真实
"""
cls_logits, reg_preds, anchors, boxes, classes = inputs
anchor_widths = anchors[:, 2] - anchors[:, 0]
anchor_heights = anchors[:, 3] - anchors[:, 1]
anchor_ctr_x = anchors[:, 0] + anchor_widths * 0.5
anchor_ctr_y = anchors[:, 1] + anchor_heights * 0.5
bacth_size = cls_logits.shape[0]
class_loss = []
reg_loss = []
for i in range(bacth_size):#每个batch
per_cls_logit = cls_logits[i,:,:] #(sum(H*W)*A, class_num)
per_reg_pred = reg_preds[i,:,:]
per_boxes = boxes[i,:,:]
per_classes = classes[i,:]
mask = per_boxes[:, 0] != -1#多余的gt框
per_boxes = per_boxes[mask] #(?, 4),?=per_num_gt,当前图片中的gt数量
per_classes = per_classes[mask] #(?,)
if per_classes.shape[0] == 0:
alpha_factor = torch.ones(per_cls_logit.shape).cuda() * 0.25 if torch.cuda.is_available() else torch.ones(per_cls_logit.shape) * 0.25
alpha_factor = 1. - alpha_factor
focal_weights = per_cls_logit
focal_weights = alpha_factor * torch.pow(focal_weights, 2.0)
bce = -(torch.log(1.0 - per_cls_logit))
cls_loss = focal_weights * bce
class_loss.append(cls_loss.sum())
reg_loss.append(torch.tensor(0).float())
continue
IoU = calc_iou(anchors, per_boxes) #计算anchors与gt的iou,(sum(H*W)*A, ?),?为该batch的gt总和
iou_max, max_ind = torch.max(IoU, dim=1) #(sum(H*W)*A,)#计算anchor对应的最大gt,max_indw欸对应的gt索引
targets = torch.ones_like(per_cls_logit) * -1 #(sum(H*W)*A, class_num)#value=-1,全部为忽略样本
targets[iou_max < 0.4, :] = 0 #bg,小于0.4为负样本
pos_anchors_ind = iou_max >= 0.5 #(?,)#>0.5为正样本
num_pos = torch.clamp(pos_anchors_ind.sum().float(), min=1.0)#正样本的数量
#anchor单值编码
assigned_classes = per_classes[max_ind] #(sum(H*W)*A, ),anchor的类别,单值编码
assigned_boxes = per_boxes[max_ind,:] #(sum(H*W)*A, 4),anchor的边界
#one-hot编码
targets[pos_anchors_ind,:] = 0#初始化为num_class个数全零
targets[pos_anchors_ind, (assigned_classes[pos_anchors_ind]).long() - 1] = 1#onehot编码
#综上:[0,0.4)负样本、[0.4,0.5)忽略样本、[0.5,1.0)正样本
class_loss.append(focal_loss(per_cls_logit, targets).view(1) / num_pos)
if self.reg_mode == 'smoothl1':
reg_loss.append(smooth_l1(pos_anchors_ind, [anchor_widths,anchor_heights,anchor_ctr_x,anchor_ctr_y],
assigned_boxes,per_reg_pred))
elif self.reg_mode =='giou':
reg_loss.append(giou(pos_anchors_ind, [anchor_widths, anchor_heights, anchor_ctr_x, anchor_ctr_y],
assigned_boxes, per_reg_pred))
cls_loss = torch.stack(class_loss).mean()
reg_loss = torch.stack(reg_loss).mean()
total_loss = cls_loss + reg_loss
return cls_loss, reg_loss, total_loss
9.facal Loss
def focal_loss(preds, targets, alpha=0.25, gamma = 2.0):
preds = preds.sigmoid()
preds = torch.clamp(preds, min=1e-4,max = 1. - 1e-4)
if torch.cuda.is_available():
alpha_factor = torch.ones(targets.shape).cuda() * alpha
else:
alpha_factor = torch.ones(targets.shape) * alpha
alpha_factor = torch.where(torch.eq(targets, 1.), alpha_factor, (1. - alpha_factor))
focal_weights = torch.where(torch.eq(targets, 1.), 1 - preds, preds)
focal_weights = alpha_factor * torch.pow(focal_weights, gamma)
bce = - (targets * torch.log(preds) + (1. - targets) * torch.log(1. - preds))
cls_loss = focal_weights * bce
if torch.cuda.is_available():
cls_loss = torch.where(torch.ne(targets, -1.0), cls_loss, torch.zeros_like(cls_loss).cuda())
else:
cls_loss = torch.where(torch.ne(targets, -1.0), cls_loss, torch.zeros_like(cls_loss))
return cls_loss.sum()
10.reg_loss(smooth_L1)
def smooth_l1(pos_inds,anchor_infos, boxes,reg_pred):
"""
pos_inds : (num_pos,)
boxes:(sum(H*W)*A, 4)
reg_pred: (sum(H*W)*A, 4)
"""
anchor_widths, anchor_heights, anchor_ctr_x, anchor_ctr_y = anchor_infos #(sum(H*W)*A,)
if pos_inds.sum() > 0:
pos_reg_pred = reg_pred[pos_inds,:] #(num_pos, 4)
gt_widths = boxes[pos_inds][:, 2] - boxes[pos_inds][:, 0]
gt_heights = boxes[pos_inds][:, 3] - boxes[pos_inds][:, 1]
gt_ctr_x = boxes[pos_inds][:, 0] + gt_widths * 0.5
gt_ctr_y = boxes[pos_inds][:, 1] + gt_heights * 0.5
pos_anchor_widths = anchor_widths[pos_inds]
pos_anchor_heights = anchor_heights[pos_inds]
pos_anchor_ctr_x = anchor_ctr_x[pos_inds]
pos_anchor_ctr_y = anchor_ctr_y[pos_inds]
gt_widths = torch.clamp(gt_widths, min=1.0)
gt_heights = torch.clamp(gt_heights, min=1.0)
target_dx = (gt_ctr_x - pos_anchor_ctr_x) / pos_anchor_widths
target_dy = (gt_ctr_y - pos_anchor_ctr_y) / pos_anchor_heights
target_dw = torch.log(gt_widths / pos_anchor_widths)
target_dh = torch.log(gt_heights / pos_anchor_heights)
targets = torch.stack([target_dx,target_dy,target_dw,target_dh], dim=0).t() #(num_pos,4)
if torch.cuda.is_available():
targets = targets / torch.FloatTensor([0.1,0.1,0.2,0.2]).cuda()
else:
targets = targets / torch.FloatTensor([0.1,0.1,0.2,0.2])
reg_diff = torch.abs(targets - pos_reg_pred) #(num_pos,4)
reg_loss = torch.where(
torch.le(reg_diff, 1.0/9.0),
0.5 * 9.0 * torch.pow(reg_diff, 2),
reg_diff - 0.5 /9.0
)
return reg_loss.mean()
else:
if torch.cuda.is_available():
reg_loss = torch.tensor(0).float().cuda()
else:
reg_loss = torch.tensor(0).float()
return reg_loss
【PyTorch】详解pytorch中nn模块的BatchNorm2d()函数:https://blog.csdn.net/bigFatCat_Tom/article/details/91619977