RetinaNet论文详解和代码(ResNet,FPN)

1.参考资料

目标检测算法 - RetinaNet:https://zhuanlan.zhihu.com/p/67768433
https://zhuanlan.zhihu.com/p/59910080
RetinaNet论文原文:https://arxiv.org/pdf/1708.02002.pdf
RetinaNet论文最全解析!一文读懂!:https://blog.csdn.net/weixin_40400177/article/details/103449280
RetinaNet非官方代码::https://github.com/ChingHo97/RetinaNet-Pytorch-36.4AP(本文的附加代码的链接)
预备知识:
困难样本挖掘技术OHEM

本文提出了新的损失函数,解决类别不均衡问题。损失函数是动态缩放的交叉熵损失,其中缩放因子随着对正确类别的置信度增加而衰减到0.
设计了一个RetinaNet的one-stage对象检测器来说明focalloss的有效性。
相关工作
Two-stage的准确率比one-stage的准确率高是解决了类别不平衡问题,在第一阶段过滤掉了大量的背景。

2.RetinaNet

RetinaNet由三部分组成:ResNet,FPN,RetinaHead。
Focal Loss是为了解决one-stage算法中正负样本比例严重失调的问题。该损失函数降低大量简单正负样本的loss权重,是一种困难样本挖掘策略。
作者认为one-stage比two-stage的精确度低,由类别失衡引起的,且大量easy example的样本占据了大量loss。
(1)negative example数量太大,占据了大量loss。
(2)大量easy positive\negative example造成loss下,反向梯度小。
因此作者提出了facal loss。
在这里插入图片描述

class RetinaNet(nn.Module):
		self.backbone = resnet50(pretrained=self.config.pretrained)
        self.fpn = FPN(features=self.config.fpn_out_channels,
                       use_p5=self.config.use_p5)
        self.head = RetinaHead(config=self.config)

    def forward(self, images):
        C3,C4,C5 = self.backbone(images)#三个特征图
        all_p_level = self.fpn([C3,C4,C5])#[p3,p4,p5,p6,p7]
        cls_logits, reg_preds = self.head(all_p_level)#预测值

        return cls_logits, reg_preds

3.ResNet

3.ResNet论文详解

随着网络层数的加深,出现了梯度消失、爆炸的问题,和精度下降问题。精度下降问题是因为更多的层导致更高的训练误差。梯度消失和爆炸通过归一化(BN batch normalization)来解决,但是只能解决20层网络。一个解决办法是引入恒等映射。简单地说,原来网络的输出为H(x)预测值,现在H(x)=F(x)+x,因为x是恒等映射,不需要学习,则只需要学习F(x)=H(x)-x就可以了,学习残差F(x)=H(x)-x比学习原始特征简单的多。
在这里插入图片描述

如何说解决了梯度爆炸和消失

原始残差公式,函数F表示一个残差函数,函数f表示激活函数
在这里插入图片描述
ResNet v2使用恒等映射,且不适用激活函数,则有:
在这里插入图片描述
L层是比l更深的层次,到L层是,预测值表达式:

在这里插入图片描述
反向求解梯度:
在这里插入图片描述
在第l层包含了深层次L的梯度,也就是说L层的梯度直接传给了l层,而在这里插入图片描述
在训练过程中,它也不会一直等于-1,因此解决了梯度消失的问题。

引入恒等映射使网络结构自动调整深度?

假设存在一个性能最强的完美网络N NN,与它相比,我们所训练的网络中必定有一些层是多余的(甚至是起反作用的),那么这些多余的层的训练目标就是恒等变换(换句话说,就是让这些多余的层变成一个恒等变换,使得整个网络无限趋近于完美网络N NN),只有达到这个目标我们的网络性能才能跟N NN一样。

那么,如何另这些多余的层变成恒等映射呢?很简单,要想多余的层变成恒等映射,只要把输出所要拟合的目标变成x xx就行了(此时H ( x ) H(x)H(x)摇身一变成了x xx),即输入为x xx,输出仍然是x xx。

因此,作者提出了利用残差来重构网络的映射,说白了就是把输入x xx再次引入到结果,这样堆叠层的权重会趋向于零,用表达式表示就是:F ( x , w ) + x = H ( x ) = x F(x,w) + x = H(x) = xF(x,w)+x=H(x)=x,即F ( x , w ) → 0 F(x,w) \to 0F(x,w)→0。

总结一句话:残差结构人为制造了恒等映射,就能让整个结构朝着恒等映射的方向去收敛,确保最终的错误率不会因为深度的变大而越来越差。
————————————————
版权声明:本文为CSDN博主「lairongxuan」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/lairongxuan/article/details/91040698

4.ResNet代码及resnet50

resnet18 50网络结构以及pytorch实现代码:https://www.jianshu.com/p/085f4c8256f1
在这里插入图片描述
虚线表示需要1*1的卷积修正channels,然后x+layer.out相加时张量大小必须保持一致。

代码:

逐层调用:
class RetinaNetDetector(nn.Module):
	self.body = RetinaNet(config=config)
class RetinaNet(nn.Module):
	self.backbone = resnet50(pretrained=self.config.pretrained)
def resnet50(pretrained=False, **kwargs):
	model = ResNet(Bottleneck, [3, 4, 6, 3], **kwargs)
resnet50的基础结构,Bottleneck
class Bottleneck(nn.Module):
    # ResNet-B
    expansion = 4

    def __init__(self, inplanes, planes, stride=1, downsample=None):
        super(Bottleneck, self).__init__()
        self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride,
                               padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(planes)
        self.conv3 = nn.Conv2d(planes, planes * 4, kernel_size=1, bias=False)
        self.bn3 = nn.BatchNorm2d(planes * 4)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample#虚线部分,为了使x和layer.out的形状保持一致
        self.stride = stride

	def forward(self, x):
        residual = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        if self.downsample is not None:
            residual = self.downsample(x)

        out += residual
        out = self.relu(out)

        return out
resnet18的基础结构
class BasicBlock(nn.Module):
    expansion = 1

    def __init__(self, inplanes, planes, stride=1, downsample=None):
        super(BasicBlock, self).__init__()
        self.conv1 = conv3x3(inplanes, planes, stride)
        self.bn1 = nn.BatchNorm2d(planes)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = conv3x3(planes, planes)
        self.bn2 = nn.BatchNorm2d(planes)
        self.downsample = downsample
        self.stride = stride
   
   def forward(self, x):
        residual = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        if self.downsample is not None:
            residual = self.downsample(x)

        out += residual
        out = self.relu(out)

        return out
def conv3x3(in_planes, out_planes, stride=1):
    """3x3 convolution with padding"""
    return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
                     padding=1, bias=False)
                     
ResNet网络
class ResNet(nn.Module):

    def __init__(self, block, layers, num_classes=1000,if_include_top=False):
        self.inplanes = 64
        super(ResNet, self).__init__()
        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3,
                               bias=False)#(ori_H,ori_H)/2
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)#(ori_H,ori_H)/2
        self.layer1 = self._make_layer(block, 64, layers[0])
        self.layer2 = self._make_layer(block, 128, layers[1], stride=2)#(ori_H,ori_H)/2
        self.layer3 = self._make_layer(block, 256, layers[2], stride=2)#(ori_H,ori_H)/2
        self.layer4 = self._make_layer(block, 512, layers[3], stride=2)#(ori_H,ori_H)/2
        self.avgpool = nn.AvgPool2d(7, stride=1)
        if if_include_top:
            self.fc = nn.Linear(512 * block.expansion, num_classes)
        self.if_include_top=if_include_top
        
        #ResNet50的权重初始化
        for m in self.modules():
            if isinstance(m, nn.Conv2d):#conv卷积的weight初始化
                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
                m.weight.data.normal_(0, math.sqrt(2. / n))
            elif isinstance(m, nn.BatchNorm2d):#BN的weight初始化
                m.weight.data.fill_(1)
                m.bias.data.zero_()

    def _make_layer(self, block, planes, blocks, stride=1):
        downsample = None
        if stride != 1 or self.inplanes != planes * block.expansion:#虚线部分
            downsample = nn.Sequential(
                nn.Conv2d(self.inplanes, planes * block.expansion,
                          kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(planes * block.expansion),
            )

        layers = []
        layers.append(block(self.inplanes, planes, stride, downsample))
        self.inplanes = planes * block.expansion
        for i in range(1, blocks):
            layers.append(block(self.inplanes, planes))

        return nn.Sequential(*layers)
        
	def forward(self, x):#这个forward结合了FPN,所以输出了三层(out3,out4,out5)
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)

        x = self.layer1(x)
        out3 = self.layer2(x)
        out4 = self.layer3(out3)
        out5 = self.layer4(out4)

        if self.if_include_top:
            x = self.avgpool(out5)
            x = x.view(x.size(0), -1)
            x = self.fc(x)
            return x
        else:
            return (out3, out4, out5)

5.FPN

这部分的FPN代码是ResNet50网络下的
在这里插入图片描述

class FPN(nn.Module):
    '''only for resnet50,101,152'''
    def __init__(self,features=256,use_p5=True):
        super(FPN,self).__init__()
        self.prj_5 = nn.Conv2d(2048, features, kernel_size=1)
        self.prj_4 = nn.Conv2d(1024, features, kernel_size=1)
        self.prj_3 = nn.Conv2d(512, features, kernel_size=1)
        self.conv_5 =nn.Conv2d(features, features, kernel_size=3, padding=1)
        self.conv_4 =nn.Conv2d(features, features, kernel_size=3, padding=1)
        self.conv_3 =nn.Conv2d(features, features, kernel_size=3, padding=1)
        if use_p5:
            self.conv_out6 = nn.Conv2d(features, features, kernel_size=3, padding=1, stride=2)
        else:
            self.conv_out6 = nn.Conv2d(2048, features, kernel_size=3, padding=1, stride=2)
        self.conv_out7 = nn.Conv2d(features, features, kernel_size=3, padding=1, stride=2)
        self.use_p5=use_p5
        self.apply(self.init_conv_kaiming)


    #上采样
    def upsamplelike(self,inputs):
        src,target=inputs
        return F.interpolate(src, size=(target.shape[2], target.shape[3]),
                    mode='nearest')#torch的上采样函数,采样方式为“nearest”
    
    def init_conv_kaiming(self,module):
        if isinstance(module, nn.Conv2d):
            nn.init.kaiming_uniform_(module.weight, a=1)

            if module.bias is not None:
                nn.init.constant_(module.bias, 0)

    def forward(self,x):#对应FPN网络
        C3,C4,C5=x#对应ResNet的(out3,out4,out5)
        P5 = self.prj_5(C5)
        P4 = self.prj_4(C4)
        P3 = self.prj_3(C3)
        
        P4 = P4 + self.upsamplelike([P5,C4])
        P3 = P3 + self.upsamplelike([P4,C3])

        P3 = self.conv_3(P3)
        P4 = self.conv_4(P4)
        P5 = self.conv_5(P5)
        
        #P6 = P5 if self.use_p5 else C5
        P6 = self.conv_out6(P5) if self.use_p5 else self.conv_out6(C5)
        P7 = self.conv_out7(F.relu(P6))
        return [P3,P4,P5,P6,P7]

6.RetinaHead

3D的输入,怎么样使用GroupNorm()解读:https://blog.csdn.net/qq_40178291/article/details/101615391
这部分的输入是[p3,p4,p5,p6,p7]5张特征图,根据特征图预测分类和位置。网络层次有4层33卷积和一层33的最终结果卷积,也就是一共5层3*3卷积。
模型初始化:(论文翻译)
二分类的模型的默认初始化是默认y=0,y=1具有相同的概率。在这样的初始化下,对于样本的分类严重不平衡的数据集,类别频繁的样本贡献全部的loss,会导致训练早期,模型不稳定。针对类别不平衡问题,引入了“prior”先验这个概念,并用p值表示先验值,即p为模型训练初期对稀有类别(前景)的预测值(模型评估值)。模型的稀有类别评估值比较低,可为0.01.这是模型初始化的一个改变,并不是损失函数范畴。这种初始化可提高训练的稳定性(对交叉熵损失函数和facal loss)。
在代码中的体现是

class RetinaHead(nn.Module):
		self.prior = self.config.prior#0.01,类别先验值
        self.apply(self.init_conv_RandomNormal)
        nn.init.constant_(self.cls_out.bias, -math.log((1 - self.prior) / self.prior))

def init_conv_RandomNormal(self,module,std=0.01):
        if isinstance(module, nn.Conv2d):
            nn.init.normal_(module.weight, std=std)

            if module.bias is not None:
                nn.init.constant_(module.bias, 0)

在这里插入图片描述

在这里插入图片描述
在这里插入图片描述

对应文件model/retina_head.py
class RetinaHead(nn.Module):
    def __init__(self, config = None):
        super(RetinaHead, self).__init__()
        if config is None:
            self.config = DefaultConfig
        else:
            self.config = config
        self.anchor_nums = self.config.anchor_nums#9
        cls_branch = []
        reg_branch = []
        for i in range(4):
            cls_branch.append(nn.Conv2d(self.config.fpn_out_channels, self.config.fpn_out_channels,
                                        kernel_size=3, stride=1,padding=1,bias=True))#256
            if self.config.use_GN_head:#True分组规范化,将输入数据的通道分成32份,每份
                cls_branch.append(nn.GroupNorm(32,self.config.fpn_out_channels))
            cls_branch.append(nn.ReLU(inplace=True))

            reg_branch.append(nn.Conv2d(self.config.fpn_out_channels, self.config.fpn_out_channels,
                                        kernel_size=3, stride=1, padding=1, bias=True))
            if self.config.use_GN_head:
                reg_branch.append(nn.GroupNorm(32, self.config.fpn_out_channels))
            reg_branch.append(nn.ReLU(inplace=True))

        self.cls_conv = nn.Sequential(*cls_branch)
        self.reg_conv = nn.Sequential(*reg_branch)
        self.cls_out = nn.Conv2d(self.config.fpn_out_channels, self.config.class_num * self.anchor_nums, kernel_size=3, stride=1,
                                 padding=1, bias=True)
        self.reg_out = nn.Conv2d(self.config.fpn_out_channels, self.anchor_nums * 4, kernel_size= 3, stride=1,padding=1,bias=True)
        self.prior = self.config.prior#0.01,类别先验值
        self.apply(self.init_conv_RandomNormal)
        nn.init.constant_(self.cls_out.bias, -math.log((1 - self.prior) / self.prior))

    def init_conv_RandomNormal(self,module,std=0.01):
        if isinstance(module, nn.Conv2d):
            nn.init.normal_(module.weight, std=std)

            if module.bias is not None:
                nn.init.constant_(module.bias, 0)

    def forward(self, inputs):
        """
        inputs:fpn output[P3,P4,P5,P6,P7]
        """

        cls_out = []
        reg_out = []
        for pred in inputs:
            batch_size, channel, H, W = pred.shape
            cls_convput = self.cls_conv(pred)
            cls_output = self.cls_out(cls_convput)#batch_size,H,W,cls_num*anchor_num
            #permute:batch_size,H,W,cls_num*anchor_num->view:batch,H*W*9,class_num
            cls_output = cls_output.permute(0,2,3,1).contiguous().view(batch_size, H * W * self.anchor_nums, -1)
            cls_out.append(cls_output)

            reg_output = self.reg_conv(pred)
            reg_output = self.reg_out(reg_output)#batch_size,H,W,anchor_num*4
            #batch_size,H*W*9,4
            reg_output = reg_output.permute(0, 2, 3, 1).contiguous().view(batch_size, H * W * self.anchor_nums, -1)
            reg_out.append(reg_output)

        #有fpn output[P3,P4,P5,P6,P7]5个特征图
        cls_logits = torch.cat(cls_out, dim = 1)#(batch_size,H*W*9*(feature_num=5),cls_num)
        reg_preds = torch.cat(reg_out, dim = 1)#(batchw_size,H*W*9*(feature_num=5),4)
        return cls_logits, reg_preds

8.从Anchors获得target

在这里插入图片描述
这部分代码在Class Loss中。
model\retina_loss.py

class LOSS(nn.Module):
    def __init__(self,reg_mode = 'giou'):
        super(LOSS, self).__init__()
        self.reg_mode = reg_mode

    def forward(self, inputs):#self.loss_func([cls_logits, reg_preds, anchors, boxes, classes])
        """
        cls_logits :(n, sum(H*W)*A, class_num+1)#(batch_size,H*W*9*(feature_num=5),cls_num)
        reg_preds:(n, sum(H*W)*A, 4)#(batch_size,H*W*9*(feature_num=5),4)
        anchors:(sum(H*W)*A, 4)#
        boxes:(n, max_num, 4)#真实
        classes:(n, max_num)#真实
        """
        cls_logits, reg_preds, anchors, boxes, classes = inputs
        anchor_widths = anchors[:, 2] - anchors[:, 0]
        anchor_heights = anchors[:, 3] - anchors[:, 1]
        anchor_ctr_x = anchors[:, 0] + anchor_widths * 0.5
        anchor_ctr_y = anchors[:, 1] + anchor_heights * 0.5

        bacth_size = cls_logits.shape[0]
        class_loss = []
        reg_loss = []
        for i in range(bacth_size):#每个batch
            per_cls_logit = cls_logits[i,:,:] #(sum(H*W)*A, class_num)
            per_reg_pred = reg_preds[i,:,:]
            per_boxes = boxes[i,:,:]
            per_classes = classes[i,:]
            mask = per_boxes[:, 0] != -1#多余的gt框
            per_boxes = per_boxes[mask] #(?, 4),?=per_num_gt,当前图片中的gt数量
            per_classes = per_classes[mask] #(?,)
            if per_classes.shape[0] == 0:
                alpha_factor = torch.ones(per_cls_logit.shape).cuda() * 0.25 if torch.cuda.is_available()  else torch.ones(per_cls_logit.shape) * 0.25
                alpha_factor = 1. - alpha_factor
                focal_weights = per_cls_logit
                focal_weights = alpha_factor * torch.pow(focal_weights, 2.0)
                bce = -(torch.log(1.0 - per_cls_logit))
                cls_loss = focal_weights * bce
                class_loss.append(cls_loss.sum())
                reg_loss.append(torch.tensor(0).float())
                continue
            IoU =  calc_iou(anchors, per_boxes) #计算anchors与gt的iou,(sum(H*W)*A, ?),?为该batch的gt总和

            iou_max, max_ind = torch.max(IoU, dim=1) #(sum(H*W)*A,)#计算anchor对应的最大gt,max_indw欸对应的gt索引
            
            
            targets = torch.ones_like(per_cls_logit) * -1 #(sum(H*W)*A, class_num)#value=-1,全部为忽略样本
            
            
            targets[iou_max < 0.4, :] = 0 #bg,小于0.4为负样本

            pos_anchors_ind = iou_max >= 0.5 #(?,)#>0.5为正样本
            num_pos =  torch.clamp(pos_anchors_ind.sum().float(), min=1.0)#正样本的数量

            #anchor单值编码
            assigned_classes = per_classes[max_ind] #(sum(H*W)*A, ),anchor的类别,单值编码
            assigned_boxes = per_boxes[max_ind,:] #(sum(H*W)*A, 4),anchor的边界

            #one-hot编码
            targets[pos_anchors_ind,:] = 0#初始化为num_class个数全零
            targets[pos_anchors_ind, (assigned_classes[pos_anchors_ind]).long() - 1] = 1#onehot编码
            #综上:[0,0.4)负样本、[0.4,0.5)忽略样本、[0.5,1.0)正样本
            class_loss.append(focal_loss(per_cls_logit, targets).view(1) / num_pos)
            if self.reg_mode == 'smoothl1':
                reg_loss.append(smooth_l1(pos_anchors_ind, [anchor_widths,anchor_heights,anchor_ctr_x,anchor_ctr_y],
                                 assigned_boxes,per_reg_pred))
            elif self.reg_mode =='giou':
                reg_loss.append(giou(pos_anchors_ind, [anchor_widths, anchor_heights, anchor_ctr_x, anchor_ctr_y],
                                          assigned_boxes, per_reg_pred))

        cls_loss = torch.stack(class_loss).mean()
        reg_loss = torch.stack(reg_loss).mean()
        total_loss = cls_loss + reg_loss
        return cls_loss, reg_loss, total_loss

9.facal Loss

在这里插入图片描述

def focal_loss(preds, targets, alpha=0.25, gamma = 2.0):
    preds = preds.sigmoid()
    preds = torch.clamp(preds, min=1e-4,max = 1. - 1e-4)
    if torch.cuda.is_available():
        alpha_factor = torch.ones(targets.shape).cuda() * alpha
    else:
        alpha_factor = torch.ones(targets.shape) * alpha

    alpha_factor = torch.where(torch.eq(targets, 1.), alpha_factor, (1.  - alpha_factor))
    focal_weights = torch.where(torch.eq(targets, 1.), 1 - preds, preds)
    focal_weights = alpha_factor * torch.pow(focal_weights, gamma)

    bce = - (targets * torch.log(preds) + (1. - targets) * torch.log(1. - preds))
    cls_loss = focal_weights * bce

    if torch.cuda.is_available():
        cls_loss = torch.where(torch.ne(targets, -1.0), cls_loss, torch.zeros_like(cls_loss).cuda())
    else:
        cls_loss = torch.where(torch.ne(targets, -1.0), cls_loss, torch.zeros_like(cls_loss))

    return cls_loss.sum()

10.reg_loss(smooth_L1)

在这里插入图片描述

def smooth_l1(pos_inds,anchor_infos, boxes,reg_pred):
    """
    pos_inds : (num_pos,)
    boxes:(sum(H*W)*A, 4)
    reg_pred: (sum(H*W)*A, 4)
    """
    anchor_widths, anchor_heights, anchor_ctr_x, anchor_ctr_y = anchor_infos #(sum(H*W)*A,)
    if pos_inds.sum() > 0:

        pos_reg_pred = reg_pred[pos_inds,:] #(num_pos, 4)

        gt_widths = boxes[pos_inds][:, 2] - boxes[pos_inds][:, 0]
        gt_heights = boxes[pos_inds][:, 3] - boxes[pos_inds][:, 1]
        gt_ctr_x = boxes[pos_inds][:, 0] + gt_widths * 0.5
        gt_ctr_y = boxes[pos_inds][:, 1] + gt_heights * 0.5

        pos_anchor_widths = anchor_widths[pos_inds]
        pos_anchor_heights = anchor_heights[pos_inds]
        pos_anchor_ctr_x = anchor_ctr_x[pos_inds]
        pos_anchor_ctr_y = anchor_ctr_y[pos_inds]

        gt_widths = torch.clamp(gt_widths, min=1.0)
        gt_heights = torch.clamp(gt_heights, min=1.0)

        target_dx = (gt_ctr_x - pos_anchor_ctr_x) / pos_anchor_widths
        target_dy = (gt_ctr_y - pos_anchor_ctr_y) / pos_anchor_heights
        target_dw = torch.log(gt_widths / pos_anchor_widths)
        target_dh = torch.log(gt_heights / pos_anchor_heights)

        targets = torch.stack([target_dx,target_dy,target_dw,target_dh], dim=0).t() #(num_pos,4)
        if torch.cuda.is_available():
            targets = targets / torch.FloatTensor([0.1,0.1,0.2,0.2]).cuda()
        else:
            targets = targets / torch.FloatTensor([0.1,0.1,0.2,0.2])


        reg_diff = torch.abs(targets - pos_reg_pred) #(num_pos,4)
        reg_loss = torch.where(
            torch.le(reg_diff, 1.0/9.0),
            0.5 * 9.0 * torch.pow(reg_diff, 2),
            reg_diff - 0.5 /9.0
        )
        return reg_loss.mean()
    else:
        if torch.cuda.is_available():
            reg_loss = torch.tensor(0).float().cuda()
        else:
            reg_loss = torch.tensor(0).float()

        return reg_loss

【PyTorch】详解pytorch中nn模块的BatchNorm2d()函数:https://blog.csdn.net/bigFatCat_Tom/article/details/91619977

  • 8
    点赞
  • 29
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
本课程适合具有一定深度学习基础,希望发展为深度学习之计算机视觉方向的算法工程师和研发人员的同学们。基于深度学习的计算机视觉是目前人工智能最活跃的领域,应用非常广泛,如人脸识别和无人驾驶中的机器视觉等。该领域的发展日新月异,网络模型和算法层出不穷。如何快速入门并达到可以从事研发的高度对新手和中级水平的学生而言面临不少的挑战。精心准备的本课程希望帮助大家尽快掌握基于深度学习的计算机视觉的基本原理、核心算法和当前的领先技术,从而有望成为深度学习之计算机视觉方向的算法工程师和研发人员。本课程系统全面地讲述基于深度学习的计算机视觉技术的原理并进行项目实践。课程涵盖计算机视觉的七大任务,包括图像分类、目标检测、图像分割(语义分割、实例分割、全景分割)、人脸识别、图像描述、图像检索、图像生成(利用生成对抗网络)。本课程注重原理和实践相结合,逐篇深入解读经典和前沿论文70余篇,图文并茂破译算法难点, 使用思维导图梳理技术要点。项目实践使用Keras框架(后端为Tensorflow),学员可快速上手。通过本课程的学习,学员可把握基于深度学习的计算机视觉的技术发展脉络,掌握相关技术原理和算法,有助于开展该领域的研究与开发实战工作。另外,深度学习之计算机视觉方向的知识结构及学习建议请参见本人CSDN博客。本课程提供课程资料的课件PPT(pdf格式)和项目实践代码,方便学员学习和复习。本课程分为上下两部分,其中上部包含课程的前五章(课程介绍、深度学习基础、图像分类、目标检测、图像分割),下部包含课程的后四章(人脸识别、图像描述、图像检索、图像生成)。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值