FCOS训练自己的数据集

最新推荐文章于 2024-06-21 20:47:14 发布

超超爱AI

最新推荐文章于 2024-06-21 20:47:14 发布

阅读量4.6k

点赞数 2

分类专栏：目标检测文章标签： pytorch 深度学习

本文链接：https://blog.csdn.net/weixin_41803339/article/details/103823805

版权

目标检测专栏收录该内容

7 篇文章 1 订阅

订阅专栏

1、fcos网络

在常见的计算机视觉任务中，个人认为检测是比较复杂的。主要原因也是anchor生成机制的原因，检测过程涉及anchor的尺寸scale和长宽比aspect radio等超参数的设置，检测框匹配，正负样本不均匀，计算复杂度高等等问题的解决。所以近年来anchor机制是检测里面的主流。

当然也有人开始挑战权威。提出了anchor-free，这种idea让我这样的弟弟感到有机会熟练掌握一个检测模型了。那么首先来看看论文：https://arxiv.org/pdf/1904.01355.pdf

先看网络结构：

模型的结构也非常简单，首先是backbone，输出三个特征图c3、c4、c5（outstride分别是8、16、32），然后经过1x1卷积改变通道数量（512、1024、2048），p3～p7明显是个FPN（特征金字塔结构），FPN的多尺度特征在检测里面用的还是很多。对多尺度目标的检测有利。p6和p7是p5的依次下采样得到。p4和p3是在p5上采样的过程中 sum c3和c4得到的。进入head层后只有两个简单的pipeline（分支），都是4个卷积层。分两个分支也是retinanet提出的，head层分类和回归共享参数效果没有分开好。也可以理解，各司其职，自然效果好点。在分类的分支多了一个center-ness的小分支。主要作用也是衡量预测框到真实框的偏离程度。这个分支得到的值（0～1）会和class分支得到的值相乘。那么可以很好的抑制一些低质量的框的生成。

再来说说这个网络怎么得到预测框：

很明显是由一个点回归到一个检测框，这个点就是p3～p7的特征图上点回归到原图所对应的点（ feature map上的(x,y) 映射到原图是 ( s /2 + xs, 􏰀 s 􏰁/2 + ys)），这些点也要分正负样本，如果这个点回归到原图在GT box里面，就证明是正样本，在外面就认为是负样本。我们只训练这些正样本（实际上这些正样本中也包含背景的样本点信息，但相对于anchor-base的方法大大减少了负样本的数量，所以训练速度和推理速度快），当然这些点在回归到原图里面有可能在多个检测框里面，那么这个点简单的选择最小的框作为他的类别和应该回归的框。这个过程中难以避免的生成了很多低质量的框。论文中提出了用conter_ness分支来预测一个值，这个值代表距离中心点的偏离程度。这个值在0～1之间。用这个值和分类的值向乘。这样一些偏离中心的低质量框会被很好的抑制。同时conter_ness分支在论文中会和分类的分支共享卷积层参数。但后面有人提出和回归的分支共享参数效果会更好。

当然过程中会有一些限制，比如要回归的tlrb四个值，这些值有个限制范围，不能无限回归。论文中是【0, 64, 128, 256, 512 and ∞】，邻近两个取值就是对应p(3~7)的回归范围。当然这些值都要根据自己的目标任务进行调节。fcos的超参数虽然少，但是参数对模型的表现效果是影响很大的。不像anchor-base的检测模型那么稳。但是参数设置的好是完全可以超越一些anchor-base的检测模型的。

loss函数也是分类的focal loss和回归的iou loss的sum。当然giou等升级版iou loss表现效果会更好。

2、代码

2.1、backbone

选择我最喜欢的vovnet网络。

这个backbone也是非常简单的，论文的地址是https://arxiv.org/abs/1904.09730

import torch
import torch.nn as nn
import torch.nn.functional as F
from collections import OrderedDict
from torch.utils.model_zoo import load_url as load_state_dict_from_url

__all__ = ['vovnet39']


model_urls = {
    'vovnet39': 'https://dl.dropbox.com/s/1lnzsgnixd8gjra/vovnet39_torchvision.pth?dl=1'
}


def conv3x3(in_channels, out_channels, module_name, postfix,
            stride=1, groups=1, kernel_size=3, padding=1):
    """3x3 convolution with padding"""
    return [
        ('{}_{}/conv'.format(module_name, postfix),
            nn.Conv2d(in_channels, out_channels,
                      kernel_size=kernel_size,
                      stride=stride,
                      padding=padding,
                      groups=groups,
                      bias=False)),
        ('{}_{}/norm'.format(module_name, postfix),
            nn.BatchNorm2d(out_channels)),
        ('{}_{}/relu'.format(module_name, postfix),
            nn.ReLU(inplace=True)),
    ]


def conv1x1(in_channels, out_channels, module_name, postfix,
            stride=1, groups=1, kernel_size=1, padding=0):
    """1x1 convolution"""
    return [
        ('{}_{}/conv'.format(module_name, postfix),
            nn.Conv2d(in_channels, out_channels,
                      kernel_size=kernel_size,
                      stride=stride,
                      padding=padding,
                      groups=groups,
                      bias=False)),
        ('{}_{}/norm'.format(module_name, postfix),
            nn.BatchNorm2d(out_channels)),
        ('{}_{}/relu'.format(module_name, postfix),
            nn.ReLU(inplace=True)),
    ]


class _OSA_module(nn.Module):
    def __init__(self,
                 in_ch,
                 stage_ch,
                 concat_ch,
                 layer_per_block,
                 module_name,
                 identity=False):
        super(_OSA_module, self).__init__()

        self.identity = identity
        self.layers = nn.ModuleList()
        in_channel = in_ch
        for i in range(layer_per_block):
            self.layers.append(nn.Sequential(
                OrderedDict(conv3x3(in_channel, stage_ch, module_name, i))))
            in_channel = stage_ch

        # feature aggregation
        in_channel = in_ch + layer_per_block * stage_ch
        self.concat = nn.Sequential(
            OrderedDict(conv1x1(in_channel, concat_ch, module_name, 'concat')))

    def forward(self, x):
        identity_feat = x
        output = []
        output.append(x)
        for layer in self.layers:
            x = layer(x)
            output.append(x)

        x = torch.cat(output, dim=1)
        xt = self.concat(x)

        if self.identity:
            xt = xt + identity_feat

        return xt


class _OSA_stage(nn.Sequential):
    def __init__(self,
                 in_ch,
                 stage_ch,
                 concat_ch,
                 block_per_stage,
                 layer_per_block,
                 stage_num):
        super(_OSA_stage, self).__init__()

        if not stage_num == 2:
            self.add_module('Pooling',
                nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True))

        module_name = f'OSA{stage_num}_1'
        self.add_module(module_name,
            _OSA_module(in_ch,
                        stage_ch,
                        concat_ch,
                        layer_per_block,
                        module_name))
        for i in range(block_per_stage-1):
            module_name = f'OSA{stage_num}_{i+2}'
            self.add_module(module_name,
                _OSA_module(concat_ch,
                            stage_ch,
                            concat_ch,
                            layer_per_block,
                            module_name,
                            identity=True))


class VoVNet(nn.Module):
    def __init__(self, 
                 config_stage_ch,
                 config_concat_ch,
                 block_per_stage,
                 layer_per_block):
        super(VoVNet, self).__init__()

        # Stem module
        stem = conv3x3(3,   64, 'stem', '1', 2)
        stem += conv3x3(64,  64, 'stem', '2', 1)
        stem += conv3x3(64, 128, 'stem', '3', 2)
        self.add_module('stem', nn.Sequential(OrderedDict(stem)))

        stem_out_ch = [128]
        in_ch_list = stem_out_ch + config_concat_ch[:-1]
        self.stage_names = []
        for i in range(4): #num_stages
            name = 'stage%d' % (i+2)
            self.stage_names.append(name)
            self.add_module(name,
                            _OSA_stage(in_ch_list[i],
                                       config_stage_ch[i],
                                       config_concat_ch[i],
                                       block_per_stage[i],
                                       layer_per_block,
                                       i+2))

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight)
            elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.Linear):
                nn.init.constant_(m.bias, 0)

    def forward(self, x):
        x = self.stem(x)
        outs = []
        for name in self.stage_names:
            x = getattr(self, name)(x)
            outs.append(x)
        return tuple(outs[1:])

    def freeze_bn(self):
        for layer in self.modules():
            if isinstance(layer, nn.BatchNorm2d):
                layer.eval()


def _vovnet(arch,
            config_stage_ch,
            config_concat_ch,
            block_per_stage,
            layer_per_block,
            pretrained,
            progress,
            **kwargs):
    model = VoVNet(config_stage_ch, config_concat_ch,
                   block_per_stage, layer_per_block,
                   **kwargs)
    if pretrained:
        state_dict = load_state_dict_from_url(model_urls[arch])
        model.load_state_dict(state_dict,strict=False)
    return model

def vovnet39(pretrained=False, progress=True, **kwargs):
    """
        pretrained (bool): If True, returns a model pre-trained on ImageNet
        progress (bool): If True, displays a progress bar of the download to stderr
    """
    return _vovnet('vovnet39', [128, 160, 192, 224], [256, 512, 768, 1024],
                    [1,1,2,2], 5, pretrained, progress, **kwargs)

if __name__ == "__main__":
    #查看模型输出
    test_inp=torch.randn((1,3,480,640)).to("cuda")
    model=vovnet39()
    model.cuda()
    out=model(test_inp)
    for i in range(len(out)):
        print("模型的C%d输出尺寸是："%(i+3),out[i].size())
    #统计模型的参数量
    k=0
    params = list(model.parameters())
    for i in params:
        l = 1
        print("该层的结构："+str(list(i.size())))
        for j in i.size():
            l*=j
        print("该层参数和：" + str(l))
        k=k+l
    print("总参数数量和：" + str(k))
 
# 模型的C3输出尺寸是： torch.Size([1, 512, 60, 80])
# 模型的C4输出尺寸是： torch.Size([1, 768, 30, 40])
# 模型的C5输出尺寸是： torch.Size([1, 1024, 15, 20])
# 总参数数量和：21575296

是Densenet的简化版，但是表现效果是非常不错的，因为考虑到了GPU效率。我只搭建了vovnet39。如果你需要深一点的，只要多重复几个OSA_Block就好了。

再看FPN，我提供了resnet18和vovnet39两个可选backbone。因为目前自己数据集量少，一直致力于轻量级网络的使用。


import torch.nn as nn
import torch.nn.functional as F
import math
from model.config import DefaultConfig as cfg

class FPN(nn.Module):
    '''only support resnet18 or vovnet39'''
    def __init__(self,features=256,use_p5=True):
        super(FPN,self).__init__()
        if cfg.backbone_choice == "resnet18":
            print("backbone use resnet18")
            self.prj_5 = nn.Conv2d(512, features, kernel_size=1)
            self.prj_4 = nn.Conv2d(256, features, kernel_size=1)
            self.prj_3 = nn.Conv2d(128, features, kernel_size=1)
        elif cfg.backbone_choice == "vovnet39":
            print("backbone use vovnet39")
            self.prj_5 = nn.Conv2d(1024, features, kernel_size=1)
            self.prj_4 = nn.Conv2d(768, features, kernel_size=1)
            self.prj_3 = nn.Conv2d(512, features, kernel_size=1)

        self.conv_5 =nn.Conv2d(features, features, kernel_size=3, padding=1)
        self.conv_4 =nn.Conv2d(features, features, kernel_size=3, padding=1)
        self.conv_3 =nn.Conv2d(features, features, kernel_size=3, padding=1)
        if use_p5:
            self.conv_out6 = nn.Conv2d(features, features, kernel_size=3, padding=1, stride=2)
        else:
            self.conv_out6 = nn.Conv2d(512, features, kernel_size=3, padding=1, stride=2)
        self.conv_out7 = nn.Conv2d(features, features, kernel_size=3, padding=1, stride=2)
        self.use_p5=use_p5
        self.apply(self.init_conv_kaiming)
    def upsamplelike(self,inputs):
        src,target=inputs
        return F.interpolate(src, size=(target.shape[2], target.shape[3]),
                    mode='nearest')
    
    def init_conv_kaiming(self,module):
        if isinstance(module, nn.Conv2d):
            nn.init.kaiming_uniform_(module.weight, a=1)

            if module.bias is not None:
                nn.init.constant_(module.bias, 0)
    
    def forward(self,x):
        C3,C4,C5=x
        P5 = self.prj_5(C5)
        P4 = self.prj_4(C4)
        P3 = self.prj_3(C3)
        
        P4 = P4 + self.upsamplelike([P5,C4])
        P3 = P3 + self.upsamplelike([P4,C3])

        P3 = self.conv_3(P3)
        P4 = self.conv_4(P4)
        P5 = self.conv_5(P5)
        
        P5 = P5 if self.use_p5 else C5
        P6 = self.conv_out6(P5)
        P7 = self.conv_out7(F.relu(P6))
        return [P3,P4,P5,P6,P7]

接下来是head部分，主要是两个pipeline的输出


import torch.nn as nn
import torch
import math

class ScaleExp(nn.Module):
    def __init__(self,init_value=1.0):
        super(ScaleExp,self).__init__()
        self.scale=nn.Parameter(torch.tensor([init_value],dtype=torch.float32))
    def forward(self,x):
        return torch.exp(x*self.scale)

class ClsCntRegHead(nn.Module):
    def __init__(self,in_channel,class_num,GN=True,cnt_on_reg=True,prior=0.01):
        '''
        Args  
        in_channel  
        class_num  
        GN  
        prior  
        '''
        super(ClsCntRegHead,self).__init__()
        self.prior=prior
        self.class_num=class_num
        self.cnt_on_reg=cnt_on_reg
        
        cls_branch=[]
        reg_branch=[]

        for i in range(4):
            cls_branch.append(nn.Conv2d(in_channel,in_channel,kernel_size=3,padding=1,bias=True))
            if GN:
                cls_branch.append(nn.GroupNorm(32,in_channel))
            cls_branch.append(nn.ReLU(True))

            reg_branch.append(nn.Conv2d(in_channel,in_channel,kernel_size=3,padding=1,bias=True))
            if GN:
                reg_branch.append(nn.GroupNorm(32,in_channel))
            reg_branch.append(nn.ReLU(True))

        self.cls_conv=nn.Sequential(*cls_branch)
        self.reg_conv=nn.Sequential(*reg_branch)

        self.cls_logits=nn.Conv2d(in_channel,class_num,kernel_size=3,padding=1)
        self.cnt_logits=nn.Conv2d(in_channel,1,kernel_size=3,padding=1)
        self.reg_pred=nn.Conv2d(in_channel,4,kernel_size=3,padding=1)
        
        self.apply(self.init_conv_RandomNormal)
        
        nn.init.constant_(self.cls_logits.bias,-math.log((1 - prior) / prior))
        self.scale_exp = nn.ModuleList([ScaleExp(1.0) for _ in range(5)])
    
    def init_conv_RandomNormal(self,module,std=0.01):
        if isinstance(module, nn.Conv2d):
            nn.init.normal_(module.weight, std=std)

            if module.bias is not None:
                nn.init.constant_(module.bias, 0)
    
    def forward(self,inputs):
        '''inputs:[P3~P7]'''
        cls_logits=[]
        cnt_logits=[]
        reg_preds=[]
        for index,P in enumerate(inputs):
            cls_conv_out=self.cls_conv(P)
            reg_conv_out=self.reg_conv(P)

            cls_logits.append(self.cls_logits(cls_conv_out))
            if not self.cnt_on_reg:
                cnt_logits.append(self.cnt_logits(cls_conv_out))
            else:
                cnt_logits.append(self.cnt_logits(reg_conv_out))
            reg_preds.append(self.scale_exp[index](self.reg_pred(reg_conv_out)))
        return cls_logits,cnt_logits,reg_preds

整体的模型实现就是这样，我使用口罩检测数据集（数据下载链接在我的GitHub，记得点star哟）。

2、训练自己数据集

2.1 首先将自己数据整理成voc格式。如下图。

Annotations里面是xml的labelimg的标注文件。JPEGImages里面是原图像。ImageSets/Main里面是txt文件。

在Github的utils里面两个脚本，convert_json2VOCSEG.py是将label转化为xml文件和掩码的npy文件，因为后续要尝试利用FCOS做实例分割，有兴趣的大佬可以和我一起讨论。maketxt.py就是用来生成txt的简单脚本。

2.2 代码

https://github.com/2anchao/FCOS_DET_MASK

2.3 训练结果

2.3.1 没有带口罩的

2.3.2 带口罩的

我师兄说这个妹子P图比较严重，你怎么看？

3 总结：

训练的时间不长，只是训练玩玩，训练到10个epoch的时候被leader把工作台给没收了。。。继续训练效果肯定会好很多。希望这个博文能帮到你们。

超超爱AI

关注

2
点赞
踩
29

收藏

觉得还不错? 一键收藏
打赏
10
评论
FCOS训练自己的数据集

1、fcos网络在常见的计算机视觉任务中，个人认为检测是比较复杂的。主要原因也是anchor生成机制的原因，检测过程涉及anchor的尺寸scale和长宽比aspect radio等超参数的设置，检测框匹配，正负样本不均匀，计算复杂度高等等问题的解决。所以近年来anchor机制是检测里面的主流。当然也有人开始挑战权威。提出了anchor-free，这种idea让我这样的弟弟...
复制链接

扫一扫