Faster Rcnn

最新推荐文章于 2022-08-30 11:08:07 发布

古风子

最新推荐文章于 2022-08-30 11:08:07 发布

阅读量72

点赞数

分类专栏： pytorch实践文章标签： Faster Rcnn

本文链接：https://blog.csdn.net/jiadongfengyahoo/article/details/115358440

版权

pytorch实践专栏收录该内容

26 篇文章 1 订阅

订阅专栏

学习教程工程

https://github.com/chenyuntc/simple-faster-rcnn-pytorch

安装依赖

conda create --name jdf_pytorch
conda activate jdf_pytorch
# install other dependancy
pip install visdom scikit-image tqdm fire ipdb pprint matplotlib torchnet

python和pytorch版本

#Python 3.7.0 (default, Jun 28 2018, 13:15:42)
torch.__version__
'1.6.0'

训练数据集

以PASCAL VOC为训练数据集

数据读取流程如下：

指定VOC数据集根目录
根据VOC目录，获取’ImageSets/Main/trainval.txt，trainval里面是各个图片的文件名称(不带后缀)

然后遍历dataloader,获取每一张图片对应的原图，所有的标注框，每个标注框对应的分类id，和是否为困难样本

bbox: 
[[ 31. 262. 294. 499.]
[ 35.   0. 298. 234.]]

label: 
[18 18]

difficult: 
[0 0]

对读取的每一张图片执行Transform,resize范围为600~1000，并执行归一化操作；根据尺寸缩放比例，缩放标注框大小

def preprocess(img, min_size=600, max_size=1000):
    C, H, W = img.shape
    scale1 = min_size / min(H, W)
    scale2 = max_size / max(H, W)
    scale = min(scale1, scale2)
    img = img / 255.
    img = sktsf.resize(img, (C, H * scale, W * scale), mode='reflect',anti_aliasing=False)
    # both the longer and shorter should be less than
    # max_size and min_size
    if opt.caffe_pretrain:
        normalize = caffe_normalize
    else:
        normalize = pytorch_normalze
    return normalize(img)

特征提取网络

作为一种CNN网络目标检测方法，Faster RCNN首先使用一组基础的conv+relu+pooling层提取image的feature maps。该feature maps被共享用于后续RPN层和全连接层。

这里使用VGG16提取图片特征
VGG16 网络图

网络结构加载代码：

#faster_rcnn_vgg16.py

def decom_vgg16():
    # the 30th layer of features is relu of conv5_3
    if opt.caffe_pretrain:
        model = vgg16(pretrained=False)
        if not opt.load_path:
            model.load_state_dict(t.load(opt.caffe_pretrain_path))
    else:
        model = vgg16(not opt.load_path)

    features = list(model.features)[:30]
    classifier = model.classifier

    classifier = list(classifier)
    del classifier[6]
    if not opt.use_drop:
        del classifier[5]
        del classifier[2]
    classifier = nn.Sequential(*classifier)

    # freeze top4 conv
    for layer in features[:10]:
        for p in layer.parameters():
            p.requires_grad = False

    return nn.Sequential(*features), classifier

主要流程：

预训练模型加载
取VGG16的前30层为特征层，也就是包含13层卷积层，4层池化层

VGG(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace=True)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace=True)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace=True)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU(inplace=True)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))

    (11): ReLU(inplace=True)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace=True)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU(inplace=True)
    (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (18): ReLU(inplace=True)
    (19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (20): ReLU(inplace=True)
    (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (22): ReLU(inplace=True)
    (23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (25): ReLU(inplace=True)
    (26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (27): ReLU(inplace=True)
    (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (29): ReLU(inplace=True)

  )

删除分类层的最后一层，因为VOC数据集时20分类，不是1000

使用以上特征成提取网络：

输入：img_size: (600, 800)
处理过程：使用VGG16的前30层进行处理，通道从3变为512，大小从600变为600/(2^4) = 37（标准VGG16应为为：600/(2^5)）
输出：features shape: torch.Size([1, 512, 37, 50])

注意：VGG16的卷积层不改变图片尺寸的大小，只增加通道数，每一层池化层，将原图大小缩小为原来的1/2, 4层池化层，大小变为原来的1/16

特征提取流程如下
在这里插入图片描述

特征提取网路

RPN候选区域网络

Region Proposal Networks。RPN网络用于生成region proposals。该层通过softmax判断anchors属于positive或者negative，再利用bounding box regression修正anchors获得精确的proposals

base anchors生成

以上公式对应的python实现为：

#bbox_tools.py
def generate_anchor_base(base_size=16, ratios=[0.5, 1, 2],
                         anchor_scales=[8, 16, 32]):
    py = base_size / 2.
    px = base_size / 2.

    anchor_base = np.zeros((len(ratios) * len(anchor_scales), 4),
                           dtype=np.float32)
    for i in six.moves.range(len(ratios)):
        for j in six.moves.range(len(anchor_scales)):


            h = base_size * anchor_scales[j] * np.sqrt(ratios[i])
            w = base_size * anchor_scales[j] * np.sqrt(1. / ratios[i])

            print(i,j,base_size, anchor_scales[j], ratios[i], np.sqrt(ratios[i]),'--->',h, w)

            index = i * len(anchor_scales) + j
            anchor_base[index, 0] = py - h / 2.
            anchor_base[index, 1] = px - w / 2.
            anchor_base[index, 2] = py + h / 2.
            anchor_base[index, 3] = px + w / 2.
    return anchor_base

base_size=16

由特征提取网络可知，原图经过特征网络，会缩小为原来的1/16,也就是输出的特征图在原图上的对应大小应该为16x16个像素的区域

anchor_scales=[8, 16, 32]

最终生成的anchor，相对于16x16缩放的倍数；也即是最总会生成3中面积大小的:

16x16  ---(16x8)x(16x8)
16x16  ---(16x16)x(16x16)
16x16  ---(16x32)x(16x32)

ratios=[0.5,1,2]

对经过scale之后，每个尺寸的anchor最终都会有三个比例的anchor, h:w符合radio比例

计算公式推导如下：


已知条件：
base_size = 16 ,anchor_scales, ratios

scale后的尺寸大小为：
scl_h = base_size*anchor_scales[i]
scl_w = base_size*anchor_scales[i]


求radio比例的任意一个dst_h, dst_w

已知：radio前后，anchor的面积相同 :
scl_h * scl_w = dst_h * dst_w 16384

且dst_h 和 dst_w的比例符合raidio :

dst_h /dst_w = ratios[j] 
==> dst_h  = ratios[j]*dst_w

带入：scl_h * scl_w = dst_h * dst_w

得到：scl_h * scl_w = ratios[j]*dst_w * dst_w
     ==> dst_w = sqrt(scl_h * scl_w/ ratios[j]), 其中scl_h和scl_w相同
     ==> dst_w = base_size*anchor_scales[i] * sqrt(1/ ratios[j])

最终得到的9个初始位置的anchor框为：

base_size[16], anchor_scales[8],  ratios[0], --> anchor_h[90], anchor_w[181]
base_size[16], anchor_scales[8],  ratios[1], --> anchor_h[128], anchor_w[128]
base_size[16], anchor_scales[8],  ratios[2], --> anchor_h[181], anchor_w[90]
base_size[16], anchor_scales[16], ratios[0], --> anchor_h[181], anchor_w[362]
base_size[16], anchor_scales[16], ratios[1], --> anchor_h[256], anchor_w[256]
base_size[16], anchor_scales[16], ratios[2], --> anchor_h[362], anchor_w[181]
base_size[16], anchor_scales[32], ratios[0], --> anchor_h[362], anchor_w[724]
base_size[16], anchor_scales[32], ratios[1], --> anchor_h[512], anchor_w[512]
base_size[16], anchor_scales[32], ratios[2], --> anchor_h[724], anchor_w[362]

anchors

上图的独立生成程序见

feature map上的每一个点对应9个Anchors，一张图片共生成多少个anchor呢？
假设图片大小为：600x800
则：anchors_num = (800/16) x (600/16) x 9 = 16875个

原图生成所有anchor的过程

原图大小是600x800, 经过VGG处理，输出的feature map为 37x50,缩小了16倍；所以feature map上一个特征点对应原图上16个像素的区域。
因此原图上anchor上的跨度为16

主要实现代码在region_proposal_network.py的_enumerate_shifted_anchor

#region_proposal_network.py
def _enumerate_shifted_anchor(anchor_base, feat_stride, height, width):
    import numpy as xp

    # plt.yticks(src_y)
    #1. 按照feat_stride扩大，生成分割原图的刻度值
    shift_y_ticks = xp.arange(0, height * feat_stride, feat_stride)
    shift_x_ticks = xp.arange(0, width * feat_stride, feat_stride)

    #2. 根据刻度值，生成刻度每个点的坐标值
    shift_x, shift_y = xp.meshgrid(shift_x_ticks, shift_y_ticks)



    #3. anchor_base偏移量
    shift = xp.stack((shift_y.ravel(), shift_x.ravel(),
                      shift_y.ravel(), shift_x.ravel()), axis=1)


    A = anchor_base.shape[0]

    K = shift.shape[0]
    # print("A-K:", A,K)
    #4. 按照shift移动anchor_base，得到height*width个anchors组合
    anchor = anchor_base.reshape((1, A, 4)) + \
             shift.reshape((1, K, 4)).transpose((1, 0, 2))
    anchor = anchor.reshape((K * A, 4)).astype(np.float32)
    show(shift_y_ticks, shift_x_ticks, shift_x, shift_y, anchor)
    return anchor