faster R-CNN中anchors 的生成过程(generate_anchors源码解析)

本博客记录generate_anchors的解读,帮助理解anchor的生成过程

首先看main函数

if __name__ == '__main__':
    import time
    t = time.time()
    a = generate_anchors()   #最主要的就是这个函数
    print time.time() - t
    print a
    from IPython import embed; embed()

进入到generate_anchors函数中:

def generate_anchors(base_size=16, ratios=[0.5, 1, 2],
                     scales=2**np.arange(3, 6)):
    """
    Generate anchor (reference) windows by enumerating aspect ratios X
    scales wrt a reference (0, 0, 15, 15) window.
    """

    base_anchor = np.array([1, 1, base_size, base_size]) - 1
    print ("base anchors",base_anchor)
    ratio_anchors = _ratio_enum(base_anchor, ratios)
    print ("anchors after ratio",ratio_anchors)
    anchors = np.vstack([_scale_enum(ratio_anchors[i, :], scales)
                         for i in xrange(ratio_anchors.shape[0])])
    print ("achors after ration and scale",anchors)
    return anchors

参数有三个:

1.base_size=16

这个参数指定了最初的类似感受野的区域大小,因为经过多层卷积池化之后,feature map上一点的感受野对应到原始图像就会是一个区域,这里设置的是16,也就是feature map上一点对应到原图的大小为16x16的区域。也可以根据需要自己设置。

2.ratios=[0.5,1,2]

这个参数指的是要将16x16的区域,按照1:2,1:1,2:1三种比例进行变换,如下图所示:

                                                   图1 宽高比变换 

3.scales=2**np.arange(3, 6)

这个参数是要将输入的区域,的宽和高进行三种倍数,2^3=8,2^4=16,2^5=32倍的放大,如16x16的区域变成(16*8)*(16*8)=128*128的区域,(16*16)*(16*16)=256*256的区域,(16*32)*(16*32)=512*512的区域,如下图所示

                                                                            图2 面积放大变换 

接下来看第一句代码:

base_anchor = np.array([1, 1, base_size, base_size]) - 1

'''base_anchor值为[ 0,  0, 15, 15]'''

表示最基本的一个大小为16x16的区域,四个值,分别代表这个区域的左上角和右下角的点的坐标。

ratio_anchors = _ratio_enum(base_anchor, ratios)

这一句是将前面的16x16的区域进行ratio变化,也就是输出三种宽高比的anchors,这里调用了_ratio_enum函数,其定义如下:

def _ratio_enum(anchor, ratios):
    """
    Enumerate a set of anchors for each aspect ratio wrt an anchor.
    """
    size = w * h   #size:16*16=256
    size_ratios = size / ratios  #256/ratios[0.5,1,2]=[512,256,128]
    #round()方法返回x的四舍五入的数字,sqrt()方法返回数字x的平方根
    ws = np.round(np.sqrt(size_ratios)) #ws:[23 16 11]
    hs = np.round(ws * ratios)    #hs:[12 16 22],ws和hs一一对应。as:23&12
    #给定一组宽高向量,输出各个预测窗口,也就是将(宽,高,中心点横坐标,中心点纵坐标)的形式,转成
    #四个坐标值的形式
    anchors = _mkanchors(ws, hs, x_ctr, y_ctr)  
    return anchors

输入参数为一个anchor(四个坐标值表示)和三种宽高比例(0.5,1,2)

在这个函数中又调用了一个_whctrs函数,这个函数定义如下,其主要作用是将输入的anchor的四个坐标值转化成(宽,高,中心点横坐标,中心点纵坐标)的形式。

def _whctrs(anchor):
    """
    Return width, height, x center, and y center for an anchor (window).
    """
    w = anchor[2] - anchor[0] + 1
    h = anchor[3] - anchor[1] + 1
    x_ctr = anchor[0] + 0.5 * (w - 1)
    y_ctr = anchor[1] + 0.5 * (h - 1)
    return w, h, x_ctr, y_ctr

通过这个函数变换之后将原来的anchor坐标(0,0,15,15)转化成了w:16,h:16,x_ctr=7.5,y_ctr=7.5的形式,接下来按照比例变化的过程见_ratio_enum的代码注释。最后该函数输出的变换了三种宽高比的anchor如下:

ratio_anchors = _ratio_enum(base_anchor, ratios)
'''[[ -3.5,   2. ,  18.5,  13. ],
    [  0. ,   0. ,  15. ,  15. ],
    [  2.5,  -3. ,  12.5,  18. ]]'''

进行完上面的宽高比变换之后,接下来执行的是面积的scale变换,

 anchors = np.vstack([_scale_enum(ratio_anchors[i, :], scales)
                         for i in xrange(ratio_anchors.shape[0])])

这里最重要的是_scale_enum函数,该函数定义如下,对上一步得到的ratio_anchors中的三种宽高比的anchor,再分别进行三种scale的变换,也就是三种宽高比,搭配三种scale,最终会得到9种宽高比和scale 的anchors。这就是论文中每一个点对应的9种anchors。

def _scale_enum(anchor, scales):
    """
    Enumerate a set of anchors for each scale wrt an anchor.
    """

    w, h, x_ctr, y_ctr = _whctrs(anchor)
    ws = w * scales
    hs = h * scales
    anchors = _mkanchors(ws, hs, x_ctr, y_ctr)
    return anchors

_scale_enum函数中也是首先将宽高比变换后的每一个ratio_anchor转化成(宽,高,中心点横坐标,中心点纵坐标)的形式,再对宽和高均进行scale倍的放大,然后再转换成四个坐标值的形式。最终经过宽高比和scale变换得到的9种尺寸的anchors的坐标如下:

anchors = np.vstack([_scale_enum(ratio_anchors[i, :], scales)
                         for i in xrange(ratio_anchors.shape[0])])
'''
[[ -84.  -40.   99.   55.]
 [-176.  -88.  191.  103.]
 [-360. -184.  375.  199.]
 [ -56.  -56.   71.   71.]
 [-120. -120.  135.  135.]
 [-248. -248.  263.  263.]
 [ -36.  -80.   51.   95.]
 [ -80. -168.   95.  183.]
 [-168. -344.  183.  359.]]
'''

下面这个表格对比了9种尺寸的anchor的变换:

base_anchorratios(宽,高,中心点横坐标,中心点纵坐标)坐标
16x16

23x12

(2:1)

[184,96,7.5,7.5]    scale=8

[ -84.  -40.   99.   55.]

[368,192,7.5,7.5]   scale=16

[-176.  -88.  191.  103.]

[736,384,7.5,7.5]    scale=32

[-360. -184.  375.  199.]

16x16

(1:1)

[128,128,7.5,7.5]    scale=8

[ -56.  -56.   71.   71.]

[256,256,7.5,7.5]    scale=16

[-120. -120.  135.  135.]

[512,512,7.5,7.5]    scale=32

[-248. -248.  263.  263.]

11x22

(1:2)

[88,176,7.5,7.5]    scale=8[ -36.  -80.   51.   95.] 
[176,352,7.5,7.5]    scale=16

[ -80. -168.   95.  183.]

[352,704,7.5,7.5]    scale=32[-168. -344.  183.  359.]

以我的理解,得到的这些anchors的坐标是相对于原始图像的,因为feature map的大小一般也就是60*40这样的大小,而上面得到的这些坐标都是好几百,因此是相对于原始大图像而设置的这9种组合的尺寸,这些尺寸基本上可以包含图像中的任何物体,如果画面里出现了特大的物体,则这个scale就要相应的再调整大一点,来包含特大的物体。

  • 62
    点赞
  • 197
    收藏
    觉得还不错? 一键收藏
  • 23
    评论
好的,以下是一个简单的 Faster R-CNN 网络的搭建过程: 首先,需要导入 PyTorch 和必要的库: ```python import torch import torch.nn as nn import torch.nn.functional as F from torch.autograd import Variable import numpy as np ``` 接下来,定义 Faster R-CNN 网络的主体部分,即特征提取器和分类/回归头部。这里使用了 ResNet50 作为特征提取器,以及两个全连接层作为分类和回归头部: ```python class FasterRCNN(nn.Module): def __init__(self): super(FasterRCNN, self).__init__() # Feature extractor self.features = nn.Sequential( nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3), nn.BatchNorm2d(64), nn.ReLU(inplace=True), nn.MaxPool2d(kernel_size=3, stride=2, padding=1), nn.Conv2d(64, 64, kernel_size=3, stride=1, padding=1), nn.BatchNorm2d(64), nn.ReLU(inplace=True), nn.MaxPool2d(kernel_size=3, stride=2, padding=1), nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1), nn.BatchNorm2d(128), nn.ReLU(inplace=True), nn.Conv2d(128, 128, kernel_size=3, stride=1, padding=1), nn.BatchNorm2d(128), nn.ReLU(inplace=True), nn.MaxPool2d(kernel_size=3, stride=2, padding=1), nn.Conv2d(128, 256, kernel_size=3, stride=1, padding=1), nn.BatchNorm2d(256), nn.ReLU(inplace=True), nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1), nn.BatchNorm2d(256), nn.ReLU(inplace=True), nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1), nn.BatchNorm2d(256), nn.ReLU(inplace=True), nn.MaxPool2d(kernel_size=3, stride=2, padding=1), nn.Conv2d(256, 512, kernel_size=3, stride=1, padding=1), nn.BatchNorm2d(512), nn.ReLU(inplace=True), nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1), nn.BatchNorm2d(512), nn.ReLU(inplace=True), nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1), nn.BatchNorm2d(512), nn.ReLU(inplace=True), nn.MaxPool2d(kernel_size=3, stride=2, padding=1), nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1), nn.BatchNorm2d(512), nn.ReLU(inplace=True), nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1), nn.BatchNorm2d(512), nn.ReLU(inplace=True), nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1), nn.BatchNorm2d(512), nn.ReLU(inplace=True), ) # Region proposal network self.rpn = nn.Sequential( nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1), nn.ReLU(inplace=True), nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1), nn.ReLU(inplace=True), nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1), nn.ReLU(inplace=True), nn.Conv2d(512, 512, kernel_size=3, stride=1), nn.ReLU(inplace=True), nn.Conv2d(512, 18, kernel_size=1, stride=1) ) # Region of interest pooling self.roi_pool = nn.AdaptiveMaxPool2d((7, 7)) # Fully connected layers for classification self.fc_cls = nn.Sequential( nn.Linear(7 * 7 * 512, 4096), nn.ReLU(inplace=True), nn.Linear(4096, 4096), nn.ReLU(inplace=True), nn.Linear(4096, 21) ) # Fully connected layers for regression self.fc_reg = nn.Sequential( nn.Linear(7 * 7 * 512, 4096), nn.ReLU(inplace=True), nn.Linear(4096, 4096), nn.ReLU(inplace=True), nn.Linear(4096, 84) ) ``` 其,特征提取器部分使用了经典的 ResNet50 网络结构;RPN 部分使用了几个卷积层和一个输出通道数为 18 的卷积层,用于生成区域提议;ROI Pooling 部分用于将不同大小的区域池化为固定大小的特征图;分类和回归头部分别使用了两个全连接层。 接下来,定义 RPN 网络的损失函数,包括分类和回归损失: ```python class RPNLoss(nn.Module): def __init__(self, num_anchors): super(RPNLoss, self).__init__() self.num_anchors = num_anchors self.cls_loss = nn.CrossEntropyLoss(reduction='sum') self.reg_loss = nn.SmoothL1Loss(reduction='sum') def forward(self, cls_score, bbox_pred, labels, bbox_targets): batch_size, _, height, width = cls_score.size() # Reshape for cross-entropy loss cls_score = cls_score.permute(0, 2, 3, 1).contiguous().view(batch_size, -1, 2) labels = labels.view(batch_size, -1) # Compute classification loss cls_mask = labels >= 0 cls_score = cls_score[cls_mask] labels = labels[cls_mask] rpn_cls_loss = self.cls_loss(cls_score, labels.long()) # Compute regression loss bbox_pred = bbox_pred.permute(0, 2, 3, 1).contiguous().view(batch_size, -1, 4) bbox_targets = bbox_targets.view(batch_size, -1, 4) bbox_mask = labels > 0 bbox_pred = bbox_pred[bbox_mask] bbox_targets = bbox_targets[bbox_mask] rpn_reg_loss = self.reg_loss(bbox_pred, bbox_targets) # Normalize by number of anchors num_anchors = float(cls_mask.sum()) rpn_cls_loss /= num_anchors rpn_reg_loss /= num_anchors return rpn_cls_loss, rpn_reg_loss ``` 最后,定义 Faster R-CNN 网络的前向传播函数,包括对输入图像进行特征提取、生成区域提议、对区域进行分类和回归等过程: ```python class FasterRCNN(nn.Module): def __init__(self): super(FasterRCNN, self).__init__() # Feature extractor self.features = nn.Sequential( # ... ) # Region proposal network self.rpn = nn.Sequential( # ... ) # Region of interest pooling self.roi_pool = nn.AdaptiveMaxPool2d((7, 7)) # Fully connected layers for classification self.fc_cls = nn.Sequential( # ... ) # Fully connected layers for regression self.fc_reg = nn.Sequential( # ... ) # RPN loss self.rpn_loss = RPNLoss(num_anchors=9) def forward(self, x, scale=1.0): # Feature extraction features = self.features(x) # Region proposal network rpn_logits = self.rpn(features) rpn_probs = F.softmax(rpn_logits, dim=1)[:, 1] rpn_bbox = self.rpn_bbox_pred(features).exp() anchors = generate_anchors(features.size(2), features.size(3)) proposals = apply_deltas(anchors, rpn_bbox) proposals = clip_boxes(proposals, x.size(2), x.size(3)) keep = filter_boxes(proposals, min_size=16*scale) proposals = proposals[keep, :] rpn_probs = rpn_probs[keep] rpn_bbox = rpn_bbox[keep, :] # Region of interest pooling rois = torch.cat([torch.zeros(proposals.size(0), 1), proposals], dim=1) rois = Variable(rois.cuda()) pooled_features = self.roi_pool(features, rois) pooled_features = pooled_features.view(pooled_features.size(0), -1) # Classification cls_score = self.fc_cls(pooled_features) cls_prob = F.softmax(cls_score, dim=1) # Regression bbox_pred = self.fc_reg(pooled_features) return cls_prob, bbox_pred, proposals, rpn_probs, rpn_bbox def loss(self, cls_score, bbox_pred, proposals, rpn_probs, rpn_bbox, gt_boxes): # RPN loss rpn_labels, rpn_bbox_targets = anchor_targets(gt_boxes, proposals) rpn_cls_loss, rpn_reg_loss = self.rpn_loss(rpn_probs, rpn_bbox, rpn_labels, rpn_bbox_targets) # Fast R-CNN loss rois, cls_labels, bbox_targets = roi_targets(proposals, gt_boxes) cls_mask = cls_labels >= 0 cls_score = cls_score[cls_mask] cls_labels = cls_labels[cls_mask] cls_loss = F.cross_entropy(cls_score, cls_labels) bbox_pred = bbox_pred[cls_mask] bbox_targets = bbox_targets[cls_mask] reg_loss = F.smooth_l1_loss(bbox_pred, bbox_targets) return cls_loss, reg_loss, rpn_cls_loss, rpn_reg_loss ``` 其,前向传播函数的 `generate_anchors`、`apply_deltas`、`clip_boxes`、`filter_boxes`、`anchor_targets`、`roi_targets` 等函数用于生成锚框、应用回归偏移量、裁剪边界框、过滤过小的边界框、计算 RPN 损失和 Fast R-CNN 损失等。这些函数的具体实现可以参考论文或开源代码。
评论 23
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值