关于FPN的学习,拖了一周,记录一下,我先是看的论文,然后看的博客:https://blog.csdn.net/WZZ18191171661/article/details/79494534 阅读量比较高,博主讲的也比较清晰,记录一下。
1.第一点疑问,关于AP50的计算方式的疑问,参考链接:https://blog.csdn.net/qq_41994006/article/details/81051150,讲的比较清楚。
2.第二点疑问,关于图像金字塔和特征金字塔,首先明确这两个东西是不同的,这点我是看文章明白的,推荐大家把原论文基本原理部分看一遍。
3.关于横向连接,看了一篇博客是这么解释的,现在对横向连接有了第一个感觉就是怎么是这么融合的,ResNet每个阶段的输出和经过上采样之后的输出融合,对网络究竟影响在何处。
另外我想研究FPN用在目标检测领域,对于Anchor的理解,还需要进一步细化。
自底向上的过程实际上就是前馈神经网络的计算过程。以ResNet为例,对每个阶段提取最后一个residual block的输出(conv2,conv3,conv4和conv5)来构成特征金字塔,相对于输入图像,步长分别为4、8、16、32像素(不使用conv1是因为它占内存太大)。自顶向下的过程通过上采样完成,也就是把高层的feature map通过最近邻上采样使其尺寸*2。横向连接就是将上采样的高层feature map和自底向上产生的feature map(通过1*1的卷积操作来减少feature map的通道数)融合(元素加)。通过上述过程不断迭代产生最好的feature map,即C2。在每个合并的map上附加一个3*3卷积产生最终特征映射,以降低上采样的混叠效应。
关于FPN最好结合图片来理解最好,这是从我参考的博客里截取过来的。
原论文对应的部分贴在下边
原论文中提到参数共享对于性能影响不大,姑且这么认为(在RPN中测试)。
import torch.nn as nn
import torch.nn.functional as F
import math
__all__ = ['FPN']
class Bottleneck(nn.Module):
expansion = 4
def __init__(self, in_planes, planes, stride=1, downsample=None):
print("---666---")
super(Bottleneck, self).__init__()
self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=1, bias=False)
self.bn1 = nn.BatchNorm2d(planes)
self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride, padding=1, bias=False)
self.bn2 = nn.BatchNorm2d(planes)
self.conv3 = nn.Conv2d(planes, self.expansion * planes, kernel_size=1, bias=False)
self.bn3 = nn.BatchNorm2d(self.expansion * planes)
self.relu = nn.ReLU(inplace=True)
self.downsample = downsample
self.stride = stride
def forward(self, x):
residual = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
out = self.relu(out)
out = self.conv3(out)
out = self.bn3(out)
if self.downsample is not None:
residual = self.downsample(x)
out += residual
out = self.relu(out)
return out
class FPN(nn.Module):
def __init__(self, block, layers):
super(FPN, self).__init__()
self.inplanes = 64
self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False)
self.bn1 = nn.BatchNorm2d(64)
self.relu = nn.ReLU(inplace=True)
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
# Bottom-up layers
self.layer1 = self._make_layer(block, 64, layers[0]) # 第一步
print(layers[0]) # 第四步,输出初始layers中的值
self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
print(layers[1])
self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
print(layers[2])
self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
print(layers[3])
# Top layer
self.toplayer = nn.Conv2d(2048, 256, kernel_size=1, stride=1, padding=0) # Reduce channels
# Smooth layers
self.smooth1 = nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1)
self.smooth2 = nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1)
self.smooth3 = nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1)
# Lateral layers
self.latlayer1 = nn.Conv2d(1024, 256, kernel_size=1, stride=1, padding=0)
self.latlayer2 = nn.Conv2d(512, 256, kernel_size=1, stride=1, padding=0)
self.latlayer3 = nn.Conv2d(256, 256, kernel_size=1, stride=1, padding=0)
for m in self.modules():
if isinstance(m, nn.Conv2d):
n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
m.weight.data.normal_(0, math.sqrt(2. / n))
elif isinstance(m, nn.BatchNorm2d):
m.weight.data.fill_(1)
m.bias.data.zero_()
def _make_layer(self, block, planes, blocks, stride=1):
downsample = None
if stride != 1 or self.inplanes != block.expansion * planes:
downsample = nn.Sequential(
nn.Conv2d(self.inplanes, block.expansion * planes, kernel_size=1, stride=stride, bias=False),
nn.BatchNorm2d(block.expansion * planes)
)
layers = []
layers.append(block(self.inplanes, planes, stride, downsample))
# 该部分是将每个blocks的第一个residual结构保存在layers列表中。
print(layers[0]) # zzf 第二步 之后又执行了一次初始化
self.inplanes = planes * block.expansion
for i in range(1, blocks):
layers.append(block(self.inplanes, planes))
# 该部分是将每个blocks的剩下residual结构保存在layers列表中,这样就完成了一个blocks的构造。
print(layers[1]) # 输出第二个结构
# 这两行代码中都是通过Bottleneck这个类来完成每个residual的构建,接下来介绍Bottleneck类。
return nn.Sequential(*layers)
def _upsample_add(self, x, y):
_, _, H, W = y.size()
return F.upsample(x, size=(H, W), mode='bilinear') + y
def forward(self, x):
# Bottom-up
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
c1 = self.maxpool(x)
c2 = self.layer1(c1)
c3 = self.layer2(c2)
c4 = self.layer3(c3)
c5 = self.layer4(c4)
# Top-down
p5 = self.toplayer(c5)
p4 = self._upsample_add(p5, self.latlayer1(c4))
p3 = self._upsample_add(p4, self.latlayer2(c3))
p2 = self._upsample_add(p3, self.latlayer3(c2))
# Smooth
p4 = self.smooth1(p4)
p3 = self.smooth2(p3)
p2 = self.smooth3(p2)
return p2, p3, p4, p5
if __name__ == '__main__':
net = FPN(Bottleneck, [2, 2, 2, 2])
# [2, 2, 2, 2]会影响layer对应的Sequential数目,如果等于2,那会输出(0)、(1)
print(net)
# def FPN101():
# return FPN(Bottleneck, [2, 2, 2, 2])
关于代码的初步理解:
关于维度的输入输出我还没搞懂,目前已经知道的是代码执行的顺序
1,初始化,会运行Bottleneck,打印出---666---
2,会跳转到
self.layer1 = self._make_layer(block, 64, layers[0]) # 第一步
往下调用_make_layer函数
layers.append(block(self.inplanes, planes, stride, downsample))
此时的输出是
Bottleneck(
(conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(downsample): Sequential(
(0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
3,然后在_make_layer里面继续执行,会返回执行Bottleneck,输出---666---
4,继续向下执行
for i in range(1, blocks):
layers.append(block(self.inplanes, planes))
会输出
Bottleneck(
(conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
)
继续执行第四步
self.layer1 = self._make_layer(block, 64, layers[0]) # 第一步
print(layers[0]) # 第四步,输出初始layers中的值
此时输出一个2,这是由于layers的初始化数组是[2,2,2,2]。
暂时先这么多,待补充
其他参考链接:
1.https://blog.csdn.net/baidu_30594023/article/details/82623623
2.http://www.mamicode.com/info-detail-2602526.html
3.https://blog.csdn.net/Jason_mmt/article/details/82662306 (下一步研究)
后续继续补充...