1. 特征金字塔池化
![](https://i-blog.csdnimg.cn/blog_migrate/5dad84ce7226164a18e1fade2c3aff27.png)
如上图所示:
- 将特征图的所有像素划分为 n × n n\times n n×n个网格,对每个网格进行池化,池化层的核大小即为网格大小,宽度不符合时可以padding
- 取不同的n值,重复1过程;
- 将上述过程得到的所有结果经过flatten和concat,得到 C × N C\times N C×N格式的特征图,可以直接用于全连接。
输出的结果只与 n n n值和通道数量相关,而与输入Tensor的形状无关(当然不能太小,否则池化结果为0)
2. 实现
完整代码连接:古承风的gitee
以下是核心代码
def _spp_layer(self,x:torch.Tensor,mode='max',grid_nums:list=[16]):
"""
output_num denote an grid's width
steps:
---
1. compute width for specific output_num, sqrt(num)
2. compute pooling's kernel_size and stride
3. pooling
4. concat all the output
"""
N,C,H,W = x.size()
for i in range(len(grid_nums)):
# step1
h = ceil(H/(sqrt(grid_nums[i])))
w = ceil(W/(sqrt(grid_nums[i])))
h_pad = int(((h*sqrt(grid_nums[i])+1)-H)/2)
w_pad = int(((w*sqrt(grid_nums[i])+1)-W)/2)
# step2
if mode == "max":
pool = nn.MaxPool2d(kernel_size=(h,w),stride=(h,w),padding=(h_pad,w_pad))
elif mode=='avg':
pool = nn.AvgPool2d(kernel_size=(h,w),stride=(h,2),padding=(h_pad,w_pad))
else:
raise ValueError(f"{mode} mode type error ,expect 'max' and 'avg'")
temp = pool(x) # to origin x , means pyramid pooling
# if for fully connected , could use this concat method
if i == 0:
output = temp.view(N,-1)
else:
output = torch.concat((output,temp.view(N,-1)),-1)