MTCNN PyTorch实现
MTCNN 网络结构实现:
P-net
全卷积网络
中间层:
卷积层:2D卷积,激活函数:PReLU
池化层:最大池化
置信度输出:Sigmoid(激活函数)
回归框输出,地表点回归:线性输出
P_net | in_shape | in_channels | out_channels | kernel_size | stride | padding | out_shape |
---|---|---|---|---|---|---|---|
conv1 | [batch,3,12,12] | 3 | 10 | 3 | 1 | 0 | [batch,10,10,10] |
pool | [batch,10,10,10] | 10 | 10 | 2 | 2 | 1 | [batch,10,5,5] |
conv2 | [batch,10,5,5] | 10 | 16 | 3 | 1 | 0 | [batch,16,3,3] |
conv3 | [batch,16,3,3] | 16 | 32 | 3 | 1 | 0 | [batch,32,1,1] |
conv4_1 | [batch,32,1,1] | 32 | 2 | 1 | 1 | 0 | [batch,1,1,1] |
conv4_2 | [batch,32,1,1] | 32 | 4 | 1 | 1 | 0 | [batch,4,1,1] |
conv4_3 | [batch,32,1,1] | 32 | 10 | 1 | 1 | 0 | [batch,10,1,1] |
R-net
R-net = 卷积层 + 全连接层
中间层:
卷积层:2D卷积,激活函数:PReLU
池化层:最大池化
全连接层:PReLU(激活函数)
置信度输出:Sigmoid(激活函数)
回归框输出,地表点回归:线性输出
conv | in_shape | in_channels | out_channels | kernel_size | stride | padding | out_shape |
---|---|---|---|---|---|---|---|
conv1 | [batch,3,24,24] | 3 | 28 | 3 | 1 | 0 | [batch,28,22,22] |
pool1 | [batch,28,22,22] | 28 | 28 | 3 | 2 | 1 | [batch,28,22,22] |
conv2 | [batch,48,11,11] | 28 | 48 | 3 | 1 | 0 | [batch,48,9,9] |
pool2 | [batch,48,9,9] | 48 | 48 | 3 | 2 | 0 | [batch,48,4,4] |
conv3 | [batch,48,4,4] | 48 | 64 | 2 | 1 | 0 | [batch,48,3,3] |
line | in_unit | out_unit |
---|---|---|
line1 | 64*3*3 | 128 |
line2_1 | 128 | 1 |
line2_2 | 128 | 4 |
line3-3 | 128 | 10 |
O-net
O-net = 卷积层 + 全连接层
中间层:
卷积层:2D卷积,激活函数:PReLU
池化层:最大池化
全连接层:PReLU(激活函数)
置信度输出:Sigmoid(激活函数)
回归框输出,地表点回归:线性输出
conv | in_shape | in_channnels | out_channels | kernel_size | stride | padding | out_shape |
---|---|---|---|---|---|---|---|
conv1 | [batch,3,48,48] | 3 | 32 | 3 | 1 | 0 | [batch,32,46,46] |
pool1 | [batch,32,46,46] | 32 | 32 | 2 | 2 | 1 | [batch,32,24,24] |
conv2 | [batch,32,24,24] | 32 | 64 | 3 | 1 | 0 | [batch,64,22,22] |
pool2 | [batch,64,22,22] | 64 | 64 | 3 | 2 | 0 | [batch,64,10,10] |
conv3 | [batch,64,10,10] | 64 | 64 | 2 | 1 | 0 | [batch,64,8,8] |
pool3 | [batch,64,8,8] | 64 | 64 | 2 | 2 | 0 | [batch,64,4,4] |
conv4 | [batch,64,4,4] | 64 | 128 | 2 | 1 | 0 | [batch,128,3,3] |
line | in_unit | out_unit |
---|---|---|
line1 | 128*3* 3 | 256 |
line2_1 | 256 | 1 |
line2_2 | 256 | 4 |
line2_3 | 256 | 10 |
代码实现:
import torch
import torch.nn as nn
import torch.nn.functional as F
class PNet(nn.Module):
def __init__(self):
super(PNet, self).__init__()
self.conv_layer = nn.Sequential(
nn.Conv2d(3, 10, kernel_size=3, stride=1), # conv1
nn.PReLU(),
nn.MaxPool2d(kernel_size=2, stride=2), # pool1
nn.Conv2d(10, 16, kernel_size=3, stride=1), # conv2
nn.PReLU(),
nn.Conv2d(16, 32, kernel_size=3, stride=1), # conv3
nn.PReLU()
)
self.conv4_1 = nn.Conv2d(32, 1, kernel_size=1, stride=1)
self.conv4_2 = nn.Conv2d(32, 4, kernel_size=1, stride=1)
self.conv4_3 = nn.Conv2d(32, 10, kernel_size=1, stride=1)
def forward(self, x):
x = self.conv_layer(x)
cond = F.sigmoid(self.conv4_1(x))
box_offset = self.conv4_2(x)
land_offset = self.conv4_3(x)
return cond, box_offset, land_offset
class RNet(nn.Module):
def __init__(self):
super(RNet, self).__init__()
self.conv_layer = nn.Sequential(
nn.Conv2d(3, 28, kernel_size=3, stride=1), # conv1
nn.PReLU(),
nn.MaxPool2d(kernel_size=3, stride=2, padding=1), # pool1
nn.Conv2d(28, 48, kernel_size=3, stride=1), # conv2
nn.PReLU(),
nn.MaxPool2d(kernel_size=3, stride=2), # pool2
nn.Conv2d(48, 64, kernel_size=2, stride=1), # conv3
nn.PReLU()
)
self.line1 = nn.Sequential(
nn