class PNet,RNet,ONet

文章介绍了三个用于人脸检测的神经网络模型——PNet、RNet和ONet。这些模型包含卷积、池化、批量归一化和PReLU激活函数等层,用于识别图像中的人脸并定位。PNet处理较小的输入,RNet和ONet进一步细化检测,ONet还预测了人脸的关键点坐标。
摘要由CSDN通过智能技术生成
class PNet(nn.Module):
    def __init__(self):
        super(PNet, self).__init__()
        self.pre_layer = nn.Sequential(
            # 第1层卷积
            nn.Conv2d(in_channels=3,
                      out_channels=10,
                      kernel_size=3,
                      stride=1,
                      padding=0),
            nn.BatchNorm2d(num_features=10),
            nn.PReLU(num_parameters=10, init=0.25),

            # 最大池化
            nn.MaxPool2d(kernel_size=2, stride=2),

            # 第2层卷积
            nn.Conv2d(in_channels=10,
                      out_channels=16,
                      kernel_size=3,
                      stride=1,
                      padding=0),
            nn.BatchNorm2d(num_features=16),
            nn.PReLU(num_parameters=16, init=0.25),

            # 第3层卷积
            nn.Conv2d(in_channels=16,
                      out_channels=32,
                      kernel_size=3,
                      stride=1,
                      padding=0),
            nn.BatchNorm2d(num_features=32),
            nn.PReLU(num_parameters=32, init=0.25)
        )

        # 输出人脸的概率 bce
        self.conv4_1 = nn.Conv2d(in_channels=32,
                                 out_channels=1,
                                 kernel_size=1,
                                 stride=1,
                                 padding=0)

        # 输出人脸的定位框的偏移量(误差)
        self.conv4_2 = nn.Conv2d(in_channels=32,
                                 out_channels=4,
                                 kernel_size=1,
                                 stride=1,
                                 padding=0)

    def forward(self, x):
        x = self.pre_layer(x)
        cls = torch.sigmoid(self.conv4_1(x))
        offset = self.conv4_2(x)
        return cls, offset

几个Net的搭建其实都非常简单,我们只需要关注它们输出了什么就可以了,对于Pnet,首先输入的x的shape=[b,3,12,12],经过pred以后的shape=[b,32,1,1],然后兵分两路用来预测该图片的种类(01之间)和该图片人脸的边框回归系数,所以cls.shape=[b,1,1,1],offset.shape=[b,4,1,1]

class RNet(nn.Module):
    def __init__(self):
        super(RNet, self).__init__()
        self.pre_layer = nn.Sequential(

            nn.Conv2d(3, 28, 3, 1),
            nn.BatchNorm2d(28),
            nn.PReLU(28),

            nn.MaxPool2d(3, 2, padding=1),

            nn.Conv2d(28, 48, 3, 1),
            nn.BatchNorm2d(48),
            nn.PReLU(48),

            nn.MaxPool2d(3, 2),

            nn.Conv2d(48, 64, 2, 1),
            nn.BatchNorm2d(64),
            nn.PReLU(64),
        )

        self.linear4 = nn.Sequential(
            nn.Linear(64 * 3 * 3, 128),
            nn.PReLU(128)
        )

        self.linear5_1 = nn.Linear(128, 1)
        self.linear5_2 = nn.Linear(128, 4)

    def forward(self, x):
        x = self.pre_layer(x)
        x = x.view(x.size(0), -1)
        x = self.linear4(x)
        cls = torch.sigmoid(self.linear5_1(x))
        offset = self.linear5_2(x)
        return cls, offset

Rnet和Pnet几乎一样,只不过输入变成了[b,3,24,24]
cls.shape=[b,1]
offser.shape=[b,4]

class ONet(nn.Module):
    def __init__(self):
        super(ONet, self).__init__()
        self.pre_layer = nn.Sequential(

            nn.Conv2d(3, 32, 3, 1),  # 46
            nn.BatchNorm2d(32),
            nn.PReLU(32),

            nn.MaxPool2d(3, 2, padding=1),  # 23

            nn.Conv2d(32, 64, 3, 1),  # 21
            nn.BatchNorm2d(64),
            nn.PReLU(64),

            nn.MaxPool2d(3, 2),  # 10

            nn.Conv2d(64, 64, 3, 1),  # 8
            nn.BatchNorm2d(64),
            nn.PReLU(64),

            nn.MaxPool2d(2, 2),  # 4

            nn.Conv2d(64, 128, 2, 1),  # 3
            nn.BatchNorm2d(128),
            nn.PReLU(128)
        )
        self.linear5 = nn.Sequential(
            nn.Linear(128 * 3 * 3, 256),
            nn.PReLU(256)
        )
        self.linear6_1 = nn.Linear(256, 1)
        self.linear6_2 = nn.Linear(256, 4)
        self.linear6_3 = nn.Linear(256, 10)
        
    def forward(self, x):
        x = self.pre_layer(x)
        x = x.view(-1, 128 * 3 * 3)
        x = self.linear5(x)
        cls = torch.sigmoid(self.linear6_1(x))
        offset = self.linear6_2(x)
        point = self.linear6_3(x)
        return cls, offset, point

它是完全按照论文中的网络结构搭建的,我们说一下形状的变化就可以了:

  • self.pre_layer是一套卷积层,输入的x的shape规定为[b,3,48,48],其shape变化非常诡异,不太规范,其shape经过4个卷积以及两个最大池化以后,shape的变化顺序为:[b,3,48,48]→[b,32,46,46]→[b,32,23,23]→[b,64,21,21]→[b,64,10,10]→[b,64,8,8]→[b,64,4,4]→[b,128,3,3]
  • 然后将后面的维度全部打平作,以一张图像的特征为单位来存放特征,[b,1152]
  • 之后经过一个全连接层来调整最后一层的维度大小,也可以说进一步融合一张图内的特征,[b,256]
  • 然后兵分三路,一路将维度降维1,来判断输入图片是否是人脸(这里采用的是sigmoid激活函数来将数值映射到01之间都不怕梯度消失么),一路将维度降低为4,来对人脸区域进行锁定(这里的四个要素是什么暂时还不确定),一路降维到10,来预测人脸五个关键点的横纵坐标
  • 最后返回这三路,shape分别为[b,1],[b,4],[b,10]
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值