一、Inception V1
用全局平均池化层代替了最后的全连接层
全连接层几乎占据了中大部分的参数量,会引起过拟合,去除全连接层之后模型可以训练的更快且避免了过拟合的情况。
在Inception v1中1*1卷积用于降维,减少参数量和feature map维度。
输入28 * 28 * 192的图像,如果直接通过3 * 3 * 256的滤波器(padding=1,stride=1),输出数据是28 * 28 * 256,参数量是192 * 3 * 3 * 256=44w;如果我们在3*3的滤波器前插入一个1 *1 * 64的滤波器实现降维,在通过3 * 3 * 256的滤波器,输出数据是28 * 28 * 256,参数量是192 * 1 * 1 * 64 + 64 * 3 * 3 * 256 = 16w,参数量减少将近1/3
采用不同大小的卷积核,意味着不同大小的感受野,这样可以增加网络对不同尺度的适应性。最后拼接意味着不同尺度特征的融合。
GoogleNet(v1)的完整结构图
import torch
import torch.nn as nn
# Inception模块
class Block(nn.Module):
def __init__(self, in_channels, out_chanel_1, out_channel_3_reduce, out_channel_3,
out_channel_5_reduce, out_channel_5, out_channel_pool):
super(Block, self).__init__()
block = []
self.block1 = nn.Conv2d(in_channels=in_channels, out_channels=out_chanel_1, kernel_size=1)
block.append(self.block1)
self.block2_1 = nn.Conv2d(in_channels=in_channels, out_channels=out_channel_3_reduce, kernel_size=1)
self.block2 = nn.Conv2d(in_channels=out_channel_3_reduce, out_channels=out_channel_3, kernel_size=3, padding=1)
block.append(self.block2)
self.block3_1 = nn.Conv2d(in_channels=in_channels, out_channels=out_channel_5_reduce, kernel_size=1)
self.block3 = nn.Conv2d(in_channels=out_channel_5_reduce, out_channels=out_channel_5, kernel_size=3, padding=1)
block.append(self.block3)
self.block4_1 = nn.MaxPool2d(kernel_size=3,stride=1,padding=1)
self.block4 = nn.Conv2d(in_channels=in_channels, out_channels=out_channel_pool, kernel_size=1)
block.append(self.block4)
# self.incep = nn.Sequential(*block)
def forward