本章小结
1,上一讲介绍了图像卷积的数学运算,还有pytorch卷积操作的几个参数。了解了基本都图像卷积后,搭建了了一个简单的网络模型,这个模型实际上就是LeNet的设计。 2, 之前我们设计的网络模型都是一层接一层,从头串到尾。本讲主要介绍了Inception和Resident两种设计模型,然后对于这类不是简单的从头到尾进行串联的网络模型相较于之前的代码编写,需要一点改进。引入了block的概念。把基础结构封装成block。
1 GoogleNet全貌
1.1 Inception Module
这个结构有四个分支,在训练的时候哪个分支效果更好,自然而然的,这个分支上的权重参数数值更大,因为我们事先并不知道哪种结构和尺寸效果更好,因此让神经网络自己选择,诞生了这么一种模型。 1,从左到右第一个分支先进行平均池化,然后进行1*1卷积(等会解释),注意这里的平均池化是有padding的,也就是说不改变图的尺寸。 2,1*1卷积 3,1*1卷积后是5 * 5卷积 4,1*1卷积后是3 * 3卷积和5 * 5卷积 注意该模块只是构成骨干特征提取网络的一部分,在搭建真正的骨干网络的
f
o
r
w
a
r
d
forward
f or w a r d 的过程才进行非线性激活,也就是在
C
o
n
c
a
t
e
n
a
t
e
Concatenate
C o n c a t e na t e 之后进行非线性激活。
1.2
1
∗
1
1*1
1 ∗ 1 卷积核的作用
作用1
说明,实际上11卷积核的卷积运算原理是一样。但1 1卷积很显然是没有padding了。但11卷积通过这个图说明了一个关键作用,信息融合 。并且1 1卷积不改变特征图的尺寸。但可以改变通道数,通过改变1*1卷积核的输出通道数进行降维和升维。这个操作的目的同样是信息的融合,信息的融合操作,其实就是加减乘除。如下图就是一个加法操作。
作用2
第二个作用就是减小运算量,而且是大幅度减小运算量。学过数据结构的基础即可知道,加减法在计算机的计算几乎是忽略的,主要是乘法和除法。如下图,上部分输入数据的通道是192尺寸是2828。通过一个输出通道数为32的5 5的卷积核得到尺寸为28*28的32个通道的特征图。这个过程的计算量为120422400。要做这么多次的乘法。 而下半部使用1*1卷积的运算,只需要12433648次的乘法计算,是上部分的十分之一,这个计算的提升是很客观的。
2Inception Module代码封装
模块解析
输入张量通过四个分支得到相同尺寸的特征图,注意这里的尺寸指的只是宽度和高度。而通道数是可以不同的,经过了这个模块之后,得到四个不同通道数的特征图,然后沿着特征图的通道数的方向进行拼接。 注意通道的维度位置在哪里。
封装代码
class InceptionA ( nn. Module) :
def __init__ ( self, in_channels) :
super ( InceptionA, self) . __init__( )
self. branch1x1 = nn. Conv2d( in_channels, 16 , kernel_size= 1 )
self. branch5x5_1 = nn. Conv2d( in_channels, 16 , kernel_size= 1 )
self. branch5x5_2 = nn. Conv2d( 16 , 24 , kernel_size= 5 , padding= 2 )
self. branch3x3_1 = nn. Conv2d( in_channels, 16 , kernel_size= 1 )
self. branch3x3_2 = nn. Conv2d( 16 , 24 , kernel_size= 3 , padding= 1 )
self. branch3x3_3 = nn. Conv2d( 24 , 24 , kernel_size= 3 , padding= 1 )
self. branch_pool = nn. Conv2d( in_channels, 24 , kernel_size= 1 )
def forward ( self, x) :
'''
forward过程,每条支路都有一个特征图输出
'''
branch1x1 = self. branch1x1( x)
branch5x5 = self. branch5x5_1( x)
branch5x5 = self. branch5x5_2( branch5x5)
branch3x3 = self. branch3x3_1( x)
branch3x3 = self. branch3x3_2( branch3x3)
branch3x3 = self. branch3x3_3( branch3x3)
branch_pool = F. avg_pool2d( x, kernel_size= 3 , stride= 1 , padding= 1 )
branch_pool = self. branch_pool( branch_pool)
outputs = [ branch_pool, branch1x1, branch5x5, branch3x3, ]
return torch. cat( outputs, dim= 1 )
torch.cat操作的实验
import torch
x = torch. tensor( [ [
1 , 2 , 3 , 4 ,
5 , 6 , 7 , 8 ,
9 , 10 , 11 , 12
] ] ) . reshape( 1 , 1 , 3 , 4 )
y = torch. arange( 13 , 25 ) . reshape( 1 , 1 , 3 , 4 )
z = [ x, y]
z[ 0 ]
tensor([[[[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]]]])
import torch
x = torch. tensor( [ [
1 , 2 , 3 , 4 ,
5 , 6 , 7 , 8 ,
9 , 10 , 11 , 12
] ] ) . reshape( 1 , 1 , 3 , 4 )
y = torch. arange( 13 , 25 ) . reshape( 1 , 1 , 3 , 4 )
z = [ x, y]
z = torch. cat( z, dim= 1 )
z[ 0 ]
变成一个整体,
z
[
0
]
z[0]
z [ 0 ] ,索引的是batch。意思是取该bath下的张量,取出来的张量是三维的。
tensor([[[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]],
[[13, 14, 15, 16],
[17, 18, 19, 20],
[21, 22, 23, 24]]])
3设计Inception Module模型
刘老师并没有把该讲的模型图画出来,这是我根据代码画的模型图,并且我认为经过Inception模块的特征提取后应该要经过非线性激活,而不是直接就接入卷积层又做一次线性运算。因为重复的线性运算其实是可以等价成一次线性运算的。所以我分别加了两次非线性激活在Inception模块之后。
4完整代码补充与实验
import torch
import torch. nn as nn
from torchvision import transforms
from torchvision import datasets
from torch. utils. data import DataLoader
import torch. nn. functional as F
import torch. optim as optim
batch_size = 64
transform = transforms. Compose( [ transforms. ToTensor( ) , transforms. Normalize( ( 0.1307 , ) , ( 0.3081 , ) ) ] )
train_dataset = datasets. MNIST( root= './datasets/data' ,
train= True ,
download= True ,
transform= transform)
train_loader = DataLoader( train_dataset,
shuffle= True ,
batch_size= batch_size)
test_dataset = datasets. MNIST( root= './datasets/data' ,
train= False ,
download= True ,
transform= transform)
test_loader = DataLoader( test_dataset,
shuffle= False ,
batch_size= batch_size)
class InceptionA ( nn. Module) :
def __init__ ( self, in_channels) :
super ( InceptionA, self) . __init__( )
self. branch1x1 = nn. Conv2d( in_channels, 16 , kernel_size= 1 )
self. branch5x5_1 = nn. Conv2d( in_channels, 16 , kernel_size= 1 )
self. branch5x5_2 = nn. Conv2d( 16 , 24 , kernel_size= 5 , padding= 2 )
self. branch3x3_1 = nn. Conv2d( in_channels, 16 , kernel_size= 1 )
self. branch3x3_2 = nn. Conv2d( 16 , 24 , kernel_size= 3 , padding= 1 )
self. branch3x3_3 = nn. Conv2d( 24 , 24 , kernel_size= 3 , padding= 1 )
self. branch_pool = nn. Conv2d( in_channels, 24 , kernel_size= 1 )
def forward ( self, x) :
'''
forward过程,每条支路都有一个特征图输出
'''
branch1x1 = self. branch1x1( x)
branch5x5 = self. branch5x5_1( x)
branch5x5 = self. branch5x5_2( branch5x5)
branch3x3 = self. branch3x3_1( x)
branch3x3 = self. branch3x3_2( branch3x3)
branch3x3 = self. branch3x3_3( branch3x3)
branch_pool = F. avg_pool2d( x, kernel_size= 3 , stride= 1 , padding= 1 )
branch_pool = self. branch_pool( branch_pool)
outputs = [ branch_pool, branch1x1, branch5x5, branch3x3, ]
return torch. cat( outputs, dim= 1 )
class Net ( nn. Module) :
def __init__ ( self) :
super ( Net, self) . __init__( )
self. conv1 = nn. Conv2d( 1 , 10 , kernel_size= 5 )
self. conv2 = nn. Conv2d( 88 , 20 , kernel_size= 5 )
self. incep1 = InceptionA( in_channels= 10 )
self. incep2 = InceptionA( in_channels= 20 )
self. mp = nn. MaxPool2d( 2 )
self. fc = nn. Linear( 1408 , 10 )
def forward ( self, x) :
in_size = x. size( 0 )
x = F. relu( self. mp( self. conv1( x) ) )
x = self. incep1( x)
x = F. relu( x)
x = F. relu( self. mp( self. conv2( x) ) )
x = self. incep2( x)
x = F. relu( x)
x = x. view( in_size, - 1 )
x = self. fc( x)
return x
device = torch. device( "cuda:0" if torch. cuda. is_available( ) else "cpu" )
model = Net( )
model. to( device)
criterion = torch. nn. CrossEntropyLoss( )
optimizer = optim. SGD( model. parameters( ) , lr= 0.01 , momentum= 0.5 )
def train ( epoch) :
running_loss = 0.0
for batch_idx, data in enumerate ( train_loader, 0 ) :
inputs, target = data
inputs, target = inputs. to( device) , target. to( device)
optimizer. zero_grad( )
outputs = model( inputs)
loss = criterion( outputs, target)
loss. backward( )
optimizer. step( )
running_loss += loss. item( )
if batch_idx % 300 == 299 :
print ( '[%d,%5d] loss:%.3f' % ( epoch + 1 , batch_idx + 1 , running_loss / 300 ) )
running_loss = 0.0
def test ( ) :
correct = 0
total = 0
with torch. no_grad( ) :
for data in test_loader:
images, labels = data
outputs = model( images. to( device) ) . cpu( )
_, predicted = torch. max ( outputs. data, dim= 1 )
total += labels. size( 0 )
correct += ( predicted == labels) . sum ( ) . item( )
print ( 'Accuracy on test set:%d %%' % ( 100 * correct / total) )
if __name__ == '__main__' :
for epoch in range ( 10 ) :
train( epoch)
test( )
实验结果
我的实验结果 老师给出的实验结果
性能分析,参数估计
这里性能相较于上一讲的普通卷积层并没有提升,老师说主要原因在于最后利用的是全连接层来做分类导致的。为什么呢?我认为是这样的,可以看一下最后一层线性层的网络模型参数,
1408
∗
10
=
14080
1408*10=14080
1408 ∗ 10 = 14080 ,那全部的网络模型参数是多少? 一个Inception的参数量是1080,而整个网络骨干除去全连接层的参数量只有2410。也就是说网络模型参数中,全连接层的参数占据百分之99.999。因此,可以认为全连接层占据了主导地位,导致了性能提升不明显。
5残差网络
Residual Block
这里的残差连接
F
(
x
)
+
x
F(x)+x
F ( x ) + x 是加法操作,不是拼接操作,加法操作需要
F
(
x
)
F(x)
F ( x ) 和
x
x
x 的维度一模一样。而上面的Inception的拼接只要求宽高相等很不一样。
Residual Block代码实现
class ResidualBlock ( nn. Module) :
def __init__ ( self, channels) :
super ( ResidualBlock, self) . __init__( )
self. channels = channels
self. conv1 = nn. Conv2d( channels, channels, kernel_size= 3 , padding= 1 )
self. conv2 = nn. Conv2d( channels, channels, kernel_size= 3 , padding= 1 )
def forward ( self, x) :
y = F. relu( self. conv1( x) )
y = self. conv2( y)
return F. relu( x + y)
6设计Residual模型
残差网络BackBone
class ResidualBlock ( nn. Module) :
def __init__ ( self, channels) :
super ( ResidualBlock, self) . __init__( )
self. channels = channels
self. conv1 = nn. Conv2d( channels, channels, kernel_size= 3 , padding= 1 )
self. conv2 = nn. Conv2d( channels, channels, kernel_size= 3 , padding= 1 )
def forward ( self, x) :
y = F. relu( self. conv1( x) )
y = self. conv2( y)
return F. relu( x + y)
实验结果
可以看到精度有一个百分点的提升。也就是说错误率下降了百分之五十!
7后续工作的方向和步骤