图片包括三个维度,channels(RGB),像素点个数(128*128等),倘若仅使用全连接网络,则模型所包括的参数量会很大,考虑到人眼分辨图片仅需一些关键特征即可做出判断,因此产生了CNN。
Convolutional Layer
使用卷积核filter(模型参数)对图片(可转换成三维矩阵)进行处理。
import torch
import torch.nn as nn
conv_1=nn.Conv2d(in_channels=1,out_channels=1,kernel_size=3,stride=1,bias=False,padding=1)
nor=nn.BatchNorm2d(1)#对图片的每一个通道进行归一化
relu=nn.ReLU()#激活函数
maxpool=nn.MaxPool2d(2, 2, 0)
img=torch.arange(36,dtype=torch.float32).reshape(1,1,6,6)#6*6*1,1为通道数
print(img)
img_1=conv_1(img)#先对img进行padding,变成8*8*1,之后与kerner_size为3*3的卷积核相乘,结果为6*6*1
print(img_1)#6*6*1
img_2=nor(img_1)#归一化
print(img_2)
img_3=maxpool(img_2)
print(img_3)#3*3*1
输出:
tensor([[[[ 0., 1., 2., 3., 4., 5.],
[ 6., 7., 8., 9., 10., 11.],
[12., 13., 14., 15., 16., 17.],
[18., 19., 20., 21., 22., 23.],
[24., 25., 26., 27., 28., 29.],
[30., 31., 32., 33., 34., 35.]]]])
tensor([[[[ -0.1362, -0.2922, -0.6574, -1.0226, -1.3878, -3.3012],
[ -0.1658, -2.2708, -2.3799, -2.4889, -2.5979, -6.6001],
[ -0.6084, -2.9251, -3.0341, -3.1431, -3.2522, -9.5992],
[ -1.0510, -3.5793, -3.6883, -3.7974, -3.9064, -12.5983],
[ -1.4936, -4.2335, -4.3426, -4.4516, -4.5606, -15.5974],
[ -0.3296, -2.5686, -2.6115, -2.6544, -2.6974, -11.4593]]]],
grad_fn=<ConvolutionBackward0>)
tensor([[[[ 1.0171, 0.9720, 0.8663, 0.7607, 0.6550, 0.1016],
[ 1.0085, 0.3996, 0.3681, 0.3365, 0.3050, -0.8527],
[ 0.8805, 0.2104, 0.1788, 0.1473, 0.1157, -1.7202],
[ 0.7525, 0.0211, -0.0104, -0.0420, -0.0735, -2.5877],
[ 0.6244, -0.1681, -0.1997, -0.2312, -0.2627, -3.4552],
[ 0.9611, 0.3135, 0.3011, 0.2887, 0.2762, -2.2582]]]],
grad_fn=<NativeBatchNormBackward0>)
tensor([[[[1.0171, 0.8663, 0.6550],
[0.8805, 0.1788, 0.1157],
[0.9611, 0.3011, 0.2762]]]], grad_fn=<MaxPool2DWithIndicesBackward0>)
输入通道数决定卷积核的通道数,输出通道数决定于卷积核的个数
Convolution 层的参数中有一个group参数,其意思是将对应的输入通道与输出通道数进行分组, 默认值为1, 也就是说默认输出输入的所有通道各为一组。 比如输入数据大小为3x3x2,通道数2,要经过一个3x3x2的卷积,group默认是1,就是全连接的卷积层。
如果group是2,那么对应要将输入的2个通道分成2个1个通道,再进行卷积
import torch
import torch.nn as nn
from torch.autograd import Variable
x=torch.FloatTensor([[1,2,3],[4,5,6],[7,8,9],
[1,2,3],[4,5,6],[7,8,9]]).view(1,2,3,3)
//输入X是通道数为2的3*3矩阵。
x = Variable(x)
conv1 = nn.Conv2d(in_channels=2,
out_channels=2,
kernel_size=3,
stride=1,
padding=0,
groups=1,
bias=False) //conv1普通卷积
conv2 = nn.Conv2d(in_channels=2,
out_channels=2,
kernel_size=3,
stride=1,
padding=0,
groups=2,
bias=False) //conv2是分组卷积
print(conv1.weight.data.size())
print(conv2.weight.data.size())
conv1.weight.data = torch.FloatTensor([[[[1,2,3],[4,5,6],[7,8,9]],
[[9,8,7],[6,5,4],[3,2,1]]],
[[[1,2,3],[4,5,6],[7,8,9]],
[[9,8,7],[6,5,4],[3,2,1]]]])
conv2.weight.data = torch.FloatTensor([[[[1,2,3],[4,5,6],[7,8,9]]],
[[[9,8,7],[6,5,4],[3,2,1]]]] )
print(conv1.weight.data)
print(conv2.weight.data)
output=conv1(x)
print(output)
output=conv2(x)
print(output)
输出:
torch.Size([2, 2, 3, 3])
torch.Size([2, 1, 3, 3])
tensor([[[[1., 2., 3.],
[4., 5., 6.],
[7., 8., 9.]],
[[9., 8., 7.],
[6., 5., 4.],
[3., 2., 1.]]],
[[[1., 2., 3.],
[4., 5., 6.],
[7., 8., 9.]],
[[9., 8., 7.],
[6., 5., 4.],
[3., 2., 1.]]]])
tensor([[[[1., 2., 3.],
[4., 5., 6.],
[7., 8., 9.]]],
[[[9., 8., 7.],
[6., 5., 4.],
[3., 2., 1.]]]])
tensor([[[[90.]],
[[90.]]]], grad_fn=<ConvolutionBackward0>)
tensor([[[[45.]],
[[45.]]]], grad_fn=<ConvolutionBackward0>)
可以看出设定分组参数group为2后,kernel size为3*3*1,进行了分组卷积操作。
设定分组可以有效减少参数数量,从而减少运算量。