pytorch学习笔记八————使用卷积进行泛化(略精细一点的图像识别)
卷积介绍
卷积本质上就是对图像进行小方块的加权处理,网上资料很多并且大多讲的都挺好这里就不赘述了
总的来说,卷积的三个特征分别是:
- 邻域的局部操作
- 平移不变性
- 模型参数的大幅度减小
所以创造一个卷积的代码如下:
conv=nn.Conv2d(3,16,kernel_size=3)
kernel_size
指的是卷积核大小,这里的3代表着3*3,当然也可以写作(3,3)
;第一个参数3代表着输入通道数,第二个参数16代表着输出通道数
nn.Conv2d
期望输入一个为
B
×
C
×
H
×
W
B\times C\times H\times W
B×C×H×W的张量
通过理论学习我们知道卷积会使图像变小,需要有填充操作,这里指定padding
为1即可
如下:
conv=nn.Conv2d(3,16,kernel_size=3,padding=1)
PS:对于偶数大小的卷积核,我们需要在左右及上下填充不同的数字,pytorch有相应的处理函数但是没必要,所以尽量用奇数大小的卷积核,偶数大小的卷积核有点奇怪
卷积参数操作
对卷积参数进行调整似乎没什么意义,这里只是做一个备份备忘
with torch.no_grad():
conv.bias.zero_()
conv.weight.fill_(1.0/9.0)
conv.weight.one_()
conv.weight[:]=torch.tensor([[-1.0,0.0,1.0],[-1.0,0.0,1.0],[-1.0,0.0,1.0]])
池化技术
看样子卷积似乎非常的完美,它对33或55的像素结构有着很好的识别技术,但是它识别不了大的结构,比如翅膀或者机翼,所以这时候我们就需要用池化技术.将图像缩小成一半,相当于4个像素合并成一个像素,可操作的方法如下:
- 取4个像素平均值
- 取4个像素最大值
- 使用带步长的卷积
目前用的最大的方法就是最大值池化技术,所以 这也是我们的重点
最大值池化结束由nn.MaxPool2d
模块提供,与卷积一样,它也有一维和三维的版本
一个完整的卷积网络
pool=nn.MaxPool2d(2)
output=pool(img.unsqueeze(0))
这里指定池化的核为2
因此我们就可以构建如下的卷积神经网络
model=nn.Sequential(
nn.Conv2d(3,16,kernel_size=3,padding=1),
nn.Tanh(),
nn.MaxPool2d(2),
nn.Conv2d(16,8,kernel_size=3,padding=1),
nn.Tanh(),
nn.MaxPool2d(),
nn.Flatten(),
nn.Linear(8*8*8,32),
nn.Tanh(),
nn.Linear(32,2)
)
Flatten()
是一个扁平化操作函数,因为默认第一维都是batch,所以无参数是从第二维开始进行扁平化处理,当然也可以写成Flatten(start_dim=0,end_dim=-1)
(从第一维开始扁平化操作)来指定扁平化的操作范围
子类化nn.Module
当我们想要构建模型来做更复杂的事情的时候,很显然Sequential就不够用了,这时候我们就要用子类化nn.Module
来构造属于自己的模型,我们需要自己写一个forward()
方法和一个__init__()
方法,我们会在__init__()
里面定义要用的模块和参数(注意调用super().__init__()
),然后在forward()
里面进行调用
写成一个子模块的神经网络模型就如下图所示
class Net(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(3, 16, kernel_size=3, padding=1)
self.act1 = nn.Tanh()
self.pool1 = nn.MaxPool2d(2)
self.conv2 = nn.Conv2d(16, 8, kernel_size=3, padding=1)
self.act2 = nn.Tanh()
self.pool2 = nn.MaxPool2d(2)
self.fc1 = nn.Linear(8 * 8 * 8, 32)
self.act3 = nn.Tanh()
self.fc2 = nn.Linear(32, 2)
def forward(self, x):
out = self.pool1(self.act1(self.conv1(x)))
out = self.pool2(self.act2(self.conv2(out)))
out = out.view(-1, 8 * 8 * 8)
out = self.act3(self.fc1(out))
out = self.fc2(out)
return out
跟踪参数和子模块
子模版必须是顶级属性,而不是隐藏在list或dict的实例中,否则优化器无法定位子模块以及其参数,所以相应的pytorch提供nn.ModuleList
和nn.ModuleDict
我们可以调用nn.Module
的任意方法,这允许Net
访问它的子模块和参数
model=Net()
numel_list=[p.numel() for p in model.parameters()]
函数式API
通过利用函数API我们可以更简单简洁的定义我们的模型
如下:
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(3, 16, kernel_size=3, padding=1)
self.conv2 = nn.Conv2d(16, 8, kernel_size=3, padding=1)
self.fc1 = nn.Linear(8 * 8 * 8, 32)
self.fc2 = nn.Linear(32, 2)
def forward(self, x):
out = F.max_pool2d(torch.tanh(self.conv1(x)), 2)
out = F.max_pool2d(torch.tanh(self.conv2(out)), 2)
out = out.view(-1, 8 * 8 * 8)
out = torch.tanh(self.fc1(out))
out = self.fc2(out)
return out
开始训练
既然模型已经定义好了那么下一步就是开始训练了,这跟之前类似
import datetime
def training_loop(n_epochs, optimizer, model, loss_fn, train_loader):
for epoch in range(1, n_epochs + 1):
loss_train = 0.0
for imgs, labels in train_loader:
outputs = model(imgs)
loss = loss_fn(outputs, labels)
optimizer.zero_grad()
loss.backward()
optimizer.step()
loss_train += loss.item()
if epoch == 1 or epoch % 10 == 0:
print('{} Epoch {}, Training loss {}'.format(
datetime.datetime.now(), epoch,
loss_train / len(train_loader)))
train_loader = torch.utils.data.DataLoader(cifar2, batch_size=64,
shuffle=True)
model = Net()
optimizer = optim.SGD(model.parameters(), lr=1e-2)
loss_fn = nn.CrossEntropyLoss()
training_loop(
n_epochs = 100,
optimizer = optimizer,
model = model,
loss_fn = loss_fn,
train_loader = train_loader,
)
保存读取模型
现在我们对我们的模型非常满意,那就把我们的模型保存在一个文件当中
torch.save(model.state_dict(), data_path + 'birds_vs_airplanes.pt')
当然读取也是可以的
loaded_model = Net()
loaded_model.load_state_dict(torch.load(data_path
+ 'birds_vs_airplanes.pt'))
在GPU上进行训练
如果GPU可用的话把一切移动到GPU上是一种很好的方式,一个好的模式是根据torch.cuda.is_available
来设置device的值
device = (torch.device('cuda') if torch.cuda.is_available()
else torch.device('cpu'))
print(f"Training on device {device}.")
然后我们可以通过Tensor.to()
把数据加载器得到的张量移动到GPU上来修正循环
这里只修改了两行代码
import datetime
def training_loop(n_epochs, optimizer, model, loss_fn, train_loader):
for epoch in range(1, n_epochs + 1):
loss_train = 0.0
for imgs, labels in train_loader:
imgs = imgs.to(device=device)# <----
labels = labels.to(device=device)# <----
outputs = model(imgs)
loss = loss_fn(outputs, labels)
optimizer.zero_grad()
loss.backward()
optimizer.step()
loss_train += loss.item()
if epoch == 1 or epoch % 10 == 0:
print('{} Epoch {}, Training loss {}'.format(
datetime.datetime.now(), epoch,
loss_train / len(train_loader)))
同样的模型也必须移动到GPU上
model=Net.to(device=device)
此时在加载网络权重时可能就略显复杂了,在加载权重时,指示pytorch覆盖设备的信息可能会更简洁一点
loaded_model = Net().to(device=device)
loaded_model.load_state_dict(torch.load(data_path
+ 'birds_vs_airplanes.pt',
map_location=device))
完整代码
from turtle import forward
from torchvision import datasets, transforms
data_path = 'data/'
cifar10 = datasets.CIFAR10(
data_path, train=True, download=False,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.4915, 0.4823, 0.4468),
(0.2470, 0.2435, 0.2616))
]))
cifar10_val = datasets.CIFAR10(
data_path, train=False, download=False,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.4915, 0.4823, 0.4468),
(0.2470, 0.2435, 0.2616))
]))
label_map = {0: 0, 2: 1}
class_names = ['airplane', 'bird']
cifar2 = [(img, label_map[label])
for img, label in cifar10
if label in [0, 2]]
cifar2_val = [(img, label_map[label])
for img, label in cifar10_val
if label in [0, 2]]
import torch
import torch.nn as nn
import torch.optim as optim
train_loader = torch.utils.data.DataLoader(cifar2, batch_size=64,
shuffle=True)
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super().__init__()
self.conv1=nn.Conv2d(3,16,kernel_size=3,padding=1)
self.conv2=nn.Conv2d(16,8,kernel_size=3,padding=1)
self.fc1=nn.Linear(8*8*8,32)
self.fc2=nn.Linear(32,2)
def forward(self,x):
out=F.max_pool2d(torch.tanh(self.conv1(x)),2)
out=F.max_pool2d(torch.tanh(self.conv2(out)),2)
out=out.view(-1,8*8*8)
out=torch.tanh(self.fc1(out))
out=self.fc2(out)
return out
model=Net()
learning_rate = 1e-2
optimizer = optim.SGD(model.parameters(), lr=learning_rate)
loss_fn = nn.CrossEntropyLoss()
n_epochs = 100
for epoch in range(n_epochs):
for imgs, labels in train_loader:
outputs = model(imgs)
loss = loss_fn(outputs, labels)
optimizer.zero_grad()
loss.backward()
optimizer.step()
print("Epoch: %d, Loss: %f" % (epoch, float(loss)))
train_loader = torch.utils.data.DataLoader(cifar2, batch_size=64,
shuffle=False)
correct = 0
total = 0
with torch.no_grad():
for imgs, labels in train_loader:
outputs = model(imgs)
_, predicted = torch.max(outputs, dim=1)
total += labels.shape[0]
correct += int((predicted == labels).sum())
print("Train Accuracy: %f" % (correct / total))
val_loader = torch.utils.data.DataLoader(cifar2_val, batch_size=64,
shuffle=False)
correct = 0
total = 0
with torch.no_grad():
for imgs, labels in val_loader:
outputs = model(imgs)
_, predicted = torch.max(outputs, dim=1)
total += labels.shape[0]
correct += int((predicted == labels).sum())
print("Val Accuracy: %f" % (correct / total))
模型设计
如上,我们构造了一个最简单的神经网络模型,对模型进行优化的最简单的方法就是增加模型容量,但与此同时出现过拟合的方法就越大,解决过拟合的最好的方法就是增加训练数据,当然,也有更多技巧对抗过拟合
权重惩罚
稳定泛化的第1种方法是在损失种添加一个正则化项,对较大权重的惩罚,使得损失更加平滑,并且从拟合单个样本中获得的收益较小,比较热门的就是L2正则化
在pytorch的SGD优化器当中有一个weight_decay
该参数对应2*lambda,完全等价于在损失中加入L2范数
DropOut
DropOut思想很简单,将网络每轮训练迭代的神经员随机部分清零
在pytorch中,我们可以在非线形激活和后面的线形或卷积模块之间添加一个nn.Dropout
模块在模型中实现,如果是卷积,则使用专门的nn.Dropout2d
或者nn.Dropout3d
将输入通道归零
class NetDropout(nn.Module):
def __init__(self, n_chans1=32):
super().__init__()
self.n_chans1 = n_chans1
self.conv1 = nn.Conv2d(3, n_chans1, kernel_size=3, padding=1)
self.conv1_dropout = nn.Dropout2d(p=0.4)
self.conv2 = nn.Conv2d(n_chans1, n_chans1 // 2, kernel_size=3,
padding=1)
self.conv2_dropout = nn.Dropout2d(p=0.4)
self.fc1 = nn.Linear(8 * 8 * n_chans1 // 2, 32)
self.fc2 = nn.Linear(32, 2)
def forward(self, x):
out = F.max_pool2d(torch.tanh(self.conv1(x)), 2)
out = self.conv1_dropout(out)
out = F.max_pool2d(torch.tanh(self.conv2(out)), 2)
out = self.conv2_dropout(out)
out = out.view(-1, 8 * 8 * self.n_chans1 // 2)
out = torch.tanh(self.fc1(out))
out = self.fc2(out)
return out
训练过程Dropout是活跃的,而生产或评估时会绕过DropOut,这是通过train的属性来控制的,pytorch允许我们在子类上通过调用model.train()
和model.eval()
来实现2中模式的切换
保持激活检查:批量归一化
批量归一化使用在该中间位置收集的小批量样本的平均值和标准差来对中间输入进行移位和缩放,有助于避免激活函数过多的进入函数的饱和部分,从而消除梯度并减慢训练速度,使用批量归一化消除或减轻对Dropout的需要
pytorch提供nn.BatchNorm1d
,nn.BatchNorm2d
,nn.BatchNorm3d
来实现批量归一化.由于批量归一化的目的是重新调整激活输入,因此其位置是在线性变换和激活函数之后
class NetBatchNorm(nn.Module):
def __init__(self, n_chans1=32):
super().__init__()
self.n_chans1 = n_chans1
self.conv1 = nn.Conv2d(3, n_chans1, kernel_size=3, padding=1)
self.conv1_batchnorm = nn.BatchNorm2d(num_features=n_chans1)
self.conv2 = nn.Conv2d(n_chans1, n_chans1 // 2, kernel_size=3,
padding=1)
self.conv2_batchnorm = nn.BatchNorm2d(num_features=n_chans1 // 2)
self.fc1 = nn.Linear(8 * 8 * n_chans1 // 2, 32)
self.fc2 = nn.Linear(32, 2)
def forward(self, x):
out = self.conv1_batchnorm(self.conv1(x))
out = F.max_pool2d(torch.tanh(out), 2)
out = self.conv2_batchnorm(self.conv2(out))
out = F.max_pool2d(torch.tanh(out), 2)
out = out.view(-1, 8 * 8 * self.n_chans1 // 2)
out = torch.tanh(self.fc1(out))
out = self.fc2(out)
return out
同DropOut一样,也是通过train来控制开合
跳跃连接
之前讨论的都是广度,从现在开始就是对深度进行讨论
深度总是会带来一些额外的挑战,增加模型的深度通常会使训练更加的难以收敛,最重要的是一长串乘法会使参数对梯度的贡献消失,导致该层的训练无效
所以2015年提出了ResNet结构,它的技巧则是对层的短路块使用跳跃连接
跳跃连接只是将输入添加到层的输出中,并且改用激活函数(ReLU)
带有跳跃连接的代码如下:
class NetRes(nn.Module):
def __init__(self, n_chans1=32):
super().__init__()
self.n_chans1 = n_chans1
self.conv1 = nn.Conv2d(3, n_chans1, kernel_size=3, padding=1)
self.conv2 = nn.Conv2d(n_chans1, n_chans1 // 2, kernel_size=3,
padding=1)
self.conv3 = nn.Conv2d(n_chans1 // 2, n_chans1 // 2,
kernel_size=3, padding=1)
self.fc1 = nn.Linear(4 * 4 * n_chans1 // 2, 32)
self.fc2 = nn.Linear(32, 2)
def forward(self, x):
out = F.max_pool2d(torch.relu(self.conv1(x)), 2)
out = F.max_pool2d(torch.relu(self.conv2(out)), 2)
out1 = out
out = F.max_pool2d(torch.relu(self.conv3(out)) + out1, 2)
out = out.view(-1, 4 * 4 * self.n_chans1 // 2)
out = torch.relu(self.fc1(out))
out = self.fc2(out)
return out
构建非常深的模型
卷积神经网络模型的层数可以超过100层,那我们怎样构建一个网络并且它也不失去理智呢,标准策略是构建一个块,例如一个Conv2d,ReLU再加上一点跳跃连接
然后再创建一个包括残差块实例列表的nn.Sequential
,如此即可参数化深度构建非常深的神经网络
class ResBlock(nn.Module):
def __init__(self, n_chans):
super(ResBlock, self).__init__()
self.conv = nn.Conv2d(n_chans, n_chans, kernel_size=3,
padding=1, bias=False) # <1>
self.batch_norm = nn.BatchNorm2d(num_features=n_chans)
torch.nn.init.kaiming_normal_(self.conv.weight,
nonlinearity='relu') # <2>
torch.nn.init.constant_(self.batch_norm.weight, 0.5)
torch.nn.init.zeros_(self.batch_norm.bias)
def forward(self, x):
out = self.conv(x)
out = self.batch_norm(out)
out = torch.relu(out)
return out + x
class NetResDeep(nn.Module):
def __init__(self, n_chans1=32, n_blocks=10):
super().__init__()
self.n_chans1 = n_chans1
self.conv1 = nn.Conv2d(3, n_chans1, kernel_size=3, padding=1)
self.resblocks = nn.Sequential(
*(n_blocks * [ResBlock(n_chans=n_chans1)]))
self.fc1 = nn.Linear(8 * 8 * n_chans1, 32)
self.fc2 = nn.Linear(32, 2)
def forward(self, x):
out = F.max_pool2d(torch.relu(self.conv1(x)), 2)
out = self.resblocks(out)
out = F.max_pool2d(out, 2)
out = out.view(-1, 8 * 8 * self.n_chans1)
out = torch.relu(self.fc1(out))
out = self.fc2(out)
return out
初始化
权重初始化是一个非常重要的技巧,pytorch的权重初始化并不理想,不过若说初始化及可牵扯一大堆东西,这里便就不再叙述了