论文阅读与视频学习
Deep Residual Learning for Image Recognition
运用深度残差学习可以解决由于网络深度增加而造成的退化问题。
主要贡献:
- 超深的网络结构
- 提出residual模块
- 使用Batch Normalization加速训练
1×1的卷积核可以用来降维和升维
虚线的残差结构中输入特征矩阵和输出特征矩阵的shape不一样,所以在主分支和捷径都采用了与之前不一样的卷积核。
Batch Normalization的目的是使一批数据的feature map均满足均值为1,方差为0的分布。
迁移学习的优势:
- 能够快速地训练出一个理想的结果
- 当数据集较小时也能训练出理想的结果
迁移学习的方式:
1.载入权重后训练所有参数
2.载入权重后只训练最后几层参数
3.载入权重后在原网络基础上再添加一层全连接层,仅训练最后一个全连接层
Aggregated Residual Transformations for Deep Neural Networks
采用组卷积之后,参数个数变为普通卷积的
1
g
\frac{1}{g}
g1。
下面的三种形式在数学上是等价的:
代码练习
首先下载训练和测试数据并解压:
设置使用GPU
import numpy as np
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torchvision
from torchvision import models,transforms,datasets
import torch.nn.functional as F
from PIL import Image
import torch.optim as optim
import json, random
import os
# 使用GPU
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
# 训练图片和测试图片的路径
train_path = './train/'
test_path = './test/'
然后将输入图像缩放为 128*128,每一个 batch 中图像数量为128
def get_data(file_path):
file_lst = os.listdir(file_path)
data_lst = []
for i in range(len(file_lst)):
clas = file_lst[i][:3]
img_path = os.path.join(file_path,file_lst[i])
if clas == 'cat':
data_lst.append((img_path, 0))
else:
data_lst.append((img_path, 1))
return data_lst
class catdog_set(torch.utils.data.Dataset):
def __init__(self, path, transform):
super(catdog_set).__init__()
self.data_lst = get_data(path)
self.trans = torchvision.transforms.Compose(transform)
def __len__(self):
return len(self.data_lst)
def __getitem__(self,index):
(img,cls) = self.data_lst[index]
image = self.trans(Image.open(img))
label = torch.tensor(cls,dtype=torch.float32)
return image,label
train_loader = torch.utils.data.DataLoader(
catdog_set(train_path, [transforms.Resize((128,128)),transforms.ToTensor()]),
batch_size=128, shuffle=True)
定义LeNet的网络结构
class LeNet(nn.Module):
def __init__(self):
super(LeNet, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 29 * 29, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
# print(x.shape)
x = x.view(-1, 16 * 29 * 29)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
# 网络放到GPU上
net = LeNet().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(), lr=0.001)
for epoch in range(30):
for i, (inputs, labels) in enumerate(train_loader):
inputs = inputs.to(device)
labels = labels.to(device)
# 优化器梯度归零
optimizer.zero_grad()
# 正向传播 + 反向传播 + 优化
outputs = net(inputs)
loss = criterion(outputs, labels.long())
loss.backward()
optimizer.step()
print('Epoch: %d loss: %.6f' %(epoch + 1, loss.item()))
print('Finished Training')
AI研习社的要求是提交csv文件,最后需要输出为csv文件:
resfile = open('res.csv', 'w')
for i in range(0,2000):
img_PIL = Image.open('./test/'+str(i)+'.jpg')
img_tensor = transforms.Compose([transforms.Resize((128,128)),transforms.ToTensor()])(img_PIL)
img_tensor = img_tensor.reshape(-1, img_tensor.shape[0], img_tensor.shape[1], img_tensor.shape[2])
img_tensor = img_tensor.to(device)
out = net(img_tensor).cpu().detach().numpy()
if out[0, 0] < out[0, 1]:
resfile.write(str(i)+','+str(1)+'\n')
else:
resfile.write(str(i)+','+str(0)+'\n')
resfile.close()
提交后得分为68.25:
然后将模型改为残差学习的模型:
class ResidualBlock(nn.Module):
def __init__(self,in_channels,out_channels,stride=1,kernel_size=3,padding=1,bias=False):
super().__init__()
self.conv1 = nn.Sequential(
nn.Conv2d(in_channels,out_channels,kernel_size,stride,padding,bias=False),
nn.BatchNorm2d(out_channels),
nn.ReLU(inplace=True),
)
self.conv2 = nn.Sequential(
nn.Conv2d(out_channels,out_channels,kernel_size,1,padding,bias=False),
nn.BatchNorm2d(out_channels),
)
if stride!=1 or in_channels != out_channels:
self.shortcut = nn.Sequential(
nn.Conv2d(in_channels,out_channels,kernel_size=1,stride = stride,bias=False),
nn.BatchNorm2d(out_channels))
else:
self.shortcut = nn.Sequential()
def forward(self,x):
residual = x
x = self.conv1(x)
x = self.conv2(x)
x += self.shortcut(residual)
x = nn.ReLU(inplace=True)(x)
return x
class ResNet34(nn.Module):
def __init__(self,n_classes):
super().__init__()
self.block1 = nn.Sequential(
nn.Conv2d(3,64,7,2,3,bias=False),
nn.BatchNorm2d(64),
nn.ReLU(inplace=True)
)
self.block2 = nn.Sequential(
nn.MaxPool2d(3,2),
ResidualBlock(64,64,1),
ResidualBlock(64,64,1),
ResidualBlock(64,64,1)
)
self.block3 = nn.Sequential(
ResidualBlock(64,128,1),
ResidualBlock(128,128,1),
ResidualBlock(128,128,1),
ResidualBlock(128,128,2)
)
self.block4 = nn.Sequential(
ResidualBlock(128,256,1),
ResidualBlock(256,256,1),
ResidualBlock(256,256,1),
ResidualBlock(256,256,1),
ResidualBlock(256,256,1),
ResidualBlock(256,256,2)
)
self.block5 = nn.Sequential(
ResidualBlock(256,512,1),
ResidualBlock(512,512,1),
ResidualBlock(512,512,2)
)
self.avgpool = nn.AvgPool2d(2)
self.fc = nn.Linear(2048,n_classes)
def forward(self,x):
x = self.block1(x)
x = self.block2(x)
x = self.block3(x)
x = self.block4(x)
x = self.block5(x)
x = self.avgpool(x)
# print(x.shape)
x = x.view(x.size(0),-1)
x = self.fc(x)
return x
net = ResNet34(2).to(device)
criterion = nn.CrossEntropyLoss(reduction='mean')
optimizer = optim.Adam(net.parameters(), lr=0.0001)
在一开始ResNet效果并不好,在减小了batch size之后效果得到了提升,并且好于LeNet。
思考题
1、Residual learning
残差网络结构图中,通过“shortcut connections(捷径连接)”的方式,直接把输入x传到输出作为初始结果,输出结果为H(x)=F(x)+x,当F(x)=0时,那么H(x)=x。于是,ResNet相当于将学习目标改变了,不再是学习一个完整的输出,而是目标值H(X)和x的差值,也就是所谓的残差F(x) := H(x)-x,因此,后面的训练目标就是要将残差结果逼近于0,使到随着网络加深,准确率不下降。
2、Batch Normailization 的原理
BN对网络中每一层的输出都进行标准化处理,将输出拉到均值为0方差为1的标准正态分布上。如此一来,网络中每一层的输出数据都服从了正态分布,可以加快网络的收敛速度。BN一般是在网络输出和激活函数之间,经过BN处理之后的数据再激活的话,激活值一般会落到激活函数的敏感区域,梯度一般较大,这对于两端饱和的激活函数来说可以解决梯度消失问题。
3、为什么分组卷积可以提升准确率?即然分组卷积可以提升准确率,同时还能降低计算量,分数数量尽量多不行吗?
组卷积的方式能够增加相邻层filter之间的对角相关性,而且能够减少训练参数,不容易过拟合。我觉得分组数量尽量多是可以的,最大为输入feature map的通道数,此时为深度分离卷积。