ResNet
网络由来
《用于图像识别的深度残差学习》(Deep Residual Learning for Image Recognition)[1]。这篇论文获得了 CVPR 2016 的最佳论文,在发表之后的两年间里获得了超过 1 万 2 千次的论文引用。
6.2 论文的主要贡献
我们前面介绍 VGG 和 GoogleNet 的时候就已经提到过,在深度学习模型的前进道路上,一个重要的研究课题就是神经网络结构究竟能够搭建多深。
这个课题要从两个方面来看:第一个是现实层面,那就是如何构建更深的网络,如何能够训练更深的网络,以及如何才能展示出更深网络的更好性能;第二个是理论层面,那就是如何真正把网络深度,或者说是层次度,以及网络的宽度和模型整体的泛化性能直接联系起来。
在很长的一段时间里,研究人员对神经网络结构有一个大胆的预测,那就是更深的网络架构能够带来更好的泛化能力。但是要想真正实现这样的结果其实并不容易,我们都会遇到哪些挑战呢?
一个长期的挑战就是模型训练时的梯度“爆炸”(Exploding)或者“消失”(Vanishing)。为了解决这个问题,在深度学习研究刚刚开始的一段时间,就如雨后春笋般爆发出了很多技术手段,比如“线性整流函数”(ReLu),“批量归一化”(Batch Normalization),“预先训练”(Pre-Training)等等。
另外一个挑战是在 VGG 和 GoogleNet 的创新之后,大家慢慢发现单纯加入更多的网络层次其实并不能带来性能的提升。研究人员有这样一个发现:当一个模型加入到 50 多层后,模型的性能不但没有提升,反而还有下降,也就是模型的准确度变差了。这样看,好像模型的性能到了一个“瓶颈”。那是不是说深度模型的深度其实是有一个限度的呢?
我们从 GoogleNet 的思路可以看出,网络结构是可以加深的,比如对网络结构的局部进行创新。而这篇论文,就是追随 GoogleNet 的方法,在网络结构上提出了一个新的结构,叫“残差网络”(Residual Network),简称为 ResNet,从而能够把模型的规模从几层、十几层或者几十层一直推到了上百层的结构。这就是这篇文章的最大贡献。
从模型在实际数据集中的表现效果来看,ResNet 的错误率只有 VGG 和 GoogleNet 的一半,模型的泛化能力随着层数的增多而逐渐增加。这其实是一件非常值得深度学习学者振奋的事情,因为它意味着深度学习解决了一个重要问题,突破了一个瓶颈。
6.3 论文的核心方法
那这篇论文的核心思想是怎样的呢?我们一起来看。
我们先假设有一个隐含的基于输入 x 的函数 H。这个函数可以根据 x 来进行复杂的变换,比如多层的神经网络。然而,在实际中,我们并不知道这个 H 到底是什么样的。那么,传统的解决方式就是我们需要一个函数 F 去逼近 H。
而这篇文章提出的“残差学习”的方式,就是不用 F 去逼近 H,而是去逼近 H(x) 减去 x 的差值。在机器学习中,我们就把这个差值叫作“残差”,也就是表明目标函数和输入之间的差距。当然,我们依然无法知道函数 H,在实际中,我们是用 F 去进行残差逼近。
F(x)=H(x)-x,当我们把 x 移动到 F 的一边,这个时候就得到了残差学习的最终形式,也就是 F(x)+x 去逼近未知的 H。
在这个公式里,外面的这个 x 往往也被称作是“捷径”(Shortcuts)。什么意思呢?有学者发现,在一个深度神经网络结构中,有一些连接或者说层与层之间的关联其实是不必要的。我们关注的是,什么样的输入就应当映射到什么样的输出,也就是所谓的“等值映射”(Identity Mapping)。
遗憾的是,如果不对网络结构进行改进,模型无法学习到这些结构。那么,构建一个从输入到输出的捷径,也就是说,从 x 可以直接到 H(或者叫 y),而不用经过 F(x),在必要的时候可以强迫 F(x) 变 0。也就是说,捷径或者是残差这样的网络架构,在理论上可以帮助整个网络变得更加有效率,我们希望算法能够找到哪些部分是可以被忽略掉的,哪些部分需要保留下来。
在真实的网络架构中,作者们选择了在每两层卷积网络层之间就加入一个捷径,然后叠加了 34 层这样的架构。从效果上看,在 34 层的时候 ResNet 的确依然能够降低训练错误率。于是,作者们进一步尝试了 50 多层,再到 110 层,一直到 1202 层的网络。最终发现,在 110 层的时候能够达到最优的结果。而对于这样的网络,所有的参数达到了 170 万个。
为了训练 ResNet,作者们依然使用了批量归一化以及一系列初始化的技巧。值得一提的是,到了这个阶段之后,作者们就放弃了 Dropout,不再使用了。
6.4 里程碑:ResNet
VGGNet与Inception出现后,学者们将卷积网络不断加深以寻求更优越的性能,然而随着网络的加深,网络却越发难以训练,一方面会产生梯度消失现象;另一方面越深的网络返回的梯度相关性会越来越差,接近于白噪声,导致梯度更新也接近于随机扰动。
ResNet(Residual Network,残差网络)较好地解决了这个问题,并获得了2015年ImageNet分类任务的第一名。此后的分类、检测、分割等任务也大规模使用ResNet作为网络骨架。
ResNet的思想在于引入了一个深度残差框架来解决梯度消失问题,即让卷积网络去学习残差映射,而不是期望每一个堆叠层的网络都完整地拟合潜在的映射(拟合函数)。如图3.17所示,对于神经网络,如果我们期望的网络最终映射为H(x),左侧的网络需要直接拟合输出H(x),而右侧由ResNet提出的子模块,通过引入一个shortcut(捷径)分支,将需要拟合的映射变为残差F(x):H(x)-x。ResNet给出的假设是:相较于直接优化潜在映射H(x),优化残差映射F(x)是更为容易的。
在ResNet中,上述的一个残差模块称为Bottleneck。ResNet有不同网络层数的版本,如18层、34层、50层、101层和152层,这里以常用的50层来讲解。ResNet-50的网络架构如图3.18所示,最主要的部分在于中间经历了4个大的卷积组,而这4个卷积组分别包含了3、4、6这3个Bottleneck模块。最后经过一个全局平均池化使得特征图大小变为1×1,然后进行1000维的全连接,最后经过Softmax输出分类得分。
由于F(x)+x是逐通道进行相加,因此根据两者是否通道数相同,存在两种Bottleneck结构。对于通道数不同的情况,比如每个卷积组的第一个Bottleneck,需要利用1×1卷积对x进行Downsample操作,将通道数变为相同,再进行加操作。对于相同的情况下,两者可以直接进行相加。
利用PyTorch实现一个带有Downsample操作的Bottleneck结构,新建一个resnet_bottleneck.py文件,代码如下:
import torch.nn as nn
class Bottleneck(nn.Module):
def __init__(self, in_dim, out_dim, stride=1):
super(Bottleneck, self).__init__()
# 网路堆叠层是由1×1、3×3、1×1这3个卷积组成的,中间包含BN层
self.bottleneck = nn.Sequential(
nn.Conv2d(in_dim, in_dim, 1, bias=False),
nn.BatchNorm2d(in_dim),
nn.ReLU(inplace=True),
nn.Conv2d(in_dim, in_dim, 3, stride, 1, bias=False),
nn.BatchNorm2d(in_dim),
nn.ReLU(inplace=True),
nn.Conv2d(in_dim, out_dim, 1, bias=False),
nn.BatchNorm2d(out_dim), )
self.relu = nn.ReLU(inplace=True)
# Downsample部分是由一个包含BN层的1×1卷积组成
self.downsample = nn.Sequential(
nn.Conv2d(in_dim, out_dim, 1, 1),
nn.BatchNorm2d(out_dim), )
def forward(self, x):
identity = x
out = self.bottleneck(x)
identity = self.downsample(x)
# 将identity(恒等映射)与网络堆叠层输出进行相加,并经过ReLU后输出
out += identity
out = self.relu(out)
return out
在终端中进入上述resnet_bottleneck.py文件的同级目录,输入python3进入交互式环境,利用下面的代码调用该模块。
>>> import torch
>>> from resnet_bottleneck import Bottleneck
# 实例化Bottleneck,输入通道数为64,输出为256,对应第一个卷积组的第一个Bottleneck
>>> bottleneck_1_1 = Bottleneck(64, 256).cuda()
>>> bottleneck_1_1
# Bottleneck作为卷积堆叠层,包含了1×1、3×3、1×1这3个卷积层
Bottleneck(
(bottleneck): Sequential(
(0): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_ running_stats=True)
(2): ReLU(inplace)
(3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_ stats=True)
(5): ReLU(inplace)
(6): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(7): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_ running_stats=True) )
(relu): ReLU(inplace)
# 利用Downsample结构将恒等映射的通道数变为与卷积堆叠层相同,保证可以相加
(downsample): Sequential(
(0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1))
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_ running_stats=True) ) )
>>> input = torch.randn(1, 64, 56, 56).cuda()
>>> output = bottleneck_1_1(input)
# 将输入送到Bottleneck结构中
>>> input.shape torch.Size([1, 64, 56, 56])
>>> output.shape
# 相比输入,输出的特征图分辨率没变,而通道数变为4倍
torch.Size([1, 256, 56, 56])
torchresNet50代码
import torch
import torch.nn as nn
Layers = [3, 4, 6, 3]
class Block(nn.Module):# nn.model 在类中实现网络各层的定义及前项计算和反向传播机制
def __init__(self, in_channels, filters, stride=1, is_1x1conv=False):
# in_channels 通道数
super(Block, self).__init__()
filter1, filter2, filter3 = filters # 各层卷积核个数
self.is_1x1conv = is_1x1conv # 直接将浅层的特征图仅仅经历一次卷积的捷径,正常情况下应该是三次卷积 跳跃连接
self.relu = nn.ReLU(inplace=True)# 原地覆盖
self.conv1 = nn.Sequential(
nn.Conv2d(in_channels, filter1, kernel_size=1, stride=stride,bias=False),
nn.BatchNorm2d(filter1), # BN层 相当于加快网络的训练和收敛的速度
# 控制梯度爆炸防止梯度消失 防止过拟合
nn.ReLU()
)
self.conv2 = nn.Sequential(
nn.Conv2d(filter1, filter2, kernel_size=3, stride=1, padding=1, bias=False),
nn.BatchNorm2d(filter2),
nn.ReLU()
)
self.conv3 = nn.Sequential(
nn.Conv2d(filter2, filter3, kernel_size=1, stride=1, bias=False),
nn.BatchNorm2d(filter3),
# 没有加上Relu()函数,主要是这里需要判断这个板块是否激活了self.shortcut,
# 只有加上这个之后才能一起Relu。
)
# 这段代码就是特征图捷径,浅层特征图就经历一次卷积直接与进行三次卷积之后的特征图相加
if is_1x1conv:
self.shortcut = nn.Sequential(
nn.Conv2d(in_channels, filter3, kernel_size=1, stride=stride, bias=False),
nn.BatchNorm2d(filter3)
)
def forward(self, x):
x_shortcut = x
x = self.conv1(x)
x = self.conv2(x)
x = self.conv3(x)
if self.is_1x1conv:
x_shortcut = self.shortcut(x_shortcut)
x = x + x_shortcut
x = self.relu(x)
return x
class Resnet50(nn.Module):
def __init__(self):
super(Resnet50,self).__init__()
self.conv1 = nn.Sequential(
nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3),
nn.BatchNorm2d(64),
nn.ReLU(),
)
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
self.conv2 = self._make_layer(64, (64, 64, 256), Layers[0])
self.conv3 = self._make_layer(256, (128, 128, 512), Layers[1], 2)
self.conv4 = self._make_layer(512, (256, 256, 1024), Layers[2], 2)
self.conv5 = self._make_layer(1024, (512, 512, 2048), Layers[3], 2)
self.avgpool = nn.AdaptiveAvgPool2d((1, 1)) # 自适应平均池化
self.fc = nn.Sequential(
nn.Linear(2048, 1000) # 全连接
)
def forward(self, input):
x = self.conv1(input)
x = self.maxpool(x)
x = self.conv2(x)
x = self.conv3(x)
x = self.conv4(x)
x = self.conv5(x)
x = self.avgpool(x)
x = torch.flatten(x, 1) # 扁平化
x = self.fc(x)
return x
def _make_layer(self, in_channels, filters, blocks, stride=1): # blocks 残差块的个数
layers = [] # 记录每次残差块的数据
block_1 = Block(in_channels, filters, stride=stride, is_1x1conv=True)
layers.append(block_1)
print(len(layers))
for i in range(1, blocks):
print(filters[2])
layers.append(Block(filters[2], filters, stride=1, is_1x1conv=False))
return nn.Sequential(*layers) # *args传的是元组 **kwargs传的是字典
net = Resnet50()
x = torch.rand((10, 3, 224, 224))
print(net(x).shape)
print(net)
# for name,layer in net.named_children():
# if name != "fc":
# x = layer(x)
# print(name, 'output shaoe:', x.shape)
# else:
# x = x.view(x.size(0), -1)
# x = layer(x)
# print(name, 'output shaoe:', x.shape)
tensorflow2cifar10
import tensorflow as tf
import os
import pickle
import numpy as np
CIFAR_DIR = r"C:\Users\Administrator.DESKTOP-T76M1PJ\.keras\datasets\cifar-10-batches-py"
print(os.listdir(CIFAR_DIR))
def load_data(filename):
"""read data from data file."""
with open(filename, 'rb') as f:
data = pickle.load(f, encoding='bytes')
return data[b'data'], data[b'labels']
# tensorflow.Dataset.
class CifarData:
def __init__(self, filenames, need_shuffle):
all_data = []
all_labels = []
for filename in filenames:
data, labels = load_data(filename)
all_data.append(data)
all_labels.append(labels)
self._data = np.vstack(all_data)
# 归一化
self._data = self._data / 127.5 - 1
self._labels = np.hstack(all_labels)
print(self._data.shape)
print(self._labels.shape)
# 样本数量
self._num_examples = self._data.shape[0]
# 是否需要乱序
self._need_shuffle = need_shuffle
# 分批次索引
self._indicator = 0
# 如果需要,则乱序
if self._need_shuffle:
self._shuffle_data()
# 乱序函数
def _shuffle_data(self):
# [0,1,2,3,4,5] -> [5,3,2,4,0,1]
p = np.random.permutation(self._num_examples)
self._data = self._data[p]
self._labels = self._labels[p]
# 分批次训练
def next_batch(self, batch_size):
"""return batch_size examples as a batch."""
end_indicator = self._indicator + batch_size
if end_indicator > self._num_examples:
if self._need_shuffle:
self._shuffle_data()
self._indicator = 0
end_indicator = batch_size
else:
raise Exception("have no more examples")
if end_indicator > self._num_examples:
raise Exception("batch size is larger than all examples")
batch_data = self._data[self._indicator: end_indicator]
batch_labels = self._labels[self._indicator: end_indicator]
self._indicator = end_indicator
return batch_data, batch_labels
train_filenames = [os.path.join(CIFAR_DIR, 'data_batch_%d' % i) for i in range(1, 6)]
test_filenames = [os.path.join(CIFAR_DIR, 'test_batch')]
# 训练集乱序处理,测试集非乱序
train_data = CifarData(train_filenames, True)
test_data = CifarData(test_filenames, False)
# Models
def residual_block(x, output_channel):
"""residual connection implementation"""
# 获取输入通道
input_channel = x.get_shape().as_list()[-1]
# 如果输出通道是输入通达的两倍
if input_channel * 2 == output_channel:
# 则需要对x进行填充
increase_dim = True
# 卷积步长为2
strides = (2, 2)
# 如果输出通道与输出通道相同
elif input_channel == output_channel:
# 不需要填充
increase_dim = False
# 卷积步长为1
strides = (1, 1)
else:
raise Exception("input channel can't match output channel")
# 残差块
conv1 = tf.layers.conv2d(x,
output_channel,
(3, 3),
strides=strides,
padding='same',
activation=tf.nn.relu,
name='conv1')
conv2 = tf.layers.conv2d(conv1,
output_channel,
(3, 3),
strides=(1, 1),
padding='same',
activation=tf.nn.relu,
name='conv2')
# 是否对x进行填充
if increase_dim:
# [None, image_width, image_height, channel] -> [,,,channel*2]
pooled_x = tf.layers.average_pooling2d(x,
(2, 2),
(2, 2),
padding='valid')
padded_x = tf.pad(pooled_x,
[[0, 0],
[0, 0],
[0, 0],
[input_channel // 2, input_channel // 2]])
else:
padded_x = x
# 完成拼接作为输出
output_x = conv2 + padded_x
return output_x
# 网络结构
def res_net(x,
num_residual_blocks, # 残差块的数量列表
num_filter_base, # 残差块输出通道数
class_num): # 类别数
"""residual network implementation"""
"""
Args:
- x:
- num_residual_blocks: eg: [3, 4, 6, 3]
- num_filter_base:
- class_num:
"""
num_subsampling = len(num_residual_blocks)
layers = []
# x: [None, width, height, channel] -> [width, height, channel]
input_size = x.get_shape().as_list()[1:]
print("input_size get_shape():", input_size)
with tf.variable_scope('conv0'):
conv0 = tf.layers.conv2d(x,
num_filter_base,
(3, 3),
strides=(1, 1),
padding='same',
activation=tf.nn.relu,
name='conv0')
layers.append(conv0)
print("layers[-1]:", layers[-1])
# eg:num_subsampling = 4, sample_id = [0,1,2,3]
for sample_id in range(num_subsampling):
for i in range(num_residual_blocks[sample_id]):
with tf.variable_scope("conv%d_%d" % (sample_id, i)):
conv = residual_block(
layers[-1],
num_filter_base * (2 ** sample_id))
layers.append(conv)
multiplier = 2 ** (num_subsampling - 1)
assert layers[-1].get_shape().as_list()[1:] \
== [input_size[0] / multiplier,
input_size[1] / multiplier,
num_filter_base * multiplier]
with tf.variable_scope('fc'):
# layer[-1].shape : [None, width, height, channel]
# kernal_size: image_width, image_height
global_pool = tf.reduce_mean(layers[-1], [1, 2])
logits = tf.layers.dense(global_pool, class_num)
layers.append(logits)
return layers[-1]
x = tf.placeholder(tf.float32, [None, 3072])
y = tf.placeholder(tf.int64, [None])
# [None], eg: [0,5,6,3]
x_image = tf.reshape(x, [-1, 3, 32, 32])
# 32*32
x_image = tf.transpose(x_image, perm=[0, 2, 3, 1])
y_ = res_net(x_image, [2, 3, 2], 32, 10)
loss = tf.losses.sparse_softmax_cross_entropy(labels=y, logits=y_)
# y_ -> sofmax
# y -> one_hot
# loss = ylogy_
# indices
predict = tf.argmax(y_, 1)
# [1,0,1,1,1,0,0,0]
correct_prediction = tf.equal(predict, y)
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float64))
with tf.name_scope('train_op'):
train_op = tf.train.AdamOptimizer(1e-3).minimize(loss)
# Trainning
init = tf.global_variables_initializer()
batch_size = 20
train_steps = 10000
test_steps = 100
# run 10k: 73.75%
with tf.Session() as sess:
sess.run(init)
for i in range(train_steps):
batch_data, batch_labels = train_data.next_batch(batch_size)
loss_val, acc_val, _ = sess.run(
[loss, accuracy, train_op],
feed_dict={
x: batch_data,
y: batch_labels})
if (i + 1) % 500 == 0:
print('[Train] Step: %d, loss: %4.5f, acc: %4.5f'
% (i + 1, loss_val, acc_val))
if (i + 1) % 5000 == 0:
test_data = CifarData(test_filenames, False)
all_test_acc_val = []
for j in range(test_steps):
test_batch_data, test_batch_labels \
= test_data.next_batch(batch_size)
test_acc_val = sess.run(
[accuracy],
feed_dict={
x: test_batch_data,
y: test_batch_labels
})
all_test_acc_val.append(test_acc_val)
test_acc = np.mean(all_test_acc_val)
print('[Test ] Step: %d, acc: %4.5f' % (i + 1, test_acc))
torch简单版
import torch
from torch import nn
def conv3x3(in_channel, out_channel, stride=1):
return nn.Conv2d(in_channel, out_channel, kernel_size=3, stride=stride, padding=1, bias=False)
def conv1x1(in_channel, out_channel, stride=1):
return nn.Conv2d(in_channel, out_channel, kernel_size=1, stride=stride, bias=False)
class BasicBlock(nn.Module):
def __init__(self, in_channel):
super(BasicBlock, self).__init__()
self.conv1 = nn.Sequential(
conv3x3(in_channel, in_channel),
nn.BatchNorm2d(in_channel),
nn.ReLU(inplace=True)
)
self.conv2 = nn.Sequential(
conv3x3(in_channel, in_channel),
nn.BatchNorm2d(in_channel),
nn.ReLU(inplace=True)
)
self.relu = nn.ReLU(inplace=True)
def forward(self, input):
x = self.conv1(input)
# print('\tconv1', x.shape)
x = self.conv2(x)
# print('\tconv2', x.shape)
out = self.relu(x + input)
return out
class Bottleneck(nn.Module):
exp = 4
def __init__(self, in_channel):
super(Bottleneck, self).__init__()
self.conv1 = nn.Sequential(
conv1x1(in_channel, in_channel),
nn.BatchNorm2d(in_channel)
)
self.conv2 = nn.Sequential(
conv3x3(in_channel, in_channel),
nn.BatchNorm2d(in_channel),
)
self.conv3 = nn.Sequential(
conv1x1(in_channel, in_channel * self.exp),
nn.BatchNorm2d(in_channel * self.exp),
nn.ReLU(inplace=True)
)
self.downsample = nn.Sequential(
conv1x1(in_channel, in_channel * self.exp),
nn.BatchNorm2d(in_channel * self.exp),
nn.ReLU(inplace=True)
)
self.relu = nn.ReLU(inplace=True)
def forward(self, input):
x = self.conv1(input)
# print('\tconv1', x.shape)
x = self.conv2(x)
# print('\tconv2', x.shape)
x = self.conv3(x)
# print('\tconv3', x.shape)
identity = self.downsample(input)
out = self.relu(x + identity)
return out
class ResNet(nn.Module):
def __init__(self, in_channel, num_classes):
super(ResNet, self).__init__()
self.conv1 = nn.Sequential(
nn.Conv2d(in_channel, 64, kernel_size=5, stride=2, padding=2),
nn.BatchNorm2d(64),
nn.ReLU(inplace=True)
)
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
self.res_block1 = nn.Sequential(
BasicBlock(64),
BasicBlock(64)
)
self.res_block2 = nn.Sequential(
nn.MaxPool2d(kernel_size=3, stride=2, padding=1),
nn.ReLU(inplace=True),
Bottleneck(64),
Bottleneck(64 * Bottleneck.exp),
Bottleneck(64 * (Bottleneck.exp ** 2))
)
self.output = nn.Sequential(
nn.Conv2d(64 * (Bottleneck.exp ** 3), 1024, kernel_size=4, bias=False),
nn.BatchNorm2d(1024),
nn.ReLU(inplace=True),
nn.Flatten(),
nn.Linear(1024, num_classes),
nn.Dropout(0.4),
nn.ReLU(inplace=True),
)
def forward(self, input):
x = self.conv1(input)
# print('res conv1', x.shape)
x = self.maxpool(x)
# print('maxpool', x.shape)
x = self.res_block1(x)
# print('res_block1', x.shape)
x = self.res_block2(x)
# print('res_block2', x.shape)
x = self.output(x)
# print('output', x.shape)
return x
if __name__ == '__main__':
fack_img = torch.randint(0, 255, [10, 3, 32, 32]).type(torch.FloatTensor)
resnet = ResNet(3, 10)
out = resnet(fack_img)
print(out)
import torch
from torch.utils.data import DataLoader
import torch.nn as nn
from torchvision import transforms
from torch import optim
from torchvision import datasets
from visdom import Visdom
batch_size= 32
learning_rate = 0.01
epochs = 2
train_cifar = datasets.CIFAR10('../pytorch/手写数字识别/cifar',train=True,transform=transforms.Compose([
transforms.Resize((32,32)),
transforms.ToTensor()
]),download=True)
train_cifar = DataLoader(train_cifar,batch_size= batch_size,shuffle=True)
test_cifar = datasets.CIFAR10('../pytorch/手写数字识别/cifar',train=False,transform=transforms.Compose([
transforms.Resize((32,32)),
transforms.ToTensor()
]),download=True)
test_cifar = DataLoader(test_cifar,batch_size= batch_size,shuffle=False)
sample,y_sample = iter(train_cifar).next()
print(sample.shape,y_sample.shape)
Layers = [3,4]
class Block(nn.Module):
def __init__(self,in_channels,filters,stride,is_1x1conv = False):
super(Block, self).__init__()
self.is_1x1conv = is_1x1conv
self.relu = nn.ReLU(True)
filter1,filter2,filter3 = filters
self.conv1= nn.Sequential(
nn.Conv2d(in_channels,filter1,kernel_size=1,stride=stride,bias=False),
nn.BatchNorm2d(filter1),
nn.ReLU(True),
)
self.conv2 = nn.Sequential(
nn.Conv2d(filter1, filter2,kernel_size=3, stride=1,padding=1, bias=False),
nn.BatchNorm2d(filter2),
nn.ReLU(True),
)
self.conv3= nn.Sequential(
nn.Conv2d(filter2,filter3,kernel_size=1,stride=1,bias=False),
nn.BatchNorm2d(filter3),
)
if is_1x1conv:
self.shortcut = nn.Sequential(
nn.Conv2d(in_channels,filter3,kernel_size=1,stride=stride,bias=False),
nn.BatchNorm2d(filter3),
)
def forward(self,x):
x_shortcut = x
x = self.conv1(x)
x = self.conv2(x)
x = self.conv3(x)
if self.is_1x1conv:
x_shortcut = self.shortcut(x_shortcut)
x =x + x_shortcut
x = self.relu(x)
return x
class ResNet(nn.Module):
def __init__(self):
super(ResNet, self).__init__()
self.conv1 = nn.Sequential(
nn.Conv2d(3,64,kernel_size=7,stride=2,padding=3),
nn.BatchNorm2d(64),
nn.ReLU(True)
)
self.maxpool = nn.MaxPool2d(kernel_size=3,stride=2,padding=1)
self.conv2 = self._make_layers(64,(64,64,256),Layers[0])
self.conv3 = self._make_layers(256,(128,128,512),Layers[1],2)
self.avgpool = nn.AdaptiveAvgPool2d((1,1))
self.fc = nn.Sequential(
nn.Linear(512,10),
)
def forward(self,input):
x = self.conv1(input)
x = self.maxpool(x)
x = self.conv2(x)
x = self.conv3(x)
x = self.avgpool(x)
x = torch.flatten(x,1)
x = self.fc(x)
return x
def _make_layers(self,in_channels,filters,blocks,stride = 1):
layers = []
block_1 = Block(in_channels,filters,stride,is_1x1conv = True)
layers.append(block_1)
for i in range(1,blocks):
print(filters[2])
layers.append(Block(filters[2],filters,stride=1,is_1x1conv=False))
return nn.Sequential(*layers)
model = ResNet()
print(model(sample).shape)
optimizer = optim.SGD(model.parameters(),lr = learning_rate)
critenron = nn.CrossEntropyLoss()
global_step = 0
# viz = Visdom()
# viz.line([0.],[0.],win='train_loss',opts=dict(title = 'train_loss'))
# viz.line([[0.,0.]],[0.],win='test',opts=dict(title = 'test_loss&acc',legend = ['loss','acc']))
for epoch in range(epochs):
for idx,(x,y) in enumerate(train_cifar):
output = model(x)
loss = critenron(output,y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
global_step += 1
if idx %10 == 0:
print("epoch:",epoch,'idx:',idx,'loss:',loss)
test_loss = 0
correct = 0
for data,target in test_cifar:
logits = model(data)
test_loss+= critenron(logits,target).item()
pred = logits.argmax(dim = 1 )
correct += pred.eq(target.data).float().sum().item()
print('correct :',correct / len(test_cifar.dataset))
为什么出现残差网络?
作者发现,随着网络层数的增加,网络发生了退化(degradation)的现象:随着网络层数的增多,训练集loss逐渐下降,然后趋于饱和,当你再增加网络深度的话,训练集loss反而会增大。注意这并不是过拟合,因为在过拟合中训练loss是一直减小的。
残差块结构
在2015年的ILSVRC比赛获得第一之后,何恺明对残差网络进行了改进,主要是把ReLu给移动到了conv之前,相应的shortcut不在经过ReLu,相当于输入输出直连。并且论文中对ReLu,BN和卷积层的顺序做了实验,最后确定了改进后的残差网络基本构造模块,因为对于每个单元,激活函数放在了仿射变换之前,所以论文叫做预激活残差单元