- 开篇
- 五种网络结构
- 项目结构代码实现
- 小结
一、开篇
也算是对卷积神经网络的一个小总结了,对网络的单一手撕感觉在2024年这个时间节点已经没有必要了,寻求统一的方式对过往经典卷积神经网络进行汇总说明是快速进入下一阶段内容学习的小技巧。因此,这次就完整的对五种经典卷积神经网络结构进行汇总。
二、五种网络结构
在这一章节,我们将详细介绍五种经典的卷积神经网络结构:LeNet、AlexNet、VGG、GoogLeNet和ResNet。这些网络在深度学习的发展过程中起到了至关重要的作用,它们的设计思想和创新点为后续的研究和应用提供了宝贵的经验和灵感。
-
1. LeNet
LeNet是由Yann LeCun等人在1998年提出的,主要用于手写数字识别(MNIST数据集)。LeNet是最早的卷积神经网络之一,其结构简单但有效,为后来的深度学习网络奠定了基础。
网络结构:
- 输入层:28x28x1的灰度图像
- 卷积层1:6个5x5的卷积核,步长为1,输出尺寸为28x28x6
- 池化层1:2x2的平均池化,步长为2,输出尺寸为14x14x6
- 卷积层2:16个5x5的卷积核,步长为1,输出尺寸为10x10x16
- 池化层2:2x2的平均池化,步长为2,输出尺寸为5x5x16
- 全连接层1:120个神经元
- 全连接层2:84个神经元
- 输出层:10个神经元(对应0-9的数字)
2. AlexNet
AlexNet是由Alex Krizhevsky等人在2012年提出的,它在ImageNet图像分类挑战赛中取得了突破性的成果,标志着深度学习在计算机视觉领域的广泛应用。
网络结构:
- 输入层:224x224x3的RGB图像
- 卷积层1:96个11x11的卷积核,步长为4,输出尺寸为55x55x96
- 池化层1:3x3的最大池化,步长为2,输出尺寸为27x27x96
- 卷积层2:256个5x5的卷积核,步长为1,输出尺寸为27x27x256
- 池化层2:3x3的最大池化,步长为2,输出尺寸为13x13x256
- 卷积层3:384个3x3的卷积核,步长为1,输出尺寸为13x13x384
- 卷积层4:384个3x3的卷积核,步长为1,输出尺寸为13x13x384
- 卷积层5:256个3x3的卷积核,步长为1,输出尺寸为13x13x256
- 池化层3:3x3的最大池化,步长为2,输出尺寸为6x6x256
- 全连接层1:4096个神经元
- 全连接层2:4096个神经元
- 输出层:1000个神经元(对应1000个类别)
- 输入层:224x224x3的RGB图像
- 卷积层1:2个3x3的卷积核,步长为1,输出尺寸为224x224x64
- 输入层:224x224x3的RGB图像
- 初始卷积层和池化层:7x7卷积,stride=2;3x3最大池化,stride=2
- Inception模块:每个模块包含1x1卷积、3x3卷积、5x5卷积以及3x3最大池化,这些操作并行进行,最后将结果在通道维度上连接起来
- 全局平均池化:代替传统的全连接层,减少参数数量和过拟合风险
- 输出层:1000个神经元(对应1000个类别)
- 输入层:224x224x3的RGB图像
- 初始卷积层和池化层:7x7卷积,stride=2;3x3最大池化,stride=2
- 残差块:由多个BasicBlock或Bottleneck Block组成,通过恒等映射直接将输入加到输出上,解决深度网络的退化问题
- Stage 1:64通道的残差块
- Stage 2:128通道的残差块,步长2
- Stage 3:256通道的残差块,步长2
- Stage 4:512通道的残差块,步长2
- 全局平均池化:减少特征图的尺寸
- 输出层:1000个神经元(对应1000个类别)
- 池化层1:2x2的最大池化,步长为2,输出尺寸为112x112x64
- 卷积层2:2个3x3的卷积核,步长为1,输出尺寸为112x112x128
- 池化层2:2x2的最大池化,步长为2,输出尺寸为56x56x128
- 卷积层3:3个3x3的卷积核,步长为1,输出尺寸为56x56x256
- 池化层3:2x2的最大池化,步长为2,输出尺寸为28x28x256
- 卷积层4:3个3x3的卷积核,步长为1,输出尺寸为28x28x512
- 池化层4:2x2的最大池化,步长为2,输出尺寸为14x14x512
- 卷积层5:3个3x3的卷积核,步长为1,输出尺寸为14x14x512
- 池化层5:2x2的最大池化,步长为2,输出尺寸为7x7x512
- 全连接层1:4096个神经元
- 全连接层2:4096个神经元
- 输出层:1000个神经元(对应1000个类别)
AlexNet通过更深的网络结构和更大的卷积核,结合ReLU激活函数和Dropout正则化,显著提高了图像分类的性能。
3. VGG
VGG是由牛津大学视觉几何组提出的,其设计思想是通过堆叠较小的卷积核(3x3)来加深网络,以提取更细致的特征。VGG在ImageNet图像分类任务中取得了优异的成绩。
网络结构:
VGG的结构简单且统一,通过深层网络结构和小卷积核的组合,展现了强大的特征提取能力。
4. GoogLeNet (Inception)
GoogLeNet是由Google提出的,其主要创新在于引入了Inception模块,通过不同尺度的卷积核并行操作来提取多尺度特征。
网络结构:
GoogLeNet通过Inception模块有效地融合了不同尺度的特征,并通过全局平均池化减少了参数量。
5. ResNet
ResNet是由微软研究院提出的,其主要创新在于引入了残差块(Residual Block),通过恒等映射(Identity Mapping)解决了深度网络中的梯度消失问题,使得网络可以训练得更深。
网络结构:
ResNet通过残差块有效地解决了深度网络中的梯度消失和退化问题,使得网络可以达到更深的层次。
通过理解和掌握这五种经典的卷积神经网络结构,我们可以深入了解卷积神经网络的发展历程和设计思路,为后续的深度学习研究和应用奠定坚实的基础。
三、代码实现
在这一章节,我们将详细的给出五种模型的网络结构,以方便通用调用来实现模型训练。
1. LeNet
import torch
from torch import nn
import torch.nn.functional as F
class Lenet(nn.Module):
def __init__(self, num_classes=10):
super(Lenet, self).__init__()
self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5, stride=1, padding=2)
self.conv2 = nn.Conv2d(in_channels=6, out_channels=16, kernel_size=5, stride=1, padding=0)
self.fc1 = nn.Linear(16*5*5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, num_classes)
def forward(self, x):
x = F.relu(self.conv1(x))
x = F.max_pool2d(x, kernel_size=2, stride=2)
x = F.relu(self.conv2(x))
x = F.max_pool2d(x, kernel_size=2, stride=2)
x = x.view(-1, 16*5*5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
if __name__ == '__main__':
X = torch.randn(1, 1, 28, 28)
model = Lenet(num_classes=5)
output = model(X)
print(output.shape)
2. Alexnet
import torch
from torch import nn
import torch.nn.functional as F
class Alexnet(nn.Module):
def __init__(self,num_classes = 1000):
super(Alexnet, self).__init__()
# 计算公式: (input_size - kernel_size + 2 * padding) / stride + 1 = output_size
# (224-11+2*2)/4 + 1 = 55
self.features = nn.Sequential(
# first conv2d
# 输入 (1, 3, 224, 224)
# 计算 (224-11 + 2*2)
# 输出(1,55,55,96)
nn.Conv2d(3, 96, kernel_size=11, stride=4, padding=2),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
# second conv2d
nn.Conv2d(96, 256 ,kernel_size=5, stride=1, padding=2),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
# third conv2d
nn.Conv2d(256, 384, kernel_size=3, stride=1, padding=1),
nn.ReLU(inplace=True),
# four conv2d
nn.Conv2d(384, 384, kernel_size=3, stride=1, padding=1),
nn.ReLU(inplace=True),
# five conv2
nn.Conv2d(384, 256, kernel_size=3, stride=1, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
)
self.classifier = nn.Sequential(
nn.Dropout(p=0.5),
nn.Linear(256*6*6, 4096),
nn.ReLU(inplace=True),
nn.Dropout(p=0.5),
nn.Linear(4096,4096),
nn.ReLU(inplace=True),
nn.Linear(4096,num_classes),
)
def forward(self, x):
x = self.features(x)
x = torch.flatten(x, 1)
x = self.classifier(x)
return x
if __name__ == '__main__':
X = torch.randn(1, 3, 227, 227)
model = Alexnet()
output = model(X)
print(output.shape)
3. VGG
import torch
from torch import nn
import torch.nn.functional as F
# 构建字典存储不同VGG网络的配置列表
cfgs = {
'vgg11' : [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M'],
'vgg13' : [64, 64, 'M', 128, 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
'vgg16' : [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'],
'vgg19' : [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 512, 512, 512, 512, 'M'],
}
# 结构化构建
def make_feature(cfg: list): # nn.features
layers = []
in_channels = 3
for v in cfg:
# print(v)
if v == 'M':
layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
else:
conv2d = nn.Conv2d(in_channels, v, kernel_size=3, padding=1)
layers += [conv2d, nn.ReLU(True)]
in_channels = v
return nn.Sequential(*layers)
class VGG(nn.Module):
def __init__(self, features, num_classses=1000):
super(VGG, self).__init__()
self.features = features
self.classifier = nn.Sequential(
nn.Dropout(p=0.5),
nn.Linear(512*7*7, 2048),
nn.ReLU(True),
nn.Dropout(p=0.5),
nn.Linear(2048,2048),
nn.ReLU(True),
nn.Linear(2048, num_classses)
)
def forward(self, x):
x = self.features(x)
x = torch.flatten(x, start_dim=1)
x = self.classifier(x)
return x
def vgg(model_name='vgg16', num_classes=10):
cfg = cfgs[model_name]
model = VGG(make_feature(cfg), num_classses=num_classes)
return model
if __name__ == '__main__':
X = torch.randn(1, 3, 224, 224)
model = vgg(model_name='vgg16', num_classes=10)
output = model(X)
print(output.shape)
4. Googlenet
import torch.nn as nn
import torch
import torch.nn.functional as F
class GoogLeNet(nn.Module):
def __init__(self, num_classes=1000, aux_logits=True, init_weights=False):
super(GoogLeNet, self).__init__()
self.aux_logits = aux_logits
self.conv1 = BasicConv2d(3, 64, kernel_size=7, stride=2, padding=3)
self.maxpool1 = nn.MaxPool2d(3, stride=2, ceil_mode=True)
self.conv2 = BasicConv2d(64, 64, kernel_size=1)
self.conv3 = BasicConv2d(64, 192, kernel_size=3, padding=1)
self.maxpool2 = nn.MaxPool2d(3, stride=2, ceil_mode=True)
self.inception3a = Inception(192, 64, 96, 128, 16, 32, 32)
self.inception3b = Inception(256, 128, 128, 192, 32, 96, 64)
self.maxpool3 = nn.MaxPool2d(3, stride=2, ceil_mode=True)
self.inception4a = Inception(480, 192, 96, 208, 16, 48, 64)
self.inception4b = Inception(512, 160, 112, 224, 24, 64, 64)
self.inception4c = Inception(512, 128, 128, 256, 24, 64, 64)
self.inception4d = Inception(512, 112, 144, 288, 32, 64, 64)
self.inception4e = Inception(528, 256, 160, 320, 32, 128, 128)
self.maxpool4 = nn.MaxPool2d(3, stride=2, ceil_mode=True)
self.inception5a = Inception(832, 256, 160, 320, 32, 128, 128)
self.inception5b = Inception(832, 384, 192, 384, 48, 128, 128)
if self.aux_logits:
self.aux1 = InceptionAux(512, num_classes)
self.aux2 = InceptionAux(528, num_classes)
self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
self.dropout = nn.Dropout(0.4)
self.fc = nn.Linear(1024, num_classes)
if init_weights:
self._initialize_weights()
def forward(self, x):
# N x 3 x 224 x 224
x = self.conv1(x)
# N x 64 x 112 x 112
x = self.maxpool1(x)
# N x 64 x 56 x 56
x = self.conv2(x)
# N x 64 x 56 x 56
x = self.conv3(x)
# N x 192 x 56 x 56
x = self.maxpool2(x)
# N x 192 x 28 x 28
x = self.inception3a(x)
# N x 256 x 28 x 28
x = self.inception3b(x)
# N x 480 x 28 x 28
x = self.maxpool3(x)
# N x 480 x 14 x 14
x = self.inception4a(x)
# N x 512 x 14 x 14
if self.training and self.aux_logits: # eval model lose this layer
aux1 = self.aux1(x)
x = self.inception4b(x)
# N x 512 x 14 x 14
x = self.inception4c(x)
# N x 512 x 14 x 14
x = self.inception4d(x)
# N x 528 x 14 x 14
if self.training and self.aux_logits: # eval model lose this layer
aux2 = self.aux2(x)
x = self.inception4e(x)
# N x 832 x 14 x 14
x = self.maxpool4(x)
# N x 832 x 7 x 7
x = self.inception5a(x)
# N x 832 x 7 x 7
x = self.inception5b(x)
# N x 1024 x 7 x 7
x = self.avgpool(x)
# N x 1024 x 1 x 1
x = torch.flatten(x, 1)
# N x 1024
x = self.dropout(x)
x = self.fc(x)
# N x 1000 (num_classes)
if self.training and self.aux_logits: # eval model lose this layer
return x, aux2, aux1
return x
def _initialize_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
if m.bias is not None:
nn.init.constant_(m.bias, 0)
elif isinstance(m, nn.Linear):
nn.init.normal_(m.weight, 0, 0.01)
nn.init.constant_(m.bias, 0)
class Inception(nn.Module):
def __init__(self, in_channels, ch1x1, ch3x3red, ch3x3, ch5x5red, ch5x5, pool_proj):
super(Inception, self).__init__()
self.branch1 = BasicConv2d(in_channels, ch1x1, kernel_size=1)
self.branch2 = nn.Sequential(
BasicConv2d(in_channels, ch3x3red, kernel_size=1),
BasicConv2d(ch3x3red, ch3x3, kernel_size=3, padding=1) # 保证输出大小等于输入大小
)
self.branch3 = nn.Sequential(
BasicConv2d(in_channels, ch5x5red, kernel_size=1),
# 在官方的实现中,其实是3x3的kernel并不是5x5,这里我也懒得改了,具体可以参考下面的issue
# Please see https://github.com/pytorch/vision/issues/906 for details.
BasicConv2d(ch5x5red, ch5x5, kernel_size=5, padding=2) # 保证输出大小等于输入大小
)
self.branch4 = nn.Sequential(
nn.MaxPool2d(kernel_size=3, stride=1, padding=1),
BasicConv2d(in_channels, pool_proj, kernel_size=1)
)
def forward(self, x):
branch1 = self.branch1(x)
branch2 = self.branch2(x)
branch3 = self.branch3(x)
branch4 = self.branch4(x)
outputs = [branch1, branch2, branch3, branch4]
return torch.cat(outputs, 1)
class InceptionAux(nn.Module):
def __init__(self, in_channels, num_classes):
super(InceptionAux, self).__init__()
self.averagePool = nn.AvgPool2d(kernel_size=5, stride=3)
self.conv = BasicConv2d(in_channels, 128, kernel_size=1) # output[batch, 128, 4, 4]
self.fc1 = nn.Linear(2048, 1024)
self.fc2 = nn.Linear(1024, num_classes)
def forward(self, x):
# aux1: N x 512 x 14 x 14, aux2: N x 528 x 14 x 14
x = self.averagePool(x)
# aux1: N x 512 x 4 x 4, aux2: N x 528 x 4 x 4
x = self.conv(x)
# N x 128 x 4 x 4
x = torch.flatten(x, 1)
x = F.dropout(x, 0.5, training=self.training)
# N x 2048
x = F.relu(self.fc1(x), inplace=True)
x = F.dropout(x, 0.5, training=self.training)
# N x 1024
x = self.fc2(x)
# N x num_classes
return x
class BasicConv2d(nn.Module):
def __init__(self, in_channels, out_channels, **kwargs):
super(BasicConv2d, self).__init__()
self.conv = nn.Conv2d(in_channels, out_channels, **kwargs)
self.relu = nn.ReLU(inplace=True)
def forward(self, x):
x = self.conv(x)
x = self.relu(x)
return x
if __name__ == '__main__':
X = torch.randn(1, 3, 224, 224)
model = GoogLeNet(num_classes=10)
output = model(X)
print(output.shape)
5. Resnet
import torch
import torch.nn as nn
import torch.nn.functional as F
class BasicBlock(nn.Module):
expansion = 1
def __init__(self, in_channels, out_channels, stride=1, downsample=None):
super(BasicBlock, self).__init__()
self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False)
self.bn1 = nn.BatchNorm2d(out_channels)
self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1, bias=False)
self.bn2 = nn.BatchNorm2d(out_channels)
self.downsample = downsample
def forward(self, x):
identity = x
out = self.conv1(x)
out = self.bn1(out)
out = F.relu(out)
out = self.conv2(out)
out = self.bn2(out)
if self.downsample is not None:
identity = self.downsample(x)
out += identity
out = F.relu(out)
return out
class ResNet(nn.Module):
def __init__(self, block, layers, num_classes=1000):
super(ResNet, self).__init__()
self.in_channels = 64
self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False)
self.bn1 = nn.BatchNorm2d(64)
self.relu = nn.ReLU(inplace=True)
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
self.layer1 = self._make_layer(block, 64, layers[0])
self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
self.fc = nn.Linear(512 * block.expansion, num_classes)
def _make_layer(self, block, out_channels, blocks, stride=1):
downsample = None
if stride != 1 or self.in_channels != out_channels * block.expansion:
downsample = nn.Sequential(
nn.Conv2d(self.in_channels, out_channels * block.expansion, kernel_size=1, stride=stride, bias=False),
nn.BatchNorm2d(out_channels * block.expansion),
)
layers = []
layers.append(block(self.in_channels, out_channels, stride, downsample))
self.in_channels = out_channels * block.expansion
for _ in range(1, blocks):
layers.append(block(self.in_channels, out_channels))
return nn.Sequential(*layers)
def forward(self, x):
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.maxpool(x)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
x = self.avgpool(x)
x = torch.flatten(x, 1)
x = self.fc(x)
return x
def resnet18(num_classes=1000):
return ResNet(BasicBlock, [2, 2, 2, 2], num_classes)
# Example usage:
model = resnet18()
print(model)