《动手学深度学习》-第三章笔记-CSDN博客

本文链接：https://blog.csdn.net/sengyuweiyanga/article/details/130323615

第三章线性神经网络

01 什么是仿射变换

仿射变换（Affine Transformation）其实是另外两种简单变换的叠加：一个是线性变换，一个是平移变换；
仿射变换变化包括缩放（Scale）、平移(transform)、旋转(rotate)、反射（reflection,对图形照镜子）、错切(shear mapping，感觉像是一个图形的倒影)，原来的直线仿射变换后还是直线，原来的平行线经过仿射变换之后还是平行线，这就是仿射
仿射变换中集合中的一些性质保持不变：
（1）凸性
（2）共线性：若几个点变换前在一条线上，则仿射变换后仍然在一条线上
（3）平行性：若两条线变换前平行，则变换后仍然平行
（4）共线比例不变性：变换前一条线上两条线段的比例，在变换后比例不变
仿射变换数学表达： $f (x) = A x + b, x \in X$
- 通过加权对特征进行线性变换，并通过偏置项进行平移

02 什么是超参数

可以调整但不在训练过程中更新的参数称为超参数，超参数通常是我们根据迭代结果来调整的，而训练迭代结果是在独立的验证数据集上评估得到的。

03 isinstance() 函数

isinstance() 函数来判断一个对象是否是一个已知的类型，类似 type()。

isinstance() 与 type() 区别：

type() 不会认为子类是一种父类类型，不考虑继承关系。
isinstance() 会认为子类是一种父类类型，考虑继承关系。

以下是 isinstance() 方法的语法:

isinstance(object, classinfo)

object – 实例对象。
classinfo – 可以是直接或间接类名、基本类型或者由它们组成的元组。

实例：

>>>a = 2
>>> isinstance (a,int)
True
>>> isinstance (a,str)
False
>>> isinstance (a,(str,int,list))    # 是元组中的一个返回 True
True

type() 与 isinstance()区别：

class A:
    pass
 
class B(A):
    pass
 
isinstance(A(), A)    # returns True
type(A()) == A        # returns True
isinstance(B(), A)    # returns True
type(B()) == A        # returns False

04 torch.normal()

该函数的原型如下：

normal(mean, std, *, generator=None, out=None)

该函数返回从单独的正态分布中提取的随机数的张量，该正态分布的均值是mean，标准差是std。

用法如下：我们从一个标准正态分布N～(0,1)，提取一个2x2的矩阵:

torch.normal(mean=0.,std=1.,size=(2,2))

我们也可以让每一个值服从不同的正态分布，我们还是生成2x2的矩阵：

torch.normal(mean=torch.arange(4.),std=torch.arange(1.,0.6,-0.1)).reshape(2,2)

05 模型参数的访问与改写

一、如果研究的任务比较简单，可以自己写一个骨干网络，以早期的LeNet-5为例

1.首先构建一个简化版的LeNet-5网络

import torch
from torch import nn

class Reshape(nn.Module):
    def forward(self, x):
        return x.view(-1, 1, 28, 28)

# simple LeNet-5
model = nn.Sequential(Reshape(),
                      nn.Conv2d(1, 6, kernel_size=5, padding=2), nn.Sigmoid(), nn.AvgPool2d(kernel_size=2, stride=2),
                      nn.Conv2d(6, 16, kernel_size=5), nn.Sigmoid(), nn.AvgPool2d(kernel_size=2, stride=2), nn.Flatten(),
                      nn.Linear(16 * 5 * 5, 120), nn.Sigmoid(),
                      nn.Linear(120, 84), nn.Sigmoid(),
                      nn.Linear(84, 10))

2.测试模型

x = torch.rand(size=(1, 1, 28, 28), dtype=torch.float32)
print(model(x))
''' 
tensor([[ 0.2852,  0.0855, -0.5000,  0.9975, -0.3090, -0.1866,  0.2211,  0.0441,
         -0.1407,  0.2835]], grad_fn=<AddmmBackward>)
'''

for layer in model:
	x = layer(x)
	print(layer.__class__.__name__, 'output shape: \t', x.shape)
'''
Reshape output shape: 	 torch.Size([1, 1, 28, 28])
Conv2d output shape: 	 torch.Size([1, 6, 28, 28])
Sigmoid output shape: 	 torch.Size([1, 6, 28, 28])
AvgPool2d output shape: 	 torch.Size([1, 6, 14, 14])
Conv2d output shape: 	 torch.Size([1, 16, 10, 10])
Sigmoid output shape: 	 torch.Size([1, 16, 10, 10])
AvgPool2d output shape: 	 torch.Size([1, 16, 5, 5])
Flatten output shape: 	 torch.Size([1, 400])
Linear output shape: 	 torch.Size([1, 120])
Sigmoid output shape: 	 torch.Size([1, 120])
Linear output shape: 	 torch.Size([1, 84])
Sigmoid output shape: 	 torch.Size([1, 84])
Linear output shape: 	 torch.Size([1, 10])
'''

3.访问模型的层和参数

首先打印写好的模型，看一下整体结构，可以看到该模型一共有13层（为了更直观的理解，此处将没有参数的层也算作神经网络中的一层）

print(model)
'''
Sequential(
  (0): Reshape()
  (1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
  (2): Sigmoid()
  (3): AvgPool2d(kernel_size=2, stride=2, padding=0)
  (4): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (5): Sigmoid()
  (6): AvgPool2d(kernel_size=2, stride=2, padding=0)
  (7): Flatten()
  (8): Linear(in_features=400, out_features=120, bias=True)
  (9): Sigmoid()
  (10): Linear(in_features=120, out_features=84, bias=True)
  (11): Sigmoid()
  (12): Linear(in_features=84, out_features=10, bias=True)
)
'''

使用Sequential类定义模型时，可以通过索引来访问模型的任意层，此时模型像一个列表
(1) 逐层访问

model[0]  # Reshape()
model[1]  # Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
model[2]  # Sigmoid()
model[3]  # AvgPool2d(kernel_size=2, stride=2, padding=0)
model[4]  # Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
model[5]  # Sigmoid()
model[6]  # AvgPool2d(kernel_size=2, stride=2, padding=0)
model[7]  # Flatten()
model[8]  # Linear(in_features=400, out_features=120, bias=True)
model[9]  # Sigmoid()
model[10] # Linear(in_features=120, out_features=84, bias=True)
model[11] # Sigmoid()
model[12] # Linear(in_features=84, out_features=10, bias=True)
# 或者
for layer in model:
	print(layer)

(2) 访问第一个卷积层的参数

model[1].state_dict() # 访问模型第二层的所有参数
model[1].weight.data  # 访问模型第二层权重参数的值
model[1].weight.grad  # 访问模型第二层权重参数的梯度
model[1].bias.data    # 访问模型第二层偏置参数的值
model[1].bias.grad    # 访问模型第二层偏置参数的梯度

(3) 一次性访问模型的所有参数

print(*[(name, param.shape) for name, param in model.named_parameters()])
'''
注意：池化层和sigmoid激活函数没有要学习的参数
('1.weight', torch.Size([6, 1, 5, 5])) ('1.bias', torch.Size([6])) ('4.weight', torch.Size([16, 6, 5, 5])) ('4.bias', torch.Size([16])) ('8.weight', torch.Size([120, 400])) ('8.bias', torch.Size([120])) ('10.weight', torch.Size([84, 120])) ('10.bias', torch.Size([84])) ('12.weight', torch.Size([10, 84])) ('12.bias', torch.Size([10]))
'''

(4) 将模型的激活函数由Sigmoid函数调整为ReLU函数

model[2] = nn.ReLU()  # sigmoid → relu
model[5] = nn.ReLU()  # sigmoid → relu
model[9] = nn.ReLU()  # sigmoid → relu
model[11] = nn.ReLU() # sigmoid → relu
print(model)
'''
Sequential(
  (0): Reshape()
  (1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
  (2): ReLU()
  (3): AvgPool2d(kernel_size=2, stride=2, padding=0)
  (4): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (5): ReLU()
  (6): AvgPool2d(kernel_size=2, stride=2, padding=0)
  (7): Flatten()
  (8): Linear(in_features=400, out_features=120, bias=True)
  (9): ReLU()
  (10): Linear(in_features=120, out_features=84, bias=True)
  (11): ReLU()
  (12): Linear(in_features=84, out_features=10, bias=True)
)
'''

(5) 在模型最后再添加两个全连接层，‘13’，‘14’，‘15’，‘16’是层的名字，可以通过该名字索引每一层

model[12] = nn.Linear(84, 32)
model.add_module('13', nn.ReLU())
model.add_module('14', nn.Linear(32, 16))
model.add_module('15', nn.ReLU())
model.add_module('16', nn.Linear(16, 10))
print(model)
'''
Sequential(
  (0): Reshape()
  (1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
  (2): Sigmoid()
  (3): AvgPool2d(kernel_size=2, stride=2, padding=0)
  (4): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (5): Sigmoid()
  (6): AvgPool2d(kernel_size=2, stride=2, padding=0)
  (7): Flatten()
  (8): Linear(in_features=400, out_features=120, bias=True)
  (9): Sigmoid()
  (10): Linear(in_features=120, out_features=84, bias=True)
  (11): Sigmoid()
  (12): Linear(in_features=84, out_features=32, bias=True)
  (13): ReLU()
  (14): Linear(in_features=32, out_features=16, bias=True)
  (15): ReLU()
  (16): Linear(in_features=16, out_features=10, bias=True)
)

'''

(6) 删除模型的某一层

del model[-1] # 删除刚才添加的最后一层全连接层
print(model)
'''
Sequential(
  (0): Reshape()
  (1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
  (2): Sigmoid()
  (3): AvgPool2d(kernel_size=2, stride=2, padding=0)
  (4): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (5): Sigmoid()
  (6): AvgPool2d(kernel_size=2, stride=2, padding=0)
  (7): Flatten()
  (8): Linear(in_features=400, out_features=120, bias=True)
  (9): Sigmoid()
  (10): Linear(in_features=120, out_features=84, bias=True)
  (11): Sigmoid()
  (12): Linear(in_features=84, out_features=32, bias=True)
  (13): ReLU()
  (14): Linear(in_features=32, out_features=16, bias=True)
  (15): ReLU()
)
'''

二、有时我们需要使用一个预训练好的模型（例如在ImageNet上预训练好的ResNet50网络）做为骨干网络

1、假设有一个多分类（以100分类为例）任务，想要使用ResNet50网络做为我们模型的骨干网络，我们需要调整ResNet50网络最后的全连接层来适应任务，首先从PyTorch中的torchvision的model zoo中读取ResNet50网络

import torchvision

resnet50 = torchvision.models.resnet50(pretrained=True) # 首次运行会花费一定时间下载模型，默认保存路径为C:\Users\XXX\.cache\torch\hub\checkpoints,XXX是你的用户名
print(resnet50)

2、为了方便对模型的访问，我们将模型转为我们熟悉的Sequential类

baseline = nn.Sequential(*list(resnet50.children())) 
print(baseline)
'''
Sequential(
  (0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (2): ReLU(inplace=True)
  (3): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (4): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): Bottleneck(
      (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
    (2): Bottleneck(
      (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
  )
  (5): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): Bottleneck(
      (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
    (2): Bottleneck(
      (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
    (3): Bottleneck(
      (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
  )
  (6): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): Bottleneck(
      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
    (2): Bottleneck(
      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
    (3): Bottleneck(
      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
    (4): Bottleneck(
      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
    (5): Bottleneck(
      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
  )
  (7): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): Bottleneck(
      (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
    (2): Bottleneck(
      (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
  )
  (8): AdaptiveAvgPool2d(output_size=(1, 1))
  (9): Linear(in_features=2048, out_features=1000, bias=True)
)
'''

3、这时我们已经得到了与第一节同样为Sequential类的ResNet50模型，可以使用第一节介绍的方法来访问和修改ResNet50模型

(1) 我们这里是一个100分类的任务，因此我们需要将ResNet50的最后一层的全连接层的输出修改为100

baseline[9] = nn.Linear(2048, 100)

(2) 我们想要访问第7个Sequential层中的第0个Bottleneck层中的conv2卷积层

baseline[7][0].conv2 # Conv2d(512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)

(3) 我们想把第7个Sequential层中的第0个Bottleneck层中的relu激活函数修改为sigmoid激活函数

baseline[7][0].relu = nn.Sigmoid()

(4) 我们想为第6个Sequential层中的第0个Bottleneck层中的的三个卷积层（conv1,conv2,conv3）分别添加一个relu激活函数

baseline[6][0].conv1.add_module('relu', nn.ReLU())
baseline[6][0].conv2.add_module('relu', nn.ReLU())
baseline[6][0].conv3.add_module('relu', nn.ReLU())

06 独热编码

独热编码是一个向量，它的分量和类别一样多。类别对应的分量设置为1，其他所有分量设置为0。

08 softmax 函数

softmax 函数能够将未规范化的预测变换为非负数且总和为1，同时让模型保持可导的性质。为了实现这一目标，首先对每个未规范化的预测求幂，这样可以确保输出非负值。为了确保最终输出的概率值总和为1，再让每个求幂后的结果除以结果的总和。

$\hat {\mathbf y}=softmax(\mathbf o),其中\hat y_j=\frac {exp(o_j)}{\sum_kexp(o_k)}$

softmax 运算不会改变未规范化的预测 $\mathbf o$ 之间的大小次序，只会确定分配给每个类别的概率。尽管softmax是一个非线性函数，但softmax回归的输出仍然由输入特征的仿射变换决定，因此softmax回归是一个线性模型。

09 交叉熵损失

[ref]损失函数：交叉熵详解 - 知乎 (zhihu.com)

10 random.shuffle 函数

random.shuffle：用于将一个列表中的元素打乱顺序，值得注意的是使用这个方法不会生成新的列表，只是将原列表的次序打乱。

# shuffle()使用样例
import random

x = [i for i in range(10)]
print(x)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
random.shuffle(x)
print(x)
[2, 5, 4, 8, 0, 3, 7, 9, 1, 6]

11 yield 与 next 用法

[ref]python中yield的用法详解——最简单，最清晰的解释_python yield_冯爽朗的博客-CSDN博客

12 argparse.ArgumentParser() 用法

argparse是python自带的命令行参数解析包，可以用来方便地读取命令行参数，当代码需要频繁地修改参数的时候，使用这个工具可以将参数和代码分离开来，会让代码更简洁，适用范围更广。

1. 基本框架

下面是使用argparse从命令行获取用户名，然后打印’Hello ‘+ 用户名，假设python文件名为print_name.py:

# file-name:print_name.py
import argparse

def get_parser():
    parser = argparse.ArgumentParser(description="Demo of argparse")
    parser.add_argument('--name', default='Great')
    
    return parser


if __name__ == '__main__':
    parser = get_parser()
    args = parser.parse_args()
    name = args.name
    print('Hello {}'.format(name))

在命令行执行如下命令：

$ python print_name.py --name Wang
Hello Wang

上面的代码段中，首先引入了argparse包，然后通过argparse.ArgumentParser函数生成argparse对象，其中这个函数的description函数表示在命令行显示帮助信息的时候，对这个程序的描述信息。之后我们通过对象的add_argument函数来增加参数。这里我们只增加了一个--name的参数，然后后面的default参数表示如果没提供参数，我们默认采用的值。即如果像下面这样执行命令：

$ python print_name.py

则输出是：

$ Hello Great

最后我们通过argpaser对象的parser_args函数来获取所有参数args，然后通过args.name的方式得到我们设置的--name参数的值，可以看到这里argparse默认的参数名就是--name形式里面--后面的字符串。
整个流程就是这样，下面详细讲解add_argument函数的一些最常用的参数，使得看完这个教程之后，能完成科研和工作中的大部分命令解析任务。

2. default：没有设置值情况下的默认参数

如同上例中展示的，default表示命令行没有设置该参数的时候，程序中用什么值来代替。

3 required: 表示这个参数是否一定需要设置

如果设置了required=True,则在实际运行的时候不设置该参数将报错：

...
parser.add_argument('-name', required=True)
...

则运行下面的命令会报错：

$ python print_name.py
usage: print_name.py [-h] --name NAME
print_name.py: error: argument --name is required

4 type：参数类型

默认的参数类型是str类型，如果你的程序需要一个整数或者布尔型参数，你需要设置type=int或type=bool，下面是一个打印平方的例子:

#name: square.py
import argparse

def get_parser():
    parser = argparse.ArgumentParser(
        description='Calculate square of a given number')
    parser.add_argument('-number', type=int)

    return parser


if __name__ == '__main__':
    parser = get_parser()
    args = parser.parse_args()
    res = args.number ** 2
    print('square of {} is {}'.format(args.number, res))

执行：

$ python square.py -number 5
square of 5 is 25

5. choices：参数值只能从几个选项里面选择

如下面的代码：

# file-name: choices.py
import argparse

def get_parser():
    parser = argparse.ArgumentParser(
        description='choices demo')
    parser.add_argument('-arch', required=True, choices=['alexnet', 'vgg'])

    return parser

if __name__ == '__main__':
    parser = get_parser()
    args = parser.parse_args()
    print('the arch of CNN is '.format(args.arch))

如果像下面这样执行会报错：

$ python choices.py -arch resnet
usage: choices.py [-h] -arch {alexnet,vgg}
choices.py: error: argument -arch: invalid choice: 'resnet' (choose from 'alexnet', 'vgg')

因为我们所给的-arch参数resnet不在备选的choices之中，所以会报错。

6.help：指定参数的说明信息

在现实帮助信息的时候，help参数的值可以给使用工具的人提供该参数是用来设置什么的说明，对于大型的项目，help参数是很有必要的，否则使用者不太明白每个参数的含义，增大了使用难度。
下面是个例子：

# file-name: help.py
import argparse

def get_parser():
    parser = argparse.ArgumentParser(
        description='help demo')
    parser.add_argument('-arch', required=True, choices=['alexnet', 'vgg'],
        help='the architecture of CNN, at this time we only support alexnet and vgg.')

    return parser


if __name__ == '__main__':
    parser = get_parser()
    args = parser.parse_args()
    print('the arch of CNN is '.format(args.arch))

在命令行加-h或--help参数运行该命令，获取帮助信息的时候，结果如下:

$ python help.py -h
usage: help.py [-h] -arch {alexnet,vgg}

choices demo

optional arguments:
  -h, --help           show this help message and exit
  -arch {alexnet,vgg}  the architecture of CNN, at this time we only support
                       alexnet and vgg.

$ python help.py -h
usage: help.py [-h] -arch {alexnet,vgg}

choices demo

optional arguments:
  -h, --help           show this help message and exit
  -arch {alexnet,vgg}  the architecture of CNN, at this time we only support
                       alexnet and vgg.

7. dest：设置参数在代码中的变量名

argparse默认的变量名是--或-后面的字符串，但是你也可以通过dest=xxx来设置参数的变量名，然后在代码中用args.xxx来获取参数的值。

8. nargs：设置参数在使用时可以提供的个数

使用方式如下：

parser.add_argument('-name', nargs=x)

其中x的候选值和含义如下：

值  含义
N   参数的绝对个数（例如：3）
'?'   0或1个参数
'*'   0或所有参数
'+'   所有，并且至少一个参数

如下例子：

# file-name: nargs.py
import argparse

def get_parser():
    parser = argparse.ArgumentParser(
        description='nargs demo')
    parser.add_argument('-name', required=True, nargs='+')

    return parser


if __name__ == '__main__':
    parser = get_parser()
    args = parser.parse_args()
    names = ', '.join(args.name)
    print('Hello to {}'.format(names))

执行命令和结果如下：

$ python nargs.py -name A B C
Hello to A, B, C

13 Pytorch 的 Variable

pytorch 目前版本已弃用

14 itertools模块chain函数

itertools是Python中的一个模块，具有用于处理迭代器的功能集合。它们非常容易地遍历列表和字符串之类的可迭代对象。 chain()是这样的itertools函数之一。

chain() 功能

它是一个需要一系列可迭代对象并返回一个迭代器的函数。它将所有可迭代对象组合在一起，并生成一个可迭代对象作为输出。它的输出不能直接使用，因此不能显式转换为可迭代对象。此函数在终止迭代器的类别迭代器下。

##15 concatenate

numpy.concatenate()函数类似于torch.cat()函数

16 torch.nn.init 初始化

[ref]【细聊】torch.nn.init 初始化_nn.init.constant_ViatorSun的博客-CSDN博客

17 model.train()

model.train()的作用是启用 Batch Normalization 和 Dropout。

如果模型中有BN层(Batch Normalization）和Dropout，需要在训练时添加model.train()。model.train()是保证BN层能够用到每一批数据的均值和方差。对于Dropout，model.train()是随机取一部分网络连接来训练更新参数。

18 model.apply(fn)

apply 函数是nn.Module 中实现的, 递归地调用self.children() 去处理自己以及子模块。pytorch的任何网络net，都是torch.nn.Module的子类,都算是module, 也就是模块。pytorch中的model.apply(fn)会递归地将函数fn应用到父模块的每个子模块submodule，也包括model这个父模块自身。经常用于初始化init_weights的操作。

from torch import nn
 
def init_weights(m):
    print(m)
    if type(m) == nn.Linear:
        m.weight.data.fill_(1.0)
        m.bias.data.fill_(0)
 
model = nn.Sequential(
            nn.Linear(2, 2), 
            nn.Linear(2, 2)
        )
model.apply(init_weights)

19 Pytorch中的权值初始化

#-*-coding:utf-8-*-
import torch
from torch.autograd import Variable

#　对模型参数进行初始化
# 官方论坛链接：https://discuss.pytorch.org/t/weight-initilzation/157/3

# 方法一
# 单独定义一个weights_init函数,输入参数是m(torch.nn.module或者自己定义的继承nn.module的子类)
#　然后使用net.apply()进行参数初始化
#　m.__class__.__name__　获得nn.module的名字
#　https://github.com/pytorch/examples/blob/master/dcgan/main.py#L90-L96
def weights_init(m):
    classname = m.__class__.__name__
    if classname.find('Conv') != -1:
        m.weight.data.normal_(0.0, 0.02)
    elif classname.find('BatchNorm') != -1:
        m.weight.data.normal_(1.0, 0.02)
        m.bias.data.fill_(0)
# # m.weight.data是卷积核参数, m.bias.data是偏置项参数
        
netG = _netG(ngpu)　# 生成模型实例
netG.apply(weights_init)　# 递归的调用weights_init函数,遍历netG的submodule作为参数

# function to be applied to each submodule

# 方法二
# 1. 使用net.modules()遍历模型中的网络层的类型 2. 对其中的m层的weigth.data(tensor)部分进行初始化操作
# Another initialization example from PyTorch Vision resnet implementation.
# https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py#L112-L118
class ResNet(nn.Module):
    def __init__(self, block, layers, num_classes=1000):
        self.inplanes = 64
        super(ResNet, self).__init__()
        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3,
                               bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self._make_layer(block, 64, layers[0])
        self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
        self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
        self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
        self.avgpool = nn.AvgPool2d(7, stride=1)
        self.fc = nn.Linear(512 * block.expansion, num_classes)
        #　权值参数初始化
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
                m.weight.data.normal_(0, math.sqrt(2. / n))
            elif isinstance(m, nn.BatchNorm2d):
                m.weight.data.fill_(1)
                m.bias.data.zero_()


# 方法三
# 自己知道网络中参数的顺序和类型, 然后将参数依次读取出来,调用torch.nn.init中的方法进行初始化
net = AlexNet(2)
params = list(net.parameters()) #　params依次为Conv2d参数和Bias参数
# 或者
conv1Params = list(net.conv1.parameters())
# 其中,conv1Params[0]表示卷积核参数, conv1Params[1]表示bias项参数
# 然后使用torch.nn.init中函数进行初始化
torch.nn.init.normal(tensor, mean=0, std=1)
torch.nn.init.constant(tensor, 0)

# net.modules()迭代的返回: AlexNet,Sequential,Conv2d,ReLU,MaxPool2d,LRN,AvgPool3d....,Conv2d,...,Conv2d,...,Linear,
# 这里,只有Conv2d和Linear才有参数
# net.children()只返回实际存在的子模块: Sequential,Sequential,Sequential,Sequential,Sequential,Sequential,Sequential,Linear


# 附AlexNet的定义
class AlexNet(nn.Module):
    def __init__(self, num_classes = 2): # 默认为两类，猫和狗
#         super().__init__() # python3
        super(AlexNet, self).__init__()
        # 开始构建AlexNet网络模型，5层卷积，3层全连接层
        # 5层卷积层
        self.conv1 = nn.Sequential(
            nn.Conv2d(in_channels=3, out_channels=96, kernel_size=11, stride=4),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            LRN(local_size=5, bias=1, alpha=1e-4, beta=0.75, ACROSS_CHANNELS=True)
        )
        self.conv2 = nn.Sequential(
            nn.Conv2d(in_channels=96, out_channels=256, kernel_size=5, groups=2, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            LRN(local_size=5, bias=1, alpha=1e-4, beta=0.75, ACROSS_CHANNELS=True)
        )
        self.conv3 = nn.Sequential(
            nn.Conv2d(in_channels=256, out_channels=384, kernel_size=3, padding=1),
            nn.ReLU(inplace=True)
        )
        self.conv4 = nn.Sequential(
            nn.Conv2d(in_channels=384, out_channels=384, kernel_size=3, padding=1),
            nn.ReLU(inplace=True)
        )
        self.conv5 = nn.Sequential(
            nn.Conv2d(in_channels=384, out_channels=256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2)
        )
        # 3层全连接层
        # 前向计算的时候，最开始输入需要进行view操作，将3D的tensor变为1D
        self.fc6 = nn.Sequential(
            nn.Linear(in_features=6*6*256, out_features=4096),
            nn.ReLU(inplace=True),
            nn.Dropout()
        )
        self.fc7 = nn.Sequential(
            nn.Linear(in_features=4096, out_features=4096),
            nn.ReLU(inplace=True),
            nn.Dropout()
        )
        self.fc8 = nn.Linear(in_features=4096, out_features=num_classes)

    def forward(self, x):
        x = self.conv5(self.conv4(self.conv3(self.conv2(self.conv1(x)))))
        x = x.view(-1, 6*6*256)
        x = self.fc8(self.fc7(self.fc6(x)))
        return x

20 pytorch 中的 normal_ 和 fill_

对于张量 a ，那么a.normal_() 就表示用标准正态分布填充 a ，是 in_place 操作：

a = torch.ones([2, 3])
a
>>> tensor([[1., 1., 1.],
		    [1., 1., 1.]])

a.normal_()
a
>>> tensor([[1.7832, 0.2113, 1.7834],
			[1.0034, -0.3221,-0.0002]])

对于张量 b ，b.fill_(0) 表示用标准正态分布填充 b ，是 in_place 操作：

b = torch.rand(2, 3)
b
>>> tensor([[0.2874, 0.2361, 0.5070],
			[0.6133, 0.1354, 0.3598]])
			
b.fill_(0)
b
>>> tensor([[0., 0., 0.],
			[0., 0., 0.]])

这两个函数通常用在神经网络模型参数的初始化中，如：

import torch.nn as nn 
net = nn.Linear(16, 2)

for m in net.modules():
	if isinstance(m, nn.Linear):
		m.weight.data.normal_(mean = 0, std=0.01)
		m.bias.data.fill_(0.0)

21 python生成器与迭代器

[ref]Python迭代器和生成器详解 - 知乎 (zhihu.com)

22 torch.full() 与 torch.full_like()

torch.full(size, fill_value, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor

torch.full_like(input, fill_value, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False, memory_format=torch.preserve_format) → Tensor

这两个方法大同小异，所以就放在一起讲了，两个方法的作用几乎相同，就是给定一个值fill_value和一个size，创建一个矩阵元素全为fill_value的大小为size的tensor。
使用方法如下：

import torch
a = torch.full((3, 4), 5)

b = torch.full_like(a, 6)
a, b
>>> (tensor([[5, 5, 5, 5],
			 [5, 5, 5, 5],
			 [5, 5, 5, 5]]),
     tensor([[6, 6, 6, 6],
             [6, 6, 6, 6],
             [6, 6, 6, 6]]))