神经网络参数初始化方法

最新推荐文章于 2024-04-20 01:50:36 发布

AutoFerry

最新推荐文章于 2024-04-20 01:50:36 发布

阅读量788

点赞数

分类专栏： PyTorch 深度学习文章标签：神经网络深度学习

本文链接：https://blog.csdn.net/qq_39809262/article/details/117479739

版权

PyTorch 同时被 2 个专栏收录

29 篇文章 12 订阅

订阅专栏

深度学习

19 篇文章 3 订阅

订阅专栏

参数初始化方法

文章目录

参数初始化方法

参数初始化对模型具有较大的影响，不同的初始化方式可能会导致截然不同的结果。

PyTorch 的初始化方式并没有那么显然，如果你使用最原始的方式创建模型，那么你需要定义模型中的所有参数，当然这样你可以非常方便地定义每个变量的初始化方式，但是对于复杂的模型，这并不容易，而且我们推崇使用 Sequential 和 Module 来定义模型，所以这个时候我们就需要知道如何来自定义初始化方式。

1. 使用NumPy来初始化

因为 PyTorch 是一个非常灵活的框架，理论上能够对所有的 Tensor 进行操作，所以我们能够通过定义新的 Tensor 来初始化。

import numpy as np
import torch
from torch import nn

# 定义一个 Sequential 模型
net1 = nn.Sequential(
    nn.Linear(30, 40),
    nn.ReLU(),
    nn.Linear(40, 50),
    nn.ReLU(),
    nn.Linear(50, 10)
)

# 访问第一层的参数 用weight 和bias 来访问
w1 = net1[0].weight
b1 = net1[0].bias

print(type(w1)) # Parameter

<class 'torch.nn.parameter.Parameter'>

w1.data

tensor([[-0.0415, -0.0698, -0.0271,  ...,  0.1236,  0.1427, -0.0937],
        [ 0.1146, -0.1277,  0.0939,  ..., -0.1292,  0.1174,  0.0545],
        [-0.1797,  0.1133,  0.0326,  ...,  0.1709, -0.0763, -0.1533],
        ...,
        [-0.1759, -0.1023, -0.1474,  ..., -0.1568, -0.0180,  0.0122],
        [-0.0129, -0.1814,  0.0708,  ...,  0.0646, -0.1447,  0.0313],
        [ 0.0918, -0.0959, -0.1383,  ..., -0.1123,  0.0753,  0.1391]])

# 定义一个 Tensor 直接对其进行替换
net1[0].weight.data = torch.from_numpy(np.random.uniform(3, 5, size=(40, 30)))

print(net1[0].weight)

Parameter containing:
tensor([[4.7496, 3.6950, 4.2277,  ..., 4.5078, 4.5376, 4.3453],
        [3.9391, 3.9205, 3.2906,  ..., 3.6608, 3.1979, 3.6794],
        [4.7851, 4.5606, 3.0872,  ..., 4.9677, 4.7213, 4.1803],
        ...,
        [4.6278, 3.5423, 4.3622,  ..., 4.9560, 4.7099, 3.6241],
        [4.5235, 4.8335, 3.1930,  ..., 4.3652, 3.9845, 4.2987],
        [3.7621, 4.9450, 3.1637,  ..., 4.4203, 4.2376, 3.7132]],
       dtype=torch.float64, requires_grad=True)

可以看到这个参数的值已经被改变了，也就是说已经被定义成了我们需要的初始化方式，如果模型中某一层需要我们手动去修改，那么我们可以直接用这种方式去访问，但是更多的时候是模型中相同类型的层都需要初始化成相同的方式，这个时候一种更高效的方式是使用循环去访问，比如

for layer in net1:
    if isinstance(layer, nn.Linear): # 判断是否是线性层
        param_shape = layer.weight.shape
        layer.weight.data = torch.from_numpy(np.random.normal(0, 0.5, size=param_shape)) 
        # 定义为均值为 0，方差为 0.5 的正态分布

2. Xavier 初始化方法

我们给出这种初始化的公式

$w\ \sim \ Uniform[- \frac{\sqrt{6}}{\sqrt{n_j + n_{j+1}}}, \frac{\sqrt{6}}{\sqrt{n_j + n_{j+1}}}]$

其中 $n_j$ 和 $n_{j+1}$ 表示该层的输入和输出数目，所以请尝试实现以下这种初始化方式

对于 Module 的参数初始化，其实也非常简单，如果想对其中的某层进行初始化，可以直接像 Sequential 一样对其 Tensor 进行重新定义，其唯一不同的地方在于，如果要用循环的方式访问，需要介绍两个属性，children 和 modules

class sim_net(nn.Module):    
	def __init__(self):        
		super(sim_net, self).__init__()        
		self.l1 = nn.Sequential(            
		nn.Linear(30, 40),            
		nn.ReLU()        
			)        
		self.l1[0].weight.data = torch.randn(40, 30) # 直接对某一层初始化        
		self.l2 = nn.Sequential(            
		nn.Linear(40, 50),            
		nn.ReLU()        
			)        
		self.l3 = nn.Sequential(            
		nn.Linear(50, 10),            
		nn.ReLU()        
		)    
	def forward(self, x):        
		x = self.l1(x)        
		x =self.l2(x)       
	 	x = self.l3(x)        
	 	return x

net2 = sim_net()

# 访问 children
for i in net2.children():    
	print(i)

Sequential(  (0): Linear(in_features=30, out_features=40, bias=True)  (1): ReLU())Sequential(  (0): Linear(in_features=40, out_features=50, bias=True)  (1): ReLU())Sequential(  (0): Linear(in_features=50, out_features=10, bias=True)  (1): ReLU())

# 访问 modules
for i in net2.modules():    
	print(i)

sim_net(  (l1): Sequential(    (0): Linear(in_features=30, out_features=40, bias=True)    (1): ReLU()  )  (l2): Sequential(    (0): Linear(in_features=40, out_features=50, bias=True)    (1): ReLU()  )  (l3): Sequential(    (0): Linear(in_features=50, out_features=10, bias=True)    (1): ReLU()  ))Sequential(  (0): Linear(in_features=30, out_features=40, bias=True)  (1): ReLU())Linear(in_features=30, out_features=40, bias=True)ReLU()Sequential(  (0): Linear(in_features=40, out_features=50, bias=True)  (1): ReLU())Linear(in_features=40, out_features=50, bias=True)ReLU()Sequential(  (0): Linear(in_features=50, out_features=10, bias=True)  (1): ReLU())Linear(in_features=50, out_features=10, bias=True)ReLU()

children 只会访问到模型定义中的第一层，因为上面的模型中定义了三个 Sequential，所以只会访问到三个 Sequential，而 modules 会访问到最后的结构，比如上面的例子，modules 不仅访问到了 Sequential，也访问到了 Sequential 里面，这就对我们做初始化非常方便，比如

for layer in net2.modules():    
	if isinstance(layer, nn.Linear): #如果要判断两个类型是否相同推荐使用 isinstance()。        																
	param_shape = layer.weight.shape        
	layer.weight.data = torch.from_numpy(np.random.normal(0, 0.5, size=param_shape))

用列表举例：a=[1,2,[3,4]]

children返回：

1,2，[3，4]

modules返回：

[1,2,[3,4]], 1, 2, [3,4], 3, 4

3.torch.nn.init

因为 PyTorch 灵活的特性，我们可以直接对 Tensor 进行操作从而初始化，PyTorch 也提供了初始化的函数帮助我们快速初始化，就是 torch.nn.init，其操作层面仍然在 Tensor 上，

from torch.nn import initprint(net1[0].weight)

Parameter containing:tensor([[ 0.3859,  0.4535,  0.0696,  ..., -1.0227,  0.9353, -0.4104],        [-0.4560,  0.1488, -0.1437,  ...,  0.9651,  0.3467,  0.4422],        [ 0.0016,  0.0452, -0.5707,  ..., -1.0326,  0.1063,  0.2163],        ...,        [ 0.5122, -0.2351, -0.2402,  ..., -0.0511,  0.1905,  0.1106],        [-0.8786, -1.0855,  0.0846,  ..., -0.6484, -0.2868, -0.7436],        [-0.6684, -0.1849, -0.1377,  ..., -0.0652, -0.0663,  0.1641]],       dtype=torch.float64, requires_grad=True)

init.xavier_uniform_(net1[0].weight) # 这就是上面我们讲过的 Xavier 初始化方法，PyTorch 直接内置了其实现

Parameter containing:tensor([[ 0.1515, -0.2174,  0.1019,  ...,  0.1673, -0.1855,  0.1930],        [-0.1304,  0.0215, -0.1496,  ..., -0.2263,  0.0526,  0.1186],        [-0.2530,  0.2620,  0.1042,  ..., -0.2545, -0.0648, -0.1097],        ...,        [ 0.0597,  0.2584,  0.1990,  ..., -0.2716,  0.0210, -0.0741],        [ 0.1171, -0.1044, -0.2067,  ..., -0.0768,  0.1825, -0.2877],        [ 0.2399,  0.0216,  0.2085,  ..., -0.1675,  0.2450, -0.2347]],       dtype=torch.float64, requires_grad=True)

print(net1[0].weight)

Parameter containing:tensor([[ 0.1515, -0.2174,  0.1019,  ...,  0.1673, -0.1855,  0.1930],        [-0.1304,  0.0215, -0.1496,  ..., -0.2263,  0.0526,  0.1186],        [-0.2530,  0.2620,  0.1042,  ..., -0.2545, -0.0648, -0.1097],        ...,        [ 0.0597,  0.2584,  0.1990,  ..., -0.2716,  0.0210, -0.0741],        [ 0.1171, -0.1044, -0.2067,  ..., -0.0768,  0.1825, -0.2877],        [ 0.2399,  0.0216,  0.2085,  ..., -0.1675,  0.2450, -0.2347]],       dtype=torch.float64, requires_grad=True)

可以看到参数已经被修改了

torch.nn.init 为我们提供了更多的内置初始化方式，避免了我们重复去实现一些相同的操作

上面讲了两种初始化方式(后两种结合)，其实它们的本质都是一样的，就是去修改某一层参数的实际值，而 torch.nn.init 提供了更多成熟的深度学习相关的初始化方式，非常方便

参考：参数初始化方法

AutoFerry

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
神经网络参数初始化方法

参数初始化方法文章目录参数初始化方法1. 使用NumPy来初始化2. [Xavier 初始化方法](http://proceedings.mlr.press/v9/glorot10a.html)3.torch.nn.init参数初始化对模型具有较大的影响，不同的初始化方式可能会导致截然不同的结果。PyTorch 的初始化方式并没有那么显然，如果你使用最原始的方式创建模型，那么你需要定义模型中的所有参数，当然这样你可以非常方便地定义每个变量的初始化方式，但是对于复杂的模型，这并不容易，而且我们推崇使用
复制链接

扫一扫