权值初始化
一、梯度消失与爆炸
不恰当的权值初始化,可能带来梯度消失与爆炸。
H 2 = H 1 ∗ W 2 Δ W 2 = ∂ L o s s ∂ W 2 = ∂ L o s s ∂ o u t ∗ ∂ o u t ∂ H 2 ∗ ∂ H 2 ∂ w 2 = ∂ L o s s ∂ o u t ∗ ∂ o u t ∂ H 2 ∗ H 1 \begin{aligned} \mathrm{H}_{2}=& \mathrm{H}_{1} * \mathrm{W}_{2} \\ \Delta \mathrm{W}_{2} &=\frac{\partial \mathrm{Loss}}{\partial \mathrm{W}_{2}}=\frac{\partial \mathrm{Loss}}{\partial \mathrm{out}} * \frac{\partial \mathrm{out}}{\partial \mathrm{H}_{2}} * \frac{\partial \mathrm{H}_{2}}{\partial \mathrm{w}_{2}} \\ &=\frac{\partial \mathrm{Loss}}{\partial \mathrm{out}} * \frac{\partial \mathrm{out}}{\partial \mathrm{H}_{2}} * \mathrm{H}_{1} \end{aligned} H2=ΔW2H1∗W2=∂W2∂Loss=∂out∂Loss∗∂H2∂out∗∂w2∂H2=∂out∂Loss∗∂H2∂out∗H1
H1趋向于0,W2梯度消失
H1趋向于无穷,W2梯度梯度爆炸
避免梯度消失、梯度爆炸,就要控制输出层的输出值的范围,不能太大或者太小
1.1 梯度爆炸
1、0均值,1标准差
class MLP(nn.Module):
def __init__(self, neural_num, layers):
super(MLP, self).__init__()
self.linears = nn.ModuleList([nn.Linear(neural_num, neural_num, bias=False) for i in range(layers)])
self.neural_num = neural_num
def forward(self, x):
for (i, linear) in enumerate(self.linears):
x = linear(x)
# x = torch.tanh(x)
# x = torch.relu(x)
# 打印output为无穷时所在
print("layer:{}, std:{}".format(i, x.std()))
if torch.isnan(x.std()):
print("output is nan in {} layers".format(i))
break
return x
# 参数初始化
def initialize(self):