神经网络权重不更新与初始化
问题
在手敲神经网络进行训练时,发现神经网络权重w并没有更新。
代码
主要代码如下(activations.activator 是用来返回激活函数的函数指针的)
import activations
import numpy as np
import pandas as pd
class Neuro():
def __init__(self,cellnum,learning_rate=0.1,activators=1):
self.cellnum = cellnum
self.layer_num = len(cellnum)-1
self.learning_rate = learning_rate
if type(activators) == int :
self.activator = [activations.activator(activators)] * self.layer_num
else :
self.activator = [activations.activator(activator) for activator in activators]
self.w , self.b = [] , [] # weights and biases
for height, width in zip(cellnum[:-1],cellnum[1:]):
self.w.append(np.zeros((height,width)))
self.b.append(np.zeros((1,width)))
def fit(self,_x,_y,epochs,batch_size):
credit_learning_rate = self.learning_rate / batch_size
for epoch in range(epochs):
index = np.random.permutation(_x.shape[0])
# activated value, x[layer].shape = ( batch_size , cell[layer] )
x = [ np.empty((batch_size,width)) for width in self.cellnum ]
# pre-activated value , z[layer].shape = ( batch_size , cell[layer+1])
z = [ np.empty((batch_size,width)) for width in self.cellnum[1:] ]
# error[layer].shape = ( batch_size , cell[layer+1] )
error = [ np.empty((batch_size,width)) for width in self.cellnum[1:] ]
for batch_num in range(_x.shape[0],batch_size-1,-batch_size):
x[0] = _x[index[batch_num-batch_size : batch_num] , :]
y = _y[index[batch_num-batch_size : batch_num] , :]
# forward_propagation
for layer in range(self.layer_num):
# b[layer] should be broadcast to shape (batch_size,b[layer].width())
# (automatically broadcast by numpy)
z[layer] = np.dot(x[layer], self.w[layer]) + self.b[layer]
x[layer+1] = self.activator[layer].function()(z[layer])
error[-1] = (x[-1] - y) * self.activator[-1].derivative()(z[-1]) # hadmard multiply
print("loss = ",np.linalg.norm(x[-1]-y))
# back propagation
for layer in range(self.layer_num-2,-1,-1):
error[layer] = self.activator[layer].derivative()(z[layer]) * np.dot(error[layer+1],self.w[layer+1].transpose())
# update
for layer in range(self.layer_num-1,-1,-1):
self.w[layer] -= np.dot(x[layer].transpose() , error[layer]) * credit_learning_rate
self.b[layer] -= np.sum(error[layer],axis=0) * credit_learning_rate
原因分析
1.初始化权重全为0
注意反向传播(back propagation)的部分中,误差项的计算公式:
δ
l
=
f
′
(
z
l
)
⊙
(
δ
l
+
1
(
w
l
+
1
)
T
)
\delta^l=f'(z^l)\odot (\delta^{l+1}(w^{l+1})^T)
δl=f′(zl)⊙(δl+1(wl+1)T)
其中
δ
l
,
δ
l
+
1
\delta^l,\delta^{l+1}
δl,δl+1为第
l
,
l
+
1
l,l+1
l,l+1 层误差项,
f
f
f为激活函数,
⊙
\odot
⊙表示哈达玛积(矩阵对应位置元素相乘)。
或见代码中的第52~53行:
for layer in range(self.layer_num-2,-1,-1):
error[layer] = self.activator[layer].derivative()(z[layer]) * np.dot(error[layer+1],self.w[layer+1].transpose())
如果一开始初始化所有权重 w w w为0(零矩阵),那么第一次反向传播的时候在此处权重 w w w 也为0,经过点乘之后,误差项 δ l = e r r o r [ l a y e r ] \delta^l=error[layer] δl=error[layer] 也为零矩阵。
再看最后几行使用误差项计算更新:
w
l
←
w
l
−
η
(
x
l
)
T
δ
l
b
l
←
b
l
−
η
δ
l
w^l ←w^l- η(x^l)^T\delta^l \\b^l ←\ b^l\ -\ η\delta^l\ \ \ \ \ \
wl←wl−η(xl)Tδlbl← bl − ηδl
或代码最末:
for layer in range(self.layer_num-1,-1,-1):
self.w[layer] -= np.dot(x[layer].transpose() , error[layer]) * credit_learning_rate
self.b[layer] -= np.sum(error[layer],axis=0) * credit_learning_rate
第 l l l 层权重与偏置 w l , b l w^l,b^l wl,bl 的更新都涉及到了误差项 δ l \delta^l δl,但由于前面表明误差项为零矩阵,所以权重就不会更新!如此一来,权重矩阵 w w w 仍然是0,再怎么训练依旧为0.
2.初始化权重全相等
那么是不是只要不把权重初始化全为0就行了呢?在偷懒的情况下,试图把权重全部初始化为一样的值(比如全变成0.5),这样就解决了吗?
答案是否定的。
假设有一个三层的神经网络:输入层(input layer)、中间层(hidden layer)、输出层(output layer),假设中间层有神经元
a
1
,
a
2
,
.
.
.
,
a
n
a_1,a_2,...,a_n
a1,a2,...,an。
实际上,神经网络训练中的每一步都是矩阵运算,它们对
a
1
,
a
2
.
.
.
,
a
n
a_1,a_2...,a_n
a1,a2...,an 一视同仁。而且由于权重初始化完全一致,也就是说
a
1
,
a
2
,
.
.
,
a
n
a_1,a_2,..,a_n
a1,a2,..,an 都是等价的。无论怎么训练,输入层对
a
1
,
a
2
,
.
.
.
,
a
n
a_1,a_2,...,a_n
a1,a2,...,an 的贡献权重一样、它们对输出层的贡献权重一样、输出层计算对每个
a
i
a_i
ai的偏导一样、反向传播的误差项一样、各
a
i
a_i
ai (与之相关的权重)的更新也一样……也就是说,这些
a
i
a_i
ai共进退,相当于中间层只有一个神经元
a
1
a_1
a1!
解决
初始化的时候用随机数,例如 np.random.normal生成给定均值、方差的正态分布矩阵:
self.w , self.b = [] , [] # weights and biases
for height, width in zip(cellnum[:-1],cellnum[1:]):
self.w.append(np.random.normal(0,1,(height,width)))
self.b.append(np.random.normal(0,1,(1,width)))