梯度检验
梯度定义:
d
J
d
θ
=
lim
ϵ
→
0
J
(
θ
+
ϵ
)
−
J
(
θ
−
ϵ
)
2
ϵ
\frac{dJ}{d\theta}=\lim_{\epsilon\rightarrow0}\frac{J(\theta+\epsilon)-J(\theta-\epsilon)}{2\epsilon}
dθdJ=ϵ→0lim2ϵJ(θ+ϵ)−J(θ−ϵ)
一维梯度:类比函数求导定义,已知成本函数:J(theta)
.
N维梯度:
- 正向传播
- 反向传播
函数dictionary_to_vector(),vector_to_dictionary(),gradients_to_vector()
将字典参数变为串联的向量:
def dictionary_to_vector(parameters):
"""
Roll all our parameters dictionary into a single vector satisfying our specific required shape.
"""
keys = []
count = 0
for key in ["W1", "b1", "W2", "b2", "W3", "b3"]:
# flatten parameter
new_vector = np.reshape(parameters[key], (-1,1))
keys = keys + [key]*new_vector.shape[0]
if count == 0:
theta = new_vector
else:
theta = np.concatenate((theta, new_vector), axis=0)
count = count + 1
return theta, keys
def vector_to_dictionary(theta):
"""
Unroll all our parameters dictionary from a single vector satisfying our specific required shape.
"""
parameters = {}
parameters["W1"] = theta[:20].reshape((5,4))
parameters["b1"] = theta[20:25].reshape((5,1))
parameters["W2"] = theta[25:40].reshape((3,5))
parameters["b2"] = theta[40:43].reshape((3,1))
parameters["W3"] = theta[43:46].reshape((1,3))
parameters["b3"] = theta[46:47].reshape((1,1))
return parameters
def gradients_to_vector(gradients):
"""
Roll all our gradients dictionary into a single vector satisfying our specific required shape.
"""
count = 0
for key in ["dW1", "db1", "dW2", "db2", "dW3", "db3"]:
# flatten parameter
new_vector = np.reshape(gradients[key], (-1,1))
if count == 0:
theta = new_vector
else:
theta = np.concatenate((theta, new_vector), axis=0)
count = count + 1
return theta
注:
-
梯度检验可验证反向传播的梯度与梯度的数值近似值之间的接近度(使用正向传播进行计算)。
-
梯度检验很慢,因此我们不会在每次训练中都运行它。通常,你仅需确保其代码正确即可运行它,然后将其关闭并将
backprop
用于实际的学习过程