下面分别对参数初始化为0,在较大的范围内随机初始化,在较小的范围内随机初始化。Python代码如下:
# 全部初始化为0
import numpy as np
def initialize_parameters_zeros(layers_dims):
parameters = {}
L = len(layers_dims)
for l in range(1, L):
parameters['W' + str(l)] = np.zeros((layers_dims[l], layers_dims[l - 1]))
parameters['b' + str(l)] = np.zeros((layers_dims[l], 1))
return parameters
# 在[0,10]范围内随机初始化
def initialize_parameters_random(layers_dims):
np.random.seed(3)
parameters = {}
L = len(layers_dims)
for l in range(1, L):
parameters['W' + str(l)] = np.random.randn(layers_dims[l], layers_dims[l - 1]) * 10
parameters['b' + str(l)] = np.zeros((layers_dims[l], 1))
return parameters
# 在较小的范围内随机初始化
def initialize_parameters_he(layers_dims):
np.random.seed(3)
parameters = {}
L = len(layers_dims) - 1
for l in range(1, L + 1):
parameters['W' + str(l)] = np.random.randn(layers_dims[l], layers_dims[l - 1]) * np.sqrt(2 / layers_dims[l - 1])
parameters['b' + str(l)] = np.zeros((layers_dims[l], 1))
return parameters
搭建的模型为:
def model(X, Y, learning_rate=0.01, num_iterations=15000, print_cost=True, initialization="he"):
grads = {}
costs = []
m = X.shape[1]
layers_dims = [X.shape[0], 10, 5, 1] # 构建一个3层的神经网络
# 3种不同的初始化方法
if initialization == "zeros":
parameters = initialize_parameters_zeros(layers_dims)
elif initialization == "random":
parameters = initialize_parameters_random(layers_dims)
elif initialization == "he":
parameters = initialize_parameters_he(layers_dims)
# 梯度下降训练参数
for i in range(0, num_iterations):
a3, cache = forward_propagation(X, parameters)
cost = compute_loss(a3, Y)
grads = backward_propagation(X, Y, cache)
parameters = update_parameters(parameters, grads, learning_rate)
if print_cost and i % 1000 == 0:
print("Cost after iteration {}: {}".format(i, cost))
costs.append(cost)
# 画出成本走向图
plt.plot(costs)
plt.ylabel('cost')
plt.xlabel('iterations (per thousands)')
plt.title("Learning rate =" + str(learning_rate))
plt.show()
return parameters
使用sklearn.datasets.make_circles产生的数据做训练,三种方法生成的成本曲线图如下所示,明显看出在小范围内随机初始化参数的成本下降的快。
参数初始化为零:
参数在[0,10]内初始化:
参数在小范围内随机初始化: