有了第二、三篇的铺垫,本篇讲述深层神经网络的内部结构已经实现过程,虽然在概念上没什么需要重复多讲的,但在程序上,针对多层的神经网络来说,需要构建更多的东西来完成网络的运算
Why deep representations?
为什么我们要采用深层的神经网络呢,而不是简单的运用浅层神经网络就可以了。理由就要参照下面这副图,我们假设要实现n的输入的异或运算,如果采用浅层神经网络(仅含一层隐藏层)的话,那么我们的隐藏神经元将需要
O
(
2
n
)
O(2^n)
O(2n),而对于深层的神经网络来说,仅需要
O
(
log
n
)
O(\log n)
O(logn) 层,
O
(
n
)
O(n)
O(n)个神经元即可,虽然网络结构相对来说复杂了,但运算量将大大降低。
深层的神经网络能够更好的选择和体现特征,对于人脸识别来说,第一层探测出图像中的edge,第二层由这些edge组合成人脸的不同部位,第三层则合成特征脸。
Getting matrix dimensions right
首先是确保各参数的维度正确,多层神经网络不比浅层或者简单的逻辑回归,它方程数量可能很多而且环环相扣,很容易出现bug,一开始就确保维度正确有利于程序的正常运行。下图为程序中会用到的参数的相应维度。
Building blocks of deep neural networks
前向传播相对简单就不赘述,对于后向传播来说如下图
Practice
- Initialize parameters/ Define hyperparameters
- Loop for num_iterations:
a. Forward propagation
b. Compute cost function
c. Backward propagation
d. Update parameters - Use trained parameters to predict
def L_layer_model(X, Y, layers_dims, learning_rate = 0.0075, num_iterations = 3000, print_cost=False):
"""
Implements a L-layer neural network: [LINEAR->RELU]*(L-1)->LINEAR->SIGMOID.
Arguments:
X -- data, numpy array of shape (number of examples, num_px * num_px * 3)
Y -- true "label" vector (containing 0 if cat, 1 if non-cat), of shape (1, number of examples)
layers_dims -- list containing the input size and each layer size, of length (number of layers + 1).
learning_rate -- learning rate of the gradient descent update rule
num_iterations -- number of iterations of the optimization loop
print_cost -- if True, it prints the cost every 100 steps
Returns:
parameters -- parameters learnt by the model. They can then be used to predict.
"""
np.random.seed(1)
costs = [] # keep track of cost
# Parameters initialization.
parameters = initialize_parameters_deep(layers_dims)
# Loop (gradient descent)
for i in range(0, num_iterations):
# Forward propagation: [LINEAR -> RELU]*(L-1) -> LINEAR -> SIGMOID.
AL, caches = L_model_forward(X, parameters)
# Compute cost.
cost = compute_cost(AL, Y)
# Backward propagation.
grads = L_model_backward(AL, Y, caches)
# Update parameters.
parameters = update_parameters(parameters, grads, learning_rate)
# Print the cost every 100 training example
if print_cost and i % 100 == 0:
print ("Cost after iteration %i: %f" %(i, cost))
if print_cost and i % 100 == 0:
costs.append(cost)
# plot the cost
plt.plot(np.squeeze(costs))
plt.ylabel('cost')
plt.xlabel('iterations (per tens)')
plt.title("Learning rate =" + str(learning_rate))
plt.show()
return parameters