简介
只用numpy库,从底层实现深度神经网络。底层的数学逻辑可参见吴恩达的深度学习。
温馨建议:
为了便于对整体进行观察,把主要子函数的输入输出列写如下,可快速了解各函数如何相互作用。
可结合整体再深入细节看每个函数的具体实现
parameters = initialize_parameters_deep(layer_dims)
# forward propagation
Z, cache = linear_forward(A, W, b)
A, cache = linear_activation_forward(A_prev, W, b, activation)
AL, caches = L_model_forward(X, parameters)
# cost function
cost = compute_cost(AL, Y)
# backward propagation
dA_prev, dW, db = linear_activation_backward(dA, cache, activation)
grads = L_model_backward(AL, Y, caches)
parameters = update_parameters(parameters, grads, learning_rate)
# compute sigmoid and ReLU function, and corresponding dZ
A, cache = sigmoid(Z)
A, cache = relu(Z)
dZ = relu_backward(dA, cache)
dZ = sigmoid_backward(dA, cache)
1 - Packages
import numpy as np
2 - Outline of the Assignment
3 - Initialization
3.2 - L-layers Neural Network
n [ l ] n^{[l]} n[l]表示第 l l l层的单元数(units)。
假如输入 X X X的大小是(12288, 209)( m = 209 m=209 m=209 examples),那么:
Initialization of a L-layers Neural Network
def initialize_parameters_deep(layers_dims):
"""
input:
layers_dims -- python list,维度矩阵.
eg.layers_dims=[2,3,2]: input layers 有 2个 units,包含3个unit的一个hidden layers,output layer has 2 units
output/return:
parameters -- pathon dictiionary, initialize parameters containing parameters:
Wl : ['W' + str(l)]
bl : ['b' + str(l)]
"""
np.random.seed(3)
parameters = {
} # 先申明dict,然后利用 for loop 在 dict 中添加 key
L = len(layers_dims) # 层的维度的个数即是层的个数
for l in range(1, L):
parameters["W" + str(l)] = np.random.rand(layers_dims[l-1], layers_dims[l]) # layers_dims[l]:第l层的units
parameters["b" + str(l)] = np.zeros(( layers_dims[l], 1))
# 验证 parameters 的 shape
assert(parameters["W" + str(l)].shape == ( layers_dims[l-1], layers_dims[l]))
assert(parameters["b" + str(l)].shape == ( layers_dims[l], ))
return parameters
4 - Forward propagation module
4.1 - Linear Forward
The linear forward 函数 (vectorized over all the examples) 计算 下面的等式:
Z [ l ] = W [ l ] A [ l − 1 ] + b [ l ] Z^{[l]} = W^{[l]}A^{[l-1]} + b^{[l]} Z[l]=W[l]A[l−1]+b[l]
where A [ 0 ] = X A^{[0]} = X A[0]=X
def linear_forward(A, W, b):
"""
input:
A -- 前一层的activations,(or input data X): (size of previous layer, numbel of examples)
W -- weight matrix: 矩阵 shape (size of current layer, size of previous layer)
b -- bias vector, 矩阵 shape (size of current layer, 1)
output/return:
Z -- the input of activations function(前激活参数)
cache -- python dictionary,containing A ,W, b. 存储在cache中,用于计算后向传播过程
"""
Z = np.dot(W, A) + b # broadcasting rule
assert(Z.shape == (W.shape[0], A.shape[1]))
cache = (A, W, b)
return Z, cache
4.2 - Linear Activation Forward
在整个网络中,使用两种activation functions:
- Sigmoid: σ ( Z ) = σ ( W A + b ) = 1 1 + e − ( W A + b ) \sigma(Z) = \sigma(W A + b) = \frac{1} {1 + e^{-(W A + b)}} σ(Z)=σ(WA+b)=1+e−(WA+b)1. 已定义好的sigmoid函数返回两个参数: the activation value “
A
” 和 a “cache
” 存储变量 “Z
” (作为相关后向传播函数的输入)。 To use it following:
A, activation_cache = sigmoid(Z)
- ReLU: A = R E L U ( Z ) = m a x ( 0 , Z ) A = RELU(Z) = max(0, Z) A=RELU(Z)=max(0,Z) 已定义的函数relu返回两个参数:the activation value “
A
” 和&#