Building your Deep Neural Network: Step by Step
Welcome to your week 4 assignment (part 1 of 2)! You have previously trained a 2-layer Neural Network (with a single hidden layer). This week, you will build a deep neural network, with as many layers as you want!
- In this notebook, you will implement all the functions required to build a deep neural network.
- In the next assignment, you will use these functions to build a deep neural network for image classification.
After this assignment you will be able to:
- Use non-linear units like ReLU to improve your model
- Build a deeper neural network (with more than 1 hidden layer)
- Implement an easy-to-use neural network class
Notation:
- Superscript [?][l] denotes a quantity associated with the ??ℎlth layer.
- Example: ?[?]a[L] is the ??ℎLth layer activation. ?[?]W[L] and ?[?]b[L] are the ??ℎLth layer parameters.
- Superscript (?)(i) denotes a quantity associated with the ??ℎith example.
- Example: ?(?)x(i) is the ??ℎith training example.
- Lowerscript ?i denotes the ??ℎith entry of a vector.
- Example: ?[?]?ai[l] denotes the ??ℎith entry of the ??ℎlth layer's activations).
这次作业主要内容是构建L层的深度神经网络,他分成了两部分。第一部分是主要函数的编写,与之前的单隐层神经网络编写差不多,但也有些需要注意的地方,总体上没有什么太大的问题,下面就按照他给的步骤说明:
1.导入必要的package
import numpy as np
import h5py
import matplotlib.pyplot as plt
from testCases_v2 import *
from dnn_utils_v2 import sigmoid, sigmoid_backward, relu, relu_backward
%matplotlib inline
plt.rcParams['figure.figsize'] = (5.0, 4.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'
%load_ext autoreload
%autoreload 2
np.random.seed(1)
2.初始化参数
他这边也分成了两个步骤,应该是为了让我们更好的理解考虑的。初始化的方式与之前一样,只是L层的神经网络添加了一个loop而已,分析好W和b的维度就可以了。
2-layer
# GRADED FUNCTION: initialize_parameters
def initialize_parameters(n_x, n_h, n_y):
np.random.seed(1)
W1 = np.random.randn(n_h , n_x)*0.01
b1 = np.zeros((n_h , 1))
W2 = np.random.randn(n_y , n_h)*0.01
b2 = np.zeros((n_y , 1))
assert(W1.shape == (n_h, n_x))
assert(b1.shape == (n_h, 1))
assert(W2.shape == (n_y, n_h))
assert(b2.shape == (n_y, 1))
parameters = {"W1": W1,
"b1": b1,
"W2": W2,
"b2": b2}
return parameters
L-layer
# GRADED FUNCTION: initialize_parameters_deep
def initialize_parameters_deep(layer_dims):
np.random.seed(3)
parameters = {}
L = len(layer_dims) # number of layers in the network
for l in range(1, L):
parameters['W' + str(l)] = np.random.randn(layer_dims[l] , layer_dims[l-1] )*0.01
parameters['b' + str(l)] = np.zeros((layer_dims[l] , 1))
assert(parameters['W' + str(l)].shape == (layer_dims[l], layer_dims[l-1]))
assert(parameters['b' + str(l)].shape == (layer_dims[l], 1))
return parameters
3.Forward propagation的模块编写
这边是分成了三个步骤,第一个是“单次”的线性正向传播编写,第二个是“单次”的激活函数的编写,第三个是L-layer的正向传播模块编写。
“单次”线性正向传播:
# GRADED FUNCTION: linear_forward
def linear_forward(A, W, b):
Z = np.dot(W , A) + b #(layer_dims[l] , layer_dims[l-1]) *(layer_dims[l-1] , m) +(layer_dims[l] , 1) =(layer_dims[l] , m)
assert(Z.shape == (W.shape[0], A.shape[1]))
cache = (A, W, b)
return Z, cache
涉及到矩阵运算的时候,习惯了把矩阵维度运算写在后面,减少出错的可能。
“单次”激活函数:
# GRADED FUNCTION: linear_activation_forward
def linear_activation_forward(A_prev, W, b, activation):
if activation == "sigmoid":
Z, linear_cache = linear_forward(A_prev , W ,b)
A, activation_cache = sigmoid(Z)
elif activation == "relu":
Z, linear_cache = linear_forward(A_prev, W , b)
A, activation_cache = relu(Z)
assert (A.shape == (W.shape[0], A_prev.shape[1]))
cache = (linear_cache, activation_cache)
return A, cache
我觉得这边需要注意下cache里缓存的东西,一个是linear_cache,这里面是存放了每次正向传播所计算的A、W和b。而activation_cache里面存放的是每次计算的Z 。
L-layer的正向传播模块:
# GRADED FUNCTION: L_model_forward
def L_model_forward(X, parameters):
caches = []
A = X
L = len(parameters) // 2 # number of layers in the neural network w1,w2,b1,b2..... so L=len(..)//2
for l in range(1, L):
A_prev = A
A, cache = linear_activation_forward(A_prev, parameters["W"+str(l)], parameters["b"+str(l)], activation = "relu")
caches.append(cache)
AL, cache = linear_activation_forward(A , parameters["W"+str(L)] , parameters["b"+str(L)] , activation = '''sigmoid''')
caches.append(cache)
assert(AL.shape == (1,X.shape[1]))
return AL, caches
我觉得这边有两个地方需要特地说明一下,首先是明确了parameters里面参数构成,通过初始化函数可以看出来,这样一来层数L的计算就很明确了。第二个地方是caches中元祖下标,从0开始计数,记到L-1。记录每一层的cache
4.损失函数的编写
这边很常规,没什么要说明的,和之前一样:
# GRADED FUNCTION: compute_cost
def compute_cost(AL, Y):
m = Y.shape[1]
cost = -(np.sum(Y*np.log(AL)+(1-Y)*np.log(1-AL)))/m
cost = np.squeeze(cost) # To make sure your cost's shape is what we expect (e.g. this turns [[17]] into 17).
assert(cost.shape == ())
return cost
5.反向传播模块的编写
总体的计算流程图如下:
这边和正向传播一样分成了三个部分,第一个部分是“单次”线性反向传播,第二个部分是“单次”激活函数反向传播,最后一个部分是L-layer的反向传播模块编写。
“单次”线性反向传播:
# GRADED FUNCTION: linear_backward
def linear_backward(dZ, cache):
A_prev, W, b = cache
m = A_prev.shape[1]
dW = np.dot(dZ , A_prev.T)/m #(n_L , m)*(m , n_L-1) = (n_L , n_L-1)
db = np.sum(dZ , axis = 1 , keepdims = True)/m #(n_L , 1)
dA_prev = np.dot(W.T , dZ) #(n_L-1 , n_L)*(n_L , m) = (n_L-1 , m)
assert (dA_prev.shape == A_prev.shape)
assert (dW.shape == W.shape)
assert (db.shape == b.shape)
return dA_prev, dW, db
“单次”激活函数方向传播:
# GRADED FUNCTION: linear_activation_backward
def linear_activation_backward(dA, cache, activation):
linear_cache, activation_cache = cache
if activation == "relu":
dZ = relu_backward(dA , activation_cache)
dA_prev, dW, db = linear_backward(dZ, linear_cache)
elif activation == "sigmoid":
dZ = sigmoid_backward(dA , activation_cache)
dA_prev, dW, db = linear_backward(dZ, linear_cache)
return dA_prev, dW, db
L-layer反向传播模块:
# GRADED FUNCTION: L_model_backward
def L_model_backward(AL, Y, caches):
grads = {}
L = len(caches) # the number of layers
m = AL.shape[1]
Y = Y.reshape(AL.shape) # after this line, Y is the same shape as AL
dAL = -(np.divide(Y, AL) - np.divide(1 - Y, 1 - AL))
current_cache = caches[L-1]
grads["dA" + str(L)], grads["dW" + str(L)], grads["db" + str(L)] = linear_activation_backward(dAL, current_cache, activation = "sigmoid")
for l in reversed(range(L - 1)): #from L-2 , L-3....0
current_cache = caches[l]
dA_prev_temp, dW_temp, db_temp = linear_activation_backward(grads["dA"+str(L)], current_cache, activation = "relu")
grads["dA" + str(l + 1)] = dA_prev_temp
grads["dW" + str(l + 1)] = dW_temp
grads["db" + str(l + 1)] = db_temp
return grads
这边取current_cache的时候,之前下标的分析就起作用了,第L层对应的cache即为caches[L-1],并且接下来取就是从L-2开始取,我们习惯的计数法与caches的下标差值为1,它是从0开始的,而我们习惯从1开始计数。
6.更新参数
直接根据w:=w -learning_rate*dw ,b:= b-learning_rate*db
# GRADED FUNCTION: update_parameters
def update_parameters(parameters, grads, learning_rate):
L = len(parameters) // 2 # number of layers in the neural network
for l in range(L):
parameters["W" + str(l+1)] = parameters["W"+str(l+1)] - learning_rate*grads["dW"+str(l+1)]
parameters["b" + str(l+1)] = parameters["b"+str(l+1)] - learning_rate*grads["db"+str(l+1)]
return parameters
注意下l的取值就好,从0到L-1