Building your Deep Neural Network: Step by Step

• 首先，需要编写一些辅助函数来帮助实现这个模型．
• 在下一篇，我们还会继续用到这些辅助函数，来实现一个图片分类的深度神经网络

阅读玩之后你将学习到一下几点:

• 使用非线性单元比如：ReLU(修正线性单元)来提升你的模型

• 搭建一个深度神经网络模型(包含多个隐藏层)
• Implement an easy-to-use neural network class

• 上标$\left[l\right]$$[l]$：表示模型的层数，${l}^{th}$$l^{th}$ layer.
• Example: ${a}^{\left[L\right]}$$a^{[L]}$ 表示网络第 ${L}^{th}$$L^{th}$层的激活(activation)输出.
• Example: ${W}^{\left[L\right]}$$W^{[L]}$${b}^{\left[L\right]}$$b^{[L]}$${L}^{th}$$L^{th}$ 层的参数(parameters).
• 上标 $\left(i\right)$$(i)$：表示样本id,第 ${i}^{th}$$i^{th}$ 个样本(example).
• Example: ${x}^{\left(i\right)}$$x^{(i)}$: 第${i}^{th}$$i^{th}$ 个training example.
• 下标 $i$$i$：表示向量的第 ${i}^{th}$$i^{th}$ 个元素.
• Example: ${a}_{i}^{\left[l\right]}$$a^{[l]}_i$${l}^{th}$$l^{th}$ 层(activations)向量(vector)的第 ${i}^{th}$$i^{th}$ 个元素．

Let’s get started!

1 -Use Packages

Let’s first import all the packages that you will need during this assignment.
- numpy is the main package for scientific computing with Python.
- matplotlib is a library to plot graphs in Python.
- np.random.seed(1) is used to keep all the random function calls consistent. It will help us grade your work. Please don’t change the seed.

import numpy as np
import h5py
import matplotlib.pyplot as plt
from testCases_v2 import *

%matplotlib inline
plt.rcParams['figure.figsize'] = (5.0, 4.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

np.random.seed(1)

* Help function

• 计算sigmoid() 和　tanh() 的非线性输出和激活函数的求导
def sigmoid(Z):
"""
Returns:
A -- output of sigmoid(z), same shape as Z
cache -- returns Z as well, useful during backpropagation
"""
A = 1/(1+np.exp(-Z))
cache = Z
return A, cache

def relu(Z):
"""
Returns:
A -- Post-activation parameter, of the same shape as Z
Z -- linear output
"""
A = np.maximum(0,Z)
assert(A.shape == Z.shape)
cache = Z
return A, cache

def relu_backward(dA, cache):
"""
Arguments:
dA -- post-activation gradient, of any shape
cache -- 'Z' where we store for computing backward propagation efficiently
Returns:
dZ -- Gradient of the cost with respect to Z
"""
Z = cache
dZ = np.array(dA, copy=True) # just converting dz to a correct object.
# When z <= 0, you should set dz to 0 as well.
dZ[Z <= 0] = 0
assert (dZ.shape == Z.shape)
return dZ

def sigmoid_backward(dA, cache):
"""
Arguments:
dA -- post-activation gradient, of any shape
cache -- 'Z' where we store for computing backward propagation efficiently
Returns:
dZ -- Gradient of the cost with respect to Z
"""
Z = cache
s = 1/(1+np.exp(-Z))
dZ = dA * s * (1-s)
assert (dZ.shape == Z.shape)
return dZ

2 - Outline of the Assignment

• 初始化参数（$L$$L$层的DNN）
• 正向传播(forward propagation)
• 线性计算部分$WX+b=Z$$WX+b = Z$ (resulting in ${Z}^{\left[l\right]}$$Z^{[l]}$).
• 非线性激活 ACTIVATION function (relu/sigmoid).
• 合并以上两步[LINEAR->ACTIVATION] 实现forward function.
• 在forward function中的前（L-1）次的正向传播的计算当中．使用的是［LINEAR->RELU］的组合，但是在最后一层，使用的是[LINEAR->SIGMOID］的组合．
• 计算损失cost.
• 反向传播 (backward propagation)
• 线性部分求偏导数
• 激活函数的求导(relu_backward/sigmoid_backward)
• 通过链式法则，求每层的参数的梯度
• 最后更新参数.

Note
在正向传播的过程中一些中间的计算结果需要保存下来，应为在后面的反向传播中，需要用到这些值来计算参数的梯度．所在正向传播的计算当中，我们包一些中间的计算结果存储在＇cache’中，cache是Python中的一个字典对象．

3 - Initialization

• 先编写一个初始化２层网络的函数，练练手
• 编写一个能对更深层网络（L层）初始化的函数

3.1 - 2-layer Neural Network

Exercise: 写一个实现两层网络初始化的函数　initialize_parameters( )
Instructions:
- 模型的结构: LINEAR -> RELU -> LINEAR -> SIGMOID.
- 随即初始化权值矩阵 np.random.randn(shape)*0.01 .
- 全０初始化偏置np.zeros(shape).

# GRADED FUNCTION: initialize_parameters

def initialize_parameters(n_x, n_h, n_y):
"""
n_x -- size of the input layer
n_h -- size of the hidden layer
n_y -- size of the output layer
Returns:
parameters -- python dictionary containing your parameters:
W1 -- weight matrix of shape (n_h, n_x)
b1 -- bias vector of shape (n_h, 1)
W2 -- weight matrix of shape (n_y, n_h)
b2 -- bias vector of shape (n_y, 1)
"""
np.random.seed(1)
W1 = np.random.randn(n_h, n_x)*0.01
b1 = np.zeros((n_h, 1))
W2 = np.random.randn(n_y, n_h)*0.01
b2 = np.zeros((n_y, 1))
parameters = {"W1": W1,
"b1": b1,
"W2": W2,
"b2": b2}
return parameters    

3.2 - L-layer Neural Network

$L$$L$ 层网络的参数进行初始化，比浅层网络的初始化要复杂的多，因为有很多的权重矩阵和偏置向量．但是它也是有技巧的，完成下面的＇initialize_parameters_deep() ‘,就可以对任意层数的全连接网络进行初始化了．
${n}^{\left[l\right]}$$n^{[l]}$$l$$l$　层神经元的个数.

 **Shape of W** **Shape of b** **Activation** **Shape of Activation** **Layer 1** (n[1],12288)$\left({n}^{\left[1\right]},12288\right)$$(n^{[1]},12288)$ (n[1],1)$\left({n}^{\left[1\right]},1\right)$$(n^{[1]},1)$ Z[1]=W[1]X+b[1]${Z}^{\left[1\right]}={W}^{\left[1\right]}X+{b}^{\left[1\right]}$$Z^{[1]} = W^{[1]} X + b^{[1]}$ (n[1],209)$\left({n}^{\left[1\right]},209\right)$$(n^{[1]},209)$ **Layer 2** (n[2],n[1])$\left({n}^{\left[2\right]},{n}^{\left[1\right]}\right)$$(n^{[2]}, n^{[1]})$ (n[2],1)$\left({n}^{\left[2\right]},1\right)$$(n^{[2]},1)$ Z[2]=W[2]A[1]+b[2]${Z}^{\left[2\right]}={W}^{\left[2\right]}{A}^{\left[1\right]}+{b}^{\left[2\right]}$$Z^{[2]} = W^{[2]} A^{[1]} + b^{[2]}$ (n[2],209)$\left({n}^{\left[2\right]},209\right)$$(n^{[2]}, 209)$ ⋮$⋮$$\vdots$ ⋮$⋮$$\vdots$ ⋮$⋮$$\vdots$ ⋮$⋮$$\vdots$ ⋮$⋮$$\vdots$ **Layer L-1** (n[L−1],n[L−2])$\left({n}^{\left[L-1\right]},{n}^{\left[L-2\right]}\right)$$(n^{[L-1]}, n^{[L-2]})$ (n[L−1],1)$\left({n}^{\left[L-1\right]},1\right)$$(n^{[L-1]}, 1)$ Z[L−1]=W[L−1]A[L−2]+b[L−1]${Z}^{\left[L-1\right]}={W}^{\left[L-1\right]}{A}^{\left[L-2\right]}+{b}^{\left[L-1\right]}$$Z^{[L-1]} = W^{[L-1]} A^{[L-2]} + b^{[L-1]}$ (n[L−1],209)$\left({n}^{\left[L-1\right]},209\right)$$(n^{[L-1]}, 209)$ **Layer L** (n[L],n[L−1])$\left({n}^{\left[L\right]},{n}^{\left[L-1\right]}\right)$$(n^{[L]}, n^{[L-1]})$ (n[L],1)$\left({n}^{\left[L\right]},1\right)$$(n^{[L]}, 1)$ Z[L]=W[L]A[L−1]+b[L]${Z}^{\left[L\right]}={W}^{\left[L\right]}{A}^{\left[L-1\right]}+{b}^{\left[L\right]}$$Z^{[L]} = W^{[L]} A^{[L-1]} + b^{[L]}$ (n[L],209)$\left({n}^{\left[L\right]},209\right)$$(n^{[L]}, 209)$

$\begin{array}{}\text{(2)}& W=\left[\begin{array}{ccc}j& k& l\\ m& n& o\\ p& q& r\end{array}\right]\phantom{\rule{thickmathspace}{0ex}}\phantom{\rule{thickmathspace}{0ex}}\phantom{\rule{thickmathspace}{0ex}}X=\left[\begin{array}{ccc}a& b& c\\ d& e& f\\ g& h& i\end{array}\right]\phantom{\rule{thickmathspace}{0ex}}\phantom{\rule{thickmathspace}{0ex}}\phantom{\rule{thickmathspace}{0ex}}b=\left[\begin{array}{c}s\\ t\\ u\end{array}\right]\end{array}$

Then $WX+b$$WX + b$ will be:

$\begin{array}{}\text{(3)}& WX+b=\left[\begin{array}{ccc}\left(ja+kd+lg\right)+s& \left(jb+ke+lh\right)+s& \left(jc+kf+li\right)+s\\ \left(ma+nd+og\right)+t& \left(mb+ne+oh\right)+t& \left(mc+nf+oi\right)+t\\ \left(pa+qd+rg\right)+u& \left(pb+qe+rh\right)+u& \left(pc+qf+ri\right)+u\end{array}\right]\end{array}$

Exercise: L-layer网络模型的初始化.

Instructions:
- 模型的结构 [linear -> ReLU] $×$$\times$ (L-1) -> linear -> Sigmoid.DNN模型中有 $L-1$$L-1$ 使用的是ReLU activation function，在输出层使用的是　sigmoid activation function.
- 随即初始化权值矩阵 Use np.random.rand(shape) * 0.01.
- ０初始化偏置向量 Use np.zeros(shape).
- ${n}^{\left[l\right]}$$n^{[l]}$,为每一层神经元的个数, 存储在变量layer_dims中.the layer_dims是python中的列表对象，举个例子：layer_dims = [2,4,1]:bi表示的是一个两层的网络结构，输入层神经元的个数为：２，隐藏层的神经元个数为：４，输出层的个数为：１．

# GRADED FUNCTION: initialize_parameters_deep

def initialize_parameters_deep(layer_dims):
"""
layer_dims -- python array (list),元素为网络每层神经元的数量
Returns:
parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL":
Wl -- weight matrix of shape (layer_dims[l], layer_dims[l-1])
bl -- bias vector of shape (layer_dims[l], 1)
"""
np.random.seed(3)
parameters = {}
L = len(layer_dims)            # number of layers in the network
for l in range(1, L):
parameters['W' + str(l)] = np.random.randn(layer_dims[l], layer_dims[l-1])*0.01
parameters['b' + str(l)] = np.zeros((layer_dims[l], 1))

return parameters

4 - Forward propagation module

4.1 - Linear Forward

$\begin{array}{}\text{(4)}& {Z}^{\left[l\right]}={W}^{\left[l\right]}{A}^{\left[l-1\right]}+{b}^{\left[l\right]}\end{array}$

${A}^{\left[0\right]}=X$$A^{[0]} = X$.

Exercise: linear_forward( )实现线性部分的正向传播

# GRADED FUNCTION: linear_forward

def linear_forward(A, W, b):
"""
A -- 前一层的激活输出或者输入层: (size of previous layer, number of examples)
W -- 权值矩阵:(size of current layer, size of previous layer)
b -- 偏置向量:(size of the current layer, 1)
Returns:
Z -- 激活函数的输入：pre-activation parameter
cache -- a python tuple containing "A", "W" and "b" ;
误差反向传播是需要用到
"""
Z = np.dot(W, A)+b
cache = (A, W, b)
return Z, cache

4.2 - Linear-Activation Forward

• Sigmoid: $\sigma \left(Z\right)=\sigma \left(WA+b\right)=\frac{1}{1+{e}^{-\left(WA+b\right)}}$$\sigma(Z) = \sigma(W A + b) = \frac{1}{ 1 + e^{-(W A + b)}}$.
在上面的helper function部分，已经实现了这个函数， sigmoid function. 这个函数返回两个值．一个是activation value “a” 和 a “cache“,cache中包含”Z“：函数的输入值．
A, activation_cache = sigmoid(Z)
• ReLU:　$A=RELU\left(Z\right)=max\left(0,Z\right)$$A = RELU(Z) = max(0, Z)$.
在helper function部分也提供了这个函数的实现，函数返回的结果也是两个值，activation value “A” 和 “cache“，cache中包含”Z”
A, activation_cache = relu(Z)

(linear->activation)＝liner_acivate_forward().

${A}^{\left[l\right]}=g\left({Z}^{\left[l\right]}\right)=g\left({W}^{\left[l\right]}{A}^{\left[l-1\right]}+{b}^{\left[l\right]}\right)$$A^{[l]} = g(Z^{[l]}) = g(W^{[l]}A^{[l-1]} +b^{[l]})$

# GRADED FUNCTION: linear_activation_forward

def linear_activation_forward(A_prev, W, b, activation):
"""
A_prev -- 上一层的输出或者输入层输入: (size of previous layer, number of examples)
W -- weights matrix:(size of current layer, size of previous layer)
b -- bias vector：(size of the current layer, 1)
activation -- 该层使用的激活函数"sigmoid" or "relu"
Returns:
A -- 激活后的输出值post-activation value
cache -- a python tuple containing "linear_cache" and "activation_cache";
"""
# Inputs: "A_prev, W, b". Outputs: "A, activation_cache".
Z, linear_cache = linear_forward(A_prev, W, b)

if activation == "sigmoid":
A, activation_cache = sigmoid(Z)
elif activation == "relu":
A, activation_cache = relu(Z)
cache = (linear_cache,activation_cache)
return A, cache

d) L-Layer Model

Exercise: L_model_forward(X, parameters)

Instruction: 在下面的代码中变量 AL${A}^{\left[L\right]}=\sigma \left({Z}^{\left[L\right]}\right)=\sigma \left({W}^{\left[L\right]}{A}^{\left[L-1\right]}+{b}^{\left[L\right]}\right)$$A^{[L]} = \sigma(Z^{[L]}) = \sigma(W^{[L]} A^{[L-1]} + b^{[L]})$.
(有时候也用Yhat来表示 $\stackrel{^}{Y}$$\hat{Y}$.)

Tips:
- 循环执行linear_activation_forward (L-1) times
- 不要忘记保存中间计算结果到”caches” list中.

# GRADED FUNCTION: L_model_forward

def L_model_forward(X, parameters):
"""
Implement forward propagation for the [LINEAR->RELU]*(L-1)->LINEAR->SIGMOID computation
X -- data, numpy array of shape (input size, number of examples)
parameters -- output of initialize_parameters_deep()
Returns:
AL -- last post-activation value
caches -- list of caches containing:
every cache of linear_relu_forward() (there are L-1 of them, indexed from 0 to L-2)
the cache of linear_sigmoid_forward() (there is one, indexed L-1)
"""
caches = []
A = X
L = len(parameters) // 2                  # number of layers in the neural network

# Implement [LINEAR -> RELU]*(L-1). Add "cache" to the "caches" list.
for l in range(1, L):
A_prev = A
W = parameters['W'+str(l)]
b = parameters['b'+str(l)]
A, cache = linear_activation_forward(A, W, b, 'relu')
caches.append(cache)
# Implement LINEAR -> SIGMOID. Add "cache" to the "caches" list.
AL, cache = linear_activation_forward(A, parameters['W'+str(L)],
parameters['b'+str(L)], 'sigmoid')
caches.append(cache)

assert(AL.shape == (1,X.shape[1]))

return AL, caches

Great!现在我们已经完整的实现了正向传播的整个过程，从输入 X　到输出最后一层的结果向量 ${A}^{\left[L\right]}$$A^{[L]}$,下面同过 ${A}^{\left[L\right]}$$A^{[L]}$计算cost.

5 - Cost function

Exercise: 计算交叉熵损失 $J$$J$,

$\begin{array}{}\text{(7)}& -\frac{1}{m}\sum _{i=1}^{m}\left({y}^{\left(i\right)}\mathrm{log}\left({a}^{\left[L\right]\left(i\right)}\right)+\left(1-{y}^{\left(i\right)}\right)\mathrm{log}\left(1-{a}^{\left[L\right]\left(i\right)}\right)\right)\end{array}$

# GRADED FUNCTION: compute_cost

def compute_cost(AL, Y):
"""
Implement the cost function defined by equation (7).

Arguments:
AL -- shape (1, number of examples)
Y -- true "label" vectorshape (1, number of examples)

Returns:
cost -- cross-entropy cost
"""

m = Y.shape[1]

# Compute loss from aL and y.
cost = -1*(np.dot(Y, np.log(AL.T))+np.dot(np.log(1-AL), (1-Y).T))/m

cost = np.squeeze(cost)      # To make sure your cost's shape is what we expect (e.g. this turns [[17]] into 17).
assert(cost.shape == ())

return cost

6 - Backward propagation module

Reminder:

- 计算偏导loss $\frac{d\mathcal{L}\left({a}^{\left[2\right]},y\right)}{d{z}^{\left[1\right]}}$$\frac{d \mathcal{L}(a^{[2]},y)}{{dz^{[1]}}}$ 在一个两层的网络中:

$\begin{array}{}\text{(8)}& \frac{d\mathcal{L}\left({a}^{\left[2\right]},y\right)}{d{z}^{\left[1\right]}}=\frac{d\mathcal{L}\left({a}^{\left[2\right]},y\right)}{d{a}^{\left[2\right]}}\frac{d{a}^{\left[2\right]}}{d{z}^{\left[2\right]}}\frac{d{z}^{\left[2\right]}}{d{a}^{\left[1\right]}}\frac{d{a}^{\left[1\right]}}{d{z}^{\left[1\right]}}\end{array}$

• $d{W}^{\left[1\right]}=\frac{\mathrm{\partial }L}{\mathrm{\partial }{W}^{\left[1\right]}}$$dW^{[1]} = \frac{\partial L}{\partial W^{[1]}}$
• $d{W}^{\left[1\right]}=d{z}^{\left[1\right]}×\frac{\mathrm{\partial }{z}^{\left[1\right]}}{\mathrm{\partial }{W}^{\left[1\right]}}$$dW^{[1]} = dz^{[1]} \times \frac{\partial z^{[1]} }{\partial W^{[1]}}$.
• $d{b}^{\left[1\right]}=\frac{\mathrm{\partial }L}{\mathrm{\partial }{b}^{\left[1\right]}}$$db^{[1]} = \frac{\partial L}{\partial b^{[1]}}$
• $d{b}^{\left[1\right]}=d{z}^{\left[1\right]}×\frac{\mathrm{\partial }{z}^{\left[1\right]}}{\mathrm{\partial }{b}^{\left[1\right]}}$$db^{[1]} = dz^{[1]} \times \frac{\partial z^{[1]} }{\partial b^{[1]}}$.

This is all backpropagation.
3 steps:
- linear backward
- linear-> activation backward where ACTIVATION computes the derivative
- [linear -> ReLU] $×$$\times$ (L-1) -> linear -> sigmoid backward (whole model)

6.1 - Linear backward

For layer $l$$l$, the linear part is: ${Z}^{\left[l\right]}={W}^{\left[l\right]}{A}^{\left[l-1\right]}+{b}^{\left[l\right]}$$Z^{[l]} = W^{[l]} A^{[l-1]} + b^{[l]}$ (followed by an activation).

$\begin{array}{}\text{(8)}& d{W}^{\left[l\right]}=\frac{\mathrm{\partial }\mathcal{L}}{\mathrm{\partial }{W}^{\left[l\right]}}=\frac{1}{m}d{Z}^{\left[l\right]}{A}^{\left[l-1\right]T}\end{array}$

$\begin{array}{}\text{(9)}& d{b}^{\left[l\right]}=\frac{\mathrm{\partial }\mathcal{L}}{\mathrm{\partial }{b}^{\left[l\right]}}=\frac{1}{m}\sum _{i=1}^{m}d{Z}^{\left[l\right]\left(i\right)}\end{array}$

$\begin{array}{}\text{(10)}& d{A}^{\left[l-1\right]}=\frac{\mathrm{\partial }\mathcal{L}}{\mathrm{\partial }{A}^{\left[l-1\right]}}={W}^{\left[l\right]T}d{Z}^{\left[l\right]}\end{array}$

Exercise: Use the 3 formulas above to implement linear_backward().

# GRADED FUNCTION: linear_backward

def linear_backward(dZ, cache):
"""
dZ -- Gradient of the cost with respect to the linear output (of current layer l)
cache --  of values (A_prev, W, b) coming from the forward propagation in the current layer
dA_prev -- Gradient of the cost with respect to the activation (of the previous layer l-1), same shape as A_prev
dW -- Gradient of the cost with respect to W (current layer l), same shape as W
db -- Gradient of the cost with respect to b (current layer l), same shape as b
"""
A_prev, W, b = cache
m = A_prev.shape[1]

dW = np.dot(dZ, A_prev.T)/m
db = np.sum(dZ, axis=1,keepdims=True)/m
dA_prev = np.dot(W.T, dZ)

assert (dA_prev.shape == A_prev.shape)
assert (dW.shape == W.shape)
assert (db.shape == b.shape)

return dA_prev, dW, db

6.2 - Linear-Activation backward

linear_activation_backward.

• sigmoid_backward: Implements the backward propagation for SIGMOID unit.
dZ = sigmoid_backward(dA, activation_cache)
• relu_backward: Implements the backward propagation for RELU unit.
dZ = relu_backward(dA, activation_cache)

If $g\left(.\right)$$g(.)$ is the activation function,
sigmoid_backward and relu_backward compute

$\begin{array}{}\text{(11)}& d{Z}^{\left[l\right]}=d{A}^{\left[l\right]}\ast {g}^{\prime }\left({Z}^{\left[l\right]}\right)\end{array}$
.

Exercise: Implement the backpropagation for the LINEAR->ACTIVATION layer.

# GRADED FUNCTION: linear_activation_backward

def linear_activation_backward(dA, cache, activation):
"""
Implement the backward propagation for the LINEAR->ACTIVATION layer.
Arguments:
dA -- post-activation gradient for current layer l
cache -- tuple of values (linear_cache, activation_cache) we store for computing backward propagation efficiently
activation -- the activation to be used in this layer, stored as a text string: "sigmoid" or "relu"

Returns:
dA_prev -- Gradient of the cost with respect to the activation (of the previous layer l-1), same shape as A_prev
dW -- Gradient of the cost with respect to W (current layer l), same shape as W
db -- Gradient of the cost with respect to b (current layer l), same shape as b
"""
linear_cache, activation_cache = cache

if activation == "relu":

dZ = relu_backward(dA, activation_cache)
dA_prev, dW, db = linear_backward(dZ, linear_cache)
elif activation == "sigmoid":

dZ = sigmoid_backward(dA, activation_cache)
dA_prev, dW, db = linear_backward(dZ, linear_cache)
return dA_prev, dW, db

6.3 - L-Model Backward

• ${A}^{\left[L\right]}=\sigma \left({Z}^{\left[L\right]}\right)$$A^{[L]} = \sigma(Z^{[L]})$.
• $d{A}^{\left[L\right]}=\frac{\mathrm{\partial }\mathcal{L}}{\mathrm{\partial }{A}^{\left[L\right]}}$$d{A^{[L]}} = \frac{\partial \mathcal{L}}{\partial A^{[L]}}$.
dAL = - (np.divide(Y, AL) - np.divide(1 - Y, 1 - AL)) # derivative of cost with respect to AL

$\begin{array}{}\text{(15)}& grads\left["dW"+str\left(l\right)\right]=d{W}^{\left[l\right]}\end{array}$

For example, for $l=3$$l=3$ this would store $d{W}^{\left[l\right]}$$dW^{[l]}$ in grads["dW3"].

# GRADED FUNCTION: L_model_backward

def L_model_backward(AL, Y, caches):
"""
Implement the backward propagation for the [LINEAR->RELU] * (L-1) -> LINEAR -> SIGMOID group
Arguments:
AL -- probability vector, output of the forward propagation (L_model_forward())
Y -- true "label" vector (containing 0 if non-cat, 1 if cat)
caches -- list of caches containing:
every cache of linear_activation_forward() with "relu" (it's caches[l], for l in range(L-1) i.e l = 0...L-2)
the cache of linear_activation_forward() with "sigmoid" (it's caches[L-1])

Returns:
"""
L = len(caches) # the number of layers
m = AL.shape[1]
Y = Y.reshape(AL.shape) # after this line, Y is the same shape as AL

# Initializing the backpropagation

dAL = - (np.divide(Y, AL)- np.divide(1-Y, 1-AL))

current_cache = caches[L-1]

for l in reversed(range(L - 1)):
# lth layer: (RELU -> LINEAR) gradients.
# Inputs: "grads["dA" + str(l + 1)], caches". Outputs: "grads["dA" + str(l + 1)] , grads["dW" + str(l + 1)] , grads["db" + str(l + 1)]

current_cache = caches[l]
dA_prev_temp, dW_temp, db_temp = linear_activation_backward(grads['dA'+str(l+1)],current_cache, 'relu')
grads["dA" + str(l + 1)] = dA_prev_temp
grads["dW" + str(l + 1)] = dW_temp
grads["db" + str(l + 1)] = db_temp

return grads

6.4 - Update Parameters

In this section you will update the parameters of the model, using gradient descent:

where $\alpha$$\alpha$ is the learning rate. After computing the updated parameters, store them in the parameters dictionary.

Exercise: Implement update_parameters() to update your parameters using gradient descent.

Instructions:
Update parameters using gradient descent on every ${W}^{\left[l\right]}$$W^{[l]}$ and ${b}^{\left[l\right]}$$b^{[l]}$ for $l=1,2,...,L$$l = 1, 2, ..., L$.

# GRADED FUNCTION: update_parameters

"""

Arguments:
parameters -- python dictionary containing your parameters

Returns:
parameters -- python dictionary containing your updated parameters
parameters["W" + str(l)] = ...
parameters["b" + str(l)] = ...
"""

L = len(parameters) // 2 # number of layers in the neural network

# Update rule for each parameter. Use a for loop.

for l in range(L):

return parameters

7 - Conclusion

Congrats on implementing all the functions required for building a deep neural network!

We know it was a long assignment but going forward it will only get better. The next part of the assignment is easier.

In the next assignment you will put all these together to build two models:
- A two-layer neural network
- An L-layer neural network

You will in fact use these models to classify cat vs non-cat images!

• 广告
• 抄袭
• 版权
• 政治
• 色情
• 无意义
• 其他

120