1.4 深层神经网络
1.4.1 深层神经网络
符号定义:
- 层数: L = 4 L=4 L=4;输入层的索引为“0”;
- n [ l ] {n}^{[l]} n[l]:代表第l层有多少个神经元, n [ 1 ] = 5 {n}^{[1]}=5 n[1]=5, n [ 2 ] = 5 {n}^{[2]}=5 n[2]=5, n [ 3 ] = 3 {n}^{[3]}=3 n[3]=3, n [ 4 ] {{n}^{[4]}} n[4]= n [ L ] = 1 {{n}^{[L]}}=1 n[L]=1(输出单元为1); n [ 0 ] = n x = 3 {n}^{[0]}={n}_{x}=3 n[0]=nx=3(输入层);
- a [ l ] {a}^{[l]} a[l]代表第l层激活后结果;
- w [ l ] {w}^{[l]} w[l]代表第l层计算 z [ l ] {z}^{[l]} z[l]值的权重;
1.4.2 前向传播和反向传播
前向传播:
第
一
层
z
[
1
]
=
w
[
1
]
x
+
b
[
1
]
,
a
[
1
]
=
g
[
1
]
(
z
[
1
]
)
第一层{{z}^{[1]}}={{w}^{[1]}}x+{{b}^{[1]}},{{a}^{[1]}}={{g}^{[1]}} {({z}^{[1]})}
第一层z[1]=w[1]x+b[1],a[1]=g[1](z[1])
第 二 层 z [ 2 ] = w [ 2 ] a [ 1 ] + b [ 2 ] , a [ 2 ] = g [ 2 ] ( z [ 2 ] ) 第二层{{z}^{[2]}}={{w}^{[2]}}{{a}^{[1]}}+{{b}^{[2]}},{{a}^{[2]}}={{g}^{[2]}} {({z}^{[2]})} 第二层z[2]=w[2]a[1]+b[2],a[2]=g[2](z[2])
. . . ... ...
第 四 层 为 z [ 4 ] = w [ 4 ] a [ 3 ] + b [ 4 ] , a [ 4 ] = g [ 4 ] ( z [ 4 ] ) 第四层为{{z}^{[4]}}={{w}^{[4]}}{{a}^{[3]}}+{{b}^{[4]}},{{a}^{[4]}}={{g}^{[4]}} {({z}^{[4]})} 第四层为z[4]=w[4]a[3]+b[4],a[4]=g[4](z[4])
归 纳 为 多 次 迭 代 z [ l ] = w [ l ] a [ l − 1 ] + b [ l ] , a [ l ] = g [ l ] ( z [ l ] ) 归纳为多次迭代{{z}^{[l]}}={{w}^{[l]}}{{a}^{[l-1]}}+{{b}^{[l]}},{{a}^{[l]}}={{g}^{[l]}} {({z}^{[l]})} 归纳为多次迭代z[l]=w[l]a[l−1]+b[l],a[l]=g[l](z[l])
向 量 化 : z [ l ] = W [ l ] ⋅ A [ l − 1 ] + b [ l ] , A [ l ] = g [ l ] ( Z [ l ] ) 向量化: {z}^{[l]}={W}^{[l]}\cdot {A}^{[l-1]}+{b}^{[l]},{A}^{[l]}={g}^{[l]}({Z}^{[l]}) 向量化:z[l]=W[l]⋅A[l−1]+b[l],A[l]=g[l](Z[l])
变量维度:
w
[
l
]
:
R
(
n
[
l
]
,
n
[
l
−
1
]
)
{{w}^{[l]}}: R^{({{n}^{[l]}}, {{n}^{[l-1]}})}
w[l]:R(n[l],n[l−1]);
b [ l ] {{b}^{[l]}} b[l] : R ( n [ l ] , 1 ) R^{(n^{[l]},1)} R(n[l],1);
z [ l ] {{z}^{[l]}} z[l], R ( n [ l ] , 1 ) R^{(n^{[l]},1)} R(n[l],1);
向量化后的维度:
Z
[
l
]
=
(
z
[
l
]
[
1
]
,
z
[
l
]
[
2
]
,
z
[
l
]
[
3
]
,
…
,
z
[
l
]
[
m
]
)
{Z}^{[l]}=({{z}^{[l][1]}},{{z}^{[l][2]}},{{z}^{[l][3]}},…,{{z}^{[l][m]}})
Z[l]=(z[l][1],z[l][2],z[l][3],…,z[l][m]),
Z
[
l
]
∈
(
n
[
l
]
,
m
)
{Z}^{[l]}\in({{n}^{[l]}},m)
Z[l]∈(n[l],m)
A [ l ] ∈ ( n [ l ] , m ) {A}^{[l]} \in ({n}^{[l]},m) A[l]∈(n[l],m), A [ 0 ] = X ∈ ( n [ l ] , m ) {A}^{[0]} = X \in ({n}^{[l]},m) A[0]=X∈(n[l],m)
反向传播:
(1) d z [ l ] = d a [ l ] ∗ g [ l ] ′ ( z [ l ] ) d{{z}^{[l]}}=d{{a}^{[l]}}*{{g}^{[l]}}'( {{z}^{[l]}}) dz[l]=da[l]∗g[l]′(z[l])
(2) d w [ l ] = d z [ l ] ⋅ a [ l − 1 ] d{{w}^{[l]}}=d{{z}^{[l]}}\cdot{{a}^{[l-1]}}~ dw[l]=dz[l]⋅a[l−1]
(3) d b [ l ] = d z [ l ] d{{b}^{[l]}}=d{{z}^{[l]}}~~ db[l]=dz[l]
(4) d a [ l − 1 ] = w [ l ] T ⋅ d z [ l ] d{{a}^{[l-1]}}={{w}^{\left[ l \right]T}}\cdot {{dz}^{[l]}} da[l−1]=w[l]T⋅dz[l]
(5) d z [ l ] = w [ l + 1 ] T d z [ l + 1 ] ⋅ g [ l ] ′ ( z [ l ] ) d{{z}^{[l]}}={{w}^{[l+1]T}}d{{z}^{[l+1]}}\cdot \text{ }{{g}^{[l]}}'( {{z}^{[l]}})~ dz[l]=w[l+1]Tdz[l+1]⋅ g[l]′(z[l])
式子(5)由式子(4)带入式子(1)得到,前四个式子就可实现反向函数。
向量化:
(6) d Z [ l ] = d A [ l ] ∗ g [ l ] ′ ( Z [ l ] ) d{{Z}^{[l]}}=d{{A}^{[l]}}*{{g}^{\left[ l \right]}}'\left({{Z}^{[l]}} \right)~~ dZ[l]=dA[l]∗g[l]′(Z[l])
(7) d W [ l ] = 1 m d Z [ l ] ⋅ A [ l − 1 ] T d{{W}^{[l]}}=\frac{1}{m}\text{}d{{Z}^{[l]}}\cdot {{A}^{\left[ l-1 \right]T}} dW[l]=m1dZ[l]⋅A[l−1]T
(8) d b [ l ] = 1 m n p . s u m ( d z [ l ] , a x i s = 1 , k e e p d i m s = T r u e ) d{{b}^{[l]}}=\frac{1}{m}\text{ }np.sum(d{{z}^{[l]}},axis=1,keepdims=True) db[l]=m1 np.sum(dz[l],axis=1,keepdims=True)
(9) d A [ l − 1 ] = W [ l ] T . d Z [ l ] d{{A}^{[l-1]}}={{W}^{\left[ l \right]T}}.d{{Z}^{[l]}} dA[l−1]=W[l]T.dZ[l]