Neural Network Representation
Computing a Neural Network’s Output
第 l l l 层的第 i i i 个神经元(单个样本):
- 参数: w i [ l ] = [ w 1 [ l ] w 2 [ l ] ⋮ w n [ l − 1 ] [ l ] ] , b i [ l ] w_i^{[l]}=\begin{bmatrix}w_1^{[l]} \\ w_2^{[l]} \\ \vdots \\ w_{n^{[l-1]}}^{[l]}\end{bmatrix}, b_i^{[l]} wi[l]= w1[l]w2[l]⋮wn[l−1][l] ,bi[l]
- 输入: a [ l − 1 ] , s h a p e = ( n [ l − 1 ] , 1 ) a^{[l-1]}, shape = (n^{[l-1]}, 1) a[l−1],shape=(n[l−1],1)
- 执行两步计算:
- z i [ l ] = w i [ l ] T a [ l − 1 ] + b i [ l ] z_i^{[l]} = w_i^{[l]T}a^{[l-1]}+b_i^{[l]} zi[l]=wi[l]Ta[l−1]+bi[l]
- a i [ l ] = σ ( z i [ l ] ) a_i^{[l]} = \sigma{(z_i^{[l]})} ai[l]=σ(zi[l])
- 输出:
a
i
[
l
]
,
s
c
a
l
a
r
a_i^{[l]}, scalar
ai[l],scalar
第 l l l 层(单个样本)
非矢量化
z 1 [ l ] = w 1 [ l ] T a [ l − 1 ] + b 1 [ l ] , a 1 [ l ] = σ ( z 1 [ l ] ) z 2 [ l ] = w 2 [ l ] T a [ l − 1 ] + b 2 [ l ] , a 2 [ l ] = σ ( z 2 [ l ] ) ⋮ z n [ l ] [ l ] = w n [ l ] [ l ] T a [ l − 1 ] + b n [ l ] [ l ] , a n [ l ] [ l ] = σ ( z n [ l ] [ l ] ) z_1^{[l]} = w_1^{[l]T}a^{[l-1]}+b_1^{[l]}, a_1^{[l]} = \sigma{(z_1^{[l]})}\\ z_2^{[l]} = w_2^{[l]T}a^{[l-1]}+b_2^{[l]}, a_2^{[l]} = \sigma{(z_2^{[l]})}\\ \vdots \\ z_{n^{[l]}}^{[l]} = w_{n^{[l]}}^{[l]T}a^{[l-1]}+b_{n^{[l]}}^{[l]}, a_{n^{[l]}}^{[l]} = \sigma{(z_{n^{[l]}}^{[l]})} z1[l]=w1[l]Ta[l−1]+b1[l],a1[l]=σ(z1[l])z2[l]=w2[l]Ta[l−1]+b2[l],a2[l]=σ(z2[l])⋮zn[l][l]=wn[l][l]Ta[l−1]+bn[l][l],an[l][l]=σ(zn[l][l])
不同的下标对应某一层中不同的神经元,这组公式实际上是对该层的每一个神经元都执行了相同的计算,下标 i i i 从 1 1 1 变化到 n [ l ] n^{[l]} n[l] 分别对应该层的第 1 1 1 到第 n [ l ] n^{[l]} n[l] 个神经元。
矢量化
这一步矢量化的目的是让每一层的所有神经元同时进行计算,也就是将上面的 n [ l ] n^{[l]} n[l] 个公式合为一个,也就是与“层”相关的矢量化。
矢量化的方法:将与“层”相关的量—— w, b 一行一行地堆叠起来 / 按行排列 (stack by column)
W
[
l
]
=
[
−
−
w
1
[
l
]
T
−
−
−
−
w
2
[
l
]
T
−
−
⋮
−
−
w
n
[
l
]
[
l
]
T
−
−
]
,
b
[
l
]
=
[
b
1
[
l
]
b
2
[
l
]
⋮
b
n
[
l
]
[
l
]
]
W^{[l]} = \begin{bmatrix} --w_1^{[l]T}--\\ --w_2^{[l]T}--\\ \vdots \\ --w_{n^{[l]}}^{[l]T}-- \end{bmatrix}, b^{[l]} = \begin{bmatrix} b_1^{[l]} \\ b_2^{[l]} \\ \vdots \\ b_{n^{[l]}}^{[l]} \end{bmatrix}
W[l]=
−−w1[l]T−−−−w2[l]T−−⋮−−wn[l][l]T−−
,b[l]=
b1[l]b2[l]⋮bn[l][l]
记号
与样本相关的量:x, z, a (stack by column)
X
=
A
[
0
]
=
[
∣
∣
.
.
.
∣
x
(
1
)
x
(
2
)
.
.
.
x
(
m
)
∣
∣
.
.
.
∣
]
Z
[
l
]
=
[
∣
∣
.
.
.
∣
z
[
l
]
(
1
)
z
[
l
]
(
2
)
.
.
.
z
[
l
]
(
m
)
∣
∣
.
.
.
∣
]
A
[
l
]
=
[
∣
∣
∣
a
[
l
−
1
]
(
1
)
a
[
l
−
1
]
(
2
)
.
.
.
a
[
l
−
1
]
(
m
)
∣
∣
∣
]
\begin{aligned} &X = A^{[0]} = \begin{bmatrix} | & | & ... & | \\ x^{(1)} & x^{(2)} & ... & x^{(m)} \\ | & | & ... & | \end{bmatrix}\\ &Z^{[l]} = \begin{bmatrix} | & | & ... & | \\ z^{[l](1)} & z^{[l](2)} & ... & z^{[l](m)}\\ | & | & ... & | \end{bmatrix}\\ &A^{[l]} = \begin{bmatrix} | & | & & | \\ a^{[l-1](1)} & a^{[l-1](2)} & ... & a^{[l-1](m)} \\ | & | & & | \end{bmatrix} \end{aligned}
X=A[0]=
∣x(1)∣∣x(2)∣.........∣x(m)∣
Z[l]=
∣z[l](1)∣∣z[l](2)∣.........∣z[l](m)∣
A[l]=
∣a[l−1](1)∣∣a[l−1](2)∣...∣a[l−1](m)∣
第 l l l 层的前向传播计算公式
Z
[
l
]
=
W
[
l
]
A
[
l
−
1
]
+
b
[
l
]
=
[
−
−
w
1
[
l
]
T
−
−
−
−
w
2
[
l
]
T
−
−
⋮
−
−
w
n
[
l
]
T
[
l
]
−
−
]
[
∣
∣
∣
a
[
l
−
1
]
(
1
)
a
[
l
−
1
]
(
2
)
.
.
.
a
[
l
−
1
]
(
m
)
∣
∣
∣
]
+
[
b
1
[
l
]
b
2
[
l
]
⋮
b
n
[
l
]
[
l
]
]
=
[
w
1
[
l
]
T
a
[
l
−
1
]
(
1
)
.
.
.
w
1
[
l
]
T
a
[
l
−
1
]
(
m
)
w
2
[
l
]
T
a
[
l
−
1
]
(
1
)
.
.
.
w
2
[
l
]
T
a
[
l
−
1
]
(
m
)
⋮
.
.
.
⋮
w
n
[
l
]
[
l
]
a
[
l
−
1
]
(
1
)
.
.
.
w
n
[
l
]
[
l
]
a
[
l
−
1
]
(
m
)
]
+
[
b
1
[
l
]
.
.
.
b
1
[
l
]
b
2
[
l
]
.
.
.
b
2
[
l
]
⋮
.
.
.
⋮
b
n
[
l
]
[
l
]
.
.
.
b
n
[
l
]
[
l
]
]
=
[
w
1
[
l
]
a
[
l
−
1
]
(
1
)
+
b
1
[
l
]
.
.
.
w
1
[
l
]
a
[
l
−
1
]
(
m
)
+
b
1
[
l
]
w
2
[
l
]
a
[
l
−
1
]
(
1
)
+
b
2
[
l
]
.
.
.
w
2
[
l
]
a
[
l
−
1
]
(
m
)
+
b
2
[
l
]
⋮
.
.
.
⋮
w
n
[
l
]
[
l
]
a
[
l
−
1
]
(
1
)
+
b
n
[
l
]
[
l
]
.
.
.
w
n
[
l
]
[
l
]
a
[
l
−
1
]
(
m
)
+
b
n
[
l
]
[
l
]
]
=
[
z
1
[
l
]
(
1
)
.
.
.
z
1
[
l
]
(
m
)
z
2
[
l
]
(
1
)
.
.
.
z
1
[
l
]
(
m
)
⋮
.
.
.
⋮
z
n
[
l
]
[
l
]
(
1
)
.
.
.
z
n
[
l
]
[
l
]
(
m
)
]
=
[
∣
∣
.
.
.
∣
z
[
l
]
(
1
)
z
[
l
]
(
2
)
.
.
.
z
[
l
]
(
m
)
∣
∣
.
.
.
∣
]
\begin{aligned} Z^{[l]} &= W^{[l]}A^{[l-1]}+b^{[l]}\\ &=\begin{bmatrix} --w_1^{[l]T}--\\ --w_2^{[l]T}--\\ \vdots \\ --w_{n^{[l]}T}^{[l]}-- \end{bmatrix} \begin{bmatrix} | & | & & | \\ a^{[l-1](1)} & a^{[l-1](2)} & ... & a^{[l-1](m)} \\ | & | & & | \end{bmatrix} +\begin{bmatrix} b_1^{[l]} \\ b_2^{[l]} \\ \vdots \\ b_{n^{[l]}}^{[l]} \end{bmatrix}\\ &=\begin{bmatrix} w_1^{[l]T}a^{[l-1](1)} & ... & w_1^{[l]T}a^{[l-1](m)} \\ w_2^{[l]T}a^{[l-1](1)} & ... &w_2^{[l]T}a^{[l-1](m)}\\ \vdots & ...&\vdots\\ w_{n^{[l]}}^{[l]}a^{[l-1](1)} &...& w_{n^{[l]}}^{[l]}a^{[l-1](m)} \end{bmatrix} +\begin{bmatrix} b_1^{[l]} & ... & b_1^{[l]}\\ b_2^{[l]} & ... & b_2^{[l]}\\ \vdots & ... & \vdots\\ b_{n^{[l]}}^{[l]} & ... & b_{n^{[l]}}^{[l]} \end{bmatrix}\\ &=\begin{bmatrix} w_1^{[l]}a^{[l-1](1)}+b_1^{[l]} & ... & w_1^{[l]}a^{[l-1](m)}+b_1^{[l]}\\ w_2^{[l]}a^{[l-1](1)}+b_2^{[l]} & ... & w_2^{[l]}a^{[l-1](m)}+b_2^{[l]}\\ \vdots & ... & \vdots\\ w_{n^{[l]}}^{[l]}a^{[l-1](1)}+b_{n^{[l]}}^{[l]} & ... & w_{n^{[l]}}^{[l]}a^{[l-1](m)}+b_{n^{[l]}}^{[l]} \end{bmatrix}\\ &=\begin{bmatrix} z_1^{[l](1)} & ... & z_1^{[l](m)}\\ z_2^{[l](1)} & ... & z_1^{[l](m)}\\ \vdots & ... & \vdots\\ z_{n^{[l]}}^{[l](1)} & ... & z_{n^{[l]}}^{[l](m)} \end{bmatrix}\\ &=\begin{bmatrix} | & | & ... & | \\ z^{[l](1)} & z^{[l](2)} & ... & z^{[l](m)}\\ | & | & ... & | \end{bmatrix} \end{aligned}\\
Z[l]=W[l]A[l−1]+b[l]=
−−w1[l]T−−−−w2[l]T−−⋮−−wn[l]T[l]−−
∣a[l−1](1)∣∣a[l−1](2)∣...∣a[l−1](m)∣
+
b1[l]b2[l]⋮bn[l][l]
=
w1[l]Ta[l−1](1)w2[l]Ta[l−1](1)⋮wn[l][l]a[l−1](1)............w1[l]Ta[l−1](m)w2[l]Ta[l−1](m)⋮wn[l][l]a[l−1](m)
+
b1[l]b2[l]⋮bn[l][l]............b1[l]b2[l]⋮bn[l][l]
=
w1[l]a[l−1](1)+b1[l]w2[l]a[l−1](1)+b2[l]⋮wn[l][l]a[l−1](1)+bn[l][l]............w1[l]a[l−1](m)+b1[l]w2[l]a[l−1](m)+b2[l]⋮wn[l][l]a[l−1](m)+bn[l][l]
=
z1[l](1)z2[l](1)⋮zn[l][l](1)............z1[l](m)z1[l](m)⋮zn[l][l](m)
=
∣z[l](1)∣∣z[l](2)∣.........∣z[l](m)∣
A
[
l
]
=
σ
(
Z
[
l
]
)
=
σ
(
[
∣
∣
.
.
.
∣
z
[
l
]
(
1
)
z
[
l
]
(
2
)
.
.
.
z
[
l
]
(
m
)
∣
∣
.
.
.
∣
]
)
=
[
∣
∣
.
.
.
∣
σ
(
z
[
l
]
(
1
)
)
σ
(
z
[
l
]
(
2
)
)
.
.
.
σ
(
z
[
l
]
(
m
)
)
∣
∣
.
.
.
∣
]
=
[
∣
∣
∣
a
[
l
−
1
]
(
1
)
a
[
l
−
1
]
(
2
)
.
.
.
a
[
l
−
1
]
(
m
)
∣
∣
∣
]
\begin{aligned} A^{[l]} &= \sigma (Z^{[l]}) \\ &= \sigma (\begin{bmatrix} | & | & ... & | \\ z^{[l](1)} & z^{[l](2)} & ... & z^{[l](m)}\\ | & | & ... & | \end{bmatrix}) \\ &=\begin{bmatrix} | & | & ... & | \\ \sigma(z^{[l](1)}) & \sigma(z^{[l](2)}) & ... & \sigma(z^{[l](m)})\\ | & | & ... & | \end{bmatrix}\\ &=\begin{bmatrix} | & | & & | \\ a^{[l-1](1)} & a^{[l-1](2)} & ... & a^{[l-1](m)} \\ | & | & & | \end{bmatrix} \end{aligned}
A[l]=σ(Z[l])=σ(
∣z[l](1)∣∣z[l](2)∣.........∣z[l](m)∣
)=
∣σ(z[l](1))∣∣σ(z[l](2))∣.........∣σ(z[l](m))∣
=
∣a[l−1](1)∣∣a[l−1](2)∣...∣a[l−1](m)∣
整个神经网络的前向传播计算公式
A [ 0 ] = X = [ ∣ ∣ . . . ∣ x ( 1 ) x ( 2 ) . . . x ( m ) ∣ ∣ . . . ∣ ] Z [ 1 ] = W [ 1 ] A [ 0 ] + b [ 1 ] , A [ 1 ] = σ ( Z [ 1 ] ) Z [ 2 ] = W [ 2 ] A [ 1 ] + b [ 2 ] , A [ 2 ] = σ ( Z [ 2 ] ) ⋮ Z [ l ] = W [ l ] A [ l − 1 ] + b [ l ] , A [ l ] = σ ( Z [ l ] ) A^{[0]} = X = \begin{bmatrix} | & | & ... & | \\ x^{(1)} & x^{(2)} & ... & x^{(m)} \\ | & | & ... & | \end{bmatrix} \\ Z^{[1]} = W^{[1]}A^{[0]}+b^{[1]}, A^{[1]} = \sigma (Z^{[1]}) \\ Z^{[2]} = W^{[2]}A^{[1]}+b^{[2]}, A^{[2]} = \sigma (Z^{[2]}) \\ \vdots \\ Z^{[l]} = W^{[l]}A^{[l-1]}+b^{[l]}, A^{[l]} = \sigma (Z^{[l]}) A[0]=X= ∣x(1)∣∣x(2)∣.........∣x(m)∣ Z[1]=W[1]A[0]+b[1],A[1]=σ(Z[1])Z[2]=W[2]A[1]+b[2],A[2]=σ(Z[2])⋮Z[l]=W[l]A[l−1]+b[l],A[l]=σ(Z[l])