【Course1】3 One hidden layer Neural Network

Neural Network Representation

Computing a Neural Network’s Output

l l l 层的第 i i i 个神经元(单个样本):

  • 参数 w i [ l ] = [ w 1 [ l ] w 2 [ l ] ⋮ w n [ l − 1 ] [ l ] ] , b i [ l ] w_i^{[l]}=\begin{bmatrix}w_1^{[l]} \\ w_2^{[l]} \\ \vdots \\ w_{n^{[l-1]}}^{[l]}\end{bmatrix}, b_i^{[l]} wi[l]= w1[l]w2[l]wn[l1][l] ,bi[l]
  • 输入 a [ l − 1 ] , s h a p e = ( n [ l − 1 ] , 1 ) a^{[l-1]}, shape = (n^{[l-1]}, 1) a[l1],shape=(n[l1],1)
  • 执行两步计算
    1. z i [ l ] = w i [ l ] T a [ l − 1 ] + b i [ l ] z_i^{[l]} = w_i^{[l]T}a^{[l-1]}+b_i^{[l]} zi[l]=wi[l]Ta[l1]+bi[l]
    2. a i [ l ] = σ ( z i [ l ] ) a_i^{[l]} = \sigma{(z_i^{[l]})} ai[l]=σ(zi[l])
  • 输出 a i [ l ] , s c a l a r a_i^{[l]}, scalar ai[l],scalar
    在这里插入图片描述

l l l 层(单个样本)

非矢量化

z 1 [ l ] = w 1 [ l ] T a [ l − 1 ] + b 1 [ l ] , a 1 [ l ] = σ ( z 1 [ l ] ) z 2 [ l ] = w 2 [ l ] T a [ l − 1 ] + b 2 [ l ] , a 2 [ l ] = σ ( z 2 [ l ] ) ⋮ z n [ l ] [ l ] = w n [ l ] [ l ] T a [ l − 1 ] + b n [ l ] [ l ] , a n [ l ] [ l ] = σ ( z n [ l ] [ l ] ) z_1^{[l]} = w_1^{[l]T}a^{[l-1]}+b_1^{[l]}, a_1^{[l]} = \sigma{(z_1^{[l]})}\\ z_2^{[l]} = w_2^{[l]T}a^{[l-1]}+b_2^{[l]}, a_2^{[l]} = \sigma{(z_2^{[l]})}\\ \vdots \\ z_{n^{[l]}}^{[l]} = w_{n^{[l]}}^{[l]T}a^{[l-1]}+b_{n^{[l]}}^{[l]}, a_{n^{[l]}}^{[l]} = \sigma{(z_{n^{[l]}}^{[l]})} z1[l]=w1[l]Ta[l1]+b1[l],a1[l]=σ(z1[l])z2[l]=w2[l]Ta[l1]+b2[l],a2[l]=σ(z2[l])zn[l][l]=wn[l][l]Ta[l1]+bn[l][l],an[l][l]=σ(zn[l][l])

不同的下标对应某一层中不同的神经元,这组公式实际上是对该层的每一个神经元都执行了相同的计算,下标 i i i 1 1 1 变化到 n [ l ] n^{[l]} n[l] 分别对应该层的第 1 1 1 到第 n [ l ] n^{[l]} n[l] 个神经元。

在这里插入图片描述

矢量化

这一步矢量化的目的是让每一层的所有神经元同时进行计算,也就是将上面的 n [ l ] n^{[l]} n[l] 个公式合为一个,也就是与“层”相关的矢量化。

矢量化的方法:将与“层”相关的量—— w, b 一行一行地堆叠起来 / 按行排列 (stack by column)
W [ l ] = [ − − w 1 [ l ] T − − − − w 2 [ l ] T − − ⋮ − − w n [ l ] [ l ] T − − ] , b [ l ] = [ b 1 [ l ] b 2 [ l ] ⋮ b n [ l ] [ l ] ] W^{[l]} = \begin{bmatrix} --w_1^{[l]T}--\\ --w_2^{[l]T}--\\ \vdots \\ --w_{n^{[l]}}^{[l]T}-- \end{bmatrix}, b^{[l]} = \begin{bmatrix} b_1^{[l]} \\ b_2^{[l]} \\ \vdots \\ b_{n^{[l]}}^{[l]} \end{bmatrix} W[l]= w1[l]Tw2[l]Twn[l][l]T ,b[l]= b1[l]b2[l]bn[l][l]

记号

与样本相关的量:x, z, a (stack by column)
X = A [ 0 ] = [ ∣ ∣ . . . ∣ x ( 1 ) x ( 2 ) . . . x ( m ) ∣ ∣ . . . ∣ ] Z [ l ] = [ ∣ ∣ . . . ∣ z [ l ] ( 1 ) z [ l ] ( 2 ) . . . z [ l ] ( m ) ∣ ∣ . . . ∣ ] A [ l ] = [ ∣ ∣ ∣ a [ l − 1 ] ( 1 ) a [ l − 1 ] ( 2 ) . . . a [ l − 1 ] ( m ) ∣ ∣ ∣ ] \begin{aligned} &X = A^{[0]} = \begin{bmatrix} | & | & ... & | \\ x^{(1)} & x^{(2)} & ... & x^{(m)} \\ | & | & ... & | \end{bmatrix}\\ &Z^{[l]} = \begin{bmatrix} | & | & ... & | \\ z^{[l](1)} & z^{[l](2)} & ... & z^{[l](m)}\\ | & | & ... & | \end{bmatrix}\\ &A^{[l]} = \begin{bmatrix} | & | & & | \\ a^{[l-1](1)} & a^{[l-1](2)} & ... & a^{[l-1](m)} \\ | & | & & | \end{bmatrix} \end{aligned} X=A[0]= x(1)x(2).........x(m) Z[l]= z[l](1)z[l](2).........z[l](m) A[l]= a[l1](1)a[l1](2)...a[l1](m)

l l l 层的前向传播计算公式

Z [ l ] = W [ l ] A [ l − 1 ] + b [ l ] = [ − − w 1 [ l ] T − − − − w 2 [ l ] T − − ⋮ − − w n [ l ] T [ l ] − − ] [ ∣ ∣ ∣ a [ l − 1 ] ( 1 ) a [ l − 1 ] ( 2 ) . . . a [ l − 1 ] ( m ) ∣ ∣ ∣ ] + [ b 1 [ l ] b 2 [ l ] ⋮ b n [ l ] [ l ] ] = [ w 1 [ l ] T a [ l − 1 ] ( 1 ) . . . w 1 [ l ] T a [ l − 1 ] ( m ) w 2 [ l ] T a [ l − 1 ] ( 1 ) . . . w 2 [ l ] T a [ l − 1 ] ( m ) ⋮ . . . ⋮ w n [ l ] [ l ] a [ l − 1 ] ( 1 ) . . . w n [ l ] [ l ] a [ l − 1 ] ( m ) ] + [ b 1 [ l ] . . . b 1 [ l ] b 2 [ l ] . . . b 2 [ l ] ⋮ . . . ⋮ b n [ l ] [ l ] . . . b n [ l ] [ l ] ] = [ w 1 [ l ] a [ l − 1 ] ( 1 ) + b 1 [ l ] . . . w 1 [ l ] a [ l − 1 ] ( m ) + b 1 [ l ] w 2 [ l ] a [ l − 1 ] ( 1 ) + b 2 [ l ] . . . w 2 [ l ] a [ l − 1 ] ( m ) + b 2 [ l ] ⋮ . . . ⋮ w n [ l ] [ l ] a [ l − 1 ] ( 1 ) + b n [ l ] [ l ] . . . w n [ l ] [ l ] a [ l − 1 ] ( m ) + b n [ l ] [ l ] ] = [ z 1 [ l ] ( 1 ) . . . z 1 [ l ] ( m ) z 2 [ l ] ( 1 ) . . . z 1 [ l ] ( m ) ⋮ . . . ⋮ z n [ l ] [ l ] ( 1 ) . . . z n [ l ] [ l ] ( m ) ] = [ ∣ ∣ . . . ∣ z [ l ] ( 1 ) z [ l ] ( 2 ) . . . z [ l ] ( m ) ∣ ∣ . . . ∣ ] \begin{aligned} Z^{[l]} &= W^{[l]}A^{[l-1]}+b^{[l]}\\ &=\begin{bmatrix} --w_1^{[l]T}--\\ --w_2^{[l]T}--\\ \vdots \\ --w_{n^{[l]}T}^{[l]}-- \end{bmatrix} \begin{bmatrix} | & | & & | \\ a^{[l-1](1)} & a^{[l-1](2)} & ... & a^{[l-1](m)} \\ | & | & & | \end{bmatrix} +\begin{bmatrix} b_1^{[l]} \\ b_2^{[l]} \\ \vdots \\ b_{n^{[l]}}^{[l]} \end{bmatrix}\\ &=\begin{bmatrix} w_1^{[l]T}a^{[l-1](1)} & ... & w_1^{[l]T}a^{[l-1](m)} \\ w_2^{[l]T}a^{[l-1](1)} & ... &w_2^{[l]T}a^{[l-1](m)}\\ \vdots & ...&\vdots\\ w_{n^{[l]}}^{[l]}a^{[l-1](1)} &...& w_{n^{[l]}}^{[l]}a^{[l-1](m)} \end{bmatrix} +\begin{bmatrix} b_1^{[l]} & ... & b_1^{[l]}\\ b_2^{[l]} & ... & b_2^{[l]}\\ \vdots & ... & \vdots\\ b_{n^{[l]}}^{[l]} & ... & b_{n^{[l]}}^{[l]} \end{bmatrix}\\ &=\begin{bmatrix} w_1^{[l]}a^{[l-1](1)}+b_1^{[l]} & ... & w_1^{[l]}a^{[l-1](m)}+b_1^{[l]}\\ w_2^{[l]}a^{[l-1](1)}+b_2^{[l]} & ... & w_2^{[l]}a^{[l-1](m)}+b_2^{[l]}\\ \vdots & ... & \vdots\\ w_{n^{[l]}}^{[l]}a^{[l-1](1)}+b_{n^{[l]}}^{[l]} & ... & w_{n^{[l]}}^{[l]}a^{[l-1](m)}+b_{n^{[l]}}^{[l]} \end{bmatrix}\\ &=\begin{bmatrix} z_1^{[l](1)} & ... & z_1^{[l](m)}\\ z_2^{[l](1)} & ... & z_1^{[l](m)}\\ \vdots & ... & \vdots\\ z_{n^{[l]}}^{[l](1)} & ... & z_{n^{[l]}}^{[l](m)} \end{bmatrix}\\ &=\begin{bmatrix} | & | & ... & | \\ z^{[l](1)} & z^{[l](2)} & ... & z^{[l](m)}\\ | & | & ... & | \end{bmatrix} \end{aligned}\\ Z[l]=W[l]A[l1]+b[l]= w1[l]Tw2[l]Twn[l]T[l] a[l1](1)a[l1](2)...a[l1](m) + b1[l]b2[l]bn[l][l] = w1[l]Ta[l1](1)w2[l]Ta[l1](1)wn[l][l]a[l1](1)............w1[l]Ta[l1](m)w2[l]Ta[l1](m)wn[l][l]a[l1](m) + b1[l]b2[l]bn[l][l]............b1[l]b2[l]bn[l][l] = w1[l]a[l1](1)+b1[l]w2[l]a[l1](1)+b2[l]wn[l][l]a[l1](1)+bn[l][l]............w1[l]a[l1](m)+b1[l]w2[l]a[l1](m)+b2[l]wn[l][l]a[l1](m)+bn[l][l] = z1[l](1)z2[l](1)zn[l][l](1)............z1[l](m)z1[l](m)zn[l][l](m) = z[l](1)z[l](2).........z[l](m)
A [ l ] = σ ( Z [ l ] ) = σ ( [ ∣ ∣ . . . ∣ z [ l ] ( 1 ) z [ l ] ( 2 ) . . . z [ l ] ( m ) ∣ ∣ . . . ∣ ] ) = [ ∣ ∣ . . . ∣ σ ( z [ l ] ( 1 ) ) σ ( z [ l ] ( 2 ) ) . . . σ ( z [ l ] ( m ) ) ∣ ∣ . . . ∣ ] = [ ∣ ∣ ∣ a [ l − 1 ] ( 1 ) a [ l − 1 ] ( 2 ) . . . a [ l − 1 ] ( m ) ∣ ∣ ∣ ] \begin{aligned} A^{[l]} &= \sigma (Z^{[l]}) \\ &= \sigma (\begin{bmatrix} | & | & ... & | \\ z^{[l](1)} & z^{[l](2)} & ... & z^{[l](m)}\\ | & | & ... & | \end{bmatrix}) \\ &=\begin{bmatrix} | & | & ... & | \\ \sigma(z^{[l](1)}) & \sigma(z^{[l](2)}) & ... & \sigma(z^{[l](m)})\\ | & | & ... & | \end{bmatrix}\\ &=\begin{bmatrix} | & | & & | \\ a^{[l-1](1)} & a^{[l-1](2)} & ... & a^{[l-1](m)} \\ | & | & & | \end{bmatrix} \end{aligned} A[l]=σ(Z[l])=σ( z[l](1)z[l](2).........z[l](m) )= σ(z[l](1))σ(z[l](2)).........σ(z[l](m)) = a[l1](1)a[l1](2)...a[l1](m)

整个神经网络的前向传播计算公式

A [ 0 ] = X = [ ∣ ∣ . . . ∣ x ( 1 ) x ( 2 ) . . . x ( m ) ∣ ∣ . . . ∣ ] Z [ 1 ] = W [ 1 ] A [ 0 ] + b [ 1 ] , A [ 1 ] = σ ( Z [ 1 ] ) Z [ 2 ] = W [ 2 ] A [ 1 ] + b [ 2 ] , A [ 2 ] = σ ( Z [ 2 ] ) ⋮ Z [ l ] = W [ l ] A [ l − 1 ] + b [ l ] , A [ l ] = σ ( Z [ l ] ) A^{[0]} = X = \begin{bmatrix} | & | & ... & | \\ x^{(1)} & x^{(2)} & ... & x^{(m)} \\ | & | & ... & | \end{bmatrix} \\ Z^{[1]} = W^{[1]}A^{[0]}+b^{[1]}, A^{[1]} = \sigma (Z^{[1]}) \\ Z^{[2]} = W^{[2]}A^{[1]}+b^{[2]}, A^{[2]} = \sigma (Z^{[2]}) \\ \vdots \\ Z^{[l]} = W^{[l]}A^{[l-1]}+b^{[l]}, A^{[l]} = \sigma (Z^{[l]}) A[0]=X= x(1)x(2).........x(m) Z[1]=W[1]A[0]+b[1],A[1]=σ(Z[1])Z[2]=W[2]A[1]+b[2],A[2]=σ(Z[2])Z[l]=W[l]A[l1]+b[l],A[l]=σ(Z[l])

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值