《吴恩达机器学习》笔记——9 神经网络学习
1 模型
神经层 |
---|
输入层 |
隐藏层 |
输出层 |
符号 | 描述 |
---|---|
a i ( j ) a^{(j)}_i ai(j) | 第 j j j层第 i i i个神经元或单元的激活项(由一个具体神经元计算并输出的值) |
Θ ( j ) \Theta^{(j)} Θ(j) | 权重矩阵(控制从第 j j j层到第 j + 1 j+1 j+1层的映射) |
符号(例) | 表达 |
---|---|
a 1 ( 2 ) a^{(2)}_1 a1(2) | g ( Θ 10 ( 1 ) x 0 + Θ 11 ( 1 ) x 1 + Θ 12 ( 1 ) x 2 + Θ 13 ( 1 ) x 3 ) = g ( z 1 ( 2 ) ) g(\Theta_{10}^{(1)}x_0+\Theta^{(1)}_{11}x_1+\Theta^{(1)}_{12}x_2+\Theta^{(1)}_{13}x_3)\\=g(z^{(2)}_1) g(Θ10(1)x0+Θ11(1)x1+Θ12(1)x2+Θ13(1)x3)=g(z1(2)) |
a 2 ( 2 ) a^{(2)}_2 a2(2) | g ( Θ 20 ( 1 ) x 0 + Θ 21 ( 1 ) x 1 + Θ 22 ( 1 ) x 2 + Θ 23 ( 1 ) x 3 ) = g ( z 2 ( 2 ) ) g(\Theta_{20}^{(1)}x_0+\Theta^{(1)}_{21}x_1+\Theta^{(1)}_{22}x_2+\Theta^{(1)}_{23}x_3)\\=g(z_2^{(2)}) g(Θ20(1)x0+Θ21(1)x1+Θ22(1)x2+Θ23(1)x3)=g(z2(2)) |
a 3 ( 2 ) a^{(2)}_3 a3(2) | g ( Θ 30 ( 1 ) x 0 + Θ 31 ( 1 ) x 1 + Θ 32 ( 1 ) x 2 + Θ 33 ( 1 ) x 3 ) = g ( z 3 ( 2 ) ) g(\Theta_{30}^{(1)}x_0+\Theta^{(1)}_{31}x_1+\Theta^{(1)}_{32}x_2+\Theta^{(1)}_{33}x_3)\\=g(z_3^{(2)}) g(Θ30(1)x0+Θ31(1)x1+Θ32(1)x2+Θ33(1)x3)=g(z3(2)) |
h Θ ( x ) h_\Theta(x) hΘ(x) | a 1 ( 3 ) = g ( Θ 10 ( 2 ) a 0 ( 2 ) + Θ 11 ( 2 ) a 1 ( 2 ) + Θ 12 ( 2 ) a 2 ( 2 ) + Θ 13 ( 2 ) a 3 ( 2 ) ) = g ( z 1 ( 3 ) ) a_1^{(3)}=g(\Theta_{10}^{(2)}a_0^{(2)}+\Theta^{(2)}_{11}a_1^{(2)}+\Theta_{12}^{(2)}a_2^{(2)}+\Theta_{13}^{(2)}a_3^{(2)})\\=g(z^{(3)}_1) a1(3)=g(Θ10(2)a0(2)+Θ11(2)a1(2)+Θ12(2)a2(2)+Θ13(2)a3(2))=g(z1(3)) |
参数(例) | 向量/矩阵 |
---|---|
x = a ( 1 ) x=a^{(1)} x=a(1) | [ x 0 x 1 x 2 x 3 ] , x 0 = 1 \left[\begin{matrix}x_0\\x_1\\x_2\\x_3\end{matrix}\right],\quad x_0=1 ⎣⎢⎢⎡x0x1x2x3⎦⎥⎥⎤,x0=1 |
Θ ( 1 ) \Theta^{(1)} Θ(1) | [ Θ 10 ( 1 ) Θ 11 ( 1 ) Θ 12 ( 1 ) Θ 13 ( 1 ) ] \left[\begin{matrix}\Theta^{(1)}_{10}&\Theta^{(1)}_{11}&\Theta^{(1)}_{12}&\Theta^{(1)}_{13}\end{matrix}\right] [Θ10(1)Θ11(1)Θ12(1)Θ13(1)] |
z ( 2 ) z^{(2)} z(2) | [ z 1 ( 2 ) z 2 ( 2 ) z 3 ( 2 ) ] = Θ ( 1 ) x = Θ ( 1 ) a ( 1 ) \left[\begin{matrix}z_1^{(2)}\\z_2^{(2)}\\z_3^{(2)}\end{matrix}\right]=\Theta^{(1)}x=\Theta^{(1)}a^{(1)} ⎣⎢⎡z1(2)z2(2)z3(2)⎦⎥⎤=Θ(1)x=Θ(1)a(1) |
a ( 2 ) a^{(2)} a(2) | g ( z ( 2 ) ) ∈ R 3 g(z^{(2)})\in\mathbb{R}^3 g(z(2))∈R3 |
添加 a 0 ( 2 ) = 1 a^{(2)}_0=1 a0(2)=1 | a ( 2 ) ∈ R 4 a^{(2)}\in\mathbb{R}^4 a(2)∈R4 |
z ( 3 ) z^{(3)} z(3) | Θ ( 2 ) a ( 2 ) \Theta^{(2)}a^{(2)} Θ(2)a(2) |
h Θ ( x ) h_\Theta(x) hΘ(x) | a ( 3 ) = g ( z ( 3 ) ) ∈ R a^{(3)}=g(z^{(3)})\in\mathbb{R} a(3)=g(z(3))∈R |
计算过程 | 前向传播 |
2 例子与直觉理解
例子 ( x 0 = 1 x_0=1 x0=1) ( x 1 , x 2 ∈ { 0 , 1 } x_1,x_2\in\{0,1\} x1,x2∈{0,1}) | x 1 A N D x 2 x_1\;AND\;x_2 x1ANDx2 | x 1 O R x 2 x_1\;OR\;x_2 x1ORx2 | N O T x 1 NOT\;x_1 NOTx1 |
---|---|---|---|
Θ ( 1 ) \Theta^{(1)} Θ(1) | [ − 30 20 20 ] \left[\begin{matrix}-30&20&20\end{matrix}\right] [−302020] | [ − 10 20 20 ] \left[\begin{matrix}-10&20&20\end{matrix}\right] [−102020] | [ 10 − 20 ] \left[\begin{matrix}10&-20\end{matrix}\right] [10−20] |
x 1 x 2 x_1\\x_2 x1x2 | 0 0 1 1 0 1 0 1 | 0 0 1 1 0 1 0 1 | 0 1 - |
h Θ ( x ) ≈ h_\Theta(x)\approx hΘ(x)≈ | 0 0 0 1 | 0 1 1 1 | 1 0 |
例子 | x 1 X O R x 2 x_1\;XOR\;x_2 x1XORx2 |
---|---|
Θ ( 1 ) \Theta^{(1)} Θ(1) | [ − 30 20 20 10 − 20 − 20 ] \left[\begin{matrix}-30&20&20\\10&-20&-20\end{matrix}\right] [−301020−2020−20] |
Θ ( 2 ) \Theta^{(2)} Θ(2) | [ − 10 20 20 ] \left[\begin{matrix}-10&20&20\end{matrix}\right] [−102020] |
x 1 x 2 x_1\\x_2 x1x2 | 0 0 1 1 0 1 0 1 |
h Θ ( x ) ≈ h_\Theta(x)\approx hΘ(x)≈ | 1 0 0 0 |