ML Notes: Week 4 - Neural Networks: Representation

1. Model representation

1.1 Neural network model

在这里插入图片描述The typical neuron has input wires which called the dendrites and also has an output wire called an Axon. The nucleus is considered as the computational unit. We could simplify the model as follows:
在这里插入图片描述
Terms: a neuron or an artificial neuron with a sigmoid or logistic activation function

1.2 Some notations in the neural networks

在这里插入图片描述
layer 1: Input layer
layer 2: Hidden layer (the rest layers all could be called hidden layer)
layer 3: Output layer


  • x , Θ x, \Theta x,Θ are parameter vectors. In addition, the Θ \Theta Θ also is called as weights.
  • Θ i n ( j ) \Theta_{in}^{(j)} Θin(j) = matrix of weights mapping from layer j j j in layer j + 1 j+1 j+1. If a network has s j s_j sj units in layer j j j and has s j + 1 s_{j+1} sj+1 units in layer j + 1 j+1 j+1, then Θ ( j ) = s j + 1 ∗ ( s j + 1 ) \Theta^{(j)} = s_{j+1}*(s_j+1) Θ(j)=sj+1(sj+1).
  • a i ( j ) a_i^{(j)} ai(j) = “activation” of unit i i i in layer j j j.

a 1 ( 2 ) = g ( Θ 10 ( 1 ) x 0 + Θ 11 ( 1 ) x 1 + Θ 12 ( 1 ) x 2 + Θ 13 ( 1 ) x 3 ) a 2 ( 2 ) = g ( Θ 20 ( 1 ) x 0 + Θ 21 ( 1 ) x 1 + Θ 22 ( 1 ) x 2 + Θ 23 ( 1 ) x 3 ) a 3 ( 2 ) = g ( Θ 30 ( 1 ) x 0 + Θ 31 ( 1 ) x 1 + Θ 32 ( 1 ) x 2 + Θ 33 ( 1 ) x 3 ) h Θ ( x ) = a 1 ( 3 ) = g ( Θ 10 ( 2 ) a 0 ( 2 ) + Θ 11 ( 2 ) a 1 ( 2 ) + Θ 12 ( 2 ) a 2 ( 2 ) + Θ 13 ( 2 ) a 3 ( 2 ) ) \begin{aligned} a_1^{(2)}& = g(\Theta_{10}^{(1)}x_0 + \Theta_{11}^{(1)}x_1 + \Theta_{12}^{(1)}x_2 + \Theta_{13}^{(1)}x_3) \\ a_2^{(2)} &= g(\Theta_{20}^{(1)}x_0 + \Theta_{21}^{(1)}x_1 + \Theta_{22}^{(1)}x_2 + \Theta_{23}^{(1)}x_3)\\ a_3^{(2)} &= g(\Theta_{30}^{(1)}x_0 + \Theta_{31}^{(1)}x_1 + \Theta_{32}^{(1)}x_2 + \Theta_{33}^{(1)}x_3) \\ \newline h_\Theta(x) = a_1^{(3)} &= g(\Theta_{10}^{(2)}a_0^{(2)} + \Theta_{11}^{(2)}a_1^{(2)} + \Theta_{12}^{(2)}a_2^{(2)} + \Theta_{13}^{(2)}a_3^{(2)}) \newline \end{aligned} a1(2)a2(2)a3(2)hΘ(x)=a1(3)=g(Θ10(1)x0+Θ11(1)x1+Θ12(1)x2+Θ13(1)x3)=g(Θ20(1)x0+Θ21(1)x1+Θ22(1)x2+Θ23(1)x3)=g(Θ30(1)x0+Θ31(1)x1+Θ32(1)x2+Θ33(1)x3)=g(Θ10(2)a0(2)+Θ11(2)a1(2)+Θ12(2)a2(2)+Θ13(2)a3(2))* g g g is sigmoid/logistic activation function.

1.3 Forward propagation nueral network

The process of computing the activations, shown in the above figure, from the input then the hidden then the output layer, and that’s also called forward propagation

Now, we will vectorize the model. We difine
z 1 ( 2 ) = Θ 10 ( 1 ) x 0 + Θ 11 ( 1 ) x 1 + Θ 12 ( 1 ) x 2 + Θ 13 ( 1 ) x 3 z 2 ( 2 ) = Θ 20 ( 1 ) x 0 + Θ 21 ( 1 ) x 1 + Θ 22 ( 1 ) x 2 + Θ 23 ( 1 ) x 3 z 3 ( 2 ) = Θ 30 ( 1 ) x 0 + Θ 31 ( 1 ) x 1 + Θ 32 ( 1 ) x 2 + Θ 33 ( 1 ) x 3 \begin{aligned} z_1^{(2)}&=\Theta_{10}^{(1)}x_0 + \Theta_{11}^{(1)}x_1 + \Theta_{12}^{(1)}x_2 + \Theta_{13}^{(1)}x_3 \\ z_2^{(2)}&=\Theta_{20}^{(1)}x_0 + \Theta_{21}^{(1)}x_1 + \Theta_{22}^{(1)}x_2 + \Theta_{23}^{(1)}x_3\\ z_3^{(2)}&=\Theta_{30}^{(1)}x_0 + \Theta_{31}^{(1)}x_1 + \Theta_{32}^{(1)}x_2 + \Theta_{33}^{(1)}x_3 \end{aligned} z1(2)z2(2)z3(2)=Θ10(1)x0+Θ11(1)x1+Θ12(1)x2+Θ13(1)x3=Θ20(1)x0+Θ21(1)x1+Θ22(1)x2+Θ23(1)x3=Θ30(1)x0+Θ31(1)x1+Θ32(1)x2+Θ33(1)x3
we can rewrite it as z ( 2 ) = [ z 1 ( 2 ) z 1 ( 2 ) z 1 ( 2 ) ] T = Θ ( 1 ) x z^{(2)} =[z_1^{(2)} z_1^{(2)} z_1^{(2)}]^T= \Theta^{(1)} x z(2)=[z1(2)z1(2)z1(2)]T=Θ(1)x. If we treat x x x as a ( 1 ) a^{(1)} a(1), so z ( 2 ) = Θ ( 1 ) a ( 1 ) z^{(2)} = \Theta^{(1)} a^{(1)} z(2)=Θ(1)a(1) .
That is z ( j + 1 ) = Θ ( j ) a ( j ) z^{(j+1)} = \Theta^{(j)} a^{(j)} z(j+1)=Θ(j)a(j)

And a 1 ( 2 ) = g ( z 1 ( 2 ) ) , a 2 ( 2 ) = g ( z 2 ( 2 ) ) , a 3 ( 2 ) = g ( z 3 ( 2 ) ) a_1^{(2)} = g(z_1^{(2)}), a_2^{(2)} = g(z_2^{(2)}), a_3^{(2)} = g(z_3^{(2)}) a1(2)=g(z1(2)),a2(2)=g(z2(2)),a3(2)=g(z3(2)) could be written as a ( 2 ) = g ( z ( 2 ) ) a^{(2)} =g( z^{(2)}) a(2)=g(z(2)) .

For the above neural network model, if we take the input layer away, the model just like logistic function.
在这里插入图片描述
Logistic function: h θ ( x ) = g ( θ 0 + θ 1 x + θ 2 x 2 ) h_\theta(x) = g(\theta_0+\theta_1x+\theta_2x_2) hθ(x)=g(θ0+θ1x+θ2x2)
The simplified neural network model: h Θ ( x ) = g ( Θ 10 ( 2 ) a 0 ( 2 ) + Θ 11 ( 2 ) a 1 ( 2 ) + Θ 12 ( 2 ) a 2 ( 2 ) + Θ 13 ( 2 ) a 3 ( 2 ) ) h_\Theta(x)=g(\Theta_{10}^{(2)}a_0^{(2)} + \Theta_{11}^{(2)}a_1^{(2)} + \Theta_{12}^{(2)}a_2^{(2)} + \Theta_{13}^{(2)}a_3^{(2)} ) hΘ(x)=g(Θ10(2)a0(2)+Θ11(2)a1(2)+Θ12(2)a2(2)+Θ13(2)a3(2))

1.4 Other network architectures

在这里插入图片描述

2. How to compute a complex nonlinear function?

x 1 , x 2 ∈ { 0 , 1 } x_1,x_2 \in\{0,1\} x1,x2{0,1}

2.1 AND

y = x 1 y = x_1 y=x1 AND x 2 x_2 x2
在这里插入图片描述
Θ ( 1 ) = [ − 30 20 20 ] \Theta^{(1)} =\begin{bmatrix}-30 & 20 & 20 \end{bmatrix} Θ(1)=[302020]

2.2 OR

y = x 1 y = x_1 y=x1 OR x 2 x_2 x2
在这里插入图片描述
Θ ( 1 ) = [ − 10 20 20 ] \Theta^{(1)} =\begin{bmatrix}-10 & 20 & 20 \end{bmatrix} Θ(1)=[102020]

2.3 NOT

y = y = y= NOT x 1 x_1 x1
在这里插入图片描述
Θ ( 1 ) = [ 10 − 20 ] \Theta^{(1)} =\begin{bmatrix}10 & -20 \end{bmatrix} Θ(1)=[1020]

2.4 (NOT x 1 x1 x1) AND (NOT x 2 x_2 x2)

在这里插入图片描述
Θ ( 1 ) = [ 10 − 20 − 20 ] \Theta^{(1)} =\begin{bmatrix}10 & -20 & -20\end{bmatrix} Θ(1)=[102020]

2.5 XNOR

y = ( x 1 y = (x_1 y=(x1 AND x 2 ) x_2) x2) OR ( ( ((NOT x 1 x1 x1) AND (NOT x 2 x_2 x2) ) ) )
在这里插入图片描述

* we are able to put pieces together to generate some new functions.


3. Multi-class Classification

The ouput y i y_i yi will be [ 1 0 0 0 ] \begin{bmatrix} 1 \\ 0\\ 0\\ 0\end{bmatrix} 1000 , [ 0 1 0 0 ] \begin{bmatrix} 0 \\ 1\\ 0\\ 0\end{bmatrix} 0100 , [ 0 0 1 0 ] \begin{bmatrix} 0 \\ 0\\ 1\\ 0\end{bmatrix} 0010 , [ 0 0 0 1 ] \begin{bmatrix} 0 \\ 0\\ 0\\ 1\end{bmatrix} 0001 depending on what the corresponding input X i X_i Xi is. And in this way, we could implement the multi-class Classification.

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值