non-linear hypotheses非线性假设
to understand why we need non-linear hypotheses (非线性假设),
Neurons and the brain
Neural Networks are a pretty old algorithm that was originally motivated by the goal of having machines that can mimic the brain.
how Neural Networks might pertain to that?
origins :algorithms that try to mimic the brain.
Recent resurgence(东山再起):State-of-the-art technique for many applications
Model representation I
an artificial neuron with a sigmoid or a logistic activation function or a logistic activation function
g ( z ) = 1 / ( 1 + e ( − z ) ) g(z)=1/(1+e^{(-z)}) g(z)=1/(1+e(−z))
Model representation II
How to actually carry out that computation efficiently?And vectorized implementation
Forward propagation : Vectorized implementation:
a
1
(
2
)
=
g
(
Θ
10
(
1
)
x
0
+
Θ
11
(
1
)
x
1
+
Θ
12
(
1
)
x
2
+
Θ
13
(
1
)
x
3
)
{a_1^{(2)}}=g(\Theta_{10}^{(1)}x_0+\Theta_{11}^{(1)}x_1+\Theta_{12}^{(1)}x_2+\Theta_{13}^{(1)}x_3)
a1(2)=g(Θ10(1)x0+Θ11(1)x1+Θ12(1)x2+Θ13(1)x3)
a
1
(
2
)
=
g
(
Θ
20
(
1
)
x
0
+
Θ
21
(
1
)
x
1
+
Θ
22
(
1
)
x
2
+
Θ
23
(
1
)
x
3
)
{a_1^{(2)}}=g(\Theta_{20}^{(1)}x_0+\Theta_{21}^{(1)}x_1+\Theta_{22}^{(1)}x_2+\Theta_{23}^{(1)}x_3)
a1(2)=g(Θ20(1)x0+Θ21(1)x1+Θ22(1)x2+Θ23(1)x3)
a
1
(
2
)
=
g
(
Θ
30
(
1
)
x
0
+
Θ
31
(
1
)
x
1
+
Θ
32
(
1
)
x
2
+
Θ
33
(
1
)
x
3
)
{a_1^{(2)}}=g(\Theta_{30}^{(1)}x_0+\Theta_{31}^{(1)}x_1+\Theta_{32}^{(1)}x_2+\Theta_{33}^{(1)}x_3)
a1(2)=g(Θ30(1)x0+Θ31(1)x1+Θ32(1)x2+Θ33(1)x3)
h Θ ( x ) = g ( Θ 10 ( 2 ) a 0 ( 2 ) + Θ 11 ( 2 ) a 1 ( 2 ) + Θ 12 ( 2 ) a 2 ( 2 ) + Θ 13 ( 2 ) a 0 ( 3 ) ) h_{\Theta}(x)=g(\Theta_{10}^{(2)}a_0^{(2)}+\Theta_{11}^{(2)}a_1^{(2)}+\Theta_{12}^{(2)}a_2^{(2)}+\Theta_{13}^{(2)}a_0^{(3)}) hΘ(x)=g(Θ10(2)a0(2)+Θ11(2)a1(2)+Θ12(2)a2(2)+Θ13(2)a0(3))
x
=
[
x
0
x
1
x
2
x
3
]
x= \left[ \begin{matrix} x_0\\ x_1\\ x_2\\ x_3 \end{matrix} \right]
x=⎣⎢⎢⎡x0x1x2x3⎦⎥⎥⎤
z
(
2
)
[
z
1
(
2
)
z
2
(
2
)
z
3
(
2
)
]
z_{(2)} \left[ \begin{matrix} z_1^{(2)}\\ z_2^{(2)}\\ z_3^{(2)} \end{matrix} \right]
z(2)⎣⎢⎡z1(2)z2(2)z3(2)⎦⎥⎤
z
(
2
)
=
Θ
(
1
)
x
z_{(2)}=\Theta^{(1)} x
z(2)=Θ(1)x
a
(
2
)
=
g
(
z
(
2
)
)
a_{(2)}=g(z^{(2)})
a(2)=g(z(2))
z
(
2
)
z_{(2)}
z(2) and
a
(
2
)
a_{(2)}
a(2) are three-dimensional vector
forward propagation(前向传播)
Add %a_0^{(2)}=1%
z
(
3
)
=
Θ
(
2
)
a
(
2
)
z^{(3)}={\Theta^{(2)}a^{(2)}}
z(3)=Θ(2)a(2)
h
Θ
(
x
)
=
a
(
3
)
=
g
(
z
(
3
)
)
h_{\Theta}(x)=a^{(3)}=g(z^{(3)})
hΘ(x)=a(3)=g(z(3))
(a vectorized implementation of this procedure)
we start off with the activations of the input-units,and then we sort of forward-propagate that to hidden layer and compute the activation(激活项)of the hidden layer(隐藏层),but this process of computing the activations from the hidden from the input,then the hidden,then the output layer,that,s also called forward propagation.
what Neural Networks might be doing and ehy thay might help us to learning interesting non-linear hypotheses(what this neural network is just like logistic regression,except that rather than using the original features x_1,x_2,x_3 is using these new features a_1,a_2,a_3)