下标及其表示
Notation | Size x 1 x_{1} x1 | Number of bed rooms x 2 x_{2} x2 | Number of floors x 3 x_{3} x3 | Years x 4 x_{4} x4 | Price y y y |
---|---|---|---|---|---|
x ( 1 ) = 1 t h x^{(1)} =1^{th} x(1)=1th t r a i n i n g training training e x a m p l e example example | 2104 | 5 | 1 | 10 | 460 |
x ( 2 ) = 2 n d x^{(2)} =2^{nd} x(2)=2nd t r a i n i n g training training e x a m p l e example example | 1416 | 3( x 2 ( 2 ) x^{(2)}_{2} x2(2)) | 2 | 8 | 232 |
x ( 3 ) = 3 r d x^{(3)} =3^{rd} x(3)=3rd t r a i n i n g training training e x a m p l e example example | 1534 | 3 | 2 | 5 | 315 |
⋯ \cdots ⋯ | ⋯ \cdots ⋯ | ⋯ \cdots ⋯ | ⋯ \cdots ⋯ | ⋯ \cdots ⋯ | ⋯ \cdots ⋯ |
n n n = number of features = 4
x ( i ) x^{(i)} x(i) = input of i t h i^{th} ith trainning example, 第 i i i个训练数据, 4 × 1 4\times1 4×1 向量,定义成列向量
x ( 2 ) = ( 1416 3 2 8 ) x^{(2)} = \left( \begin{matrix} 1416 \\ 3 \\ 2 \\ 8 \\ \end{matrix} \right) x(2)=⎝⎜⎜⎛1416328⎠⎟⎟⎞
x j ( i ) x^{(i)}_{j} xj(i) = value of feature j j j in i t h i^{th} ith trainning example, 标量
多变量表示
h
θ
(
x
)
(
假
设
)
=
θ
0
+
θ
1
x
1
+
θ
2
x
2
+
⋯
h_{\theta}(x)(假设) = \theta_0 +\theta_1x_1 + \theta_2x_2 + \cdots
hθ(x)(假设)=θ0+θ1x1+θ2x2+⋯
定义:
x
0
=
1
x_0 = 1
x0=1
x
=
(
x
0
x
1
⋮
x
n
)
x = \left( \begin{matrix} x_0 \\ x_1 \\ \vdots \\ x_n\\ \end{matrix} \right)
x=⎝⎜⎜⎜⎛x0x1⋮xn⎠⎟⎟⎟⎞
θ
=
(
θ
0
θ
1
⋮
θ
n
)
\theta = \left( \begin{matrix} \theta_0 \\ \theta_1 \\ \vdots \\ \theta_n\\ \end{matrix} \right)
θ=⎝⎜⎜⎜⎛θ0θ1⋮θn⎠⎟⎟⎟⎞
h
θ
(
x
)
=
θ
T
⋅
x
=
(
θ
0
θ
1
⋯
θ
n
)
⋅
(
x
0
x
1
⋮
x
n
)
h_{\theta}(x) = \theta^T\cdot x = \left( \begin{matrix} \theta_0 & \theta_1 & \cdots & \theta_n \end{matrix} \right) \cdot \left( \begin{matrix} x_0 \\ x_1 \\ \vdots \\ x_n\\ \end{matrix} \right)
hθ(x)=θT⋅x=(θ0θ1⋯θn)⋅⎝⎜⎜⎜⎛x0x1⋮xn⎠⎟⎟⎟⎞
梯度下降
Hypothesis:
h
θ
(
x
)
=
θ
T
⋅
x
=
θ
0
x
0
+
θ
1
x
1
+
θ
2
x
2
+
⋯
h_{\theta}(x) = \theta^T\cdot x =\theta_0x_0 +\theta_1x_1 + \theta_2x_2 + \cdots
hθ(x)=θT⋅x=θ0x0+θ1x1+θ2x2+⋯
Parameters:
θ
\theta
θ which is a
(
n
+
1
)
×
1
(n+1) \times 1
(n+1)×1 vector
Cost function:
J
(
θ
)
=
1
2
m
∑
i
=
1
m
(
h
θ
(
x
(
i
)
)
−
y
i
)
2
J(\theta) = \frac{1}{2m}\sum_{i=1}^{m}{(h_{\theta}(x^{(i)})-y^{i})^2}
J(θ)=2m1i=1∑m(hθ(x(i))−yi)2
Gradient Descent:
Repeat:{
θ
j
:
=
θ
j
−
α
∂
∂
θ
j
J
(
θ
)
\theta_j:= \theta_j - \alpha\frac{\partial}{\partial \theta_j}J(\theta)
θj:=θj−α∂θj∂J(θ)
}
So,
for j=0:
∂
∂
θ
0
J
(
θ
)
=
1
m
∑
i
=
1
m
(
h
θ
(
x
(
i
)
)
−
y
i
)
∂
(
h
θ
(
x
(
i
)
)
−
y
i
)
∂
θ
0
\frac{\partial}{\partial \theta_0}J(\theta) = \frac{1}{m}\sum_{i=1}^{m}{(h_{\theta}(x^{(i)})-y^{i})}\frac{\partial(h_{\theta}(x^{(i)})-y^{i})}{\partial \theta_0}
∂θ0∂J(θ)=m1i=1∑m(hθ(x(i))−yi)∂θ0∂(hθ(x(i))−yi)
=
1
m
∑
i
=
1
m
(
h
θ
(
x
(
i
)
)
−
y
i
)
∂
(
θ
0
x
0
(
i
)
+
θ
1
x
1
(
i
)
+
θ
2
x
2
(
i
)
+
⋯
 
)
∂
θ
0
=\frac{1}{m}\sum_{i=1}^{m}{(h_{\theta}(x^{(i)})-y^{i})}\frac{\partial(\theta_0x_0^{(i)} +\theta_1x_1^{(i)} + \theta_2x_2^{(i)} + \cdots)}{\partial \theta_0}
=m1i=1∑m(hθ(x(i))−yi)∂θ0∂(θ0x0(i)+θ1x1(i)+θ2x2(i)+⋯)
=
1
m
∑
i
=
1
m
(
h
θ
(
x
(
i
)
)
−
y
i
)
x
0
(
i
)
=\frac{1}{m}\sum_{i=1}^{m}{(h_{\theta}(x^{(i)})-y^{i})}x_0^{(i)}
=m1i=1∑m(hθ(x(i))−yi)x0(i)
=
1
m
∑
i
=
1
m
(
h
θ
(
x
(
i
)
)
−
y
i
)
=\frac{1}{m}\sum_{i=1}^{m}{(h_{\theta}(x^{(i)})-y^{i})}
=m1i=1∑m(hθ(x(i))−yi)
for j=1:
∂
∂
θ
1
J
(
θ
)
=
1
m
∑
i
=
1
m
(
h
θ
(
x
(
i
)
)
−
y
i
)
∂
(
h
θ
(
x
(
i
)
)
−
y
i
)
∂
θ
1
\frac{\partial}{\partial \theta_1}J(\theta) = \frac{1}{m}\sum_{i=1}^{m}{(h_{\theta}(x^{(i)})-y^{i})}\frac{\partial(h_{\theta}(x^{(i)})-y^{i})}{\partial \theta_1}
∂θ1∂J(θ)=m1i=1∑m(hθ(x(i))−yi)∂θ1∂(hθ(x(i))−yi)
=
1
m
∑
i
=
1
m
(
h
θ
(
x
(
i
)
)
−
y
i
)
∂
(
θ
0
x
0
(
i
)
+
θ
1
x
1
(
i
)
+
θ
2
x
2
(
i
)
+
⋯
 
)
∂
θ
1
=\frac{1}{m}\sum_{i=1}^{m}{(h_{\theta}(x^{(i)})-y^{i})}\frac{\partial(\theta_0x_0^{(i)} +\theta_1x_1^{(i)} + \theta_2x_2^{(i)} + \cdots)}{\partial \theta_1}
=m1i=1∑m(hθ(x(i))−yi)∂θ1∂(θ0x0(i)+θ1x1(i)+θ2x2(i)+⋯)
=
1
m
∑
i
=
1
m
(
h
θ
(
x
(
i
)
)
−
y
i
)
x
1
(
i
)
=\frac{1}{m}\sum_{i=1}^{m}{(h_{\theta}(x^{(i)})-y^{i})}x_1^{(i)}
=m1i=1∑m(hθ(x(i))−yi)x1(i)
So:
Repeat:{
θ
j
:
=
θ
j
−
α
1
m
∑
i
=
1
m
(
h
θ
(
x
(
i
)
)
−
y
i
)
x
j
(
i
)
\theta_j:= \theta_j - \alpha\frac{1}{m}\sum_{i=1}^{m}{(h_{\theta}(x^{(i)})-y^{i})}x_j^{(i)}
θj:=θj−αm1i=1∑m(hθ(x(i))−yi)xj(i)
}