文章目录
线性回归和梯度下降的形式表达
线性回归:
假设函数
h
θ
(
x
)
=
θ
0
+
θ
1
x
(1)
h_\theta (x)=\theta_0+\theta_1x \tag1
hθ(x)=θ0+θ1x(1)
代价函数
J
(
θ
)
=
1
2
m
∑
i
=
1
m
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
2
(2)
J(\theta)=\frac {1} {2m} \sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})^2 \tag2
J(θ)=2m1i=1∑m(hθ(x(i))−y(i))2(2)
梯度下降:
repeat until convergence
{
θ
j
=
θ
j
−
α
∂
J
(
θ
0
,
θ
1
)
∂
θ
j
(
f
o
r
j
=
0
a
n
d
j
=
1
)
}
(3)
\begin{aligned} &\text{ repeat\ until\ convergence}\{\\ &\qquad \theta_j = \theta_j - \alpha \frac {\partial J(\theta_0,\theta_1)}{\partial\theta_j}\qquad (for\ j =0\ and\ j=1)\\ &\} \end{aligned} \tag3
repeat until convergence{θj=θj−α∂θj∂J(θ0,θ1)(for j=0 and j=1)}(3)
应用计算
首先先计算代价函数对于两个参数
θ
0
,
θ
1
\theta_0,\theta_1
θ0,θ1的偏导数:
∂
∂
θ
j
J
(
θ
0
,
θ
1
)
=
∂
∂
θ
j
1
2
m
∑
i
=
1
m
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
2
=
∂
∂
θ
j
1
2
m
∑
i
=
1
m
(
θ
0
+
θ
1
x
(
i
)
−
y
(
i
)
)
2
(4)
\begin{aligned} \frac {\partial }{\partial \theta_j}J(\theta_0,\theta_1) &=\frac {\partial }{\partial \theta_j} \frac {1}{2m} \sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})^2 \\ &=\frac {\partial }{\partial \theta_j} \frac {1}{2m} \sum_{i=1}^m(\theta_0+\theta_1x^{(i)}-y^{(i)})^2 \end{aligned} \tag4
∂θj∂J(θ0,θ1)=∂θj∂2m1i=1∑m(hθ(x(i))−y(i))2=∂θj∂2m1i=1∑m(θ0+θ1x(i)−y(i))2(4)
代价函数对于参数
θ
0
\theta_0
θ0的偏导数:
∂
∂
θ
0
J
(
θ
0
,
θ
1
)
=
1
m
∑
i
=
1
m
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
(5)
\frac {\partial }{\partial \theta_0}J(\theta_0,\theta_1) = \frac {1}{m} \sum_{i=1}^m (h_\theta(x^{(i)})-y^{(i)}) \tag5
∂θ0∂J(θ0,θ1)=m1i=1∑m(hθ(x(i))−y(i))(5)
代价函数对于参数
θ
1
\theta_1
θ1的偏导数:
∂
∂
θ
1
J
(
θ
0
,
θ
1
)
=
1
m
∑
i
=
1
m
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
⋅
x
(
i
)
(6)
\frac {\partial }{\partial \theta_1}J(\theta_0,\theta_1) = \frac {1}{m} \sum_{i=1}^m (h_\theta(x^{(i)})-y^{(i)})\cdot x^{(i)} \tag6
∂θ1∂J(θ0,θ1)=m1i=1∑m(hθ(x(i))−y(i))⋅x(i)(6)
将其带回到梯度下降算法中去:
repeat until convergence
{
θ
0
=
θ
0
−
α
1
m
∑
i
=
1
m
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
θ
1
=
θ
1
−
α
1
m
∑
i
=
1
m
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
⋅
x
(
i
)
}
(3)
\begin{aligned} &\text{ repeat\ until\ convergence}\{\\ &\qquad \theta_0 = \theta_0 - \alpha \frac {1}{m} \sum_{i=1}^m (h_\theta(x^{(i)})-y^{(i)})\\ &\qquad \theta_1 = \theta_1 - \alpha \frac {1}{m} \sum_{i=1}^m (h_\theta(x^{(i)})-y^{(i)}) \cdot x^{(i)} \\ &\} \end{aligned} \tag3
repeat until convergence{θ0=θ0−αm1i=1∑m(hθ(x(i))−y(i))θ1=θ1−αm1i=1∑m(hθ(x(i))−y(i))⋅x(i)}(3)
对于线性回归的代价函数来说,他总是这样的弓形函数(bow-shaped function),专业名称叫做凸函数(convex function),对于该函数来说,不论初始点在哪里,最后都可以保证收敛到同一个全局最优点,因为该函数只有一个全局最优点,无局部最优点
接下来看看如何一步步优化到全局最优点
首先我们从 θ 0 = 900 , θ 1 = − 0.1 \theta_0=900,\theta_1=-0.1 θ0=900,θ1=−0.1开始,此时假设函数为 h ( x ) = − 900 − 0.1 x h(x)=-900-0.1x h(x)=−900−0.1x:
对其应用一次后梯度下降算法后,可以看到我们的假设函数发生了一点变化
然后不断应用梯度下降算法,直到我们下降到收敛点(中心的点),其路线如右图所示: