梯度下降法
一、定义
梯度下降法是用来计算函数最小值的。它的思路很简单,就相当于在一个山顶放了一个小球,一松手小球就会顺着山坡最陡峭的地方滚落到谷底。
二、目的
凸函数图像看上去就像上面的山谷,如果运用梯度下降法的话,就可以通过一步步的滚动最终来到谷底,也就是找到了函数的最小值。
Have some function
J
(
θ
0
,
θ
1
)
J(\theta_{0},\theta_{1})
J(θ0,θ1)
want
min
θ
0
,
θ
1
J
(
θ
0
,
θ
1
)
\min\limits_{\theta_0, \theta_{1}}J(\theta_{0}, \theta_{1})
θ0,θ1minJ(θ0,θ1)
初始化
θ
0
,
θ
1
\theta_0, \theta_{1}
θ0,θ1
不断改变这两个值的大小,直到目标函数到达一个全局最小值,或局部极小值。
基本迭代式:
x
k
+
1
=
x
k
+
t
k
p
k
x_{k+1} = x_{k}+t_{k}p_{k}
xk+1=xk+tkpk
三、原理
我们总是考虑从点xk出发沿哪一个方向pk,使目标函数f下降得最快。由微积分的知识可以得到,点xk的负梯度方向
p
k
=
−
∇
f
(
x
k
)
p_{k} = -\nabla f(x_{k})
pk=−∇f(xk)
是从点xk出发使得f下降最快的方向。为此,称负梯度方向
∇
f
(
x
k
)
\nabla f(x_{k})
∇f(xk)
为f在点xk处的最速下降方向。
四、步骤
-
选取初始数据。选取初始点x0,给定终止误差,令k=0
-
求梯度向量。计算 ∇ f ( x k ) \nabla f(x_{k}) ∇f(xk),若
∣ ∣ ∇ f ( x k ) ∣ ∣ ≤ ε \mid\mid\nabla f(x_{k})\mid\mid\ \leq \varepsilon ∣∣∇f(xk)∣∣ ≤ε
停止迭代,输出xk。否则到3计算。 -
构造负梯度方向。取
p k = − ∇ f ( x k ) p_{k} = -\nabla f(x_{k}) pk=−∇f(xk)
- 进行一维搜索。求tk,使得
f ( x k + t k p k ) = min t ≥ 0 f ( x k + t p k ) f(x_{k} + t_{k}p_{k}) = \min\limits_{t\geq0}f(x_{k}+tp_{k}) f(xk+tkpk)=t≥0minf(xk+tpk)
令
x k + 1 = x k + t k p k , k : = k + 1 x_{k+1} = x_{k} + t_{k}p_{k}, k:= k+1 xk+1=xk+tkpk,k:=k+1
跳转到2。
五、示例
以梯度下降法求解无约束非线性规划问题
m
i
n
f
(
x
)
=
x
1
2
+
25
x
2
2
min f(x) = x_{1}^{2} + 25x_{2}^{2}
minf(x)=x12+25x22
其中
x
=
(
x
1
,
x
2
)
T
x = (x_{1}, x_{2})^{T}
x=(x1,x2)T
要求选取初始点
x
0
=
(
2
,
2
)
T
x_{0} = (2,2)^{T}
x0=(2,2)T
解:
∇
f
(
x
0
)
=
(
2
x
1
,
50
x
2
)
∣
x
0
=
2
,
x
0
=
2
=
(
4
,
100
)
\nabla f(x_{0}) = (2x_1, 50x_2)\bigg|_{x_0=2,x_0=2} = (4, 100)
∇f(x0)=(2x1,50x2)∣
∣x0=2,x0=2=(4,100)
x
1
=
x
0
−
η
∇
f
(
x
0
)
=
(
2
,
2
)
−
0.01
(
4
,
100
)
=
(
1.96
,
1
)
x_1 = x_0 -\eta\nabla f(x_0)=(2, 2) - 0.01(4, 100) = (1.96,1)
x1=x0−η∇f(x0)=(2,2)−0.01(4,100)=(1.96,1)
∇
f
(
x
1
)
=
(
2
x
1
,
50
x
2
)
∣
x
1
=
1.96
,
x
2
=
1
=
(
3.92
,
50
)
\nabla f(x_{1}) = (2x_1, 50x_2)\bigg|_{x_1=1.96,x_2=1} = (3.92, 50)
∇f(x1)=(2x1,50x2)∣
∣x1=1.96,x2=1=(3.92,50)
x
2
=
x
1
−
η
∇
f
(
x
1
)
=
(
1.96
,
1
)
−
0.01
(
3.92
,
50
)
=
(
1.9208
,
0.5
)
x_2 = x_1 - \eta\nabla f(x_1) = (1.96, 1) - 0.01(3.92, 50) = (1.9208, 0.5)
x2=x1−η∇f(x1)=(1.96,1)−0.01(3.92,50)=(1.9208,0.5)
MATLAB代码
clear all, close all, clc
x = [2;2];
[f0, g] = detaf(x);
while norm(g) > 0.000001
p = -g/norm(g);
t = 1.0;
f = detaf(x+t*p);
while f > f0
t = t/2;
f = detaf(x+t*p);
end
x = x + t*p;
[f0, g] = detaf(x);
end
x, f0
detaf函数
function [f, df] = detaf(x)
f = x(1)^2 + 25*x(2)^2;
df = [2*x(1); 50*x(2)];
end
Python代码
def detaf(x):
f = x[0]**2 + 25*x[1]**2
df = [2*x[0],50*x[1]]
return f, df
x = [2, 2];
f0, g = detaf(x)
while np.linalg.norm(g) > 0.000001:
p = -np.array(g)/np.linalg.norm(g)
t = 1
f = (detaf(x+t*p))[0]
while f > f0:
t = t/2
f = detaf(x+t*p)[0]
#print(f)
x = x+t*p
f0, g = detaf(x)
x, f0
(x+t*p))[0]
while f > f0:
t = t/2
f = detaf(x+t*p)[0]
#print(f)
x = x+t*p
f0, g = detaf(x)
x, f0