码字不易,转载请注明出处
http://xlindo.com
https://zhuanlan.zhihu.com/p/52692238
优化中常用两种梯度方法,普通梯度法(我们老师叫的Einfaches Gradienten-Verfahren)以及共轭梯度法(Conjugate Gradients)。
据我的理解,应该是用普通梯度法来阐述梯度下降思想,用共轭梯度法作引申。
1 问题提出
2 梯度法
2.1 梯度下降法 simple gradient method
梯度下降法的思想很简单,每次下降方向选择梯度反方向,这样能保证每次值都在减小。
但问题是“锯齿”现象,所以效果很差,实际情况中一般不用。
x_k = 6 # The algorithm starts at x=6
alpha = 0.01 # step size multiplier
precision = 0.00001 # end mark
previous_step_size = 1
max_iters = 10000 # maximum number of iterations
iters = 0 #iteration counter
df = lambda x: 4 * x**3 - 9 * x**2
while previous_step_size > precision and iters < max_iters:
prev_x_k = x_k
g_k = df(prev_x_k)
d_k = -g_k
x_k += alpha * d_k
previous_step_size = abs(x_k - prev_x_k)
iters+=1
print("The local minimum occurs at", x_k)
print("The number of iterations is", iters)
The local minimum occurs at 2.2499646074278457
The number of iterations is 70
2.2 共轭梯度法 conjugate gradient method
该方法基于梯度下降法。
由于梯度下降法每次下降的方向为负梯度方向,这并不能保证其是最优的方向。通过共轭方向的计算,保证第二步开始的下降方向在一个圆锥内,这能极大的提高下降的效率。
即
x_k = 6 # The algorithm starts at x=6
alpha = 0.01 # step size multiplier
precision = 0.00001 # end mark
previous_step_size = 1
max_iters = 10000 # maximum number of iterations
iters = 0 #iteration counter
beta = 0
df = lambda x: 4 * x**3 - 9 * x**2
g_k = df(x_k)
while previous_step_size > precision and iters < max_iters:
prev_x_k = x_k
prev_g_k = g_k
if 0 == iters:
d_k = -g_k
else:
g_k = df(x_k)
beta = g_k*g_k/(prev_g_k*prev_g_k)
d_k = -g_k + beta*d_k
x_k += alpha * d_k
previous_step_size = abs(x_k - prev_x_k)
iters+=1
print("The local minimum occurs at", x_k)
print("The number of iterations is", iters)
The local minimum occurs at 2.2500110335395793
The number of iterations is 22
2.3 线搜索 line search
线搜加入了对步长的计算。
名为Backtracking Line Search(BLS)的梯度下降法以Armijo条件为方法,在以2.2为基础的搜索方向上选取不越过最优点的最大步长: 1. 定义
,及
2. 从
开始迭代
,计算
其中Armijo条件为
# -*- coding: utf-8 -*-
# min ( )=(x-3)**2
import matplotlib.pyplot as plt
def f(x):
'''The function we want to minimize'''
return (x-3)**2
def f_grad(x):
'''gradient of function f'''
return (x-3)*2
x = 6
y = f(x)
MAX_ITER = 300
curve = [y]
i = 0
step = 0.1
previous_step_size = 1.0
precision = 1e-4
#下面展示的是我之前用的方法,看上去貌似还挺合理的,但是很慢
while previous_step_size > precision and i < MAX_ITER:
prev_x = x
gradient = f_grad(x)
x = x - gradient * step
new_y = f(x)
if new_y > y: #如果出现divergence的迹象,就减小step size
step *= 0.8
curve.append(new_y)
previous_step_size = abs(x - prev_x)
i += 1
print("The local minimum occurs at", x)
print("The number of iterations is", i)
plt.figure()
plt.show()
plt.plot(curve, 'r*-')
plt.xlabel('iterations')
plt.ylabel('objective function value')
#下面展示的是backtracking line search,速度很快
x = 6
y = f(x)
alpha = 0.25
beta = 0.8
curve2 = [y]
i = 0
previous_step_size = 1.0
precision = 1e-4
while previous_step_size > precision and i < MAX_ITER:
prev_x = x
gradient = f_grad(x)
step = 1.0
while f(x - step * gradient) > f(x) - alpha * step * gradient**2:
step *= beta
x = x - step * gradient
new_y = f(x)
curve2.append(new_y)
previous_step_size = abs(x - prev_x)
i += 1
print("The local minimum occurs at", x)
print("The number of iterations is", i)
plt.plot(curve2, 'bo-')
plt.legend(['gradient descent', 'BLS'])
plt.show()
运行结果如图
The local minimum occurs at 3.0003987683987354
The number of iterations is 40
The local minimum occurs at 3.000008885903001
The number of iterations is 10
3 其他方法
其他方法还有如Riccati与Collocation,以后有时间再单独总结。