Line Search and Quasi-Newton Methods 线性搜索与拟牛顿法

最新推荐文章于 2024-07-12 19:06:42 发布

weixin_33709609

最新推荐文章于 2024-07-12 19:06:42 发布

阅读量729

点赞数

文章标签： python 人工智能

Gradient Descent

机器学习中很多模型的参数估计都要用到优化算法，梯度下降是其中最简单也用得最多的优化算法之一。梯度下降(Gradient Descent)[3]也被称之为最快梯度(Steepest Descent)，可用于寻找函数的局部最小值。梯度下降的思路为，函数值在梯度反方向下降是最快的，只要沿着函数的梯度反方向移动足够小的距离到一个新的点，那么函数值必定是非递增的，如图1所示。

梯度下降思想的数学表述如下：

b = a - α \nabla F (a) \Rightarrow f (a) \geq f (b) (1)

其中

x k + 1 = x k - α k \nabla f (x k), 0 \leq k \leq n (2)

f (x 0) \geq f (x 1) \geq f (x 2) \geq \dots \geq f (x n) (3)

f (x k + α d k) < f (x k)

d k = - B k \nabla f (x k) (5)

Line Search

在给定搜索方向

α = a r g

Bisection Search

二分线性搜索(Bisection Line Search)[2]可用于求解函数的根，其思想很简单，就是不断将现有区间划分为两半，选择必定含有使

L = (1 2 ) n α ^ (7)

L \leq ϵ \Rightarrow k \leq [log 2 (α ^ ϵ ) ] (8)

 1 def bisection(dfun,theta,args,d,low,high,maxiter=1e4):
 2     """
 3     #Functionality:find the root of the function(fun) in the interval [low,high]
 4     #@Parameters
 5     #dfun:compute the graident of function f(x)
 6     #theta:Parameters of the model
 7     #args:other variables needed to compute the value of dfun
 8     #[low,high]:the interval which contains the root
 9     #maxiter:the max number of iterations
10     """
11     eps=1e-6
12     val_low=np.sum(dfun(theta+low*d,args)*d.T)
13     val_high=np.sum(dfun(theta+high*d,args)*d.T)
14     if val_low*val_high>0:
15         raise Exception('Invalid interval!')
16     iter_num=1
17     while iter_num<maxiter:
18         mid=(low+high)/2
19         val_mid=np.sum(dfun(theta+mid*d,args)*d.T)
20         if abs(val_mid)<eps or abs(high-low)<eps:
21             return mid
22         elif val_mid*val_low>0:
23             low=mid
24         else:
25             high=mid
26         iter_num+=1

Backtracking

回溯线性搜索(Backing Line Search)[1]基于Armijo准则计算搜素方向上的最大步长，其基本思想是沿着搜索方向移动一个较大的步长估计值，然后以迭代形式不断缩减步长，直到该步长使得函数值

f (x k + α d k) \leq f (x k) + c 1 α f' (x k) T d k (9)

h' (0) < c 1 h' (0) < 0 (10)

h' (0) = lim α \to 0 h ( α ) - h ( 0 ) α = lim α \to 0 f ( x k +

f ( x k + α d k ) - f ( x k ) α < c f ' ( x k ) T d k (12)

 1 def ArmijoBacktrack(fun,dfun,theta,args,d,stepsize=1,tau=0.5,c1=1e-3):
 2     """
 3     #Functionality:find an acceptable stepsize via backtrack under Armijo rule
 4     #@Parameters
 5     #fun:compute the value of objective function
 6     #dfun:compute the gradient of objective function
 7     #theta:a vector of parameters of the model
 8     #stepsize:initial step size
 9     #c1:sufficient decrease Parameters
10     #tau:rate of shrink of stepsize
11     """
12     slope=np.sum(dfun(theta,args)*d.T)
13     obj_old=costFunction(theta,args)
14     theta_new=theta+stepsize*d
15     obj_new=costFunction(theta_new,args)
16     while obj_new>obj_old+c1*stepsize*slope:
17         stepsize*=tau
18         theta_new=theta+stepsize*d
19         obj_new=costFunction(theta_new,args)
20     return stepsize

Interpolation

基于Armijo准则的回溯线性搜索的收敛速度无法得到保证，特别是要回退很多次后才能落入满足Armijo准则的区间。如果我们根据已有的函数值和导数信息，采用多项式插值法(Interpolation)[12,6,5,9]拟合函数，然后根据该多项式函数估计函数的极值点，这样选择合适步长的效率会高很多。假设我们只有

h q (α) = (h ( α 0 ) - h ( 0 ) - α 0 h ' ( 0 ) α 2 0 ) α 2 + h

α 1 = h ' ( 0 ) α 2 0 2 [ h ( 0 ) + h ' ( 0 ) α 0 - h ( α 0 ) ]

h c (α) = a α 3 + b α 2 + h' (0) α + h (0) (15)

[a b] = 1 α 2 i - 1 α 2 i ( α i - α i - 1 ) [

α i + 1 = - b + b 2 - 3 a h ' ( 0 )----------\sqrt 3 a (17)

H 3 (α) = [1 + 2 α i - α α i - α i - 1 ]

weixin_33709609

关注

0
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
Line Search and Quasi-Newton Methods 线性搜索与拟牛顿法

Gradient Descent机器学习中很多模型的参数估计都要用到优化算法，梯度下降是其中最简单也用得最多的优化算法之一。梯度下降(Gradient Descent)[3]也被称之为最快梯度(Steepest Descent)，可用于寻找函数的局部最小值。梯度下降的思路为，函数值在梯度反方向下降是最快的，只要沿着函数的梯度反方向移动足够小的距离到一个新的点，那么函数值必定是非递增的，如图1所...
复制链接

扫一扫