Optimization Week 13: Newton method and barrier method

最新推荐文章于 2022-04-30 23:10:30 发布

xiwang_chn

最新推荐文章于 2022-04-30 23:10:30 发布

阅读量245

点赞数

分类专栏： Optimization

本文链接：https://blog.csdn.net/weixin_42017454/article/details/111056328

版权

Optimization 专栏收录该内容

15 篇文章 1 订阅

订阅专栏

Week 13: Newton method and barrier method

1 Newton method (Second order method)
2 Quasi Newton methods
3 Barrier method (with constraints)

1 Newton method (Second order method)

1.1 Motivation

Gradient methods is first order, which uses a linear approximation to iterate.
Gradient method is not affine invariant. A linear or affine change will of variables will cahnge the convergence rate.
Thus use the change to get the best convergence rate, which requires the Hessian to be identity, resulting in the second order method, Newton method.

1.2 Idea of Newton method

Use local quadratic approximation
$g(x)=f(x_0)+\nabla f(x_0)^T(x-x_0)+(x-x_0)^T\nabla^2 f(x_0)(x-x_0)$ $x_+=\argmin_xg(x)=x_0-[\nabla^2 f(x_0)^T]^{-1}\nabla f(x_0)$

1.3 Newton method

$x_+=x_0-\eta[\nabla^2 f(x_0)]^{-1}\nabla f(x_0)$

Affine invariant
Need to know Hessian $\nabla^2f$
Converge fast
Expensive each iteration

1.4 Step size

$\eta=1$ : pure newton, may not converge
Backtracking line search (BLTS):
$\alpha<1/2, \beta<1$
$d=[\nabla^2 f(x_0)]^{-1}\nabla f(x_0)$
while $f(x-\eta d)>f(x)-\alpha \eta \nabla f(x)^Td$
$\quad \quad \eta=\eta \beta$
Backtracking line search is a way to choose the step size by starting from a very optimistic step size, and then checking if we are too optimistic. And if we are, we decrease it by a fraction beta. And alpha is a parameter that decides whether we’re optimistic or not.

1.5 Convergence with BTLS

Prerequisite 1: $mI\leq\nabla^2f(x)\leq MI$
Prerequisite 2: $\nabla^2f(x)$ is L-lipschitz
Two-phase convergence:
- Damped phase ( $||\nabla f(x)\geq \alpha||$ ): $f(x_t)-f(x^*)\leq f(x_0)-f^*-\gamma t$
- Pure phase ( $||\nabla f(x)< \alpha||$ , BTLS selects $\eta=1$ ): $f(x_t)-f(x^*)\leq \frac{2m^3}{L^2}(\frac{1}{2})^{2^{t-t_0+1}}$ , or $\frac{L}{2m^2}||\nabla f(x_+)||\leq(\frac{L}{2m^2}||\nabla f(x)||)^2$ .
Steps to reach $\varepsilon$ accuracy: $\frac{f(x_0)-f^*}{\gamma}+\log\log(\frac{\varepsilon_0}{\varepsilon})$ , quadratic convergence
Error $(\frac{1}{2})^{2^t}$
Gradient descent: $\log(\frac{1}{\varepsilon})$ , linear convergence.

1.6 Scale free Newton

1.6.1 Definition

A one-dimensional convex function $f:\mathbb{R}\rightarrow \mathbb{R}$ is self-concordant if $|f'''|\leq 2[f''(x)]^{\frac{3}{2}}$ .
A n-dimensional convex function $f:\mathbb{R}^n\rightarrow \mathbb{R}$ is self-concordant if its every 1-d projection is self-concordant.

1.6.2 Convergence

Newton with BTLS $(\alpha,\beta)$ for a self-concordant $f$ reaches $\varepsilon$ -

optimality in $c(\alpha,\beta)(f(x_0)-f^*)+\log\log(\frac{1}{\varepsilon})$ .

2 Quasi Newton methods

2.1 Motivation

$x_{t+1}=x_t-\eta I \nabla f(x_t)$
$x_{t+1}=x_t-\eta \nabla^2f(x_t)^{-1} \nabla f(x_t)$
$x_{t+1}=x_t-\eta H_t \nabla f(x_t)$

2.2 Basic idea

$x_{+}-x=s$ Solve secant equation $B_{+}s=\nabla f(x_{+})-\nabla f(x)$ $B_+s=y$ for $B_{+}$ through cheap update of $B$ and keeping $B$ symmetric, positive semidefinite, then solve $B_+p=-\nabla f(x)$ for $p$ ,finally, approximate $x_+=x_0-\eta[\nabla^2 f(x_0)]^{-1}\nabla f(x_0)$ with $x_+=x_0+\eta p$

https://en.wikipedia.org/wiki/Quasi-Newton_method

Use Sherman–Morrison formula to:
Update of inverse

2.3 Convergence

Super linear convergence when strong convexity + extra assumptions.

3 Barrier method (with constraints)

3.1 Basic idea

$\begin{aligned} \min_x& \quad f(x)\\ s.t.& \quad h_i(x)\leq C, i=1\dots m\\ & \quad Ax=b \end{aligned}$ Bring the constraints into the objective function using indicator function $I(u)=\infin$ if $u\geq 0$ ; $= 0$ otherwise.
$\begin{aligned} \min_x& \quad f(x)+\sum_i I(h_i(x))\\ s.t.& \quad Ax=b \end{aligned}$ Then use a smooth function approx $=-\frac{1}{t}\log(-u)$ to approximate $I (u)$ , $t\rightarrow \infin$ , approx $\rightarrow I(u)$ .

$\begin{aligned} \min_x& \quad tf(x)+[-\sum_{i=1}^m \log (-h_i(x))]\\ s.t.& \quad Ax=b \end{aligned}$

3.2 Barrier method

Solve sequence of problems: $\begin{aligned} \min_x& \quad t^kf(x)+[-\sum_{i=1}^m \log (-h_i(x))]\\ s.t.& \quad Ax=b \end{aligned}$
Start from initial $t^0$
At each epoch $t^k$ , find $x^*(t^k)$ using Newton method starting at $x^*(t^{k-1})$ . Increase $t^{k+1}=\mu t^k$ .

3.2.1 Central path

Solve the problem $\begin{aligned} \min_x& \quad t^kf(x)+[-\sum_{i=1}^m \log (-h_i(x))]\\ s.t.& \quad Ax=b \end{aligned}$ we get the optimal solution $x^*(t^k)$ . $x^*(t^k)\rightarrow x^*$ when $k\rightarrow\infin$ .

3.2.2 Choose $t$

Initially, use small $t$ to avoid the bad conditioning, gradually increase $t$ when approaching $x^*$ .

xiwang_chn

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Optimization Week 13: Newton method and barrier method

Week 13: Newton method and barrier method1 Newton method (Second order method)1.1 Motivation1.2 Idea of Newton method1.3 Newton method1.4 Step size1.5 Convergence with BTLS1.6 Scale free Newton1.6.1 Definition1.6.2 Convergence2 Quasi Newton methods2.1 Basi
复制链接

扫一扫