Lect1: Fundamentals of Unconstrained Optimization

最新推荐文章于 2021-09-17 17:45:46 发布

weixin_30413739

最新推荐文章于 2021-09-17 17:45:46 发布

阅读量102

点赞数

原文链接：http://www.cnblogs.com/cihui/p/6402730.html

版权

1. Classification

Continuous vs Discrete Optimization
Constrained vs Unconstrained Optimization
Global vs Local Optimization
Stochastic vs Deterministic Optimization
Convex vs Nonconvex Optimization

2. Claim

Most algorithms are able to find only a local minimizer

3. First-Order Necessary Conditions

If \(x^*\) is a local minimizer and \(f\) is continuously differentiable in an open neighborhood of \(x^*\), then \(\nabla f(x^*)=0\)

4. Second-Order Necessary Conditions

If \(x^*\) is a local minimizer of \(f\) and \(\nabla^2f\) is continuous in an open neighborhood of \(x^*\), then \(\nabla f(x^*)=0\) and \(\nabla^2f(x^*)\) is positive semidefinite

5. Second-Order Sufficient Conditions

Suppose that \(\nabla^2f\) is continuous in an open neighborhood of \(x^*\) and that \(\nabla f(x^*)=0\) and \(\nabla^2f(x^*)\) is poisitive definite. then \(x^*\) is a strict local minimizer of \(f\)

6. Property of Convex Optimization

When \(f\) is convex, any local minimizer \(x^*\) is a global minimizer of \(f\). If in addition \(f\) is differentiable, then any stationary point \(x^*\) is a global minimizer of \(f\)

7. Two Fundamental Strategies for Iteration

Linear Search:
Update \(x_{k+1}=x_k+\alpha^*p_k\) by approximately solving the following one-dimensional minimization problem to find a step length \(\alpha^*\):
\[\alpha^*=argmin_{\alpha>0}f(x_k+\alpha p_k)\]
Trust Region:
Update \(x_{k+1}=x_k+p^*\) by approximately solving the following subproblem:
\[p^*=argmin_p m_k(x_k+p)\]
where \(x_k+p\) lies inside the trust region, and insider the trust region \(||p||\leq \Delta\), the behavior of the model function \(m_k\) near the current point \(x_k\) is similar to \(f\), \(m_k\) is usually defined as:
\[m_k(x_k+p)=f_k+p^T\nabla f_k+\frac{1}{2}p^TB_kp\]
where \(B_k\) is usually set as the Hessian \(\nabla^2f\) or some approximation to it

8. Search Directions

steepest descent direction:
\[p_k=-\nabla f_k\]
any descent direction:
\[angle(p_k, -\nabla f_k)<\frac{\pi}{2}\]
In particular:
Newton direction:
\[p_k=-\nabla^2f_k^{-1}\nabla f_k\]
a particular descent direction, and unlike steepest descent direction, there is a natural step length of 1
Quasi-Newton direction:
\[B_k\approx \nabla^2 f_k\]
note: \(\nabla^2f_{k+1}(x_{k+1}-x_{k})\approx \nabla f_{k+1}-\nabla f_{k}\), so we can required the approximation \(B_{k+1}\) satisfy the secant equation:
\[B_{k+1}s_k=y_{k}\]
where \(s_k=x_{k+1}-x_k\) and \(y_k=\nabla f_{k+1}-\nabla f_{k}\) also impose additional requirements on \(B_{k+1}\), such as symmetry, difference between successive approximation \(B_k\) to \(B_{k+1}\) have low rank:
SR1:
\[B_{k+1}=B_k+\frac{(y_k-B_ks_k)(y_k-B_ks_k)^T}{(y_k-B_ks_k)^Ts_k}\]
\(rank(B_{k+1}-B_k)=1\)
BFGS:
\[B_{k+1}=B_k-\frac{B_ks_ks_k^TB_k}{s_k^TB_ks_k}+\frac{y_ky_k^T}{y_k^Ts_k}\]
\(rank(B_{k+1}-B_k)=2\). If \(B_0\) is positive definite, and \(s_k^Ty_k>0\), then \(B_k\) is positive definite
Nonlinear Conjugate Gradient:
\[p_k=-\nabla f(x_k)+\beta_kp_{k-1}\]
where \(\beta_k\) is a scalar that ensures that \(p_k\) and \(p_{k-1}\) are conjugate

Remark: nonliear conjugate gradient directions are much more effective than steepest descent directions and are almost as simple to compute. It does not attain the fast convergence rate of Newton or quasi-Newton, but have the advantage of not requiring storage of matrices

9. Practical Implementation of Quasi-Newton

\[p_k=-B_k^{-1}\nabla f_k\]
Avoid the need to factorized \(B_k\) by updating the inverse of \(B_k\) itself:
\[H_{k+1}=(I-\rho_ks_ky_k^T)H_k(I-\rho_ky_ks_k^T)+\rho_ks_ks_k^T\]
where \(H_k=B_k^{-1}\) and \(\rho_k=\frac{1}{y_k^Ts_k}\)
\[p_k=-H_k\nabla f_k\]

10. Scaling

Some optimization algorithms, such as steepest descent, are sensitive to poor scaling, while others, such as Newton’s method, are unaffected by it
Poor scaling example:
\[f(x)=10^9x_1^2+x_2^2\]
Poorly scaled and well-scaled problems, and performance of the steepest descent direction:

11. Rate of Convergence

Let \(\{x_k\}\) be a sequence in \(\mathbb{R}^n\) that converges to \(x^*\), we say that the convergence is Q-linear if there is a constant \(r\in(0,1)\) such that:
\[\frac{||x_{k+1}-x^*||}{||x_k-x^*||}\leq r\]
for all \(k\) sufficiently large
The convergence is said to be Q-superlinear if
\[\lim_{k\rightarrow \infty}\frac{||x_{k+1}-x^*||}{||x_k-x^*||}=0\]
The convergence is said to be Q-quadratic if
\[\frac{||x_{k+1}-x^*||}{||x_k-x^*||^2}\leq M\]
for all \(k\) sufficiently large

Remark:Quasi-Newton methods typically converge Q-superlinearly, whereas Newton's method converges Q-quadratically. In contrast, steepest descent algorithms converge only at a Q-linear rate, and when the problem is ill-conditioned the convergence constant \(r\) is close to 1

转载于:https://www.cnblogs.com/cihui/p/6402730.html

weixin_30413739

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Lect1: Fundamentals of Unconstrained Optimization

1. ClassificationContinuous vs Discrete OptimizationConstrained vs Unconstrained OptimizationGlobal vs Local OptimizationStochastic vs Deterministic OptimizationConvex vs Nonconvex Optimizatio...
复制链接

扫一扫