Lect1: Fundamentals of Unconstrained Optimization


1. Classification

  • Continuous vs Discrete Optimization
  • Constrained vs Unconstrained Optimization
  • Global vs Local Optimization
  • Stochastic vs Deterministic Optimization
  • Convex vs Nonconvex Optimization

2. Claim

Most algorithms are able to find only a local minimizer


3. First-Order Necessary Conditions

If \(x^*\) is a local minimizer and \(f\) is continuously differentiable in an open neighborhood of \(x^*\), then \(\nabla f(x^*)=0\)


4. Second-Order Necessary Conditions

If \(x^*\) is a local minimizer of \(f\) and \(\nabla^2f\) is continuous in an open neighborhood of \(x^*\), then \(\nabla f(x^*)=0\) and \(\nabla^2f(x^*)\) is positive semidefinite


5. Second-Order Sufficient Conditions

Suppose that \(\nabla^2f\) is continuous in an open neighborhood of \(x^*\) and that \(\nabla f(x^*)=0\) and \(\nabla^2f(x^*)\) is poisitive definite. then \(x^*\) is a strict local minimizer of \(f\)


6. Property of Convex Optimization

When \(f\) is convex, any local minimizer \(x^*\) is a global minimizer of \(f\). If in addition \(f\) is differentiable, then any stationary point \(x^*\) is a global minimizer of \(f\)


7. Two Fundamental Strategies for Iteration

  • Linear Search:
    Update \(x_{k+1}=x_k+\alpha^*p_k\) by approximately solving the following one-dimensional minimization problem to find a step length \(\alpha^*\):
    \[\alpha^*=argmin_{\alpha>0}f(x_k+\alpha p_k)\]
  • Trust Region:
    Update \(x_{k+1}=x_k+p^*\) by approximately solving the following subproblem:
    \[p^*=argmin_p m_k(x_k+p)\]
    where \(x_k+p\) lies inside the trust region, and insider the trust region \(||p||\leq \Delta\), the behavior of the model function \(m_k\) near the current point \(x_k\) is similar to \(f\), \(m_k\) is usually defined as:
    \[m_k(x_k+p)=f_k+p^T\nabla f_k+\frac{1}{2}p^TB_kp\]
    where \(B_k\) is usually set as the Hessian \(\nabla^2f\) or some approximation to it

8. Search Directions

  • steepest descent direction:
    \[p_k=-\nabla f_k\]
  • any descent direction:
    \[angle(p_k, -\nabla f_k)<\frac{\pi}{2}\]
    In particular:
    • Newton direction:
      \[p_k=-\nabla^2f_k^{-1}\nabla f_k\]
      a particular descent direction, and unlike steepest descent direction, there is a natural step length of 1
    • Quasi-Newton direction:
      \[B_k\approx \nabla^2 f_k\]
      note: \(\nabla^2f_{k+1}(x_{k+1}-x_{k})\approx \nabla f_{k+1}-\nabla f_{k}\), so we can required the approximation \(B_{k+1}\) satisfy the secant equation:
      \[B_{k+1}s_k=y_{k}\]
      where \(s_k=x_{k+1}-x_k\) and \(y_k=\nabla f_{k+1}-\nabla f_{k}\) also impose additional requirements on \(B_{k+1}\), such as symmetry, difference between successive approximation \(B_k\) to \(B_{k+1}\) have low rank:
      • SR1:
        \[B_{k+1}=B_k+\frac{(y_k-B_ks_k)(y_k-B_ks_k)^T}{(y_k-B_ks_k)^Ts_k}\]
        \(rank(B_{k+1}-B_k)=1\)
      • BFGS:
        \[B_{k+1}=B_k-\frac{B_ks_ks_k^TB_k}{s_k^TB_ks_k}+\frac{y_ky_k^T}{y_k^Ts_k}\]
        \(rank(B_{k+1}-B_k)=2\). If \(B_0\) is positive definite, and \(s_k^Ty_k>0\), then \(B_k\) is positive definite
    • Nonlinear Conjugate Gradient:
      \[p_k=-\nabla f(x_k)+\beta_kp_{k-1}\]
      where \(\beta_k\) is a scalar that ensures that \(p_k\) and \(p_{k-1}\) are conjugate

Remark: nonliear conjugate gradient directions are much more effective than steepest descent directions and are almost as simple to compute. It does not attain the fast convergence rate of Newton or quasi-Newton, but have the advantage of not requiring storage of matrices


9. Practical Implementation of Quasi-Newton

\[p_k=-B_k^{-1}\nabla f_k\]
Avoid the need to factorized \(B_k\) by updating the inverse of \(B_k\) itself:
\[H_{k+1}=(I-\rho_ks_ky_k^T)H_k(I-\rho_ky_ks_k^T)+\rho_ks_ks_k^T\]
where \(H_k=B_k^{-1}\) and \(\rho_k=\frac{1}{y_k^Ts_k}\)
\[p_k=-H_k\nabla f_k\]


10. Scaling

Some optimization algorithms, such as steepest descent, are sensitive to poor scaling, while others, such as Newton’s method, are unaffected by it
Poor scaling example:
\[f(x)=10^9x_1^2+x_2^2\]
Poorly scaled and well-scaled problems, and performance of the steepest descent direction:
1106900-20170215183806207-139509978.png


11. Rate of Convergence

  • Let \(\{x_k\}\) be a sequence in \(\mathbb{R}^n\) that converges to \(x^*\), we say that the convergence is Q-linear if there is a constant \(r\in(0,1)\) such that:
    \[\frac{||x_{k+1}-x^*||}{||x_k-x^*||}\leq r\]
    for all \(k\) sufficiently large
  • The convergence is said to be Q-superlinear if
    \[\lim_{k\rightarrow \infty}\frac{||x_{k+1}-x^*||}{||x_k-x^*||}=0\]
    The convergence is said to be Q-quadratic if
    \[\frac{||x_{k+1}-x^*||}{||x_k-x^*||^2}\leq M\]
    for all \(k\) sufficiently large

Remark:Quasi-Newton methods typically converge Q-superlinearly, whereas Newton's method converges Q-quadratically. In contrast, steepest descent algorithms converge only at a Q-linear rate, and when the problem is ill-conditioned the convergence constant \(r\) is close to 1

转载于:https://www.cnblogs.com/cihui/p/6402730.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值