Lect3: Trust-Region Methods


1. Introduction:

  • Trust-region methods choose the direction and length of the step simultaneously.
  • If a step is not acceptable, they reduce the size of the region and find a new minimizer.
  • The step direction changes whenever the size of the trust region is altered.
  • If the region is too small, the algorithm misses an opportunity to take a substantial step.
  • If too large, the minimizer of the model may be far from the minimizer
  • Increase the trust region if the previous step is good, otherwise reduce the size

1106900-20170215190551441-25647188.png

subproblem:
\[\min_{p}m_k(p)=f_k+\nabla f_k^Tp+\frac{1}{2}p^TB_kp, \ s.t.\ ||p||\leq\Delta_k\]
If \(||B_k^{-1}\nabla f_k||\leq\Delta_k\), then \(p_k=-B_k^{-1}\nabla f_k\) is the solution


2. Outline of the algorithm:

Algorithm:
Given \(\bar{\Delta}>0, \Delta_0\in(0,\bar{\Delta})\), and \(\eta\in[0,\frac{1}{4})\)
for \(k=0,1,2,\cdots\)
  given \(p_k\) by solving the above subproblem
  evalute \(\rho_k=\frac{f(x_k)-f(x_k+p_k)}{m_k(0)-m_k(p_k)}\)
  if \(\rho_k<\frac{1}{4}\)
    \(\Delta_{k+1}=\frac{1}{4}||p_k||\)
  else
    if \(\rho_k>\frac{3}{4}\) and \(||p_k||=\Delta_k\)
      \(\Delta_{k+1}=\min(2\Delta_k,\bar{\Delta})\)
    else
      \(\Delta_{k+1}=\Delta_k\)
  if \(\rho_k>\eta\)
    \(x_{k+1}=x_k+p_k\)
  else
    \(x_{k+1}=x_k\)
end(for)

Theorem:
The vector \(p^*\) is a global solution of the trust-region problem
\[\min_{p\in\mathbb{R}^n}m(p)=f+g^Tp+\frac{1}{2}p^TBp, s.t. ||p||\leq \Delta\]
if and only if \(p^*\) is feasible and there is a scale \(\lambda\geq 0\) such that the following conditions are satisfied:
\[(B+\lambda I)p^*=-g\]
\[\lambda(\Delta-||p^*||)=0\]
\[(B+\lambda I) \ is\ positive\ semidefinite\ \]


3. Algorithms based on the Cauchy point:

Two strategies finding approximate solutions to achieve as much reduction as the Cauchy point

  • The dogleg method
  • The two-dimensional subspace minimization

Cauchy point calculation:
Find the vector \(p_k^s\) that solves a linear version of the subproblem, that is
\[p_k^s=argmin_{p\in\mathbb{R}^n}f_k+g_k^Tp, s.t. ||p||\leq\Delta)k\]
calculate the scalar \(\tau_k>0\) that minimizes \(m_k(\tau_kp_k^s)\) subject to the trust-region bounds:
\[\tau_k=argmin_{\tau\geq 0}m_k(\tau p_k^s), s.t. ||\tau p_k^s||\leq \Delta_k\]
Set \(p_k^c=\tau_kp_k^s\)
The solution is:
\[p_k^c=-\tau_k\frac{\Delta_k}{||g_k||}g_k\]
\[\tau_k=\begin{cases} 1, \ if\ g_k^TB_kg_k\leq 0 \\ min(||g_k||^3/(\Delta_kg_k^TB_kg_k), 1), \ otherwise \\ \end{cases}\]

The Dogleg method:
1106900-20170515165542369-434258906.png

\[\tilde{p}(\tau)=\begin{cases} \tau p^U, \ 0\leq \tau \leq 1, \\ p^U+(\tau-1)(p^B-p^U), \ 1 \leq \tau \leq 2 \\ \end{cases}\]
\(p^B=-B^{-1}g, p^U=-\frac{g^Tg}{g^TBg}g\)

\(\tilde{p}(\tau)\) intersects the trust-region boundary \(||p||=\Delta\) at exactly one point if \(||p^B||\geq \Delta\) and nowhere otherwise. Since \(m\) is decreasing along the path, the chosen value of \(p\) will be \(p^B\) if \(||p^B||\leq \Delta\), otherwise at the point of intersection of the dogleg and the trust-region boundary. In the latter case, we compute the appropriate value of \(\tau\) by solving the scalar quadratic equation:
\[||p^U+(\tau-1)(p^B-p^U)||^2=\Delta^2\]

The Newton-dogleg method is most appropriate when the objective function is convex (that is \(\nabla^2f(x_k)\) is always positive semidefinite).

Two-dimensional subspace minimization:
The subproblem is replaced by:
\[min_pm(p)=f+g^Tp+\frac{1}{2}p^TBp, s.t. \ ||p||\leq\Delta, \ p\in span[g,B^{-1}g]\]
when \(B\) has negative eigenvalues, this method can be modified to handle the case by change the subspace to:
\[span[g, (B+\alpha I)^{-1}g], \ for\ some \alpha\in(-\lambda_1, -2\lambda_1]\]
where \(\lambda_1\) denotes the most negative eigenvalue of B.
When \(||(B+\alpha I)^{-1}g||\leq \Delta\), we discard the subspace search and instead define the step to be
\[p=-(B+\alpha I)^{-1}g+v\]
where \(v\) is a vector that satisfies \(v^T(B+\alpha I)^{-1}g\leq 0\) which ensures that \(||p||\geq ||(B+\alpha I)^{-1}g||\). When \(B\) has zero eigenvalues but no negative eigenvalues, we define the step
to be the Cauchy point \(p=p^c\).


4. Global convergence:

It can be proved that the sequence of gradients \(\{g_k\}\) generated by the algorithm in section 2 has an accumulation point at zero, and in fact converges to zero when \(\eta\) is strictly positive.

转载于:https://www.cnblogs.com/cihui/p/6402904.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值