【Optimal Control (CMU 16-745)】Lecture 4 Optimization Part 2

啵啵啵啵哲

已于 2023-10-07 10:44:57 修改

阅读量99

点赞数 1

分类专栏：最优控制文章标签：学习机器人

于 2023-08-31 15:08:59 首次发布

本文链接：https://blog.csdn.net/xuzhengzhe/article/details/132603753

版权

最优控制专栏收录该内容

10 篇文章 2 订阅

订阅专栏

Review:

Root finding
Newton’s method
Minimization
Regularization/Damped Newton’s method

Lecture 4 Optimization Pt.2

Overview

Line search (which can solve the “over-shoot” problem) (trust region method can also solve this problem)
Constrained minimization

1. Line Search

Motivation:

$\Delta \mathbf{x}$ step from Newton’s method may overshoot the minimum.
To fix this, check $f\left(\mathbf{x}+ \Delta \mathbf{x}\right)$ and “backtrack” until we get a “good” reduction in $f$ .

(1) Armijo Rule

There are many strategies for this, but we will focus on the Armijo rule which is simple and effective.

$\alpha = 1$ (step length)
while $f\left(\mathbf{x}+\alpha \Delta \mathbf{x}\right) > f\left(\mathbf{x}\right) + b \alpha \nabla f\left(\mathbf{x}\right)^T \Delta \mathbf{x}$
$\quad$ $\alpha \leftarrow c \alpha$ ( $\in \left(0,1\right)$ )
end

$b$ is tolerance, $\in \left(0,1\right)$
$\alpha \nabla f\left(\mathbf{x}\right)^T \Delta \mathbf{x}$ is the expected reduction from gradient

(2) Intuition

Make sure step agrees with linearization within some tolerance $b$ .
Typical values: $b = 10^{-4}- 10^{-1}$ , $c = 1/2$ .

(3) Example

function backtracking_regularized_newton_step(x0)
    b = 0.1
    c = 0.5
    β = 1.0
    H = ∇2f(x0)
    while !isposdef(H)
        H = H + β*I 
    end
    Δx = -H\∇f(x0)
    
    α = 1.0
    while f(x0 + α*Δx) > f(x0) + b*α*∇f(x0)'*Δx
        α = c*α
    end
    
    xn = x0 + α*Δx
end

在这里插入图片描述

(4) Takeaway message

Newton with simple and cheap modifications (globalization startegy) is extrmely effective at finding local minima.

2. Equality Constraints

$f\left(\mathbf{x}\right): \mathbb{R}^n \rightarrow \mathbb{R}$ , $\mathbf{c}\left(\mathbf{x}\right): \mathbb{R}^n \rightarrow \mathbb{R}^m$ .
$\min_{\mathbf{x}} f\left(\mathbf{x}\right) \quad \text{s.t.} \quad \mathbf{c}\left(\mathbf{x}\right) = \mathbf{0}$

(1) First-order necessary conditions

i. Need $\nabla f\left(\mathbf{x}\right) = \mathbf{0}$ in free directions.
ii. Need $\mathbf{c}\left(\mathbf{x}\right) = \mathbf{0}$ .
在这里插入图片描述

Another statement: any non-zero component of $\nabla f\left(\mathbf{x}\right)$ must be normal to the constraint surface/manifold.

Explanation:
if the component of $\nabla f\left(\mathbf{x}\right)$ is not normal to the constraint surface, then we can move along the constraint surface to reduce the value of $f\left(\mathbf{x}\right)$ .

(i) Lagrange multiplier

$\Rightarrow \nabla f\left(\mathbf{x}\right) + \lambda \nabla \mathbf{c}\left(\mathbf{x}\right) = \mathbf{0},$ for some $\lambda \in \mathbb{R}$ . (Lagrange multiplier/dual variable)
In other words, $\nabla f\left(\mathbf{x}\right)$ and $\nabla \mathbf{c}\left(\mathbf{x}\right)$ are parallel.

(ii) More general case

In general (in vector form):
$\frac{\partial f}{\partial \mathbf{x}} + \lambda^\top \frac{\partial \mathbf{c}}{\partial \mathbf{x}} = \mathbf{0},$ where $\lambda \in \mathbb{R}^m$ .

(iii) Lagrangian

Based on this gradient condition, we define the Lagrangian:
$\mathcal{L}\left(\mathbf{x}, \lambda\right) = f\left(\mathbf{x}\right) + \lambda^\top \mathbf{c}\left(\mathbf{x}\right).$

It turns the constrained minimization problem into an unconstrained one:
$\min_{\mathbf{x}} \mathcal{L}\left(\mathbf{x}, \lambda\right).$

Its gradients are (also called KKT conditions):
$\begin{align*} \nabla_\mathbf{x} \mathcal{L}\left(\mathbf{x}, \lambda\right) &= \nabla f\left(\mathbf{x}\right) + \left(\frac{\partial \mathbf{c}}{\partial \mathbf{x}}\right)^\top \lambda = \mathbf{0} \\ \nabla_\lambda \mathcal{L}\left(\mathbf{x}, \lambda\right) &= \mathbf{c}\left(\mathbf{x}\right) = \mathbf{0}. \end{align*}$

So, we can solve this with Newton’s method (root finding problem).
$\begin{align*} \nabla_\mathbf{x} \mathcal{L}\left(\mathbf{x}+\Delta \mathbf{x}, \lambda + \Delta \lambda\right) &\approx \nabla_\mathbf{x} \mathcal{L}\left(\mathbf{x}, \lambda\right) + \frac{\partial^2 \mathcal{L}}{\partial \mathbf{x}^2} \Delta \mathbf{x} + \frac{\partial^2 \mathcal{L}}{\partial \mathbf{x} \partial \lambda} \Delta \lambda = \mathbf{0} \\ &= \nabla_\mathbf{x} \mathcal{L}\left(\mathbf{x}, \lambda\right) + \frac{\partial^2 \mathcal{L}}{\partial \mathbf{x}^2} \Delta \mathbf{x} + \left(\frac{\partial \mathbf{c}}{\partial \mathbf{x}}\right)^\top \Delta \lambda = \mathbf{0} \end{align*}$

$\nabla_\lambda \mathcal{L}\left(\mathbf{x}+\Delta \mathbf{x}, \lambda + \Delta \lambda\right) \approx \mathbf{c}\left(\mathbf{x}\right) + \frac{\partial \mathbf{c}}{\partial \mathbf{x}} \Delta \mathbf{x} = \mathbf{0}$

The Newton step is:
$\begin{bmatrix} \frac{\partial^2 \mathcal{L}}{\partial \mathbf{x}^2} & \left(\frac{\partial \mathbf{c}}{\partial \mathbf{x}}\right)^\top \\ \frac{\partial \mathbf{c}}{\partial \mathbf{x}} & \mathbf{0} \end{bmatrix} \begin{bmatrix} \Delta \mathbf{x} \\ \Delta \lambda \end{bmatrix} = \begin{bmatrix} -\nabla_\mathbf{x} \mathcal{L}\left(\mathbf{x}, \lambda\right) \\ -\mathbf{c}\left(\mathbf{x}\right) \end{bmatrix}$

The equation is called KKT system, and the first matrix is symmetric. (this is a symmetric indefinite system)

3. Gauss-Newton Method

(1) Basic idea

$\begin{align*} \frac{\partial^2 \mathcal{L}}{\partial \mathbf{x}^2} &= \nabla^2 f\left(\mathbf{x}\right) + \frac{\partial}{\partial \mathbf{x}}\left[\left(\frac{\partial \mathbf{c}}{\partial \mathbf{x}}\right)^\top \lambda\right]\\ &\approx \nabla^2 f\left(\mathbf{x}\right) \end{align*}$

The second term is a tensor and is expensive to compute. So, we often drop the second constraint curvature term. This is called Gauss-Newton method.
This method has slightly slower convergence than Newton’s method (more iterations), but each iteration is much cheaper (often wins in wall-clock time).

(2) Example: comparison of Newton and Gauss-Newton

(i) Newton’s method

function newton_step(x0,λ0)
    H = ∇2f(x0) + ForwardDiff.jacobian(x -> ∂c(x)'*λ0, x0)
    C = ∂c(x0)
    Δz = [H C'; C 0]$$-∇f(x0)-C'*λ0; -c(x0)]
    Δx = Δz[1:2]
    Δλ = Δz[3]
    return x0+Δx, λ0+Δλ
end

start from $\left(-1,-1\right)$
在这里插入图片描述

start from $\left(-3,2\right)$
在这里插入图片描述

it does not converge to the minimum
check the Hessians

H = ∇2f(xguess[:,end]) + ForwardDiff.jacobian(x -> ∂c(x)'*λguess[end], xguess[:,end])

result:

2×2 Matrix{Float64}:
 -1.75818  0.0
  0.0      1.0

It has a negative eigenvalue, which results in a wrong descent direction. This is brought by the second term (constraint curvature term), so we must do some regularization during the Newton’s method. However, the Gauss-Newton method does not have this problem.

(ii) Gauss-Newton method

function gauss_newton_step(x0,λ0)
    H = ∇2f(x0) # drop the 2nd term (tensor)
    C = ∂c(x0)
    Δz = [H C'; C 0]$$-∇f(x0)-C'*λ0; -c(x0)]
    Δx = Δz[1:2]
    Δλ = Δz[3]
    return x0+Δx, λ0+Δλ
end

start from $\left(-1,-1\right)$
在这里插入图片描述

start from $\left(-3,2\right)$
在这里插入图片描述

It converges to the minimum in 8 iterations.

(3) Takeaway message

May still need to regularize the Hessian $\frac{\partial^2 \mathcal{L}}{\partial \mathbf{x}^2}$ even if $\nabla^2 f\left(\mathbf{x}\right)\succ 0$ .
Gauss-Newton method is often used in practice.

4. Inequality Constraints

$\min_{\mathbf{x}} f\left(\mathbf{x}\right) \quad \text{s.t.} \quad \mathbf{c}\left(\mathbf{x}\right) \leq \mathbf{0}$

We’ll look at just inequality constraints for now.
Just combine with previous methods to handle both kinds of constraints.

(1) First-order necessary conditions

i. Need $\nabla f\left(\mathbf{x}\right) = \mathbf{0}$ in free directions.
ii. Need $\mathbf{c}\left(\mathbf{x}\right) \leq \mathbf{0}$ .
(same as equality constraints)

(i) KKT conditions

$\begin{align*} \nabla f\left(\mathbf{x}\right) + \left(\frac{\partial \mathbf{c}}{\partial \mathbf{x}}\right)^\top \lambda = \mathbf{0}& \quad (\mathrm{stationarity}) \\ \mathbf{c}\left(\mathbf{x}\right) \leq \mathbf{0}& \quad (\mathrm{primal\; feasibility}) \\ \lambda \geq \mathbf{0}& \quad (\mathrm{dual\; feasibility}) \\ \lambda\odot \mathbf{c}\left(\mathbf{x}\right) = \lambda^\top \mathbf{c}\left(\mathbf{x}\right) = \mathbf{0}& \quad (\mathrm{complementary\; slackness}) \end{align*}$ for some $\lambda \in \mathbb{R}^m$ .

(ii) Intuition of KKT conditions

If the constraint is active ( $\mathbf{c}\left(\mathbf{x}\right) = \mathbf{0}$ ) $\Rightarrow$ $\lambda > 0$ .
If the constraint is inactive ( $\mathbf{c}\left(\mathbf{x}\right) < \mathbf{0}$ ) $\Rightarrow$ $\lambda = 0$ .

(iii) Takeaway message

The complementary slackness condition is just like a switch. If the constraint is active, then the switch is on ( $\lambda > 0$ ), and the constraint is inactive, then the switch is off ( $\lambda = 0$ ).
There is also an edge when the minimum of the objective happens to be on the constraint manifold when $\mathbf{c}\left(\mathbf{x}\right) = \lambda = 0$ .

啵啵啵啵哲

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
【Optimal Control (CMU 16-745)】Lecture 4 Optimization Part 2

【代码】【Optimal Control (CMU 16-745)】Lecture 4 Optimization Part 2。
复制链接

扫一扫

专栏目录