【Optimal Control (CMU 16-745)】Lecture 5 Optimization Part 3

最新推荐文章于 2024-10-15 10:20:47 发布

啵啵啵啵哲

最新推荐文章于 2024-10-15 10:20:47 发布

阅读量127

点赞数 1

分类专栏：最优控制文章标签：学习机器人

本文链接：https://blog.csdn.net/xuzhengzhe/article/details/132624745

版权

最优控制专栏收录该内容

10 篇文章 2 订阅

订阅专栏

Review:

Line search
Equality and inequality constraints
KKT conditions

Lecture 5 Optimization Pt.3

Overview

Algorithms for constrained minimization
Augumented lagrangian method
Quadratic programming
More on regularization and line search

1. Inequality-constrained minimization

$\begin{aligned} &\min_{\mathbf{x}} \quad f(\mathbf{x}) \\ &\text{s.t.} \quad \mathbf{c}(\mathbf{x}) \leq \mathbf{0} \end{aligned}$

KKT conditions:

$\begin{align*} \nabla f\left(\mathbf{x}\right) + \left(\frac{\partial \mathbf{c}}{\partial \mathbf{x}}\right)^\top \lambda = \mathbf{0}& \quad (\mathrm{stationarity}) \\ \mathbf{c}\left(\mathbf{x}\right) \leq \mathbf{0}& \quad (\mathrm{primal\; feasibility}) \\ \lambda \geq \mathbf{0}& \quad (\mathrm{dual\; feasibility}) \\ \lambda\odot \mathbf{c}\left(\mathbf{x}\right) = \lambda^\top \mathbf{c}\left(\mathbf{x}\right) = \mathbf{0}& \quad (\mathrm{complementary\; slackness}) \end{align*}$

$\left(\frac{\partial \mathbf{c}}{\partial \mathbf{x}}\right)^\top \lambda$ acts like a penalty term. Make sure its sign is consistent (makes cost worse for $\mathbf{c}\left(\mathbf{x}\right)$ infeasible)
The notation $\odot$ is the Hadamard product (element-wise product)

2. Optimization algorithms

(1) Active-set methods

Guess active/inactive constraints
Solve equality constrained problem
Used when you have a good heuristic for the active set

(2) Barrier/Interior-point methods

Replace inequalities with barrier function in objective
$\begin{aligned} &\min_{\mathbf{x}} \quad f(\mathbf{x}) \\ &\text{s.t.} \quad \mathbf{c}(\mathbf{x}) \leq \mathbf{0} \end{aligned}$

$\Rightarrow \min_{\mathbf{x}} \quad f(\mathbf{x}) - \sum_{i=1}^m \frac{1}{\rho}\log(-c_i(\mathbf{x}))$

在这里插入图片描述

Gold standard for convex problems. (e.g. MPC)

Example:
$\begin{aligned} &\min_{\mathbf{x}} \quad f(\mathbf{x}) = x_1^2 + 0.3x_2^2 \\ &\text{s.t.} \quad c_1(\mathbf{x}) = x_1^2 + x_2^2 - 2 \leq 0 \\ &\quad \quad \quad c_2(\mathbf{x}) = x_1 + x_2 - 1 \leq 0 \end{aligned}$

Plot the objective and constraints on 2D plane:
在这里插入图片描述

Plot the objective and constraints in 3D:
在这里插入图片描述

(3) Penalty

Replace constraints with penalty term that penalizes violations
$\begin{aligned} &\min_{\mathbf{x}} \quad f(\mathbf{x}) \\ &\text{s.t.} \quad \mathbf{c}(\mathbf{x}) \leq \mathbf{0} \end{aligned}$

$\Rightarrow \min_{\mathbf{x}} \quad f(\mathbf{x}) + \frac{\rho}{2} \left[\max(0, c(\mathbf{x}))\right]^2$
在这里插入图片描述

easy to implement
has issues with numerical ill-conditioning
cannot achieve high accuracy
rarely used in practice

(4) Augmented Lagrangian

Add Lagrange multiplier estimate to penalty method:
$\min_{\mathbf{x}} \quad \mathcal{L}_\rho(\mathbf{x}, \tilde{\lambda}) = f(\mathbf{x}) + \tilde{\lambda}^\top \mathbf{c}(\mathbf{x}) + \frac{\rho}{2} \left[\max(0, c(\mathbf{x}))\right]^2 \\$

$\mathcal{L}_\rho(\mathbf{x}, \tilde{\lambda})$ is called the augmented Lagrangian. $\tilde{\lambda}$ is the estimate of the Lagrange multiplier.

i. Update $\tilde{\lambda}$ by offloading the penalty term into $\tilde{\lambda}$ at each iteration:
$\tilde{\lambda}\leftarrow \tilde{\lambda} + \rho \mathbf{c}(\mathbf{x})$ (for active constraints)

Insight:
$\begin{aligned} &\frac{\partial f}{\partial \mathbf{x}} + \tilde{\lambda}^\top \frac{\partial \mathbf{c}}{\partial \mathbf{x}} + + \rho \mathbf{c}^\top(\mathbf{x}) \frac{\partial \mathbf{c}}{\partial \mathbf{x}} \\ = &\frac{\partial f}{\partial \mathbf{x}} + \left[\tilde{\lambda} + \rho \mathbf{c}(\mathbf{x})\right]^\top \frac{\partial \mathbf{c}}{\partial \mathbf{x}} \\ = & \mathbf{0} \end{aligned}$ The term $\tilde{\lambda} + \rho c(\mathbf{x})$ looks like a Lagrange multiplier. It pushes at the same direction as the Lagrange multiplier, and pushes up against the constraint.

ii. Repeat until convergence:
(1) $\min_{\mathbf{x}} \quad \mathcal{L}_\rho(\mathbf{x}, \tilde{\lambda})$
(2) $\tilde{\lambda}\leftarrow \max(0, \tilde{\lambda} + \rho \mathbf{c}(\mathbf{x}))$ (clamping to guarantee $\tilde{\lambda} \geq 0$ )
(3) $\rho \leftarrow \alpha \rho$ ( $\alpha$ typically 10)

Fixes ill-conditioning of penalty method
Converges with finite $\rho$
Works well on non-convex problems

Example: Quadratic program

$\begin{aligned} &\min_{\mathbf{x}} \quad \frac{1}{2} \mathbf{x}^\top \mathbf{Q} \mathbf{x} + \mathbf{q}^\top \mathbf{x} \quad (\mathbf{Q} \succ 0) \\ &\text{s.t.} \quad \mathbf{A}\mathbf{x} \leq \mathbf{b}, \mathbf{C}\mathbf{x} = \mathbf{d} \end{aligned}$

Super useful in control
Can be solved very fast (~kHz)

Parameters: $\mathbf{Q} = \begin{bmatrix} 0.5 & 0 \\ 0 & 1 \end{bmatrix}$ , $\mathbf{q} = \mathbf{0}$ , $\mathbf{A} = \begin{bmatrix} 1 & 1 \end{bmatrix}$ , $\mathbf{b} = -1$ , $\mathbf{C} = \mathbf{0}$ , $\mathbf{d} = \mathbf{0}$ .

Plot the objective and constraints on 2D plane:
在这里插入图片描述

After 3 iterations:
在这里插入图片描述

Without $\rho$ update (slow convergence):
在这里插入图片描述

Example:

Try with penalty method, full Augmented Lagrangian, and just $\lambda$ update

3. Regularization & Duality

(1) Equality constraints

Given
$\begin{aligned} &\min_{\mathbf{x}} \quad f(\mathbf{x}) \\ &\text{s.t.} \quad \mathbf{c}(\mathbf{x}) = \mathbf{0} \end{aligned}$

We might like to turn this into:
$\min_{\mathbf{x}} \quad f(\mathbf{x}) + p_\infty(\mathbf{c}(\mathbf{x})),\quad p_\infty(\mathbf{x}) = \left\{ \begin{array}{ll} 0 & \mathbf{x} = \mathbf{0} \\ +\infty & \text{otherwise} \end{array} \right.$

Practically it’s terrible, but we can get the same effect solving:
$\min_{\mathbf{x}} \max_{\lambda} \quad f(\mathbf{x}) + \lambda^\top \mathbf{c}(\mathbf{x})$

If the constraints cannot be satisfied ( $\mathbf{c}(\mathbf{x}) \neq \mathbf{0}$ ), then the cost is infinite.
The $\max$ acts like an infinite penalty.

(2) Inequality constraints

The above trick is also similar for inequality constraints:
$\begin{aligned} &\min_{\mathbf{x}} \quad f(\mathbf{x}) \\ &\text{s.t.} \quad \mathbf{c}(\mathbf{x}) \leq \mathbf{0} \end{aligned}$

$\Rightarrow \min_{\mathbf{x}} \quad f(\mathbf{x}) + p_\infty(\mathbf{c}(\mathbf{x})),\quad p_\infty(\mathbf{x}) = \left\{ \begin{array}{ll} 0 & \mathbf{x} \leq \mathbf{0} \\ +\infty & \text{otherwise} \end{array} \right.$

Apply the “min-max” trick:
$\Rightarrow \min_{\mathbf{x}} \max_{\lambda \geq \mathbf{0}} \quad f(\mathbf{x}) + \lambda^\top \mathbf{c}(\mathbf{x})$

We denote
$\mathcal{L}(\mathbf{x}, \lambda) = f(\mathbf{x}) + \lambda^\top \mathbf{c}(\mathbf{x})$

The solution is a saddle point in $\left(\mathbf{x}, \lambda\right)$ space.

Consider the eigenvalues of the KKT matrix: The eigenvalues related to $\mathbf{x}$ are all positive, and the eigenvalues related to $\lambda$ should be all negative.

(3) Interpretation

KKT conditions define a saddle point in $\left(\mathbf{x}, \lambda\right)$ space.
KKT system should have $\dim(\mathbf{x})$ positive eigenvalues and $\dim(\lambda)$ negative eigenvalues at an optimum. It’s called a quasi-definite system.
Regularize the KKT matrix as follows:

$\begin{bmatrix} H+\beta I & c^\top \\ c & -\beta I \end{bmatrix} \begin{bmatrix} \Delta x \\ \Delta \lambda \end{bmatrix} = \begin{bmatrix} -\nabla_\mathbf{x} \mathcal{L} \\ -\mathbf{c}\left(\mathbf{x}\right) \end{bmatrix}, \quad \beta > 0$

(4) Duality

Question: Can we swap the $\min$ and $\max$ ?
Answer: If the problem is convex, yes! (Strong duality)

For convex problems, the optimal value of the primal problem is equal to the optimal value of the dual problem.
For non-convex problems, the optimal value of the primal problem is less than or equal to the optimal value of the dual problem. The difference is called the duality gap.

(5) Example

Without regularization, the solution will not converge to the optimal solution:
在这里插入图片描述

Check the eigenvalues of the KKT matrix:

H = ∇2f(xguess[:,end]) + ForwardDiff.jacobian(xn -> ∂c(xn)'*λguess[end], xguess[:,end])
C = ∂c(xguess[:,end])
K = [H C'; C 0]
eigvals(K)

We get:

3-element Vector{Float64}:
 -2.7504046059655027
 -0.5914642540616377
  1.6230587398907093

where two eigenvalues are negative and one is positive.

To fix this, we add regularization to the KKT matrix:

function regularized_newton_step(x,λ)
    β = 1.0
    H = ∇2f(x) + ForwardDiff.jacobian(xn -> ∂c(xn)'*λ, x)
    C = ∂c(x)
    K = [H C'; C 0]
    e = eigvals(K)
    while !(sum(e .> 0) == length(x) && sum(e .< 0) == length(λ)) # check if the KKT matrix is quasi-definite
        K = K + Diagonal([β*ones(length(x)); -β*ones(length(λ))]) # add regularization
        e = eigvals(K)
    end
    Δz = K$$-∇f(x)-C'*λ; -c(x)]
    Δx = Δz[1:2]
    Δλ = Δz[3]
    return Δx, Δλ
end

The solution converges to the optimal solution in 6 iterations:
在这里插入图片描述

still overshoot (the 4th iteration) $\Rightarrow$ use line search

4. Merit function

(1) Question: How do we do a line search on a root-finding problem?

$\Rightarrow$ We find $\mathbf{x}^*$ such that $\mathbf{c}(\mathbf{x}^*) = \mathbf{0}$ .

We define a scalar merit function $p\left(\mathbf{x}\right)$ that measures the distance to solution.

Here are some standard choices:

$L_2$ norm: $p\left(\mathbf{x}\right) = \frac{1}{2}\mathbf{c}^\top(\mathbf{x})\mathbf{c}(\mathbf{x}) = \frac{1}{2}\|\mathbf{c}(\mathbf{x})\|_2^2$
$L_1$ norm: $p\left(\mathbf{x}\right) = \|\mathbf{c}(\mathbf{x})\|_1$ (any norm works)

Now just do Armijo line search on $p\left(\mathbf{x}\right)$ :
$\alpha = 1$ (step length)
while $p\left(\mathbf{x} + \alpha \Delta \mathbf{x}\right) > p\left(\mathbf{x}\right) + b \alpha \nabla p\left(\mathbf{x}\right)^\top \Delta \mathbf{x}$
$\quad$ $\alpha \leftarrow \theta \alpha$ ( $\theta \in (0,1)$ )
end
$\mathbf{x} \leftarrow \mathbf{x} + \alpha \Delta \mathbf{x}$

(2) Question: How about constrained minimization?

$\begin{aligned} &\min_{\mathbf{x}} \quad f(\mathbf{x}) \\ &\text{s.t.} \quad \mathbf{c}(\mathbf{x}) \leq \mathbf{0}, \mathbf{d}(\mathbf{x}) = \mathbf{0} \end{aligned}$

$\Rightarrow \min_{\mathbf{x}} \quad \mathcal{L}(\mathbf{x}, \lambda, \mu) = f(\mathbf{x}) + \lambda^\top \mathbf{c}(\mathbf{x}) + \mu^\top \mathbf{d}(\mathbf{x})$

Pick a merit function:
$p(\mathbf{x}, \lambda, \mu) = \frac{1}{2}\|\mathbf{r}(\mathbf{x}, \lambda, \mu)\|_2^2, \quad \mathbf{r}(\mathbf{x}, \lambda, \mu) = \begin{bmatrix} \nabla_{\mathbf{x}} \mathcal{L}(\mathbf{x}, \lambda, \mu) \\ \min(0, \mathbf{c}(\mathbf{x})) \\ \mathbf{d}(\mathbf{x}) \end{bmatrix}$ ( $\mathbf{r}(\mathbf{x}, \lambda, \mu)$ is called the KKT residual)

or
$p(\mathbf{x}, \lambda, \mu) = f(\mathbf{x}) + \rho \left|\left| \begin{bmatrix} \min(0, \mathbf{c}(\mathbf{x})) \\ \mathbf{d}(\mathbf{x}) \end{bmatrix} \right|\right|_1$