【Optimal Control (CMU 16-745)】Lecture 5 Optimization Part 3

Review:

  • Line search
  • Equality and inequality constraints
  • KKT conditions

Lecture 5 Optimization Pt.3

Overview

  • Algorithms for constrained minimization
  • Augumented lagrangian method
  • Quadratic programming
  • More on regularization and line search

1. Inequality-constrained minimization

min ⁡ x f ( x ) s.t. c ( x ) ≤ 0 \begin{aligned} &\min_{\mathbf{x}} \quad f(\mathbf{x}) \\ &\text{s.t.} \quad \mathbf{c}(\mathbf{x}) \leq \mathbf{0} \end{aligned} xminf(x)s.t.c(x)0

KKT conditions:

∇ f ( x ) + ( ∂ c ∂ x ) ⊤ λ = 0 ( s t a t i o n a r i t y ) c ( x ) ≤ 0 ( p r i m a l    f e a s i b i l i t y ) λ ≥ 0 ( d u a l    f e a s i b i l i t y ) λ ⊙ c ( x ) = λ ⊤ c ( x ) = 0 ( c o m p l e m e n t a r y    s l a c k n e s s ) \begin{align*} \nabla f\left(\mathbf{x}\right) + \left(\frac{\partial \mathbf{c}}{\partial \mathbf{x}}\right)^\top \lambda = \mathbf{0}& \quad (\mathrm{stationarity}) \\ \mathbf{c}\left(\mathbf{x}\right) \leq \mathbf{0}& \quad (\mathrm{primal\; feasibility}) \\ \lambda \geq \mathbf{0}& \quad (\mathrm{dual\; feasibility}) \\ \lambda\odot \mathbf{c}\left(\mathbf{x}\right) = \lambda^\top \mathbf{c}\left(\mathbf{x}\right) = \mathbf{0}& \quad (\mathrm{complementary\; slackness}) \end{align*} f(x)+(xc)λ=0c(x)0λ0λc(x)=λc(x)=0(stationarity)(primalfeasibility)(dualfeasibility)(complementaryslackness)

  • ( ∂ c ∂ x ) ⊤ λ \left(\frac{\partial \mathbf{c}}{\partial \mathbf{x}}\right)^\top \lambda (xc)λ acts like a penalty term. Make sure its sign is consistent (makes cost worse for c ( x ) \mathbf{c}\left(\mathbf{x}\right) c(x) infeasible)
  • The notation ⊙ \odot is the Hadamard product (element-wise product)

2. Optimization algorithms

(1) Active-set methods
  • Guess active/inactive constraints
  • Solve equality constrained problem
  • Used when you have a good heuristic for the active set
(2) Barrier/Interior-point methods

Replace inequalities with barrier function in objective
min ⁡ x f ( x ) s.t. c ( x ) ≤ 0 \begin{aligned} &\min_{\mathbf{x}} \quad f(\mathbf{x}) \\ &\text{s.t.} \quad \mathbf{c}(\mathbf{x}) \leq \mathbf{0} \end{aligned} xminf(x)s.t.c(x)0

⇒ min ⁡ x f ( x ) − ∑ i = 1 m 1 ρ log ⁡ ( − c i ( x ) ) \Rightarrow \min_{\mathbf{x}} \quad f(\mathbf{x}) - \sum_{i=1}^m \frac{1}{\rho}\log(-c_i(\mathbf{x})) xminf(x)i=1mρ1log(ci(x))

在这里插入图片描述

  • Gold standard for convex problems. (e.g. MPC)

Example:
min ⁡ x f ( x ) = x 1 2 + 0.3 x 2 2 s.t. c 1 ( x ) = x 1 2 + x 2 2 − 2 ≤ 0 c 2 ( x ) = x 1 + x 2 − 1 ≤ 0 \begin{aligned} &\min_{\mathbf{x}} \quad f(\mathbf{x}) = x_1^2 + 0.3x_2^2 \\ &\text{s.t.} \quad c_1(\mathbf{x}) = x_1^2 + x_2^2 - 2 \leq 0 \\ &\quad \quad \quad c_2(\mathbf{x}) = x_1 + x_2 - 1 \leq 0 \end{aligned} xminf(x)=x12+0.3x22s.t.c1(x)=x12+x2220c2(x)=x1+x210

Plot the objective and constraints on 2D plane:
在这里插入图片描述

Plot the objective and constraints in 3D:
在这里插入图片描述

(3) Penalty

Replace constraints with penalty term that penalizes violations
min ⁡ x f ( x ) s.t. c ( x ) ≤ 0 \begin{aligned} &\min_{\mathbf{x}} \quad f(\mathbf{x}) \\ &\text{s.t.} \quad \mathbf{c}(\mathbf{x}) \leq \mathbf{0} \end{aligned} xminf(x)s.t.c(x)0

⇒ min ⁡ x f ( x ) + ρ 2 [ max ⁡ ( 0 , c ( x ) ) ] 2 \Rightarrow \min_{\mathbf{x}} \quad f(\mathbf{x}) + \frac{\rho}{2} \left[\max(0, c(\mathbf{x}))\right]^2 xminf(x)+2ρ[max(0,c(x))]2
在这里插入图片描述

  • easy to implement
  • has issues with numerical ill-conditioning
  • cannot achieve high accuracy
  • rarely used in practice
(4) Augmented Lagrangian

Add Lagrange multiplier estimate to penalty method:
min ⁡ x L ρ ( x , λ ~ ) = f ( x ) + λ ~ ⊤ c ( x ) + ρ 2 [ max ⁡ ( 0 , c ( x ) ) ] 2 \min_{\mathbf{x}} \quad \mathcal{L}_\rho(\mathbf{x}, \tilde{\lambda}) = f(\mathbf{x}) + \tilde{\lambda}^\top \mathbf{c}(\mathbf{x}) + \frac{\rho}{2} \left[\max(0, c(\mathbf{x}))\right]^2 \\ xminLρ(x,λ~)=f(x)+λ~c(x)+2ρ[max(0,c(x))]2

L ρ ( x , λ ~ ) \mathcal{L}_\rho(\mathbf{x}, \tilde{\lambda}) Lρ(x,λ~) is called the augmented Lagrangian. λ ~ \tilde{\lambda} λ~ is the estimate of the Lagrange multiplier.

i. Update λ ~ \tilde{\lambda} λ~ by offloading the penalty term into λ ~ \tilde{\lambda} λ~ at each iteration:
λ ~ ← λ ~ + ρ c ( x ) \tilde{\lambda}\leftarrow \tilde{\lambda} + \rho \mathbf{c}(\mathbf{x}) λ~λ~+ρc(x)(for active constraints)

Insight:
∂ f ∂ x + λ ~ ⊤ ∂ c ∂ x + + ρ c ⊤ ( x ) ∂ c ∂ x = ∂ f ∂ x + [ λ ~ + ρ c ( x ) ] ⊤ ∂ c ∂ x = 0 \begin{aligned} &\frac{\partial f}{\partial \mathbf{x}} + \tilde{\lambda}^\top \frac{\partial \mathbf{c}}{\partial \mathbf{x}} + + \rho \mathbf{c}^\top(\mathbf{x}) \frac{\partial \mathbf{c}}{\partial \mathbf{x}} \\ = &\frac{\partial f}{\partial \mathbf{x}} + \left[\tilde{\lambda} + \rho \mathbf{c}(\mathbf{x})\right]^\top \frac{\partial \mathbf{c}}{\partial \mathbf{x}} \\ = & \mathbf{0} \end{aligned} ==xf+λ~xc++ρc(x)xcxf+[λ~+ρc(x)]xc0 The term λ ~ + ρ c ( x ) \tilde{\lambda} + \rho c(\mathbf{x}) λ~+ρc(x) looks like a Lagrange multiplier. It pushes at the same direction as the Lagrange multiplier, and pushes up against the constraint.

ii. Repeat until convergence:
(1) min ⁡ x L ρ ( x , λ ~ ) \min_{\mathbf{x}} \quad \mathcal{L}_\rho(\mathbf{x}, \tilde{\lambda}) minxLρ(x,λ~)
(2) λ ~ ← max ⁡ ( 0 , λ ~ + ρ c ( x ) ) \tilde{\lambda}\leftarrow \max(0, \tilde{\lambda} + \rho \mathbf{c}(\mathbf{x})) λ~max(0,λ~+ρc(x)) (clamping to guarantee λ ~ ≥ 0 \tilde{\lambda} \geq 0 λ~0)
(3) ρ ← α ρ \rho \leftarrow \alpha \rho ραρ ( α \alpha α typically 10)

  • Fixes ill-conditioning of penalty method
  • Converges with finite ρ \rho ρ
  • Works well on non-convex problems
Example: Quadratic program

min ⁡ x 1 2 x ⊤ Q x + q ⊤ x ( Q ≻ 0 ) s.t. A x ≤ b , C x = d \begin{aligned} &\min_{\mathbf{x}} \quad \frac{1}{2} \mathbf{x}^\top \mathbf{Q} \mathbf{x} + \mathbf{q}^\top \mathbf{x} \quad (\mathbf{Q} \succ 0) \\ &\text{s.t.} \quad \mathbf{A}\mathbf{x} \leq \mathbf{b}, \mathbf{C}\mathbf{x} = \mathbf{d} \end{aligned} xmin21xQx+qx(Q0)s.t.Axb,Cx=d

  • Super useful in control
  • Can be solved very fast (~kHz)

Parameters: Q = [ 0.5 0 0 1 ] \mathbf{Q} = \begin{bmatrix} 0.5 & 0 \\ 0 & 1 \end{bmatrix} Q=[0.5001], q = 0 \mathbf{q} = \mathbf{0} q=0, A = [ 1 1 ] \mathbf{A} = \begin{bmatrix} 1 & 1 \end{bmatrix} A=[11], b = − 1 \mathbf{b} = -1 b=1, C = 0 \mathbf{C} = \mathbf{0} C=0, d = 0 \mathbf{d} = \mathbf{0} d=0.

Plot the objective and constraints on 2D plane:
在这里插入图片描述

After 3 iterations:
在这里插入图片描述

Without ρ \rho ρ update (slow convergence):
在这里插入图片描述

Example:
  • Try with penalty method, full Augmented Lagrangian, and just λ \lambda λ update

3. Regularization & Duality

(1) Equality constraints

Given
min ⁡ x f ( x ) s.t. c ( x ) = 0 \begin{aligned} &\min_{\mathbf{x}} \quad f(\mathbf{x}) \\ &\text{s.t.} \quad \mathbf{c}(\mathbf{x}) = \mathbf{0} \end{aligned} xminf(x)s.t.c(x)=0

We might like to turn this into:
min ⁡ x f ( x ) + p ∞ ( c ( x ) ) , p ∞ ( x ) = { 0 x = 0 + ∞ otherwise \min_{\mathbf{x}} \quad f(\mathbf{x}) + p_\infty(\mathbf{c}(\mathbf{x})),\quad p_\infty(\mathbf{x}) = \left\{ \begin{array}{ll} 0 & \mathbf{x} = \mathbf{0} \\ +\infty & \text{otherwise} \end{array} \right. xminf(x)+p(c(x)),p(x)={0+x=0otherwise

Practically it’s terrible, but we can get the same effect solving:
min ⁡ x max ⁡ λ f ( x ) + λ ⊤ c ( x ) \min_{\mathbf{x}} \max_{\lambda} \quad f(\mathbf{x}) + \lambda^\top \mathbf{c}(\mathbf{x}) xminλmaxf(x)+λc(x)

If the constraints cannot be satisfied ( c ( x ) ≠ 0 \mathbf{c}(\mathbf{x}) \neq \mathbf{0} c(x)=0), then the cost is infinite.
The max ⁡ \max max acts like an infinite penalty.

(2) Inequality constraints

The above trick is also similar for inequality constraints:
min ⁡ x f ( x ) s.t. c ( x ) ≤ 0 \begin{aligned} &\min_{\mathbf{x}} \quad f(\mathbf{x}) \\ &\text{s.t.} \quad \mathbf{c}(\mathbf{x}) \leq \mathbf{0} \end{aligned} xminf(x)s.t.c(x)0

⇒ min ⁡ x f ( x ) + p ∞ ( c ( x ) ) , p ∞ ( x ) = { 0 x ≤ 0 + ∞ otherwise \Rightarrow \min_{\mathbf{x}} \quad f(\mathbf{x}) + p_\infty(\mathbf{c}(\mathbf{x})),\quad p_\infty(\mathbf{x}) = \left\{ \begin{array}{ll} 0 & \mathbf{x} \leq \mathbf{0} \\ +\infty & \text{otherwise} \end{array} \right. xminf(x)+p(c(x)),p(x)={0+x0otherwise

Apply the “min-max” trick:
⇒ min ⁡ x max ⁡ λ ≥ 0 f ( x ) + λ ⊤ c ( x ) \Rightarrow \min_{\mathbf{x}} \max_{\lambda \geq \mathbf{0}} \quad f(\mathbf{x}) + \lambda^\top \mathbf{c}(\mathbf{x}) xminλ0maxf(x)+λc(x)

We denote
L ( x , λ ) = f ( x ) + λ ⊤ c ( x ) \mathcal{L}(\mathbf{x}, \lambda) = f(\mathbf{x}) + \lambda^\top \mathbf{c}(\mathbf{x}) L(x,λ)=f(x)+λc(x)

The solution is a saddle point in ( x , λ ) \left(\mathbf{x}, \lambda\right) (x,λ) space.

Consider the eigenvalues of the KKT matrix: The eigenvalues related to x \mathbf{x} x are all positive, and the eigenvalues related to λ \lambda λ should be all negative.

(3) Interpretation
  • KKT conditions define a saddle point in ( x , λ ) \left(\mathbf{x}, \lambda\right) (x,λ) space.
  • KKT system should have dim ⁡ ( x ) \dim(\mathbf{x}) dim(x) positive eigenvalues and dim ⁡ ( λ ) \dim(\lambda) dim(λ) negative eigenvalues at an optimum. It’s called a quasi-definite system.
  • Regularize the KKT matrix as follows:

[ H + β I c ⊤ c − β I ] [ Δ x Δ λ ] = [ − ∇ x L − c ( x ) ] , β > 0 \begin{bmatrix} H+\beta I & c^\top \\ c & -\beta I \end{bmatrix} \begin{bmatrix} \Delta x \\ \Delta \lambda \end{bmatrix} = \begin{bmatrix} -\nabla_\mathbf{x} \mathcal{L} \\ -\mathbf{c}\left(\mathbf{x}\right) \end{bmatrix}, \quad \beta > 0 [H+βIccβI][ΔxΔλ]=[xLc(x)],β>0

(4) Duality

Question: Can we swap the min ⁡ \min min and max ⁡ \max max?
Answer: If the problem is convex, yes! (Strong duality)

  • For convex problems, the optimal value of the primal problem is equal to the optimal value of the dual problem.
  • For non-convex problems, the optimal value of the primal problem is less than or equal to the optimal value of the dual problem. The difference is called the duality gap.
(5) Example

Without regularization, the solution will not converge to the optimal solution:
在这里插入图片描述

Check the eigenvalues of the KKT matrix:

H = ∇2f(xguess[:,end]) + ForwardDiff.jacobian(xn -> ∂c(xn)'*λguess[end], xguess[:,end])
C = ∂c(xguess[:,end])
K = [H C'; C 0]
eigvals(K)

We get:

3-element Vector{Float64}:
 -2.7504046059655027
 -0.5914642540616377
  1.6230587398907093

where two eigenvalues are negative and one is positive.

To fix this, we add regularization to the KKT matrix:

function regularized_newton_step(x,λ)
    β = 1.0
    H = ∇2f(x) + ForwardDiff.jacobian(xn -> ∂c(xn)'*λ, x)
    C = ∂c(x)
    K = [H C'; C 0]
    e = eigvals(K)
    while !(sum(e .> 0) == length(x) && sum(e .< 0) == length(λ)) # check if the KKT matrix is quasi-definite
        K = K + Diagonal([β*ones(length(x)); -β*ones(length(λ))]) # add regularization
        e = eigvals(K)
    end
    Δz = K$$-∇f(x)-C'*λ; -c(x)]
    Δx = Δz[1:2]
    Δλ = Δz[3]
    return Δx, Δλ
end

The solution converges to the optimal solution in 6 iterations:
在这里插入图片描述

  • still overshoot (the 4th iteration) ⇒ \Rightarrow use line search

4. Merit function

(1) Question: How do we do a line search on a root-finding problem?

⇒ \Rightarrow We find x ∗ \mathbf{x}^* x such that c ( x ∗ ) = 0 \mathbf{c}(\mathbf{x}^*) = \mathbf{0} c(x)=0.

We define a scalar merit function p ( x ) p\left(\mathbf{x}\right) p(x) that measures the distance to solution.

Here are some standard choices:

  • L 2 L_2 L2 norm: p ( x ) = 1 2 c ⊤ ( x ) c ( x ) = 1 2 ∥ c ( x ) ∥ 2 2 p\left(\mathbf{x}\right) = \frac{1}{2}\mathbf{c}^\top(\mathbf{x})\mathbf{c}(\mathbf{x}) = \frac{1}{2}\|\mathbf{c}(\mathbf{x})\|_2^2 p(x)=21c(x)c(x)=21c(x)22
  • L 1 L_1 L1 norm: p ( x ) = ∥ c ( x ) ∥ 1 p\left(\mathbf{x}\right) = \|\mathbf{c}(\mathbf{x})\|_1 p(x)=c(x)1 (any norm works)

Now just do Armijo line search on p ( x ) p\left(\mathbf{x}\right) p(x):
α = 1 \alpha = 1 α=1 (step length)
while p ( x + α Δ x ) > p ( x ) + b α ∇ p ( x ) ⊤ Δ x p\left(\mathbf{x} + \alpha \Delta \mathbf{x}\right) > p\left(\mathbf{x}\right) + b \alpha \nabla p\left(\mathbf{x}\right)^\top \Delta \mathbf{x} p(x+αΔx)>p(x)+bαp(x)Δx
\quad α ← θ α \alpha \leftarrow \theta \alpha αθα ( θ ∈ ( 0 , 1 ) \theta \in (0,1) θ(0,1))
end
x ← x + α Δ x \mathbf{x} \leftarrow \mathbf{x} + \alpha \Delta \mathbf{x} xx+αΔx

(2) Question: How about constrained minimization?

min ⁡ x f ( x ) s.t. c ( x ) ≤ 0 , d ( x ) = 0 \begin{aligned} &\min_{\mathbf{x}} \quad f(\mathbf{x}) \\ &\text{s.t.} \quad \mathbf{c}(\mathbf{x}) \leq \mathbf{0}, \mathbf{d}(\mathbf{x}) = \mathbf{0} \end{aligned} xminf(x)s.t.c(x)0,d(x)=0

⇒ min ⁡ x L ( x , λ , μ ) = f ( x ) + λ ⊤ c ( x ) + μ ⊤ d ( x ) \Rightarrow \min_{\mathbf{x}} \quad \mathcal{L}(\mathbf{x}, \lambda, \mu) = f(\mathbf{x}) + \lambda^\top \mathbf{c}(\mathbf{x}) + \mu^\top \mathbf{d}(\mathbf{x}) xminL(x,λ,μ)=f(x)+λc(x)+μd(x)

Pick a merit function:
p ( x , λ , μ ) = 1 2 ∥ r ( x , λ , μ ) ∥ 2 2 , r ( x , λ , μ ) = [ ∇ x L ( x , λ , μ ) min ⁡ ( 0 , c ( x ) ) d ( x ) ] p(\mathbf{x}, \lambda, \mu) = \frac{1}{2}\|\mathbf{r}(\mathbf{x}, \lambda, \mu)\|_2^2, \quad \mathbf{r}(\mathbf{x}, \lambda, \mu) = \begin{bmatrix} \nabla_{\mathbf{x}} \mathcal{L}(\mathbf{x}, \lambda, \mu) \\ \min(0, \mathbf{c}(\mathbf{x})) \\ \mathbf{d}(\mathbf{x}) \end{bmatrix} p(x,λ,μ)=21r(x,λ,μ)22,r(x,λ,μ)= xL(x,λ,μ)min(0,c(x))d(x) ( r ( x , λ , μ ) \mathbf{r}(\mathbf{x}, \lambda, \mu) r(x,λ,μ) is called the KKT residual)

or
p ( x , λ , μ ) = f ( x ) + ρ ∣ ∣ [ min ⁡ ( 0 , c ( x ) ) d ( x ) ] ∣ ∣ 1 p(\mathbf{x}, \lambda, \mu) = f(\mathbf{x}) + \rho \left|\left| \begin{bmatrix} \min(0, \mathbf{c}(\mathbf{x})) \\ \mathbf{d}(\mathbf{x}) \end{bmatrix} \right|\right|_1 p(x,λ,μ)=f(x)+ρ [min(0,c(x))d(x)] 1

or the Augmented Lagrangian:
p ( x , λ , μ ) = L ρ ( x , λ , μ ) p(\mathbf{x}, \lambda, \mu) = \mathcal{L}_{\rho}(\mathbf{x}, \lambda, \mu) p(x,λ,μ)=Lρ(x,λ,μ)

(3) Example

See the jupyter notebook.

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值