【Optimal Control (CMU 16-745)】Lecture 6 Deterministic Optimal Control Introduction

Review:

  • Constrained optimization
  • Augmented Lagrangian
  • Merit functions/line search

Lecture 6 Deterministic Optimal Control Introduction

Overview

  • Control history
  • Deterministic optimal control
  • Pontryagin’s minimum principle
  • Linear-quadratic regulator (LQR)

1. Control History

(1) Brachistochrone Problem

The first trajectory optimization problem.
min ⁡ x ( t ) T = ∫ P 0 P f 1 v d s = ∫ x 0 x f 1 + ( d y / d x ) 2 2 g y d x \min_{x(t)} T = \int_{P_0}^{P_f} \frac{1}{v}\mathrm{d}s = \int_{x_0}^{x_f} \frac{\sqrt{1+(\mathrm{d}y/\mathrm{d}x)^2}}{\sqrt{2gy}}\mathrm{d}x x(t)minT=P0Pfv1ds=x0xf2gy 1+(dy/dx)2 dx

(2) Calculus of Variations

min ⁡ x ( t ) J ( x ( t ) ) = ∫ t 0 t f L ( t , x ( t ) , x ˙ ( t ) ) d t \min_{x(t)} J\left(x(t)\right) = \int_{t_0}^{t_f} L\left(t, x(t), \dot{x}(t)\right)\mathrm{d}t x(t)minJ(x(t))=t0tfL(t,x(t),x˙(t))dt

Lots of Applications:

  • Statics(Catenary)
  • Finite-Element Methods (Weak formulation)
  • Optics(Fermat’s Principle)
  • General Relativity (Einstein-Hilbert Action)
  • Classical Mechanics(Hamilton’s Principle)
  • Quantum Mechanics(Feynman Path Integral)
(3) Feedback Systems

在这里插入图片描述

  • “Classical Control” (1910-1960)
  • “Modern Control” (Post-1960)
  • Adaptive Control and RL (1950s-Present)
  • Robust Control (1980s-Present)
  • Model Predictive Control (1970s-Present)
  • Robotic Manipulators (1970s-1980s)
  • Legged Robots (1980s-Present)
(4) Challenges for the Future?
  • General theory for dealing with contact
  • Bridging the gap between model-based control and RL
  • Making RL more data efficient by incorporating prior knowledge
  • Safety guarantees for uncertain nonlinear systems
  • Dealing with other (possibly adversarial) agents

2. Deterministic Optimal Control

(1) Continuous-time formulation

min ⁡ x ( t ) , u ( t ) J ( x ( t ) , u ( t ) ) = ∫ t 0 t f ℓ ( x ( t ) , u ( t ) ) d t + ℓ F ( x ( t f ) ) s.t. x ˙ ( t ) = f ( x ( t ) , u ( t ) ) , (possibly other constraints) \begin{aligned} &\min_{\mathbf{x}(t), \mathbf{u}(t)} J\left(\mathbf{x}(t), \mathbf{u}(t)\right) = \int_{t_0}^{t_f} \ell \left(\mathbf{x}(t), \mathbf{u}(t)\right)\mathrm{d}t + \ell_F\left(\mathbf{x}(t_f)\right) \\ &\text{s.t.} \quad \dot{\mathbf{x}}(t) = f\left(\mathbf{x}(t), \mathbf{u}(t)\right), \text{(possibly other constraints)} \end{aligned} x(t),u(t)minJ(x(t),u(t))=t0tf(x(t),u(t))dt+F(x(tf))s.t.x˙(t)=f(x(t),u(t)),(possibly other constraints)

  • x ( t ) \mathbf{x}(t) x(t): state trajectory
  • u ( t ) \mathbf{u}(t) u(t): control trajectory
  • J ( x ( t ) , u ( t ) ) J\left(\mathbf{x}(t), \mathbf{u}(t)\right) J(x(t),u(t)): cost function
  • ℓ ( t , x ( t ) , u ( t ) ) \ell\left(t, \mathbf{x}(t), \mathbf{u}(t)\right) (t,x(t),u(t)): stage cost
  • ℓ F ( x ( t f ) ) \ell_F\left(\mathbf{x}(t_f)\right) F(x(tf)): terminal cost
  • f ( x ( t ) , u ( t ) ) f\left(\mathbf{x}(t), \mathbf{u}(t)\right) f(x(t),u(t)): dynamics constraints

This is an infinite-dimensional optimization problem in the following sense:
在这里插入图片描述
u ( t ) = lim ⁡ N → ∞ u 1 : N \mathbf{u}(t) = \lim_{N\rightarrow\infty}\mathbf{u}_{1:N} u(t)=limNu1:N

  • Solutions are open-loop trajectories.
  • There are a handful of problems with analytic solutions but not many.
  • We will focus on the discrete-time setting which leads to tractable algorithms.
(2) Discrete-time formulation

min ⁡ x 1 : N , u 1 : N − 1 J ( x 1 : N , u 1 : N − 1 ) = ∑ k = 1 N − 1 ℓ ( x k , u k ) + ℓ F ( x N ) s.t. x k + 1 = f ( x k , u k ) u min ⁡ ≤ u k ≤ u max ⁡ c ( x k ) ≤ 0 \begin{aligned} &\min_{\mathbf{x}_{1:N}, \mathbf{u}_{1:N-1}} J\left(\mathbf{x}_{1:N}, \mathbf{u}_{1:N-1}\right) = \sum_{k=1}^{N-1} \ell \left(\mathbf{x}_k, \mathbf{u}_k\right) + \ell_F\left(\mathbf{x}_N\right) \\ &\text{s.t.} \quad \mathbf{x}_{k+1} = f\left(\mathbf{x}_k, \mathbf{u}_k\right)\\ &\quad\quad \mathbf{u}_{\min} \leq \mathbf{u}_k \leq \mathbf{u}_{\max}\\ &\quad\quad \mathbf{c}(\mathbf{x}_k) \leq 0 \end{aligned} x1:N,u1:N1minJ(x1:N,u1:N1)=k=1N1(xk,uk)+F(xN)s.t.xk+1=f(xk,uk)uminukumaxc(xk)0

  • u min ⁡ ≤ u k ≤ u max ⁡ \mathbf{u}_{\min}\leq\mathbf{u}_k\leq\mathbf{u}_{\max} uminukumax: : torque limits

  • c ( x k ) ≤ 0 \mathbf{c}(\mathbf{x}_k)\leq 0 c(xk)0: obstacle/safety constraints

  • This is a finite-dimensional optimization problem.

  • Samples x k \mathbf{x}_k xk, u k \mathbf{u}_k uk are called knot points.

  • Continuous -> discrete-time using integration (e.g. Runge-Kutta)

  • Discrete-time -> continuous using interpolation (e.g. cubic splines)

3. Pontryagin’s Minimum Principle

  • It is also called “maximum principle” if you maximize a reward.
  • First-order necessary conditions for deterministic optimal control problems.
  • In discrete-time, just special case of KKT conditions.

Given
min ⁡ x 1 : N , u 1 : N − 1 ∑ k = 1 N − 1 ℓ ( x k , u k ) + ℓ F ( x N ) s.t. x k + 1 = f ( x k , u k ) \begin{aligned} &\min_{\mathbf{x}_{1:N}, \mathbf{u}_{1:N-1}} \sum_{k=1}^{N-1} \ell \left(\mathbf{x}_k, \mathbf{u}_k\right) + \ell_F\left(\mathbf{x}_N\right) \\ &\text{s.t.} \quad \mathbf{x}_{k+1} = f\left(\mathbf{x}_k, \mathbf{u}_k\right)\\ \end{aligned} x1:N,u1:N1mink=1N1(xk,uk)+F(xN)s.t.xk+1=f(xk,uk)

We can form the Lagrangian
L = ∑ k = 1 N − 1 [ ℓ ( x k , u k ) + λ k + 1 ⊤ ( f ( x k , u k ) − x k + 1 ) ] + ℓ F ( x N ) \mathcal{L} = \sum_{k=1}^{N-1} \left[ \ell \left(\mathbf{x}_k, \mathbf{u}_k\right) + \lambda^\top_{k+1}\left(f\left(\mathbf{x}_k, \mathbf{u}_k\right) - \mathbf{x}_{k+1}\right)\right] + \ell_F\left(\mathbf{x}_N\right) L=k=1N1[(xk,uk)+λk+1(f(xk,uk)xk+1)]+F(xN)

This result is usually stated in terms of the Hamiltonian:
H ( x , u , λ ) = ℓ ( x , u ) + λ ⊤ f ( x , u ) \mathcal{H}(\mathbf{x}, \mathbf{u}, \lambda) = \ell(\mathbf{x}, \mathbf{u}) + \lambda^\top f(\mathbf{x}, \mathbf{u}) H(x,u,λ)=(x,u)+λf(x,u)

Plug it into L \mathcal{L} L, we have
L = H ( x 1 , u 1 , λ 2 ) + [ ∑ k = 2 N − 1 H ( x k , u k , λ k + 1 ) − λ k ⊤ x k ] + ℓ F ( x N ) − λ N ⊤ x N \mathcal{L} = \mathcal{H}(\mathbf{x}_1, \mathbf{u}_1, \lambda_2) + \left[\sum_{k=2}^{N-1}\mathcal{H}(\mathbf{x}_k, \mathbf{u}_k, \lambda_{k+1}) -\lambda_k^\top \mathbf{x}_{k}\right] + \ell_F(\mathbf{x}_N)-\lambda_N^\top \mathbf{x}_N L=H(x1,u1,λ2)+[k=2N1H(xk,uk,λk+1)λkxk]+F(xN)λNxN

Take derivatives w.r.t. x \mathbf{x} x and λ \lambda λ, we have
∂ L ∂ λ k = ∂ H ∂ λ k − x k = f ( x k − 1 , u k − 1 ) − x k = 0 \frac{\partial \mathcal{L}}{\partial \lambda_k} = \frac{\partial \mathcal{H}}{\partial \lambda_k} - \mathbf{x}_k = f(\mathbf{x}_{k-1}, \mathbf{u}_{k-1}) - \mathbf{x}_k = 0 λkL=λkHxk=f(xk1,uk1)xk=0 (namely the dynamics constraints)

∂ L ∂ x k = ∂ H ∂ x k − λ k = ∂ ℓ ∂ x k + λ k + 1 ⊤ ∂ f ∂ x k − λ k ⊤ = 0 \frac{\partial \mathcal{L}}{\partial \mathbf{x}_k} = \frac{\partial \mathcal{H}}{\partial \mathbf{x}_k} - \lambda_{k} = \frac{\partial \ell}{\partial \mathbf{x}_k} + \lambda_{k+1}^\top \frac{\partial f}{\partial \mathbf{x}_k} - \lambda_{k}^\top = 0 xkL=xkHλk=xk+λk+1xkfλk=0 (for k = 1 , ⋯   , N − 1 k=1,\cdots,N-1 k=1,,N1)

For the N N N-th state, we have
∂ L ∂ x N = ∂ ℓ F ∂ x N − λ N ⊤ = 0 \frac{\partial \mathcal{L}}{\partial \mathbf{x}_N} = \frac{\partial \ell_F}{\partial \mathbf{x}_N} - \lambda_{N}^\top = 0 xNL=xNFλN=0

For u \mathbf{u} u, we write the min explicitly to handle torque limits:
u k = arg min ⁡ u ~ H ( x k , u ~ , λ k + 1 ) s.t. u ~ ∈ U ( shorthand for "in feasible set" ) \begin{aligned} &\mathcal{u}_k = \argmin_{\tilde{\mathbf{u}}} \mathcal{H}(\mathbf{x}_k, \tilde{\mathbf{u}}, \lambda_{k+1})\\ &\text{s.t.} \quad \tilde{\mathbf{u}} \in \mathcal{U} \quad (\text{shorthand for "in feasible set"}) \end{aligned} uk=u~argminH(xk,u~,λk+1)s.t.u~U(shorthand for "in feasible set")

In summary:
x k + 1 = ∇ λ H ( x k , u k , λ k + 1 ) = f ( x k , u k ) λ k = ∇ x H ( x k , u k , λ k + 1 ) = ∇ x ℓ ( x k , u k ) + ( ∂ f ∂ x k ) ⊤ λ k + 1 u k = arg min ⁡ u ~ H ( x k , u ~ , λ k + 1 ) s.t. u ~ ∈ U λ N = ∂ ℓ F ∂ x N \begin{aligned} &\mathbf{x}_{k+1} = \nabla_\lambda \mathcal{H}(\mathbf{x}_k, \mathbf{u}_k, \lambda_{k+1})= f(\mathbf{x}_{k}, \mathbf{u}_{k})\\ &\lambda_k = \nabla_{\mathbf{x}} \mathcal{H}(\mathbf{x}_k, \mathbf{u}_k, \lambda_{k+1})= \nabla_{\mathbf{x}} \ell(\mathbf{x}_k, \mathbf{u}_k) + \left(\frac{\partial f}{\partial \mathbf{x}_k}\right)^\top \lambda_{k+1}\\ &\mathbf{u}_k = \argmin_{\tilde{\mathbf{u}}} \mathcal{H}(\mathbf{x}_k, \tilde{\mathbf{u}}, \lambda_{k+1})\\ &\text{s.t.} \quad \tilde{\mathbf{u}} \in \mathcal{U}\\ &\lambda_N = \frac{\partial \ell_F}{\partial \mathbf{x}_N} \end{aligned} xk+1=λH(xk,uk,λk+1)=f(xk,uk)λk=xH(xk,uk,λk+1)=x(xk,uk)+(xkf)λk+1uk=u~argminH(xk,u~,λk+1)s.t.u~UλN=xNF

This is where the shooting method comes from.
The dynamics are integrated forward in time using the control u k \mathbf{u}_k uk and the adjoint variables λ k \lambda_k λk are integrated backward.

Continuous-time version:
x ˙ ( t ) = ∇ λ H ( x ( t ) , u ( t ) , λ ( t ) ) = f ( x ( t ) , u ( t ) ) − λ ˙ ( t ) = ∇ x H ( x ( t ) , u ( t ) , λ ( t ) ) = ∇ x ℓ ( x ( t ) , u ( t ) ) + ( ∂ f ∂ x ) ⊤ λ ( t ) u ( t ) = arg min ⁡ u ~ H ( x ( t ) , u ~ , λ ( t ) ) s.t. u ~ ∈ U λ ( t f ) = ∂ ℓ F ∂ x ( t f ) \begin{aligned} &\dot{\mathbf{x}}(t) = \nabla_\lambda \mathcal{H}(\mathbf{x}(t), \mathbf{u}(t), \lambda(t))= f(\mathbf{x}(t), \mathbf{u}(t))\\ &-\dot{\lambda}(t) = \nabla_{\mathbf{x}} \mathcal{H}(\mathbf{x}(t), \mathbf{u}(t), \lambda(t))= \nabla_{\mathbf{x}} \ell(\mathbf{x}(t), \mathbf{u}(t)) + \left(\frac{\partial f}{\partial \mathbf{x}}\right)^\top \lambda(t)\\ &\mathbf{u}(t) = \argmin_{\tilde{\mathbf{u}}} \mathcal{H}(\mathbf{x}(t), \tilde{\mathbf{u}}, \lambda(t))\\ &\text{s.t.} \quad \tilde{\mathbf{u}} \in \mathcal{U}\\ &\lambda(t_f) = \frac{\partial \ell_F}{\partial \mathbf{x}(t_f)} \end{aligned} x˙(t)=λH(x(t),u(t),λ(t))=f(x(t),u(t))λ˙(t)=xH(x(t),u(t),λ(t))=x(x(t),u(t))+(xf)λ(t)u(t)=u~argminH(x(t),u~,λ(t))s.t.u~Uλ(tf)=x(tf)F

Some Notes:
  • Historically many algorithms were based on integrating the continuous ODEs, forward/backward to do gradient descent on u ( t ) \mathbf{u}(t) u(t).
  • These methods are called indirect methods or shooting methods.
  • In continous-time λ ( t ) \lambda(t) λ(t) is called the co-state trajectory.
  • These methods have largely fallen out of favor as computers have improved.
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值