Review:
- Constrained optimization
- Augmented Lagrangian
- Merit functions/line search
Lecture 6 Deterministic Optimal Control Introduction
Overview
- Control history
- Deterministic optimal control
- Pontryagin’s minimum principle
- Linear-quadratic regulator (LQR)
1. Control History
(1) Brachistochrone Problem
The first trajectory optimization problem.
min
x
(
t
)
T
=
∫
P
0
P
f
1
v
d
s
=
∫
x
0
x
f
1
+
(
d
y
/
d
x
)
2
2
g
y
d
x
\min_{x(t)} T = \int_{P_0}^{P_f} \frac{1}{v}\mathrm{d}s = \int_{x_0}^{x_f} \frac{\sqrt{1+(\mathrm{d}y/\mathrm{d}x)^2}}{\sqrt{2gy}}\mathrm{d}x
x(t)minT=∫P0Pfv1ds=∫x0xf2gy1+(dy/dx)2dx
(2) Calculus of Variations
min x ( t ) J ( x ( t ) ) = ∫ t 0 t f L ( t , x ( t ) , x ˙ ( t ) ) d t \min_{x(t)} J\left(x(t)\right) = \int_{t_0}^{t_f} L\left(t, x(t), \dot{x}(t)\right)\mathrm{d}t x(t)minJ(x(t))=∫t0tfL(t,x(t),x˙(t))dt
Lots of Applications:
- Statics(Catenary)
- Finite-Element Methods (Weak formulation)
- Optics(Fermat’s Principle)
- General Relativity (Einstein-Hilbert Action)
- Classical Mechanics(Hamilton’s Principle)
- Quantum Mechanics(Feynman Path Integral)
(3) Feedback Systems
- “Classical Control” (1910-1960)
- “Modern Control” (Post-1960)
- Adaptive Control and RL (1950s-Present)
- Robust Control (1980s-Present)
- Model Predictive Control (1970s-Present)
- Robotic Manipulators (1970s-1980s)
- Legged Robots (1980s-Present)
(4) Challenges for the Future?
- General theory for dealing with contact
- Bridging the gap between model-based control and RL
- Making RL more data efficient by incorporating prior knowledge
- Safety guarantees for uncertain nonlinear systems
- Dealing with other (possibly adversarial) agents
2. Deterministic Optimal Control
(1) Continuous-time formulation
min x ( t ) , u ( t ) J ( x ( t ) , u ( t ) ) = ∫ t 0 t f ℓ ( x ( t ) , u ( t ) ) d t + ℓ F ( x ( t f ) ) s.t. x ˙ ( t ) = f ( x ( t ) , u ( t ) ) , (possibly other constraints) \begin{aligned} &\min_{\mathbf{x}(t), \mathbf{u}(t)} J\left(\mathbf{x}(t), \mathbf{u}(t)\right) = \int_{t_0}^{t_f} \ell \left(\mathbf{x}(t), \mathbf{u}(t)\right)\mathrm{d}t + \ell_F\left(\mathbf{x}(t_f)\right) \\ &\text{s.t.} \quad \dot{\mathbf{x}}(t) = f\left(\mathbf{x}(t), \mathbf{u}(t)\right), \text{(possibly other constraints)} \end{aligned} x(t),u(t)minJ(x(t),u(t))=∫t0tfℓ(x(t),u(t))dt+ℓF(x(tf))s.t.x˙(t)=f(x(t),u(t)),(possibly other constraints)
- x ( t ) \mathbf{x}(t) x(t): state trajectory
- u ( t ) \mathbf{u}(t) u(t): control trajectory
- J ( x ( t ) , u ( t ) ) J\left(\mathbf{x}(t), \mathbf{u}(t)\right) J(x(t),u(t)): cost function
- ℓ ( t , x ( t ) , u ( t ) ) \ell\left(t, \mathbf{x}(t), \mathbf{u}(t)\right) ℓ(t,x(t),u(t)): stage cost
- ℓ F ( x ( t f ) ) \ell_F\left(\mathbf{x}(t_f)\right) ℓF(x(tf)): terminal cost
- f ( x ( t ) , u ( t ) ) f\left(\mathbf{x}(t), \mathbf{u}(t)\right) f(x(t),u(t)): dynamics constraints
This is an infinite-dimensional optimization problem in the following sense:
u
(
t
)
=
lim
N
→
∞
u
1
:
N
\mathbf{u}(t) = \lim_{N\rightarrow\infty}\mathbf{u}_{1:N}
u(t)=limN→∞u1:N
- Solutions are open-loop trajectories.
- There are a handful of problems with analytic solutions but not many.
- We will focus on the discrete-time setting which leads to tractable algorithms.
(2) Discrete-time formulation
min x 1 : N , u 1 : N − 1 J ( x 1 : N , u 1 : N − 1 ) = ∑ k = 1 N − 1 ℓ ( x k , u k ) + ℓ F ( x N ) s.t. x k + 1 = f ( x k , u k ) u min ≤ u k ≤ u max c ( x k ) ≤ 0 \begin{aligned} &\min_{\mathbf{x}_{1:N}, \mathbf{u}_{1:N-1}} J\left(\mathbf{x}_{1:N}, \mathbf{u}_{1:N-1}\right) = \sum_{k=1}^{N-1} \ell \left(\mathbf{x}_k, \mathbf{u}_k\right) + \ell_F\left(\mathbf{x}_N\right) \\ &\text{s.t.} \quad \mathbf{x}_{k+1} = f\left(\mathbf{x}_k, \mathbf{u}_k\right)\\ &\quad\quad \mathbf{u}_{\min} \leq \mathbf{u}_k \leq \mathbf{u}_{\max}\\ &\quad\quad \mathbf{c}(\mathbf{x}_k) \leq 0 \end{aligned} x1:N,u1:N−1minJ(x1:N,u1:N−1)=k=1∑N−1ℓ(xk,uk)+ℓF(xN)s.t.xk+1=f(xk,uk)umin≤uk≤umaxc(xk)≤0
-
u min ≤ u k ≤ u max \mathbf{u}_{\min}\leq\mathbf{u}_k\leq\mathbf{u}_{\max} umin≤uk≤umax: : torque limits
-
c ( x k ) ≤ 0 \mathbf{c}(\mathbf{x}_k)\leq 0 c(xk)≤0: obstacle/safety constraints
-
This is a finite-dimensional optimization problem.
-
Samples x k \mathbf{x}_k xk, u k \mathbf{u}_k uk are called knot points.
-
Continuous -> discrete-time using integration (e.g. Runge-Kutta)
-
Discrete-time -> continuous using interpolation (e.g. cubic splines)
3. Pontryagin’s Minimum Principle
- It is also called “maximum principle” if you maximize a reward.
- First-order necessary conditions for deterministic optimal control problems.
- In discrete-time, just special case of KKT conditions.
Given
min
x
1
:
N
,
u
1
:
N
−
1
∑
k
=
1
N
−
1
ℓ
(
x
k
,
u
k
)
+
ℓ
F
(
x
N
)
s.t.
x
k
+
1
=
f
(
x
k
,
u
k
)
\begin{aligned} &\min_{\mathbf{x}_{1:N}, \mathbf{u}_{1:N-1}} \sum_{k=1}^{N-1} \ell \left(\mathbf{x}_k, \mathbf{u}_k\right) + \ell_F\left(\mathbf{x}_N\right) \\ &\text{s.t.} \quad \mathbf{x}_{k+1} = f\left(\mathbf{x}_k, \mathbf{u}_k\right)\\ \end{aligned}
x1:N,u1:N−1mink=1∑N−1ℓ(xk,uk)+ℓF(xN)s.t.xk+1=f(xk,uk)
We can form the Lagrangian
L
=
∑
k
=
1
N
−
1
[
ℓ
(
x
k
,
u
k
)
+
λ
k
+
1
⊤
(
f
(
x
k
,
u
k
)
−
x
k
+
1
)
]
+
ℓ
F
(
x
N
)
\mathcal{L} = \sum_{k=1}^{N-1} \left[ \ell \left(\mathbf{x}_k, \mathbf{u}_k\right) + \lambda^\top_{k+1}\left(f\left(\mathbf{x}_k, \mathbf{u}_k\right) - \mathbf{x}_{k+1}\right)\right] + \ell_F\left(\mathbf{x}_N\right)
L=k=1∑N−1[ℓ(xk,uk)+λk+1⊤(f(xk,uk)−xk+1)]+ℓF(xN)
This result is usually stated in terms of the Hamiltonian:
H
(
x
,
u
,
λ
)
=
ℓ
(
x
,
u
)
+
λ
⊤
f
(
x
,
u
)
\mathcal{H}(\mathbf{x}, \mathbf{u}, \lambda) = \ell(\mathbf{x}, \mathbf{u}) + \lambda^\top f(\mathbf{x}, \mathbf{u})
H(x,u,λ)=ℓ(x,u)+λ⊤f(x,u)
Plug it into
L
\mathcal{L}
L, we have
L
=
H
(
x
1
,
u
1
,
λ
2
)
+
[
∑
k
=
2
N
−
1
H
(
x
k
,
u
k
,
λ
k
+
1
)
−
λ
k
⊤
x
k
]
+
ℓ
F
(
x
N
)
−
λ
N
⊤
x
N
\mathcal{L} = \mathcal{H}(\mathbf{x}_1, \mathbf{u}_1, \lambda_2) + \left[\sum_{k=2}^{N-1}\mathcal{H}(\mathbf{x}_k, \mathbf{u}_k, \lambda_{k+1}) -\lambda_k^\top \mathbf{x}_{k}\right] + \ell_F(\mathbf{x}_N)-\lambda_N^\top \mathbf{x}_N
L=H(x1,u1,λ2)+[k=2∑N−1H(xk,uk,λk+1)−λk⊤xk]+ℓF(xN)−λN⊤xN
Take derivatives w.r.t.
x
\mathbf{x}
x and
λ
\lambda
λ, we have
∂
L
∂
λ
k
=
∂
H
∂
λ
k
−
x
k
=
f
(
x
k
−
1
,
u
k
−
1
)
−
x
k
=
0
\frac{\partial \mathcal{L}}{\partial \lambda_k} = \frac{\partial \mathcal{H}}{\partial \lambda_k} - \mathbf{x}_k = f(\mathbf{x}_{k-1}, \mathbf{u}_{k-1}) - \mathbf{x}_k = 0
∂λk∂L=∂λk∂H−xk=f(xk−1,uk−1)−xk=0 (namely the dynamics constraints)
∂ L ∂ x k = ∂ H ∂ x k − λ k = ∂ ℓ ∂ x k + λ k + 1 ⊤ ∂ f ∂ x k − λ k ⊤ = 0 \frac{\partial \mathcal{L}}{\partial \mathbf{x}_k} = \frac{\partial \mathcal{H}}{\partial \mathbf{x}_k} - \lambda_{k} = \frac{\partial \ell}{\partial \mathbf{x}_k} + \lambda_{k+1}^\top \frac{\partial f}{\partial \mathbf{x}_k} - \lambda_{k}^\top = 0 ∂xk∂L=∂xk∂H−λk=∂xk∂ℓ+λk+1⊤∂xk∂f−λk⊤=0 (for k = 1 , ⋯ , N − 1 k=1,\cdots,N-1 k=1,⋯,N−1)
For the
N
N
N-th state, we have
∂
L
∂
x
N
=
∂
ℓ
F
∂
x
N
−
λ
N
⊤
=
0
\frac{\partial \mathcal{L}}{\partial \mathbf{x}_N} = \frac{\partial \ell_F}{\partial \mathbf{x}_N} - \lambda_{N}^\top = 0
∂xN∂L=∂xN∂ℓF−λN⊤=0
For
u
\mathbf{u}
u, we write the min explicitly to handle torque limits:
u
k
=
arg min
u
~
H
(
x
k
,
u
~
,
λ
k
+
1
)
s.t.
u
~
∈
U
(
shorthand for "in feasible set"
)
\begin{aligned} &\mathcal{u}_k = \argmin_{\tilde{\mathbf{u}}} \mathcal{H}(\mathbf{x}_k, \tilde{\mathbf{u}}, \lambda_{k+1})\\ &\text{s.t.} \quad \tilde{\mathbf{u}} \in \mathcal{U} \quad (\text{shorthand for "in feasible set"}) \end{aligned}
uk=u~argminH(xk,u~,λk+1)s.t.u~∈U(shorthand for "in feasible set")
In summary:
x
k
+
1
=
∇
λ
H
(
x
k
,
u
k
,
λ
k
+
1
)
=
f
(
x
k
,
u
k
)
λ
k
=
∇
x
H
(
x
k
,
u
k
,
λ
k
+
1
)
=
∇
x
ℓ
(
x
k
,
u
k
)
+
(
∂
f
∂
x
k
)
⊤
λ
k
+
1
u
k
=
arg min
u
~
H
(
x
k
,
u
~
,
λ
k
+
1
)
s.t.
u
~
∈
U
λ
N
=
∂
ℓ
F
∂
x
N
\begin{aligned} &\mathbf{x}_{k+1} = \nabla_\lambda \mathcal{H}(\mathbf{x}_k, \mathbf{u}_k, \lambda_{k+1})= f(\mathbf{x}_{k}, \mathbf{u}_{k})\\ &\lambda_k = \nabla_{\mathbf{x}} \mathcal{H}(\mathbf{x}_k, \mathbf{u}_k, \lambda_{k+1})= \nabla_{\mathbf{x}} \ell(\mathbf{x}_k, \mathbf{u}_k) + \left(\frac{\partial f}{\partial \mathbf{x}_k}\right)^\top \lambda_{k+1}\\ &\mathbf{u}_k = \argmin_{\tilde{\mathbf{u}}} \mathcal{H}(\mathbf{x}_k, \tilde{\mathbf{u}}, \lambda_{k+1})\\ &\text{s.t.} \quad \tilde{\mathbf{u}} \in \mathcal{U}\\ &\lambda_N = \frac{\partial \ell_F}{\partial \mathbf{x}_N} \end{aligned}
xk+1=∇λH(xk,uk,λk+1)=f(xk,uk)λk=∇xH(xk,uk,λk+1)=∇xℓ(xk,uk)+(∂xk∂f)⊤λk+1uk=u~argminH(xk,u~,λk+1)s.t.u~∈UλN=∂xN∂ℓF
This is where the shooting method comes from.
The dynamics are integrated forward in time using the control u k \mathbf{u}_k uk and the adjoint variables λ k \lambda_k λk are integrated backward.
Continuous-time version:
x
˙
(
t
)
=
∇
λ
H
(
x
(
t
)
,
u
(
t
)
,
λ
(
t
)
)
=
f
(
x
(
t
)
,
u
(
t
)
)
−
λ
˙
(
t
)
=
∇
x
H
(
x
(
t
)
,
u
(
t
)
,
λ
(
t
)
)
=
∇
x
ℓ
(
x
(
t
)
,
u
(
t
)
)
+
(
∂
f
∂
x
)
⊤
λ
(
t
)
u
(
t
)
=
arg min
u
~
H
(
x
(
t
)
,
u
~
,
λ
(
t
)
)
s.t.
u
~
∈
U
λ
(
t
f
)
=
∂
ℓ
F
∂
x
(
t
f
)
\begin{aligned} &\dot{\mathbf{x}}(t) = \nabla_\lambda \mathcal{H}(\mathbf{x}(t), \mathbf{u}(t), \lambda(t))= f(\mathbf{x}(t), \mathbf{u}(t))\\ &-\dot{\lambda}(t) = \nabla_{\mathbf{x}} \mathcal{H}(\mathbf{x}(t), \mathbf{u}(t), \lambda(t))= \nabla_{\mathbf{x}} \ell(\mathbf{x}(t), \mathbf{u}(t)) + \left(\frac{\partial f}{\partial \mathbf{x}}\right)^\top \lambda(t)\\ &\mathbf{u}(t) = \argmin_{\tilde{\mathbf{u}}} \mathcal{H}(\mathbf{x}(t), \tilde{\mathbf{u}}, \lambda(t))\\ &\text{s.t.} \quad \tilde{\mathbf{u}} \in \mathcal{U}\\ &\lambda(t_f) = \frac{\partial \ell_F}{\partial \mathbf{x}(t_f)} \end{aligned}
x˙(t)=∇λH(x(t),u(t),λ(t))=f(x(t),u(t))−λ˙(t)=∇xH(x(t),u(t),λ(t))=∇xℓ(x(t),u(t))+(∂x∂f)⊤λ(t)u(t)=u~argminH(x(t),u~,λ(t))s.t.u~∈Uλ(tf)=∂x(tf)∂ℓF
Some Notes:
- Historically many algorithms were based on integrating the continuous ODEs, forward/backward to do gradient descent on u ( t ) \mathbf{u}(t) u(t).
- These methods are called indirect methods or shooting methods.
- In continous-time λ ( t ) \lambda(t) λ(t) is called the co-state trajectory.
- These methods have largely fallen out of favor as computers have improved.