【Optimal Control (CMU 16-745)】Lecture 6 Deterministic Optimal Control Introduction

最新推荐文章于 2024-09-14 17:55:40 发布

啵啵啵啵哲

最新推荐文章于 2024-09-14 17:55:40 发布

阅读量118

点赞数 1

分类专栏：最优控制文章标签：机器人学习

本文链接：https://blog.csdn.net/xuzhengzhe/article/details/132631790

版权

最优控制专栏收录该内容

10 篇文章 2 订阅

订阅专栏

Review:

Constrained optimization
Augmented Lagrangian
Merit functions/line search

Lecture 6 Deterministic Optimal Control Introduction

Overview

Control history
Deterministic optimal control
Pontryagin’s minimum principle
Linear-quadratic regulator (LQR)

1. Control History

(1) Brachistochrone Problem

The first trajectory optimization problem.
$\min_{x(t)} T = \int_{P_0}^{P_f} \frac{1}{v}\mathrm{d}s = \int_{x_0}^{x_f} \frac{\sqrt{1+(\mathrm{d}y/\mathrm{d}x)^2}}{\sqrt{2gy}}\mathrm{d}x$

(2) Calculus of Variations

$\min_{x(t)} J\left(x(t)\right) = \int_{t_0}^{t_f} L\left(t, x(t), \dot{x}(t)\right)\mathrm{d}t$

Lots of Applications:

Statics(Catenary)
Finite-Element Methods (Weak formulation)
Optics(Fermat’s Principle)
General Relativity (Einstein-Hilbert Action)
Classical Mechanics(Hamilton’s Principle)
Quantum Mechanics(Feynman Path Integral)

(3) Feedback Systems

在这里插入图片描述

“Classical Control” (1910-1960)
“Modern Control” (Post-1960)
Adaptive Control and RL (1950s-Present)
Robust Control (1980s-Present)
Model Predictive Control (1970s-Present)
Robotic Manipulators (1970s-1980s)
Legged Robots (1980s-Present)

(4) Challenges for the Future?

General theory for dealing with contact
Bridging the gap between model-based control and RL
Making RL more data efficient by incorporating prior knowledge
Safety guarantees for uncertain nonlinear systems
Dealing with other (possibly adversarial) agents

2. Deterministic Optimal Control

(1) Continuous-time formulation

$\begin{aligned} &\min_{\mathbf{x}(t), \mathbf{u}(t)} J\left(\mathbf{x}(t), \mathbf{u}(t)\right) = \int_{t_0}^{t_f} \ell \left(\mathbf{x}(t), \mathbf{u}(t)\right)\mathrm{d}t + \ell_F\left(\mathbf{x}(t_f)\right) \\ &\text{s.t.} \quad \dot{\mathbf{x}}(t) = f\left(\mathbf{x}(t), \mathbf{u}(t)\right), \text{(possibly other constraints)} \end{aligned}$

$\mathbf{x}(t)$ : state trajectory
$\mathbf{u}(t)$ : control trajectory
$J\left(\mathbf{x}(t), \mathbf{u}(t)\right)$ : cost function
$\ell\left(t, \mathbf{x}(t), \mathbf{u}(t)\right)$ : stage cost
$\ell_F\left(\mathbf{x}(t_f)\right)$ : terminal cost
$f\left(\mathbf{x}(t), \mathbf{u}(t)\right)$ : dynamics constraints

This is an infinite-dimensional optimization problem in the following sense:
在这里插入图片描述
$\mathbf{u}(t) = \lim_{N\rightarrow\infty}\mathbf{u}_{1:N}$

Solutions are open-loop trajectories.
There are a handful of problems with analytic solutions but not many.
We will focus on the discrete-time setting which leads to tractable algorithms.

(2) Discrete-time formulation

$\begin{aligned} &\min_{\mathbf{x}_{1:N}, \mathbf{u}_{1:N-1}} J\left(\mathbf{x}_{1:N}, \mathbf{u}_{1:N-1}\right) = \sum_{k=1}^{N-1} \ell \left(\mathbf{x}_k, \mathbf{u}_k\right) + \ell_F\left(\mathbf{x}_N\right) \\ &\text{s.t.} \quad \mathbf{x}_{k+1} = f\left(\mathbf{x}_k, \mathbf{u}_k\right)\\ &\quad\quad \mathbf{u}_{\min} \leq \mathbf{u}_k \leq \mathbf{u}_{\max}\\ &\quad\quad \mathbf{c}(\mathbf{x}_k) \leq 0 \end{aligned}$

$\mathbf{u}_{\min}\leq\mathbf{u}_k\leq\mathbf{u}_{\max}$ : : torque limits
$\mathbf{c}(\mathbf{x}_k)\leq 0$ : obstacle/safety constraints
This is a finite-dimensional optimization problem.
Samples $\mathbf{x}_k$ , $\mathbf{u}_k$ are called knot points.
Continuous -> discrete-time using integration (e.g. Runge-Kutta)
Discrete-time -> continuous using interpolation (e.g. cubic splines)

3. Pontryagin’s Minimum Principle

It is also called “maximum principle” if you maximize a reward.
First-order necessary conditions for deterministic optimal control problems.
In discrete-time, just special case of KKT conditions.

Given
$\begin{aligned} &\min_{\mathbf{x}_{1:N}, \mathbf{u}_{1:N-1}} \sum_{k=1}^{N-1} \ell \left(\mathbf{x}_k, \mathbf{u}_k\right) + \ell_F\left(\mathbf{x}_N\right) \\ &\text{s.t.} \quad \mathbf{x}_{k+1} = f\left(\mathbf{x}_k, \mathbf{u}_k\right)\\ \end{aligned}$

We can form the Lagrangian
$\mathcal{L} = \sum_{k=1}^{N-1} \left[ \ell \left(\mathbf{x}_k, \mathbf{u}_k\right) + \lambda^\top_{k+1}\left(f\left(\mathbf{x}_k, \mathbf{u}_k\right) - \mathbf{x}_{k+1}\right)\right] + \ell_F\left(\mathbf{x}_N\right)$

This result is usually stated in terms of the Hamiltonian:
$\mathcal{H}(\mathbf{x}, \mathbf{u}, \lambda) = \ell(\mathbf{x}, \mathbf{u}) + \lambda^\top f(\mathbf{x}, \mathbf{u})$

Plug it into $\mathcal{L}$ , we have
$\mathcal{L} = \mathcal{H}(\mathbf{x}_1, \mathbf{u}_1, \lambda_2) + \left[\sum_{k=2}^{N-1}\mathcal{H}(\mathbf{x}_k, \mathbf{u}_k, \lambda_{k+1}) -\lambda_k^\top \mathbf{x}_{k}\right] + \ell_F(\mathbf{x}_N)-\lambda_N^\top \mathbf{x}_N$

Take derivatives w.r.t. $\mathbf{x}$ and $\lambda$ , we have
$\frac{\partial \mathcal{L}}{\partial \lambda_k} = \frac{\partial \mathcal{H}}{\partial \lambda_k} - \mathbf{x}_k = f(\mathbf{x}_{k-1}, \mathbf{u}_{k-1}) - \mathbf{x}_k = 0$ (namely the dynamics constraints)

$\frac{\partial \mathcal{L}}{\partial \mathbf{x}_k} = \frac{\partial \mathcal{H}}{\partial \mathbf{x}_k} - \lambda_{k} = \frac{\partial \ell}{\partial \mathbf{x}_k} + \lambda_{k+1}^\top \frac{\partial f}{\partial \mathbf{x}_k} - \lambda_{k}^\top = 0$ (for $k=1,\cdots,N-1$ )

For the $N$ -th state, we have
$\frac{\partial \mathcal{L}}{\partial \mathbf{x}_N} = \frac{\partial \ell_F}{\partial \mathbf{x}_N} - \lambda_{N}^\top = 0$

For $\mathbf{u}$ , we write the min explicitly to handle torque limits:
$\begin{aligned} &\mathcal{u}_k = \argmin_{\tilde{\mathbf{u}}} \mathcal{H}(\mathbf{x}_k, \tilde{\mathbf{u}}, \lambda_{k+1})\\ &\text{s.t.} \quad \tilde{\mathbf{u}} \in \mathcal{U} \quad (\text{shorthand for "in feasible set"}) \end{aligned}$

In summary:
$\begin{aligned} &\mathbf{x}_{k+1} = \nabla_\lambda \mathcal{H}(\mathbf{x}_k, \mathbf{u}_k, \lambda_{k+1})= f(\mathbf{x}_{k}, \mathbf{u}_{k})\\ &\lambda_k = \nabla_{\mathbf{x}} \mathcal{H}(\mathbf{x}_k, \mathbf{u}_k, \lambda_{k+1})= \nabla_{\mathbf{x}} \ell(\mathbf{x}_k, \mathbf{u}_k) + \left(\frac{\partial f}{\partial \mathbf{x}_k}\right)^\top \lambda_{k+1}\\ &\mathbf{u}_k = \argmin_{\tilde{\mathbf{u}}} \mathcal{H}(\mathbf{x}_k, \tilde{\mathbf{u}}, \lambda_{k+1})\\ &\text{s.t.} \quad \tilde{\mathbf{u}} \in \mathcal{U}\\ &\lambda_N = \frac{\partial \ell_F}{\partial \mathbf{x}_N} \end{aligned}$

This is where the shooting method comes from.
The dynamics are integrated forward in time using the control $\mathbf{u}_k$ and the adjoint variables $\lambda_k$ are integrated backward.

Continuous-time version:
$\begin{aligned} &\dot{\mathbf{x}}(t) = \nabla_\lambda \mathcal{H}(\mathbf{x}(t), \mathbf{u}(t), \lambda(t))= f(\mathbf{x}(t), \mathbf{u}(t))\\ &-\dot{\lambda}(t) = \nabla_{\mathbf{x}} \mathcal{H}(\mathbf{x}(t), \mathbf{u}(t), \lambda(t))= \nabla_{\mathbf{x}} \ell(\mathbf{x}(t), \mathbf{u}(t)) + \left(\frac{\partial f}{\partial \mathbf{x}}\right)^\top \lambda(t)\\ &\mathbf{u}(t) = \argmin_{\tilde{\mathbf{u}}} \mathcal{H}(\mathbf{x}(t), \tilde{\mathbf{u}}, \lambda(t))\\ &\text{s.t.} \quad \tilde{\mathbf{u}} \in \mathcal{U}\\ &\lambda(t_f) = \frac{\partial \ell_F}{\partial \mathbf{x}(t_f)} \end{aligned}$

Some Notes:

Historically many algorithms were based on integrating the continuous ODEs, forward/backward to do gradient descent on $\mathbf{u}(t)$ .
These methods are called indirect methods or shooting methods.
In continous-time $\lambda(t)$ is called the co-state trajectory.
These methods have largely fallen out of favor as computers have improved.