最优控制 LQR 与 Differential Dynamic Programming(DDP) 详细公式推导（中）

no_xycoordinate

已于 2024-05-16 20:51:57 修改

阅读量1.2k

点赞数 48

文章标签：动态规划算法数据结构自动驾驶无人机人工智能机器学习

于 2024-05-16 01:24:59 首次发布

本文为博主原创作品，著作权归作者所有，转载时请务必在显著位置以超链接方式标明本文出处、作者信息和本声明，未经作者允许不得用于商业目的，否则将追究法律责任。

本文链接：https://blog.csdn.net/no_xycoordinate/article/details/138928980

版权

1. Introduction of Optimal Control (OC)
2. Linear Quadratic Regulator (LQR)
3. Differential Dynamic Programming (DDP)

1. Introduction of Optimal Control (OC)

Please refer to this 最优控制 LQR 与 Differential Dynamic Programming(DDP) 详细公式推导（上）.

2. Linear Quadratic Regulator (LQR)

Linear Quadratic Regulator is a type of optimal control method that finds a feedback controller for a linear system that minimizes a quadratic cost function. The cost function is often defined as a sum of the deviations of the state and the control input from their desired values, in quadratic form. The optimal controller (regulator) has the form $\bm{u}=-K\bm{x}$ , where $K$ is a matrix that depends on the system dynamics and the cost function parameters.

The history of LQR dates back to the 1950s and 1960s, when researchers such as Richard Bellman, Rudolf Kalman and John Zames developed the theory of dynamic programming, optimal control and linear systems. The LQR problem was one of the first and most fundamental problems that could be solved by these methods. The LQR algorithm was also extended to handle stochastic disturbances and imperfect measurements, resulting in the linear-quadratic-Gaussian (LQG) problem.

LQR is widely used in engineering applications because it is easy to implement (given acceptable state identification and state estimation), robust, and eﬀicient. It is also a popular benchmark for other control algorithms, such as reinforcement learning.

The practical implementation of LQR involves solving a matrix equation called the algebraic Riccati equation (ARE), which gives the optimal feedback gain matrix K. The ARE can be derived by applying the Hamilton-Jacobi-Bellman equation to the LQR problem, or by using variational methods. The ARE can be solved numerically by various algorithms, such as Schur-decomposition, Newton iteration or Riccati recursion.

Firstly, define the OCP of LQR as follows:

$\begin{aligned}\min_{\boldsymbol{u}_{1:N-1}\in\mathcal{U}}\bm{J}(\boldsymbol{x}_{1},\boldsymbol{u}_{1,...,N-1})&=\frac{1}{2}\sum_{k=1}^{N-1}\boldsymbol{x}_{k}^{T}Q_{k}\boldsymbol{x}_{k}+\boldsymbol{u}_{k}^{T}R_{k}\boldsymbol{u}_{k}+\frac{1}{2}\boldsymbol{x}_{N}^{T}Q_{N}\boldsymbol{x}_{N} \\ s. t.\quad \boldsymbol{x}_{k+ 1}&= A_k\boldsymbol{x}_k+ B_k\boldsymbol{u}_k,\quad k= 1, . . . , N- 1\end{aligned}\tag{25}$

with $Q\succeq0$ positive defined, $R\succ0$ positive semi-definite, and initial state $\bm{x}_1$ given. The cost function is quadratic in both state and control input. The state equation is linear. The control input is unconstrained. The terminal state is penalized.

We will derive LQR feedback gain with several approaches below.

2.1 LQR with indirect shooting based on PMP

Define the Hamiltonian as follows:

$\begin{aligned} H(\boldsymbol{x},\boldsymbol{u},\boldsymbol{\lambda})&=\frac{1}{2}\boldsymbol{x}^{T}Q\boldsymbol{x}+\frac{1}{2}\boldsymbol{u}^{T}R\boldsymbol{u}+\boldsymbol{\lambda}^{T}(A\boldsymbol{x}+B\boldsymbol{u})\\&=\frac{1}{2}\boldsymbol{x}^{T}Q\boldsymbol{x}+\frac{1}{2}\boldsymbol{u}^{T}R\boldsymbol{u}+\boldsymbol{\lambda}^{T}A\boldsymbol{x}+\boldsymbol{\lambda}^{T}B\boldsymbol{u} \end{aligned}\tag{26}$

with PMP, we have
$\bm{x}_{k+1}=\nabla_{\lambda}H=A\boldsymbol{x}_{k}+B\boldsymbol{u}_{k}\tag{27a}$
$\lambda_k=\nabla_{\boldsymbol{x}}H=Q\bm{x}_k+A^T\boldsymbol{\lambda}_{k+1}\tag{27b}$
with terminal condition $\bm{\lambda}_N=Q\bm{x}_N.$ (Please note that we are assuming linear time-invariant system here.)

In order to find optimal control $\bm{u}^*$ , we can set the derivative of Hamiltonian with respect to $\bm{u}$ to zero:
$\frac{\partial H}{\partial\boldsymbol{u}}=R\boldsymbol{u}_{k}+\boldsymbol{\lambda}_{k+1}^{T}B=0\tag{28}$
from which, we have:
$\bm{u}_k^*=-R^{-1}B^T\bm{\lambda}_{k+1}\tag{29}$

Then the general procedure for solving LQR problem with indirect shooting is as follows:

Start with an initial guess of $\bm{u}_{1:N-1}.$
Roll-out with the initial state $\bm{x}_1$ and the control sequence $\bm{u}_{1:N-1}$ to get $\bm{x}_{1:N}.$
Solve the terminal condition $\boldsymbol{\lambda}_N=Q\boldsymbol{x}_N$ and Eq.27 to get $\boldsymbol{\lambda}_{1:N-1}$ and get $\Delta\boldsymbol{u}$ with Eq.29.
Roll out with a line-search on $\Delta \bm{u}.$
Repeat step 3 and 4 until convergence.

2.2 LQR as Quadratic Programming

The LQR problem can also be formulated as a quadratic programming problem, which can then be solved by various algorithms, such as active-set method, interior-point method, and so on.

To start with, we can re-organize the state and control sequence stacked as follows:
$\bm{z}:=\begin{bmatrix}\bm{u}_1&\bm{x}_2&\cdots&\bm{u}_2&\bm{x}_3&\cdots&\bm{u}_{N-1}&\bm{x}_N\end{bmatrix}^T\tag{30}$

and correspondingly the cost function weighting matrix as follows:
$\bm{H}:=\begin{bmatrix}\bm{R}_1\\&\bm{Q}_2\\&&\bm{R}_2\\&&&\bm{Q}_3\\&&&&\ddots\\&&&&&\bm{Q}_N\end{bmatrix}\tag{31}$

(please be noted that we are omitting $\bm{Q}_1$ cause we assume that the initial state is fixed and no longer a decision variable.)

Then the cost function can be written as:
$J(\bm{z})=\frac{1}{2}\bm{z}^T\bm{H}\bm{z}\tag{32}$

Plug the dynamics into the stacked state and control sequence, we have:
$\underbrace{\begin{bmatrix}B_1&-I&0&\cdots\\&A&B&-I&0&\cdots\\&&A&B&-I&0&\cdots\\&&&\ddots&\ddots&\ddots\\&&&&A&B&-I\end{bmatrix}}_{\text{C matrix}}\underbrace{\begin{bmatrix}\boldsymbol{u}_1\\\boldsymbol{x}_2\\\boldsymbol{u}_2\\\vdots\\\boldsymbol{x}_N\end{bmatrix}}_{\text{z}}=\underbrace{\begin{bmatrix}-\boldsymbol{A}\boldsymbol{x}_1\\0\\0\\\vdots\\0\end{bmatrix}}_{\text{D matrix}}\tag{33}$

which can serve as the equality constraint for the quadratic programming problem.

To be more specific, the quadratic programming problem for LQR can be formulated as follows:

$\underset{\bm{z}}{\operatorname*{minimize}}\quad\frac{1}{2}\bm{z}^{T}\bm{H}\bm{z} \\ s.t.\quad \bm{C}\bm{z}=\bm{D}\tag{34}$

Then the Lagrangian of the QP will be:
$L(\bm{z},\bm{\lambda})=\frac{1}{2}\bm{z}^{T}\bm{H}\bm{z}+\bm{\lambda}^{T}(\bm{C}\bm{z}-\bm{D})\tag{35}$

whose KKT condition is:
$\begin{aligned} \frac{\partial L}{\partial \bm{z}}&=\bm{H}\bm{z}+\bm{C}^{T}\bm{\lambda}=0\\\frac{\partial L}{\partial\bm{\lambda}}&=\bm{C}\bm{z}-\bm{D}=0\tag{36} \end{aligned}$

The KKT condition Eq.36 can be organized as a linear system:
$\begin{bmatrix}H&C^T\\C&0\end{bmatrix}\begin{bmatrix}z\\\lambda\end{bmatrix}=\begin{bmatrix}0\\D\end{bmatrix}\tag{37}$
which can be solved analytically.

Beyond the analytically solution of the KKT condition, we can take a closer look at Eq.36, which is pretty sparse and has a block-tridiagonal structure. This structure can be exploited to solve the KKT condition efficiently.

To get started, we first expose the block-tridiagonal structure of the KKT condition Eq.36 as follows:
$\begin{matrix} \#1\\\#2\\\#3\\\#4\\\#5\\\#6\\\#7\\\#8\\\#9 \end{matrix} \left[\begin{array}{cccccc|ccc}\bm{R}&&&&&&\bm{B}^T&&\\&\bm{Q}&&&&&\bm{-I}&\bm{A}^T&\\&&\bm{R}&&&&&\bm{B}^T&\\&&&\bm{Q}&&&&\bm{-I}&A^T\\&&&&\bm{R}&&&&\bm{B}^T\\&&&&&\bm{Q_4}&&&\bm{-I}\\\hline \bm{B}&\bm{-I}&&&&&0&\cdots&0\\&\bm{A}&\bm{B}&\bm{-I}&&&\vdots&\ddots&\vdots\\&&&\bm{A}&\bm{B}&\bm{-I}&0&\cdots&0\end{array}\right]\begin{bmatrix}\bm{u}_1\\\bm{x}_2\\\bm{u}_2\\\bm{x}_3\\\bm{u}_3\\\bm{x}_4\\\hline\bm{\lambda}_2\\\bm{\lambda}_3\\\bm{\lambda}_4\end{bmatrix}=\begin{bmatrix}0\\0\\0\\0\\0\\0\\\hline-\bm{Ax}_1\\0\\0\end{bmatrix}\tag{38}$
(here we are just using four time steps as an example, the structure is the same for any number of time steps)

Then we can solve the KKT condition Eq.36 by forward substitution and backward substitution.

Start from line #6, we have
$\bm{Q_4}\bm{x_4}-\bm{I}\bm{\lambda}_4=0\implies\bm{\lambda_4}=\bm{Q_4}\bm{x_4}\tag{39}$

plugging $\bm{\lambda}_4$ back into #5 and working backwards,
$Ru_3+B^T\lambda_4=0\implies Ru_3+B^TQ_4x_4=0\tag{40}$
plugging system dynamics $x_4=\boldsymbol{A}x_3+Bu_3$ will get,
$Ru_3+B^TQ_4(Ax_3+Bu_3)=0\tag{41}$

$\implies u_3=\underbrace{-(R+B^TQ_4B)^{-1}B^TQ_4A}_{\mathrm{K_3}}x_3\tag{42}$
which is exactly the most common form of feedback gain we can find for LQR in textbook.

Repeat for line #4 and plug $\bm{\lambda}_4$ ,

$\bm{Q}\boldsymbol{x}_3-\boldsymbol{I}\boldsymbol{\lambda}_3+\boldsymbol{A}^T\boldsymbol{\lambda}_4=0\implies\boldsymbol{Q}\boldsymbol{x}_3-\boldsymbol{I}\boldsymbol{\lambda}_3+\boldsymbol{A}^T\boldsymbol{Q}_4\boldsymbol{x}_4=0\tag{43}$
again, plugging the system dynamics $x_4=Ax_3+Bu_3$ leads to
$Qx_3-I\lambda_3+A^TQ_4(Ax_3+Bu_3)=0\tag{44}$
since we have solved $u_3=-K_3x_3$ in the previous step, we can solve $\bm{\lambda_3}$ as,

$\bm{Qx_3}-\boldsymbol{I}\lambda_3+\boldsymbol{A}^TQ_4(\boldsymbol{A}\boldsymbol{x}_3-\boldsymbol{B}\boldsymbol{K}_3\boldsymbol{x}_3)=0\\ \Longrightarrow\boldsymbol{\lambda}_3=\underbrace{\left[\boldsymbol{Q}+\boldsymbol{A}^T\boldsymbol{Q}_4(\boldsymbol{A}-\boldsymbol{B}\boldsymbol{K}_3)\right]\boldsymbol{x}_3}_{\mathbf{P}_3}\tag{45}$

Now we have a recursion for $\bm{K}$ and $\bm{P}$ :

$\bm{P}_N=\bm{Q}_N$
$\bm{K}_{k}= - ( \boldsymbol{R}+ \boldsymbol{B}^{T}\boldsymbol{P}_{k+ 1}\boldsymbol{B}) \boldsymbol{B}^{T}\boldsymbol{P}_{k+ 1}\boldsymbol{A}$
$\bm{P}_k= \bm{Q}+ \bm{A}^T\bm{P}_{k+ 1}( \boldsymbol{A}- \bm{B}\boldsymbol{K}_k)$

which is called the algebraic Riccati Equation/Recursion (ARE).

We can then solve the LQR QP problem by doing a backward Riccati recursion followed by a forward roll-out to compute $\bm{x}_{1:N}$ and $\bm{u}_{1:N-1}$ from the initial state. This is part of the reason why people love LQR, which gives nice and eﬀicient closed-form solutions.

The Linear Quadratic Regulator has several extensions, e.g., the infinite-horizon LQR, the stochastic LQR, etc., which are currently out of the scope of this tutorial.

2.3 LQR as Dynamic Programming

Besides indirect shooting and the Riccati recursion, LQR can also be solved with DP formulation. The DP formulation is more general and can be applied to non-linear systems. Recall the general procedure of DP in 1.4, we iterate over time and solve the value function recursively. For LQR, the value function at terminal time is defined as
$\begin{aligned}\bm{V}_{N}(\boldsymbol{x})=Terminal\:cost&=\frac{1}{2}\boldsymbol{x}_{N}^{T}\boldsymbol{Q}_{N}\boldsymbol{x}_{N}\\&=\frac{1}{2}\boldsymbol{x}_{N}^{T}\boldsymbol{P}_{N}\boldsymbol{x}_{N}\end{aligned}\tag{46}$

and working backwards,
$\begin{aligned}\boldsymbol{V}_{N-1}(\boldsymbol{x})&=\min_{\boldsymbol{u}_{N-1}}\frac{1}{2}\boldsymbol{x}_{N-1}^{T}\boldsymbol{Q}_{k}\boldsymbol{x}_{N-1}+\frac{1}{2}\boldsymbol{u}_{N-1}^{T}\boldsymbol{R}\boldsymbol{u}_{N-1}+\boldsymbol{V}_{N}\\&=\min_{\boldsymbol{u}_{N-1}}\frac{1}{2}\boldsymbol{x}_{N-1}^{T}\boldsymbol{Q}_{k}\boldsymbol{x}_{N-1}+\frac{1}{2}\boldsymbol{u}_{N-1}^{T}\boldsymbol{R}\boldsymbol{u}_{N-1}+\frac{1}{2}\boldsymbol{x}_{N}^{T}\boldsymbol{P}_{N}\boldsymbol{x}_{N}\\&=\min_{\boldsymbol{u}_{N-1}}\frac{1}{2}\boldsymbol{x}_{N-1}^{T}\boldsymbol{Q}_{k}\boldsymbol{x}_{N-1}+\frac{1}{2}\boldsymbol{u}_{N-1}^{T}\boldsymbol{R}\boldsymbol{u}_{N-1}+\\&\quad\frac{1}{2}(\boldsymbol{A}\boldsymbol{x}_{N-1}+\boldsymbol{B}\boldsymbol{u}_{N-1})^{T}\boldsymbol{P}_{N}(\boldsymbol{A}\boldsymbol{x}_{N-1}+\boldsymbol{B}\boldsymbol{u}_{N-1})\end{aligned}\tag{47}$

in order to find the optimal $\bm{u}_{N-1}^*$ ,we set
$\frac{\partial \bm{V}_{N-1}}{\partial\boldsymbol{u}_{N-1}}=0\tag{48}$

which leads to

$\bm{R}\bm{u}_{N-1}+\bm{B}^T\bm{P}_N(\bm{A}\bm{x}_{N-1}+\bm{B}\bm{u}_{N-1})=0\\ \Longrightarrow \bm{u}_{N-1}^{*}=\underbrace{-(\bm{R}+\bm{B}^{T}\bm{P}_{N}\bm{B})^{-1}\bm{B}\bm{P}_{N}\bm{A}}_{\mathrm{K_{N-1}}}\bm{x}_{N-1}\tag{49}$

which is the feedback law for LQR, and to iterate for $\bm{P}$ , we need to plug the optimal $\bm{u}_{N-1}^*$ back to the value function $\bm{V}_{N-1}$ , which leads to
$\bm{V}_{N-1}(\bm{x})=\frac{1}{2}\bm{x}_{N-1}^{T}\underbrace{\left[\bm{Q}+\bm{K}_{N-1}^{T}\bm{R}\bm{K}_{N-1}+(\bm{A}-\bm{BK}_{N-1})^{T}\bm{P}_{N}(\bm{A}-\bm{B}\bm{K}_{N-1})\right]}_{\mathrm{P_{N-1}}}\bm{x}_{N-1}\tag{50}$
Now with Eq. 46, 49 and 50 we have a recursion for solving the LQR problem with general DP algorithm,

Initialize $\bm{P}_N=\bm{Q}_N$ ;
For $k=N-1,\cdots,1$ :
$\quad\bm{K}_k=- (\bm{R}+ \bm{B}^{T}\bm{P}_{k+ 1}\bm{B}) ^{- 1}\bm{B}^{T}\bm{P}_{k+ 1}\bm{A}$
$\quad\boldsymbol{P}_{k}=\bm{Q}+\boldsymbol{K}_{k}^{T}\boldsymbol{R}\boldsymbol{K}_{k}+(\boldsymbol{A}-\boldsymbol{B}\boldsymbol{K}_{k})^{T}\boldsymbol{P}_{k+1}(\boldsymbol{A}-\boldsymbol{B}\boldsymbol{K}_{k})$

Compared to the Riccati recursion, the DP formulation is more computationally expensive, which will explode as the system dimension grows, known as the Bellman’s Curse of Dimensionality.

Still, this is the linear case for DP, while for non-linear case, we need to solve the Hamilton-Jacobi-Bellman equation, which is even more computationally expensive.

3. Differential Dynamic Programming (DDP)

Please refer to this 最优控制 LQR 与 Differential Dynamic Programming(DDP) 详细公式推导（下）.

3.1 Local Approximation

Please refer to this 最优控制 LQR 与 Differential Dynamic Programming(DDP) 详细公式推导（下）.

3.2 Backward Pass

Please refer to this 最优控制 LQR 与 Differential Dynamic Programming(DDP) 详细公式推导（下）.

3.3 Line Search

Please refer to this 最优控制 LQR 与 Differential Dynamic Programming(DDP) 详细公式推导（下）.

3.4 Forward Pass

Please refer to this 最优控制 LQR 与 Differential Dynamic Programming(DDP) 详细公式推导（下）.

no_xycoordinate

关注

48
点赞
踩
24

收藏

觉得还不错? 一键收藏
0
评论
最优控制 LQR 与 Differential Dynamic Programming(DDP) 详细公式推导（中）

最优控制 LQR 与 Differential Dynamic Programming(DDP) 详细公式推导（中）
复制链接

扫一扫