最优控制 LQR 与 Differential Dynamic Programming(DDP) 详细公式推导(中)

1. Introduction of Optimal Control (OC)

Please refer to this 最优控制 LQR 与 Differential Dynamic Programming(DDP) 详细公式推导(上).

1.1 System dynamics

Please refer to this 最优控制 LQR 与 Differential Dynamic Programming(DDP) 详细公式推导(上).

1.2 Optimal control problem, cost function and constraints formulation

Please refer to this 最优控制 LQR 与 Differential Dynamic Programming(DDP) 详细公式推导(上).

1.3 Control policy

Please refer to this 最优控制 LQR 与 Differential Dynamic Programming(DDP) 详细公式推导(上).

1.4 Bellman’s Principle of Optimality (BPO) and Dynamic Programming (DP)

Please refer to this 最优控制 LQR 与 Differential Dynamic Programming(DDP) 详细公式推导(上).

1.5 Hamiltonian-Jacobian-Bellman

Please refer to this 最优控制 LQR 与 Differential Dynamic Programming(DDP) 详细公式推导(上).

1.6 Pontryagin’s Maximum Principle (PMP)

Please refer to this 最优控制 LQR 与 Differential Dynamic Programming(DDP) 详细公式推导(上).

2. Linear Quadratic Regulator (LQR)

Linear Quadratic Regulator is a type of optimal control method that finds a feedback controller for a linear system that minimizes a quadratic cost function. The cost function is often defined as a sum of the deviations of the state and the control input from their desired values, in quadratic form. The optimal controller (regulator) has the form u = − K x \bm{u}=-K\bm{x} u=Kx, where K K K is a matrix that depends on the system dynamics and the cost function parameters.

The history of LQR dates back to the 1950s and 1960s, when researchers such as Richard Bellman, Rudolf Kalman and John Zames developed the theory of dynamic programming, optimal control and linear systems. The LQR problem was one of the first and most fundamental problems that could be solved by these methods. The LQR algorithm was also extended to handle stochastic disturbances and imperfect measurements, resulting in the linear-quadratic-Gaussian (LQG) problem.

LQR is widely used in engineering applications because it is easy to implement (given acceptable state identification and state estimation), robust, and efficient. It is also a popular benchmark for other control algorithms, such as reinforcement learning.

The practical implementation of LQR involves solving a matrix equation called the algebraic Riccati equation (ARE), which gives the optimal feedback gain matrix K. The ARE can be derived by applying the Hamilton-Jacobi-Bellman equation to the LQR problem, or by using variational methods. The ARE can be solved numerically by various algorithms, such as Schur-decomposition, Newton iteration or Riccati recursion.

Firstly, define the OCP of LQR as follows:

min ⁡ u 1 : N − 1 ∈ U J ( x 1 , u 1 , . . . , N − 1 ) = 1 2 ∑ k = 1 N − 1 x k T Q k x k + u k T R k u k + 1 2 x N T Q N x N s . t . x k + 1 = A k x k + B k u k , k = 1 , . . . , N − 1 (25) \begin{aligned}\min_{\boldsymbol{u}_{1:N-1}\in\mathcal{U}}\bm{J}(\boldsymbol{x}_{1},\boldsymbol{u}_{1,...,N-1})&=\frac{1}{2}\sum_{k=1}^{N-1}\boldsymbol{x}_{k}^{T}Q_{k}\boldsymbol{x}_{k}+\boldsymbol{u}_{k}^{T}R_{k}\boldsymbol{u}_{k}+\frac{1}{2}\boldsymbol{x}_{N}^{T}Q_{N}\boldsymbol{x}_{N} \\ s. t.\quad \boldsymbol{x}_{k+ 1}&= A_k\boldsymbol{x}_k+ B_k\boldsymbol{u}_k,\quad k= 1, . . . , N- 1\end{aligned}\tag{25} u1:N1UminJ(x1,u1,...,N1)s.t.xk+1=21k=1N1xkTQkxk+ukTRkuk+21xNTQNxN=Akxk+Bkuk,k=1,...,N1(25)

with Q ⪰ 0 Q\succeq0 Q0 positive defined, R ≻ 0 R\succ0 R0 positive semi-definite, and initial state x 1 \bm{x}_1 x1 given. The cost function is quadratic in both state and control input. The state equation is linear. The control input is unconstrained. The terminal state is penalized.

We will derive LQR feedback gain with several approaches below.

2.1 LQR with indirect shooting based on PMP

Define the Hamiltonian as follows:

H ( x , u , λ ) = 1 2 x T Q x + 1 2 u T R u + λ T ( A x + B u ) = 1 2 x T Q x + 1 2 u T R u + λ T A x + λ T B u (26) \begin{aligned} H(\boldsymbol{x},\boldsymbol{u},\boldsymbol{\lambda})&=\frac{1}{2}\boldsymbol{x}^{T}Q\boldsymbol{x}+\frac{1}{2}\boldsymbol{u}^{T}R\boldsymbol{u}+\boldsymbol{\lambda}^{T}(A\boldsymbol{x}+B\boldsymbol{u})\\&=\frac{1}{2}\boldsymbol{x}^{T}Q\boldsymbol{x}+\frac{1}{2}\boldsymbol{u}^{T}R\boldsymbol{u}+\boldsymbol{\lambda}^{T}A\boldsymbol{x}+\boldsymbol{\lambda}^{T}B\boldsymbol{u} \end{aligned}\tag{26} H(x,u,λ)=21xTQx+21uTRu+λT(Ax+Bu)=21xTQx+21uTRu+λTAx+λTBu(26)

with PMP, we have
x k + 1 = ∇ λ H = A x k + B u k (27a) \bm{x}_{k+1}=\nabla_{\lambda}H=A\boldsymbol{x}_{k}+B\boldsymbol{u}_{k}\tag{27a} xk+1=λH=Axk+Buk(27a)
λ k = ∇ x H = Q x k + A T λ k + 1 (27b) \lambda_k=\nabla_{\boldsymbol{x}}H=Q\bm{x}_k+A^T\boldsymbol{\lambda}_{k+1}\tag{27b} λk=xH=Qxk+ATλk+1(27b)
with terminal condition λ N = Q x N . \bm{\lambda}_N=Q\bm{x}_N. λN=QxN. (Please note that we are assuming linear time-invariant system here.)

In order to find optimal control u ∗ \bm{u}^* u, we can set the derivative of Hamiltonian with respect to u \bm{u} u to zero:
∂ H ∂ u = R u k + λ k + 1 T B = 0 (28) \frac{\partial H}{\partial\boldsymbol{u}}=R\boldsymbol{u}_{k}+\boldsymbol{\lambda}_{k+1}^{T}B=0\tag{28} uH=Ruk+λk+1TB=0(28)
from which, we have:
u k ∗ = − R − 1 B T λ k + 1 (29) \bm{u}_k^*=-R^{-1}B^T\bm{\lambda}_{k+1}\tag{29} uk=R1BTλk+1(29)

Then the general procedure for solving LQR problem with indirect shooting is as follows:

  1. Start with an initial guess of u 1 : N − 1 . \bm{u}_{1:N-1}. u1:N1.
  2. Roll-out with the initial state x 1 \bm{x}_1 x1 and the control sequence u 1 : N − 1 \bm{u}_{1:N-1} u1:N1 to get x 1 : N . \bm{x}_{1:N}. x1:N.
  3. Solve the terminal condition λ N = Q x N \boldsymbol{\lambda}_N=Q\boldsymbol{x}_N λN=QxN and Eq.27 to get λ 1 : N − 1 \boldsymbol{\lambda}_{1:N-1} λ1:N1 and get Δ u \Delta\boldsymbol{u} Δu with Eq.29.
  4. Roll out with a line-search on Δ u . \Delta \bm{u}. Δu.
  5. Repeat step 3 and 4 until convergence.

2.2 LQR as Quadratic Programming

The LQR problem can also be formulated as a quadratic programming problem, which can then be solved by various algorithms, such as active-set method, interior-point method, and so on.

To start with, we can re-organize the state and control sequence stacked as follows:
z : = [ u 1 x 2 ⋯ u 2 x 3 ⋯ u N − 1 x N ] T (30) \bm{z}:=\begin{bmatrix}\bm{u}_1&\bm{x}_2&\cdots&\bm{u}_2&\bm{x}_3&\cdots&\bm{u}_{N-1}&\bm{x}_N\end{bmatrix}^T\tag{30} z:=[u1x2u2x3uN1xN]T(30)

and correspondingly the cost function weighting matrix as follows:
H : = [ R 1 Q 2 R 2 Q 3 ⋱ Q N ] (31) \bm{H}:=\begin{bmatrix}\bm{R}_1\\&\bm{Q}_2\\&&\bm{R}_2\\&&&\bm{Q}_3\\&&&&\ddots\\&&&&&\bm{Q}_N\end{bmatrix}\tag{31} H:= R1Q2R2Q3QN (31)

(please be noted that we are omitting Q 1 \bm{Q}_1 Q1 cause we assume that the initial state is fixed and no longer a decision variable.)

Then the cost function can be written as:
J ( z ) = 1 2 z T H z (32) J(\bm{z})=\frac{1}{2}\bm{z}^T\bm{H}\bm{z}\tag{32} J(z)=21zTHz(32)

Plug the dynamics into the stacked state and control sequence, we have:
[ B 1 − I 0 ⋯ A B − I 0 ⋯ A B − I 0 ⋯ ⋱ ⋱ ⋱ A B − I ] ⏟ C matrix [ u 1 x 2 u 2 ⋮ x N ] ⏟ z = [ − A x 1 0 0 ⋮ 0 ] ⏟ D matrix (33) \underbrace{\begin{bmatrix}B_1&-I&0&\cdots\\&A&B&-I&0&\cdots\\&&A&B&-I&0&\cdots\\&&&\ddots&\ddots&\ddots\\&&&&A&B&-I\end{bmatrix}}_{\text{C matrix}}\underbrace{\begin{bmatrix}\boldsymbol{u}_1\\\boldsymbol{x}_2\\\boldsymbol{u}_2\\\vdots\\\boldsymbol{x}_N\end{bmatrix}}_{\text{z}}=\underbrace{\begin{bmatrix}-\boldsymbol{A}\boldsymbol{x}_1\\0\\0\\\vdots\\0\end{bmatrix}}_{\text{D matrix}}\tag{33} C matrix B1IA0BAIB0IA0BI z u1x2u2xN =D matrix Ax1000 (33)

which can serve as the equality constraint for the quadratic programming problem.

To be more specific, the quadratic programming problem for LQR can be formulated as follows:

minimize ⁡ z 1 2 z T H z s . t . C z = D (34) \underset{\bm{z}}{\operatorname*{minimize}}\quad\frac{1}{2}\bm{z}^{T}\bm{H}\bm{z} \\ s.t.\quad \bm{C}\bm{z}=\bm{D}\tag{34} zminimize21zTHzs.t.Cz=D(34)

Then the Lagrangian of the QP will be:
L ( z , λ ) = 1 2 z T H z + λ T ( C z − D ) (35) L(\bm{z},\bm{\lambda})=\frac{1}{2}\bm{z}^{T}\bm{H}\bm{z}+\bm{\lambda}^{T}(\bm{C}\bm{z}-\bm{D})\tag{35} L(z,λ)=21zTHz+λT(CzD)(35)

whose KKT condition is:
∂ L ∂ z = H z + C T λ = 0 ∂ L ∂ λ = C z − D = 0 (36) \begin{aligned} \frac{\partial L}{\partial \bm{z}}&=\bm{H}\bm{z}+\bm{C}^{T}\bm{\lambda}=0\\\frac{\partial L}{\partial\bm{\lambda}}&=\bm{C}\bm{z}-\bm{D}=0\tag{36} \end{aligned} zLλL=Hz+CTλ=0=CzD=0(36)

The KKT condition Eq.36 can be organized as a linear system:
[ H C T C 0 ] [ z λ ] = [ 0 D ] (37) \begin{bmatrix}H&C^T\\C&0\end{bmatrix}\begin{bmatrix}z\\\lambda\end{bmatrix}=\begin{bmatrix}0\\D\end{bmatrix}\tag{37} [HCCT0][zλ]=[0D](37)
which can be solved analytically.

Beyond the analytically solution of the KKT condition, we can take a closer look at Eq.36, which is pretty sparse and has a block-tridiagonal structure. This structure can be exploited to solve the KKT condition efficiently.

To get started, we first expose the block-tridiagonal structure of the KKT condition Eq.36 as follows:
# 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 [ R B T Q − I A T R B T Q − I A T R B T Q 4 − I B − I 0 ⋯ 0 A B − I ⋮ ⋱ ⋮ A B − I 0 ⋯ 0 ] [ u 1 x 2 u 2 x 3 u 3 x 4 λ 2 λ 3 λ 4 ] = [ 0 0 0 0 0 0 − A x 1 0 0 ] (38) \begin{matrix} \#1\\\#2\\\#3\\\#4\\\#5\\\#6\\\#7\\\#8\\\#9 \end{matrix} \left[\begin{array}{cccccc|ccc}\bm{R}&&&&&&\bm{B}^T&&\\&\bm{Q}&&&&&\bm{-I}&\bm{A}^T&\\&&\bm{R}&&&&&\bm{B}^T&\\&&&\bm{Q}&&&&\bm{-I}&A^T\\&&&&\bm{R}&&&&\bm{B}^T\\&&&&&\bm{Q_4}&&&\bm{-I}\\\hline \bm{B}&\bm{-I}&&&&&0&\cdots&0\\&\bm{A}&\bm{B}&\bm{-I}&&&\vdots&\ddots&\vdots\\&&&\bm{A}&\bm{B}&\bm{-I}&0&\cdots&0\end{array}\right]\begin{bmatrix}\bm{u}_1\\\bm{x}_2\\\bm{u}_2\\\bm{x}_3\\\bm{u}_3\\\bm{x}_4\\\hline\bm{\lambda}_2\\\bm{\lambda}_3\\\bm{\lambda}_4\end{bmatrix}=\begin{bmatrix}0\\0\\0\\0\\0\\0\\\hline-\bm{Ax}_1\\0\\0\end{bmatrix}\tag{38} #1#2#3#4#5#6#7#8#9 RBQIARBQIARBQ4IBTI00ATBTIATBTI00 u1x2u2x3u3x4λ2λ3λ4 = 000000Ax100 (38)
(here we are just using four time steps as an example, the structure is the same for any number of time steps)

Then we can solve the KKT condition Eq.36 by forward substitution and backward substitution.

Start from line #6, we have
Q 4 x 4 − I λ 4 = 0    ⟹    λ 4 = Q 4 x 4 (39) \bm{Q_4}\bm{x_4}-\bm{I}\bm{\lambda}_4=0\implies\bm{\lambda_4}=\bm{Q_4}\bm{x_4}\tag{39} Q4x4Iλ4=0λ4=Q4x4(39)

plugging λ 4 \bm{\lambda}_4 λ4 back into #5 and working backwards,
R u 3 + B T λ 4 = 0    ⟹    R u 3 + B T Q 4 x 4 = 0 (40) Ru_3+B^T\lambda_4=0\implies Ru_3+B^TQ_4x_4=0\tag{40} Ru3+BTλ4=0Ru3+BTQ4x4=0(40)
plugging system dynamics x 4 = A x 3 + B u 3 x_4=\boldsymbol{A}x_3+Bu_3 x4=Ax3+Bu3 will get,
R u 3 + B T Q 4 ( A x 3 + B u 3 ) = 0 (41) Ru_3+B^TQ_4(Ax_3+Bu_3)=0\tag{41} Ru3+BTQ4(Ax3+Bu3)=0(41)

   ⟹    u 3 = − ( R + B T Q 4 B ) − 1 B T Q 4 A ⏟ K 3 x 3 (42) \implies u_3=\underbrace{-(R+B^TQ_4B)^{-1}B^TQ_4A}_{\mathrm{K_3}}x_3\tag{42} u3=K3 (R+BTQ4B)1BTQ4Ax3(42)
which is exactly the most common form of feedback gain we can find for LQR in textbook.

Repeat for line #4 and plug λ 4 \bm{\lambda}_4 λ4,

Q x 3 − I λ 3 + A T λ 4 = 0    ⟹    Q x 3 − I λ 3 + A T Q 4 x 4 = 0 (43) \bm{Q}\boldsymbol{x}_3-\boldsymbol{I}\boldsymbol{\lambda}_3+\boldsymbol{A}^T\boldsymbol{\lambda}_4=0\implies\boldsymbol{Q}\boldsymbol{x}_3-\boldsymbol{I}\boldsymbol{\lambda}_3+\boldsymbol{A}^T\boldsymbol{Q}_4\boldsymbol{x}_4=0\tag{43} Qx3Iλ3+ATλ4=0Qx3Iλ3+ATQ4x4=0(43)
again, plugging the system dynamics x 4 = A x 3 + B u 3 x_4=Ax_3+Bu_3 x4=Ax3+Bu3 leads to
Q x 3 − I λ 3 + A T Q 4 ( A x 3 + B u 3 ) = 0 (44) Qx_3-I\lambda_3+A^TQ_4(Ax_3+Bu_3)=0\tag{44} Qx3Iλ3+ATQ4(Ax3+Bu3)=0(44)
since we have solved u 3 = − K 3 x 3 u_3=-K_3x_3 u3=K3x3 in the previous step, we can solve λ 3 \bm{\lambda_3} λ3 as,

Q x 3 − I λ 3 + A T Q 4 ( A x 3 − B K 3 x 3 ) = 0 ⟹ λ 3 = [ Q + A T Q 4 ( A − B K 3 ) ] x 3 ⏟ P 3 (45) \bm{Qx_3}-\boldsymbol{I}\lambda_3+\boldsymbol{A}^TQ_4(\boldsymbol{A}\boldsymbol{x}_3-\boldsymbol{B}\boldsymbol{K}_3\boldsymbol{x}_3)=0\\ \Longrightarrow\boldsymbol{\lambda}_3=\underbrace{\left[\boldsymbol{Q}+\boldsymbol{A}^T\boldsymbol{Q}_4(\boldsymbol{A}-\boldsymbol{B}\boldsymbol{K}_3)\right]\boldsymbol{x}_3}_{\mathbf{P}_3}\tag{45} Qx3Iλ3+ATQ4(Ax3BK3x3)=0λ3=P3 [Q+ATQ4(ABK3)]x3(45)

Now we have a recursion for K \bm{K} K and P \bm{P} P :

  1. P N = Q N \bm{P}_N=\bm{Q}_N PN=QN
  2. K k = − ( R + B T P k + 1 B ) B T P k + 1 A \bm{K}_{k}= - ( \boldsymbol{R}+ \boldsymbol{B}^{T}\boldsymbol{P}_{k+ 1}\boldsymbol{B}) \boldsymbol{B}^{T}\boldsymbol{P}_{k+ 1}\boldsymbol{A} Kk=(R+BTPk+1B)BTPk+1A
  3. P k = Q + A T P k + 1 ( A − B K k ) \bm{P}_k= \bm{Q}+ \bm{A}^T\bm{P}_{k+ 1}( \boldsymbol{A}- \bm{B}\boldsymbol{K}_k) Pk=Q+ATPk+1(ABKk)

which is called the algebraic Riccati Equation/Recursion (ARE).

We can then solve the LQR QP problem by doing a backward Riccati recursion followed by a forward roll-out to compute x 1 : N \bm{x}_{1:N} x1:N and u 1 : N − 1 \bm{u}_{1:N-1} u1:N1 from the initial state. This is part of the reason why people love LQR, which gives nice and efficient closed-form solutions.

The Linear Quadratic Regulator has several extensions, e.g., the infinite-horizon LQR, the stochastic LQR, etc., which are currently out of the scope of this tutorial.

2.3 LQR as Dynamic Programming

Besides indirect shooting and the Riccati recursion, LQR can also be solved with DP formulation. The DP formulation is more general and can be applied to non-linear systems. Recall the general procedure of DP in 1.4, we iterate over time and solve the value function recursively. For LQR, the value function at terminal time is defined as
V N ( x ) = T e r m i n a l   c o s t = 1 2 x N T Q N x N = 1 2 x N T P N x N (46) \begin{aligned}\bm{V}_{N}(\boldsymbol{x})=Terminal\:cost&=\frac{1}{2}\boldsymbol{x}_{N}^{T}\boldsymbol{Q}_{N}\boldsymbol{x}_{N}\\&=\frac{1}{2}\boldsymbol{x}_{N}^{T}\boldsymbol{P}_{N}\boldsymbol{x}_{N}\end{aligned}\tag{46} VN(x)=Terminalcost=21xNTQNxN=21xNTPNxN(46)

and working backwards,
V N − 1 ( x ) = min ⁡ u N − 1 1 2 x N − 1 T Q k x N − 1 + 1 2 u N − 1 T R u N − 1 + V N = min ⁡ u N − 1 1 2 x N − 1 T Q k x N − 1 + 1 2 u N − 1 T R u N − 1 + 1 2 x N T P N x N = min ⁡ u N − 1 1 2 x N − 1 T Q k x N − 1 + 1 2 u N − 1 T R u N − 1 + 1 2 ( A x N − 1 + B u N − 1 ) T P N ( A x N − 1 + B u N − 1 ) (47) \begin{aligned}\boldsymbol{V}_{N-1}(\boldsymbol{x})&=\min_{\boldsymbol{u}_{N-1}}\frac{1}{2}\boldsymbol{x}_{N-1}^{T}\boldsymbol{Q}_{k}\boldsymbol{x}_{N-1}+\frac{1}{2}\boldsymbol{u}_{N-1}^{T}\boldsymbol{R}\boldsymbol{u}_{N-1}+\boldsymbol{V}_{N}\\&=\min_{\boldsymbol{u}_{N-1}}\frac{1}{2}\boldsymbol{x}_{N-1}^{T}\boldsymbol{Q}_{k}\boldsymbol{x}_{N-1}+\frac{1}{2}\boldsymbol{u}_{N-1}^{T}\boldsymbol{R}\boldsymbol{u}_{N-1}+\frac{1}{2}\boldsymbol{x}_{N}^{T}\boldsymbol{P}_{N}\boldsymbol{x}_{N}\\&=\min_{\boldsymbol{u}_{N-1}}\frac{1}{2}\boldsymbol{x}_{N-1}^{T}\boldsymbol{Q}_{k}\boldsymbol{x}_{N-1}+\frac{1}{2}\boldsymbol{u}_{N-1}^{T}\boldsymbol{R}\boldsymbol{u}_{N-1}+\\&\quad\frac{1}{2}(\boldsymbol{A}\boldsymbol{x}_{N-1}+\boldsymbol{B}\boldsymbol{u}_{N-1})^{T}\boldsymbol{P}_{N}(\boldsymbol{A}\boldsymbol{x}_{N-1}+\boldsymbol{B}\boldsymbol{u}_{N-1})\end{aligned}\tag{47} VN1(x)=uN1min21xN1TQkxN1+21uN1TRuN1+VN=uN1min21xN1TQkxN1+21uN1TRuN1+21xNTPNxN=uN1min21xN1TQkxN1+21uN1TRuN1+21(AxN1+BuN1)TPN(AxN1+BuN1)(47)

in order to find the optimal u N − 1 ∗ \bm{u}_{N-1}^* uN1,we set
∂ V N − 1 ∂ u N − 1 = 0 (48) \frac{\partial \bm{V}_{N-1}}{\partial\boldsymbol{u}_{N-1}}=0\tag{48} uN1VN1=0(48)

which leads to

R u N − 1 + B T P N ( A x N − 1 + B u N − 1 ) = 0 ⟹ u N − 1 ∗ = − ( R + B T P N B ) − 1 B P N A ⏟ K N − 1 x N − 1 (49) \bm{R}\bm{u}_{N-1}+\bm{B}^T\bm{P}_N(\bm{A}\bm{x}_{N-1}+\bm{B}\bm{u}_{N-1})=0\\ \Longrightarrow \bm{u}_{N-1}^{*}=\underbrace{-(\bm{R}+\bm{B}^{T}\bm{P}_{N}\bm{B})^{-1}\bm{B}\bm{P}_{N}\bm{A}}_{\mathrm{K_{N-1}}}\bm{x}_{N-1}\tag{49} RuN1+BTPN(AxN1+BuN1)=0uN1=KN1 (R+BTPNB)1BPNAxN1(49)

which is the feedback law for LQR, and to iterate for P \bm{P} P, we need to plug the optimal u N − 1 ∗ \bm{u}_{N-1}^* uN1 back to the value function V N − 1 \bm{V}_{N-1} VN1, which leads to
V N − 1 ( x ) = 1 2 x N − 1 T [ Q + K N − 1 T R K N − 1 + ( A − B K N − 1 ) T P N ( A − B K N − 1 ) ] ⏟ P N − 1 x N − 1 (50) \bm{V}_{N-1}(\bm{x})=\frac{1}{2}\bm{x}_{N-1}^{T}\underbrace{\left[\bm{Q}+\bm{K}_{N-1}^{T}\bm{R}\bm{K}_{N-1}+(\bm{A}-\bm{BK}_{N-1})^{T}\bm{P}_{N}(\bm{A}-\bm{B}\bm{K}_{N-1})\right]}_{\mathrm{P_{N-1}}}\bm{x}_{N-1}\tag{50} VN1(x)=21xN1TPN1 [Q+KN1TRKN1+(ABKN1)TPN(ABKN1)]xN1(50)
Now with Eq. 46, 49 and 50 we have a recursion for solving the LQR problem with general DP algorithm,

  1. Initialize P N = Q N \bm{P}_N=\bm{Q}_N PN=QN;
  2. For k = N − 1 , ⋯   , 1 k=N-1,\cdots,1 k=N1,,1:
  3. K k = − ( R + B T P k + 1 B ) − 1 B T P k + 1 A \quad\bm{K}_k=- (\bm{R}+ \bm{B}^{T}\bm{P}_{k+ 1}\bm{B}) ^{- 1}\bm{B}^{T}\bm{P}_{k+ 1}\bm{A} Kk=(R+BTPk+1B)1BTPk+1A
  4. P k = Q + K k T R K k + ( A − B K k ) T P k + 1 ( A − B K k ) \quad\boldsymbol{P}_{k}=\bm{Q}+\boldsymbol{K}_{k}^{T}\boldsymbol{R}\boldsymbol{K}_{k}+(\boldsymbol{A}-\boldsymbol{B}\boldsymbol{K}_{k})^{T}\boldsymbol{P}_{k+1}(\boldsymbol{A}-\boldsymbol{B}\boldsymbol{K}_{k}) Pk=Q+KkTRKk+(ABKk)TPk+1(ABKk)

Compared to the Riccati recursion, the DP formulation is more computationally expensive, which will explode as the system dimension grows, known as the Bellman’s Curse of Dimensionality.

Still, this is the linear case for DP, while for non-linear case, we need to solve the Hamilton-Jacobi-Bellman equation, which is even more computationally expensive.

3. Differential Dynamic Programming (DDP)

Please refer to this 最优控制 LQR 与 Differential Dynamic Programming(DDP) 详细公式推导(下).

3.1 Local Approximation

Please refer to this 最优控制 LQR 与 Differential Dynamic Programming(DDP) 详细公式推导(下).

3.2 Backward Pass

Please refer to this 最优控制 LQR 与 Differential Dynamic Programming(DDP) 详细公式推导(下).

3.3 Line Search

Please refer to this 最优控制 LQR 与 Differential Dynamic Programming(DDP) 详细公式推导(下).

3.4 Forward Pass

Please refer to this 最优控制 LQR 与 Differential Dynamic Programming(DDP) 详细公式推导(下).

  • 48
    点赞
  • 24
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值