最优控制 LQR 与 Differential Dynamic Programming(DDP) 详细公式推导(下)

1. Introduction of Optimal Control (OC)

Please refer to this 最优控制 LQR 与 Differential Dynamic Programming(DDP) 详细公式推导(上).

1.1 System dynamics

Please refer to this 最优控制 LQR 与 Differential Dynamic Programming(DDP) 详细公式推导(上).

1.2 Optimal control problem, cost function and constraints formulation

Please refer to this 最优控制 LQR 与 Differential Dynamic Programming(DDP) 详细公式推导(上).

1.3 Control policy

Please refer to this 最优控制 LQR 与 Differential Dynamic Programming(DDP) 详细公式推导(上).

1.4 Bellman’s Principle of Optimality (BPO) and Dynamic Programming (DP)

Please refer to this 最优控制 LQR 与 Differential Dynamic Programming(DDP) 详细公式推导(上).

1.5 Hamiltonian-Jacobian-Bellman

Please refer to this 最优控制 LQR 与 Differential Dynamic Programming(DDP) 详细公式推导(上).

1.6 Pontryagin’s Maximum Principle (PMP)

Please refer to this 最优控制 LQR 与 Differential Dynamic Programming(DDP) 详细公式推导(上).

2. Linear Quadratic Regulator (LQR)

Please refer to this 最优控制 LQR 与 Differential Dynamic Programming(DDP) 详细公式推导(中).

2.1 LQR with indirect shooting based on PMP

Please refer to this 最优控制 LQR 与 Differential Dynamic Programming(DDP) 详细公式推导(中).

2.2 LQR as Quadratic Programming

Please refer to this 最优控制 LQR 与 Differential Dynamic Programming(DDP) 详细公式推导(中).

2.3 LQR as Dynamic Programming

Please refer to this 最优控制 LQR 与 Differential Dynamic Programming(DDP) 详细公式推导(中).

3. Differential Dynamic Programming (DDP)

In the last section we discussed the Linear Quadratic Regulator (LQR) problem, which is a special case of the optimal control problem (OCP) where the system dynamics are linear and the cost function is quadratic. How about non-linearities, either in dynamics or cost functions? How can we tackle the value function? In this section, we will discuss a more general framework for solving OCPs, namely Differential Dynamic Programming (DDP) for discrete process. DDP is a trajectory optimization method that is based on the principle of optimality. It is a two-step method that first computes a locally optimal trajectory and then iteratively refines it. The method is iterative and can be used to solve non-linear OCPs with non-linear dynamics and non-linear cost functions.

Recall that we have an OCP defined in the discrete-time domain as follows:
min ⁡ u J ( x 0 , u 0 , . . . , N − 1 ) s u b j e c t   t o x k + 1 = f ( x k , u k ) x ∈ X u ∈ U (51) \begin{aligned}&\operatorname*{min}_{\boldsymbol{u}}&&J\left(\boldsymbol{x}_{0},\boldsymbol{u}_{0,...,N-1}\right)\\&\mathrm{subject~to}&&\bm{x}_{k+1}=\bm{f}(\bm{x}_{k},\bm{u}_{k})\\&&&\bm{x}\in\mathcal{X}\\&&&\bm{u}\in\mathcal{U}\end{aligned}\tag{51} uminsubject toJ(x0,u0,...,N1)xk+1=f(xk,uk)xXuU(51)
where the cost function is defined as:
J ( x 0 , u 0 , . . . , N − 1 ) = ∑ k = 0 N − 1 ℓ ( x k , u k ) ⏟ cost-to-go + ℓ f ( x N ) ⏟ terminal cost (52) \begin{aligned}&J\left(\boldsymbol{x}_{0},\boldsymbol{u}_{0,...,N-1}\right)=\sum_{k=0}^{N-1}\underbrace{\ell(\boldsymbol{x}_{k},\boldsymbol{u}_{k})}_{\text{cost-to-go}}+\underbrace{\ell_{f}(\boldsymbol{x}_{N})}_{\text{terminal cost}}\end{aligned}\tag{52} J(x0,u0,...,N1)=k=0N1cost-to-go (xk,uk)+terminal cost f(xN)(52)

As in DP, we need to define the value function and find the recursion for it. To begin with, the value function is defined as:

V ( x k ) = min ⁡ u k , … , N − 1 J ( x k , u k , … , N − 1 ) = min ⁡ u k , … , N − 1 [ ∑ i = k N − 1 ℓ ( x i , u i ) + ℓ f ( x N ) ] = min ⁡ u k ℓ ( x k , u k ) + min ⁡ u k + 1 , … , N − 1 [ ∑ i = k + 1 N − 1 ℓ ( x i , u i ) + ℓ f ( x N ) ] = min ⁡ u k ℓ ( x k , u k ) + V ( x k + 1 ) (53) \begin{aligned}V(\boldsymbol{x}_{k})&=\min_{\boldsymbol{u}_{k,\ldots,N-1}}J\left(\boldsymbol{x}_{k},\boldsymbol{u}_{k,\ldots,N-1}\right)\\&=\min_{\boldsymbol{u}_{k,\ldots,N-1}}\left[\sum_{i=k}^{N-1}\ell(\boldsymbol{x}_{i},\boldsymbol{u}_{i})+\ell_{f}(\boldsymbol{x}_{N})\right]\\&=\min_{\boldsymbol{u}_{k}}\ell(\boldsymbol{x}_{k},\boldsymbol{u}_{k})+\min_{\boldsymbol{u}_{k+1,\ldots,N-1}}\left[\sum_{i=k+1}^{N-1}\ell(\boldsymbol{x}_{i},\boldsymbol{u}_{i})+\ell_{f}(\boldsymbol{x}_{N})\right]\\&=\min_{\boldsymbol{u}_{k}}\ell(\boldsymbol{x}_{k},\boldsymbol{u}_{k})+V(\boldsymbol{x}_{k+1})\\\end{aligned}\tag{53} V(xk)=uk,,N1minJ(xk,uk,,N1)=uk,,N1min[i=kN1(xi,ui)+f(xN)]=ukmin(xk,uk)+uk+1,,N1min[i=k+1N1(xi,ui)+f(xN)]=ukmin(xk,uk)+V(xk+1)(53)

Here we don’t stick with the assumption on linear dynamics and quadratic cost function any more, so we can’t use the Riccati equation to solve the value function. Instead, we can use differential-based approximation upon both dynamics and cost function.

3.1 Local Approximation

Start with the dynamics, we can use Taylor expansion to approximate the dynamics around the current state and control input, say ( x ˉ , u ˉ ) (\bar{\bm{x}},\bar{\bm{u}}) (xˉ,uˉ), then we can approximate the dynamics around ( x ˉ , u ˉ ) (\bar{\bm{x}},\bar{\bm{u}}) (xˉ,uˉ) as:

d x d t = f ( x , u ) = f ( x ˉ , u ˉ ) + ∇ x f ( x ˉ , u ˉ ) ( x − x ˉ ) ⏟ l i n e a r   t e r m + ∇ x f ( x ˉ , u ˉ ) ( u − u ˉ ) ⏟ l i n e a r   t e r m + 1 2 ∇ x x 2 f ( x ˉ , u ˉ ) ( x − x ˉ ) 2 ⏟ quadratic term + 1 2 ∇ u u 2 f ( x ˉ , u ˉ ) ( u − u ˉ ) 2 ⏟ quadratic term + f x u ( x ˉ , u ˉ ) ( x − x ˉ ) ( u − u ˉ ) ⏟ cross quaduatic lerm + O ( ∥ x − x ˉ ∥ 3 ) + O ( ∥ u − u ˉ ∥ 3 ) (54) \begin{aligned}\frac{d\boldsymbol{x}}{dt}=\boldsymbol{f}(\boldsymbol{x},\boldsymbol{u})=\boldsymbol{f}(\bar{\boldsymbol{x}},\bar{\boldsymbol{u}})&+\underbrace{\nabla_{\boldsymbol{x}}\boldsymbol{f}(\bar{\boldsymbol{x}},\bar{\boldsymbol{u}})(\boldsymbol{x}-\bar{\boldsymbol{x}})}_{\mathrm{linear~term}}+\underbrace{\nabla_{\boldsymbol{x}}\boldsymbol{f}(\bar{\boldsymbol{x}},\bar{\boldsymbol{u}})(\boldsymbol{u}-\bar{\boldsymbol{u}})}_{\mathrm{linear~term}}\\&+\underbrace{\frac12\nabla_{\boldsymbol{x}\boldsymbol{x}}^2f(\bar{\boldsymbol{x}},\bar{\boldsymbol{u}})(\bm{x}-\bar{\boldsymbol{x}})^2}_{\text{quadratic term}}+\underbrace{\frac12\nabla_{\boldsymbol{u}\boldsymbol{u}}^2f(\bar{\boldsymbol{x}},\bar{\boldsymbol{u}})(\boldsymbol{u}-\bar{\boldsymbol{u}})^2}_{\text{quadratic term}}\\&+\underbrace{\boldsymbol{f}_{\boldsymbol{x}\boldsymbol{u}}(\bar{\boldsymbol{x}},\bar{\boldsymbol{u}})(\boldsymbol{x}-\bar{\boldsymbol{x}})(\boldsymbol{u}-\bar{\boldsymbol{u}})}_{\text{cross quaduatic lerm}}\\&+\mathcal{O}\left(\|\bm{x}-\bar{\boldsymbol{x}}\|^3\right)+\mathcal{O}\left(\|\boldsymbol{u}-\bar{\boldsymbol{u}}\|^3\right)\end{aligned}\tag{54} dtdx=f(x,u)=f(xˉ,uˉ)+linear term xf(xˉ,uˉ)(xxˉ)+linear term xf(xˉ,uˉ)(uuˉ)+quadratic term 21xx2f(xˉ,uˉ)(xxˉ)2+quadratic term 21uu2f(xˉ,uˉ)(uuˉ)2+cross quaduatic lerm fxu(xˉ,uˉ)(xxˉ)(uuˉ)+O(xxˉ3)+O(uuˉ3)(54)

or we can put it into a more compact matrix form with notation δ x = x − x ˉ \delta \bm{x}=\bm{x}-\bar{\bm{x}} δx=xxˉ and δ u = u − u ˉ \delta \bm{u}=\bm{u}-\bar{\bm{u}} δu=uuˉ:

d δ x d t = d ( x − x ˉ ) d t = f ( x , u ) − f ( x ˉ , u ˉ ) ≈ [ ∇ x f ( x ˉ , u ˉ ) ∇ x f ( x ˉ , u ˉ ) ] T [ δ x δ u ] + 1 2 [ δ x δ u ] T [ ∇ x x 2 f ( x ˉ , u ˉ ) ∇ x u 2 f ( x ˉ , u ˉ ) ∇ u x 2 f ( x ˉ , u ˉ ) ∇ u u 2 f ( x ˉ , u ˉ ) ] [ δ x δ u ] (55) \begin{aligned}\frac{d\delta\boldsymbol{x}}{dt}&=\frac{d(\boldsymbol{x}-\bar{\boldsymbol{x}})}{dt}=\boldsymbol{f}(\boldsymbol{x},\boldsymbol{u})-\boldsymbol{f}(\bar{\boldsymbol{x}},\bar{\boldsymbol{u}})\\&\approx\left[\begin{array}{c}\nabla_{\boldsymbol{x}}\boldsymbol{f}(\bar{\boldsymbol{x}},\bar{\boldsymbol{u}})\\\nabla_{\boldsymbol{x}}\boldsymbol{f}(\bar{\boldsymbol{x}},\bar{\boldsymbol{u}})\end{array}\right]^T\left[\begin{array}{c}\delta x\\\delta u\end{array}\right]+\frac12\left[\begin{array}{c}\delta x\\\delta u\end{array}\right]^T\left[\begin{array}{c}\nabla_{\boldsymbol{x}\boldsymbol{x}}^2f(\bar{\boldsymbol{x}},\bar{\boldsymbol{u}})&\nabla_{\boldsymbol{x}\boldsymbol{u}}^2f(\bar{\boldsymbol{x}},\bar{\boldsymbol{u}})\\\nabla_{\boldsymbol{u}\boldsymbol{x}}^2f(\bar{\boldsymbol{x}},\bar{\boldsymbol{u}})&\nabla_{\boldsymbol{u}\boldsymbol{u}}^2f(\bar{\boldsymbol{x}},\bar{\boldsymbol{u}})\end{array}\right]\left[\begin{array}{c}\delta x\\\delta u\end{array}\right]\end{aligned}\tag{55} dtdδx=dtd(xxˉ)=f(x,u)f(xˉ,uˉ)[xf(xˉ,uˉ)xf(xˉ,uˉ)]T[δxδu]+21[δxδu]T[xx2f(xˉ,uˉ)ux2f(xˉ,uˉ)xu2f(xˉ,uˉ)uu2f(xˉ,uˉ)][δxδu](55)

Then similarly for the value function, we have
V ( x k , u k ) = min ⁡ u k ℓ ( x k , u k ) + V ( x k + 1 ) = min ⁡ δ u k ℓ ( x ˉ k + δ x k , u ˉ k + δ u k ) + V ( x ˉ k + 1 + δ x k + 1 ) (56) \begin{aligned}V(\boldsymbol{x}_k,\boldsymbol{u}_k)&=\min_{\boldsymbol{u}_k}\ell(\boldsymbol{x}_k,\boldsymbol{u}_k)+V(\boldsymbol{x}_{k+1})\\ &=\min_{\delta\boldsymbol{u}_{k}}\ell(\bar{\boldsymbol{x}}_{k}+\delta\boldsymbol{x}_{k},\bar{\boldsymbol{u}}_{k}+\delta\boldsymbol{u}_{k})+V(\bar{\boldsymbol{x}}_{k+1}+\delta\boldsymbol{x}_{k+1})\end{aligned}\tag{56} V(xk,uk)=ukmin(xk,uk)+V(xk+1)=δukmin(xˉk+δxk,uˉk+δuk)+V(xˉk+1+δxk+1)(56)
as in DP backward iteration, we need to express the value function at next time step k + 1 k + 1 k+1 as a function of the current state x k \bm{x}_k xk and control u k \bm{u}_k uk in order to solve for u k ∗ . \bm{u}_k^*. uk. Hence we can use the Taylor expansior again:

V ( x ˉ k + 1 + δ x k + 1 ) = V ( x ˉ k + 1 ) + ∇ x V ( x ˉ k + 1 ) T δ x k + 1 + 1 2 δ x k + 1 T ∇ x x 2 V ( x ˉ k + 1 ) δ x k + 1 + O ( ∥ δ x k + 1 ∥ 3 ) (57) \begin{aligned}V(\bar{\boldsymbol{x}}_{k+1}+\delta\boldsymbol{x}_{k+1})&=V(\bar{\boldsymbol{x}}_{k+1})+\nabla_{\boldsymbol{x}}V(\bar{\boldsymbol{x}}_{k+1})^{T}\delta\boldsymbol{x}_{k+1}+\frac{1}{2}\delta\boldsymbol{x}_{k+1}^{T}\nabla_{\boldsymbol{x}\boldsymbol{x}}^{2}V(\bar{\boldsymbol{x}}_{k+1})\delta\boldsymbol{x}_{k+1}\\ &+\mathcal{O}\left(\|\delta\boldsymbol{x}_{k+1}\|^3\right)\end{aligned}\tag{57} V(xˉk+1+δxk+1)=V(xˉk+1)+xV(xˉk+1)Tδxk+1+21δxk+1Txx2V(xˉk+1)δxk+1+O(δxk+13)(57)

to go further, we need to express δ x k + 1 \delta \bm{x}_{k+1} δxk+1 as a function of δ x k \delta\boldsymbol{x}_k δxk and δ u k \delta\boldsymbol{u}_k δuk, which can be done by the dynamics in Eq.55 as
δ x k + 1 ≈ δ x k + [ ∇ x f ( x ˉ k , u ˉ k ) ∇ x f ( x ˉ k , u ˉ k ) ] T [ δ x k δ u k ] + 1 2 [ δ x k δ u k ] T [ ∇ x x 2 f ( x ˉ k , u ˉ k ) ∇ x u 2 f ( x ˉ k , u ˉ k ) ∇ u x 2 f ( x ˉ k , u ˉ k ) ∇ u u 2 f ( x ˉ k , u ˉ k ) ] [ δ x k δ u k ] (58) \begin{aligned}\delta\boldsymbol{x}_{k+1}&\approx\delta\boldsymbol{x}_{k}+\left[\begin{array}{c}\nabla_{\boldsymbol{x}}\boldsymbol{f}(\bar{\boldsymbol{x}}_{k},\bar{\boldsymbol{u}}_{k})\\\nabla_{\boldsymbol{x}}\boldsymbol{f}(\bar{\boldsymbol{x}}_{k},\bar{\boldsymbol{u}}_{k})\end{array}\right]^{T}\left[\begin{array}{c}\delta\boldsymbol{x}_{k}\\\delta\boldsymbol{u}_{k}\end{array}\right]\\&+\frac{1}{2}\left[\begin{array}{c}\delta\boldsymbol{x}_{k}\\\delta\boldsymbol{u}_{k}\end{array}\right]^{T}\left[\begin{array}{c}\nabla_{\boldsymbol{x}\boldsymbol{x}}^{2}\boldsymbol{f}(\bar{\boldsymbol{x}}_{k},\bar{\boldsymbol{u}}_{k})&\nabla_{\boldsymbol{x}\boldsymbol{u}}^{2}\boldsymbol{f}(\bar{\boldsymbol{x}}_{k},\bar{\boldsymbol{u}}_{k})\\\nabla_{\boldsymbol{u}\boldsymbol{x}}^{2}\boldsymbol{f}(\bar{\boldsymbol{x}}_{k},\bar{\boldsymbol{u}}_{k})&\nabla_{\boldsymbol{u}\boldsymbol{u}}^{2}\boldsymbol{f}(\bar{\boldsymbol{x}}_{k},\bar{\boldsymbol{u}}_{k})\end{array}\right]\left[\begin{array}{c}\delta\boldsymbol{x}_{k}\\\delta\boldsymbol{u}_{k}\end{array}\right]\\\end{aligned}\tag{58} δxk+1δxk+[xf(xˉk,uˉk)xf(xˉk,uˉk)]T[δxkδuk]+21[δxkδuk]T[xx2f(xˉk,uˉk)ux2f(xˉk,uˉk)xu2f(xˉk,uˉk)uu2f(xˉk,uˉk)][δxkδuk](58)

Here we trim the 2 n d 2^{nd} 2nd order term for simplicity, and use the non-matrix for for plugging it back into approximating the value function at k + 1 k + 1 k+1 as in Eq.57
δ x k + 1 ≈ δ x k + ∇ x f ( x ˉ k , u ˉ k ) δ x k + ∇ u f ( x ˉ k , u ˉ k ) δ u k = ( I + ∇ x f ( x ˉ k , u ˉ k ) ) ⏟ ∇ x f ~ ( x ˉ k , u ˉ k ) δ x k + ∇ u f ( x ˉ k , u ˉ k ) δ u k (59) \begin{aligned}\delta\boldsymbol{x}_{k+1}&\approx\delta\boldsymbol{x}_{k}+\nabla_{\boldsymbol{x}}\boldsymbol{f}(\bar{\boldsymbol{x}}_{k},\bar{\boldsymbol{u}}_{k})\delta\boldsymbol{x}_{k}+\nabla_{\boldsymbol{u}}\boldsymbol{f}(\bar{\boldsymbol{x}}_{k},\bar{\boldsymbol{u}}_{k})\delta\boldsymbol{u}_{k}\\&=\underbrace{(\boldsymbol{I}+\nabla_{\boldsymbol{x}}\boldsymbol{f}(\bar{\boldsymbol{x}}_{k},\bar{\boldsymbol{u}}_{k}))}_{\nabla_{\boldsymbol{x}}\tilde{\boldsymbol{f}}(\bar{\boldsymbol{x}}_{k},\bar{\boldsymbol{u}}_{k})}\delta\boldsymbol{x}_{k}+\nabla_{\boldsymbol{u}}\boldsymbol{f}(\bar{\boldsymbol{x}}_{k},\bar{\boldsymbol{u}}_{k})\delta\boldsymbol{u}_{k}\end{aligned}\tag{59} δxk+1δxk+xf(xˉk,uˉk)δxk+uf(xˉk,uˉk)δuk=xf~(xˉk,uˉk) (I+xf(xˉk,uˉk))δxk+uf(xˉk,uˉk)δuk(59)
which yields
V ( x ˉ k + 1 + δ x k + 1 ) ≈ V ( x ˉ k + 1 ) + ∇ x V ( x ˉ k + 1 ) T [ ∇ x f ~ ( x ˉ k , u ˉ k ) δ x k + ∇ u f ( x ˉ k , u ˉ k ) δ u k ] + 1 2 [ ∇ x f ~ ( x ˉ k , u ˉ k ) δ x k + ∇ u f ( x ˉ k , u ˉ k ) δ u k ] T ∇ x x 2 V ( x ˉ k + 1 ) + [ ∇ x f ~ ( x ˉ k , u ˉ k ) δ x k + ∇ u f ( x ˉ k , u ˉ k ) δ u k ] (60) \begin{aligned}V(\bar{\boldsymbol{x}}_{k+1}+\delta\boldsymbol{x}_{k+1})\approx&V(\bar{\boldsymbol{x}}_{k+1})\\&+\nabla_{\boldsymbol{x}}V(\bar{\boldsymbol{x}}_{k+1})^{T}\left[\nabla_{\boldsymbol{x}}\tilde{\boldsymbol{f}}(\bar{\boldsymbol{x}}_{k},\bar{\boldsymbol{u}}_{k})\delta\boldsymbol{x}_{k}+\nabla_{\boldsymbol{u}}\boldsymbol{f}(\bar{\boldsymbol{x}}_{k},\bar{\boldsymbol{u}}_{k})\delta\boldsymbol{u}_{k}\right]\\&+\frac{1}{2}\left[\nabla_{\boldsymbol{x}}\tilde{\boldsymbol{f}}(\bar{\boldsymbol{x}}_{k},\bar{\boldsymbol{u}}_{k})\delta\boldsymbol{x}_{k}+\nabla_{\boldsymbol{u}}\boldsymbol{f}(\bar{\boldsymbol{x}}_{k},\bar{\boldsymbol{u}}_{k})\delta\boldsymbol{u}_{k}\right]^{T}\nabla_{\boldsymbol{x}\boldsymbol{x}}^{2}V(\bar{\boldsymbol{x}}_{k+1})\\&+\left[\nabla_{\boldsymbol{x}}\tilde{\boldsymbol{f}}(\bar{\boldsymbol{x}}_{k},\bar{\boldsymbol{u}}_{k})\delta\boldsymbol{x}_{k}+\nabla_{\boldsymbol{u}}\boldsymbol{f}(\bar{\boldsymbol{x}}_{k},\bar{\boldsymbol{u}}_{k})\delta\boldsymbol{u}_{k}\right]\end{aligned}\tag{60} V(xˉk+1+δxk+1)V(xˉk+1)+xV(xˉk+1)T[xf~(xˉk,uˉk)δxk+uf(xˉk,uˉk)δuk]+21[xf~(xˉk,uˉk)δxk+uf(xˉk,uˉk)δuk]Txx2V(xˉk+1)+[xf~(xˉk,uˉk)δxk+uf(xˉk,uˉk)δuk](60)

as shown above, the Taylor expansion equation can be pretty long and will get even longer afterwards, hence we need to adapt some notations to make it more compact.
V ( x ˉ k + 1 ) = V ′ f ( x ˉ k , u ˉ k ) = f (61) V(\bar{\boldsymbol{x}}_{k+1})=V'\\ \bm{f}(\bar{\bm{x}}_k,\bar{\bm{u}}_k)=\bm{f}\tag{61} V(xˉk+1)=Vf(xˉk,uˉk)=f(61)

then
V ( x ˉ k + 1 + δ x k + 1 ) ≈ V ′ + ∇ x V ′ T [ ∇ x f ~ δ x k + ∇ u f δ u k ] + 1 2 [ ∇ x f ~ δ x k + ∇ u f δ u k ] T ∇ x x 2 V ′ [ ∇ x f ~ δ x k + ∇ u f δ u k ] = V ′ + ∇ x V ′ T ∇ x f ~ δ x k + ∇ x V ′ T ∇ u f δ u k + 1 2 δ x k T ∇ x f ~ T ∇ x x 2 V ′ ∇ x f ~ δ x k + 1 2 δ x k T ∇ x f ~ T ∇ x x 2 V ′ ∇ u f δ u k + 1 2 δ u k T ∇ u f T ∇ x x 2 V ′ ∇ x f ~ δ x k + 1 2 δ u k T ∇ u f T ∇ x x 2 V T ∇ x x 2 V ′ ∇ u f δ u k (62) \begin{aligned}V(\bar{\boldsymbol{x}}_{k+1}+\delta\boldsymbol{x}_{k+1})&\approx V^{\prime}+\nabla_{\boldsymbol{x}}V^{\prime T}\left[\nabla_{\boldsymbol{x}}\tilde{\boldsymbol{f}}\delta\boldsymbol{x}_{k}+\nabla_{\boldsymbol{u}}f\delta\boldsymbol{u}_{k}\right]\\ &\quad\quad+\frac{1}{2}\left[\nabla_{\boldsymbol{x}}\tilde{\boldsymbol{f}}\delta\boldsymbol{x}_{k}+\nabla_{\boldsymbol{u}}f\delta\boldsymbol{u}_{k}\right]^{T}\nabla_{\boldsymbol{x}\boldsymbol{x}}^{2}V^{\prime}\left[\nabla_{\boldsymbol{x}}\tilde{\boldsymbol{f}}\delta\boldsymbol{x}_{k}+\nabla_{\boldsymbol{u}}f\delta\boldsymbol{u}_{k}\right]\\ &=V^{\prime}+\nabla_{\boldsymbol{x}}V^{\prime T}\nabla_{\boldsymbol{x}}\tilde{\boldsymbol{f}}\delta\boldsymbol{x}_{k}+\nabla_{\boldsymbol{x}}V^{\prime T}\nabla_{\boldsymbol{u}}f\delta\boldsymbol{u}_{k}\\ &\quad\quad+\frac{1}{2}\delta\boldsymbol{x}_{k}^{T}\nabla_{\boldsymbol{x}}\tilde{\boldsymbol{f}}^{T}\nabla_{\boldsymbol{x}\boldsymbol{x}}^{2}V^{\prime}\nabla_{\boldsymbol{x}}\tilde{\boldsymbol{f}}\delta\boldsymbol{x}_{k}+\frac{1}{2}\delta\boldsymbol{x}_{k}^{T}\nabla_{\boldsymbol{x}}\tilde{\boldsymbol{f}}^{T}\nabla_{\boldsymbol{x}\boldsymbol{x}}^{2}V^{\prime}\nabla_{\boldsymbol{u}}f\delta\boldsymbol{u}_{k} \\ &\quad\quad+\frac{1}{2}\delta\boldsymbol{u}_{k}^{T}\nabla_{\boldsymbol{u}}\bm{f}^{T}\nabla_{\boldsymbol{x}\boldsymbol{x}}^2V^{\prime}\nabla_{\boldsymbol{x}}\tilde{\boldsymbol{f}}\delta\boldsymbol{x}_{k}+\frac{1}{2}\delta\boldsymbol{u}_{k}^{T}\nabla_{\boldsymbol{u}}\bm{f}^{T}\nabla_{\boldsymbol{x}\boldsymbol{x}}^2V^{T}\nabla_{\boldsymbol{x}\boldsymbol{x}}^{2}V^{\prime}\nabla_{\boldsymbol{u}}\bm{f}\delta\boldsymbol{u}_{k} \end{aligned}\tag{62} V(xˉk+1+δxk+1)V+xVT[xf~δxk+ufδuk]+21[xf~δxk+ufδuk]Txx2V[xf~δxk+ufδuk]=V+xVTxf~δxk+xVTufδuk+21δxkTxf~Txx2Vxf~δxk+21δxkTxf~Txx2Vufδuk+21δukTufTxx2Vxf~δxk+21δukTufTxx2VTxx2Vufδuk(62)

By now we have approximated the latter part of the RHS of Eq. 56, and we still need the second order expansion of the running cost function to fully approximate the value function, which is pretty straightforward:
ℓ ( x k , u k ) ≈ ℓ ( x ˉ k , u ˉ k ) + ∇ x ℓ ( x ˉ k , u ˉ k ) T δ x k + ∇ u ℓ ( x ˉ k , u ˉ k ) T δ u k + 1 2 δ x k T ∇ x x 2 ℓ ( x ˉ k , u ˉ k ) T δ x k + 1 2 δ u k T ∇ u u 2 ℓ ( x ˉ k , u ˉ k ) δ u k + 1 2 δ x k T ∇ x u 2 ℓ ( x ˉ k , u ˉ k ) δ u k + 1 2 δ u k T ∇ u x 2 ℓ ( x ˉ k , u ˉ k ) δ x k (63) \begin{aligned}\ell(\boldsymbol{x}_{k},\boldsymbol{u}_{k})&\approx\ell(\bar{\boldsymbol{x}}_{k},\bar{\boldsymbol{u}}_{k})+\nabla_{\boldsymbol{x}}\ell(\bar{\boldsymbol{x}}_{k},\bar{\boldsymbol{u}}_{k})^{T}\delta\boldsymbol{x}_{k}+\nabla_{\boldsymbol{u}}\ell(\bar{\boldsymbol{x}}_{k},\bar{\boldsymbol{u}}_{k})^{T}\delta\boldsymbol{u}_{k}\\&+\frac{1}{2}\delta\boldsymbol{x}_{k}^{T}\nabla_{\boldsymbol{x}\boldsymbol{x}}^{2}\ell(\bar{\boldsymbol{x}}_{k},\bar{\boldsymbol{u}}_{k})^{T}\delta\boldsymbol{x}_{k}+\frac{1}{2}\delta\boldsymbol{u}_{k}^{T}\nabla_{\boldsymbol{u}\boldsymbol{u}}^{2}\ell(\bar{\boldsymbol{x}}_{k},\bar{\boldsymbol{u}}_{k})\delta\boldsymbol{u}_{k}\\&+\frac{1}{2}\delta\boldsymbol{x}_{k}^{T}\nabla_{\boldsymbol{x}\boldsymbol{u}}^{2}\ell(\bar{\boldsymbol{x}}_{k},\bar{\boldsymbol{u}}_{k})\delta\boldsymbol{u}_{k}+\frac{1}{2}\delta\boldsymbol{u}_{k}^{T}\nabla_{\boldsymbol{u}\boldsymbol{x}}^{2}\ell(\bar{\boldsymbol{x}}_{k},\bar{\boldsymbol{u}}_{k})\delta\boldsymbol{x}_{k}\end{aligned}\tag{63} (xk,uk)(xˉk,uˉk)+x(xˉk,uˉk)Tδxk+u(xˉk,uˉk)Tδuk+21δxkTxx2(xˉk,uˉk)Tδxk+21δukTuu2(xˉk,uˉk)δuk+21δxkTxu2(xˉk,uˉk)δuk+21δukTux2(xˉk,uˉk)δxk(63)

Now we can plug Eq 62 and 63 into Eq. 56 and get the final approximation of the value function:
V ( x k , u k ) = min ⁡ u k [ ℓ + V ′ + ( ∇ x f ~ T ∇ x V ′ + ∇ x ℓ ) T δ x k + ( ∇ u f T ∇ x V ′ + ∇ u ℓ ) T δ u k + 1 2 δ x k T ( ∇ x f ~ T ∇ x x 2 V ′ ∇ x f ~ + ∇ x x 2 ℓ ) δ x k + 1 2 δ x k T ( ∇ x f ~ T ∇ x x 2   V ′ ∇ u f + ∇ x u 2 ℓ ) δ u k + 1 2 δ u k T ( ∇ u f T ∇ x x 2 V ′ ∇ x f ~ + ∇ u x 2 ℓ ) δ x k + 1 2 δ u k T ( ∇ u f T ∇ x x 2 V ′ ∇ u f + ∇ u u 2 ℓ ) δ u k ] (64) \begin{aligned} V(\boldsymbol{x}_{k},\boldsymbol{u}_{k})&=\min_{\boldsymbol{u}_{k}}[\ell+V^{\prime}+(\nabla_{\boldsymbol{x}}\tilde{\boldsymbol{f}}^{T}\nabla_{\boldsymbol{x}}V^{\prime}+\nabla_{\boldsymbol{x}}\ell)^{T}\delta\boldsymbol{x}_{k}+(\nabla_{\boldsymbol{u}}\boldsymbol{f}^{T}\nabla_{\boldsymbol{x}}V^{\prime}+\nabla_{\boldsymbol{u}}\ell)^{T}\delta\boldsymbol{u}_{k}\\ &+\frac{1}{2}\delta\boldsymbol{x}_{k}^{T}(\nabla_{\boldsymbol{x}}\tilde{\boldsymbol{f}}^{T}\nabla_{\boldsymbol{x}\boldsymbol{x}}^{2}V^{\prime}\nabla_{\boldsymbol{x}}\tilde{\boldsymbol{f}}+\nabla_{\boldsymbol{x}\boldsymbol{x}}^{2}\ell)\delta\boldsymbol{x}_{k}\\ &+\frac{1}{2}\delta\boldsymbol{x}_{k}^{T}(\nabla_{\boldsymbol{x}}\tilde{\boldsymbol{f}}^{T}\nabla_{\boldsymbol{x}\boldsymbol{x}}^{2}\:V^{\prime}\nabla_{\boldsymbol{u}}\boldsymbol{f}+\nabla_{\boldsymbol{x}\boldsymbol{u}}^{2}\ell)\delta\boldsymbol{u}_{k}\\ &+\frac{1}{2}\delta\boldsymbol{u}_{k}^{T}(\nabla_{\boldsymbol{u}}\boldsymbol{f}^{T}\nabla_{\boldsymbol{x}\boldsymbol{x}}^{2}V^{\prime}\nabla_{\boldsymbol{x}}\tilde{\boldsymbol{f}}+\nabla_{\boldsymbol{u}\boldsymbol{x}}^{2}\ell)\delta\boldsymbol{x}_{k}\\ &+\frac{1}{2}\delta\boldsymbol{u}_{k}^{T}(\nabla_{\boldsymbol{u}}\boldsymbol{f}^{T}\nabla_{\boldsymbol{x}\boldsymbol{x}}^{2}V^{\prime}\nabla_{\boldsymbol{u}}\boldsymbol{f}+\nabla_{\boldsymbol{u}\boldsymbol{u}}^{2}\ell)\delta\boldsymbol{u}_{k}] \end{aligned}\tag{64} V(xk,uk)=ukmin[+V+(xf~TxV+x)Tδxk+(ufTxV+u)Tδuk+21δxkT(xf~Txx2Vxf~+xx2)δxk+21δxkT(xf~Txx2Vuf+xu2)δuk+21δukT(ufTxx2Vxf~+ux2)δxk+21δukT(ufTxx2Vuf+uu2)δuk](64)

theoretically, we can now set ∇ u k V ( x , u ) = 0 \nabla_{\bm{u}_k}V(\bm{x},\bm{u})=0 ukV(x,u)=0 to get the optimal control input u k ∗ u_k^* uk at step k k k. However, please note these terms in brackets are all derivatives of the dynamics, cost function and value function, hence we can bring in yet another handy notation to make the formulas more compact:
S ( x , u ) = ℓ ( x , u ) + V ′ ( f ( x , u ) ) (65) S(\boldsymbol{x},\boldsymbol{u})=\ell(\boldsymbol{x},\boldsymbol{u})+V'(\bm{f}(\boldsymbol{x},\boldsymbol{u}))\tag{65} S(x,u)=(x,u)+V(f(x,u))(65)

which is commonly referred as the state-action value function. Now we can rewrite the ”optimal” value function at stage k k k as:
V ( x k ) = min ⁡ u k S ( x k , u k ) = min ⁡ u k S ( x ˉ k + δ x , u ˉ k + δ u ) = min ⁡ u k [ S ( x ˉ k , u ˉ k ) + ∇ x S ( x ˉ k , u ˉ k ) T δ x k + ∇ u S ( x ˉ k , u ˉ k ) T δ u k + 1 2 δ x k T ∇ x x 2 S ( x k , u k ) δ x k + 1 2 δ x k T ∇ x u 2 S ( x k , u k ) δ u k + 1 2 δ u k T ∇ u x 2 S ( x k , u k ) δ x k + 1 2 δ u k T ∇ u u 2 S ( x k , u k ) δ u k ] (66) \begin{aligned}V(\boldsymbol{x}_{k})&=\min_{\boldsymbol{u}_{k}}S(\boldsymbol{x}_{k},\boldsymbol{u}_{k})\\ &=\min_{\boldsymbol{u}_{k}}S(\bar{\boldsymbol{x}}_{k}+\delta\boldsymbol{x},\bar{\boldsymbol{u}}_{k}+\delta\boldsymbol{u})\\ &=\min_{\boldsymbol{u}_{k}}[S(\bar{\boldsymbol{x}}_{k},\bar{\boldsymbol{u}}_{k})+\nabla_{\boldsymbol{x}}S(\bar{\boldsymbol{x}}_{k},\bar{\boldsymbol{u}}_{k})^{T}\delta\boldsymbol{x}_{k}+\nabla_{\boldsymbol{u}}S(\bar{\boldsymbol{x}}_{k},\bar{\boldsymbol{u}}_{k})^{T}\delta\boldsymbol{u}_{k}\\ &+\frac{1}{2}\delta\boldsymbol{x}_{k}^{T}\nabla_{\boldsymbol{x}\boldsymbol{x}}^{2}S(\boldsymbol{x}_{k},\boldsymbol{u}_{k})\delta\boldsymbol{x}_{k}+\frac{1}{2}\delta\boldsymbol{x}_{k}^{T}\nabla_{\boldsymbol{x}{\boldsymbol{u}}}^{2}S(\boldsymbol{x}_{k},\boldsymbol{u}_{k})\delta\boldsymbol{u}_{k}\\ &+\frac{1}{2}\delta\boldsymbol{u}_{k}^{T}\nabla_{\boldsymbol{u}\boldsymbol{x}}^{2}S(\boldsymbol{x}_{k},\boldsymbol{u}_{k})\delta\boldsymbol{x}_{k}+\frac{1}{2}\delta\boldsymbol{u}_{k}^{T}\nabla_{\boldsymbol{u}\boldsymbol{u}}^{2}S(\boldsymbol{x}_{k},\boldsymbol{u}_{k})\delta\boldsymbol{u}_{k}]\end{aligned}\tag{66} V(xk)=ukminS(xk,uk)=ukminS(xˉk+δx,uˉk+δu)=ukmin[S(xˉk,uˉk)+xS(xˉk,uˉk)Tδxk+uS(xˉk,uˉk)Tδuk+21δxkTxx2S(xk,uk)δxk+21δxkTxu2S(xk,uk)δuk+21δukTux2S(xk,uk)δxk+21δukTuu2S(xk,uk)δuk](66)

where the derivatives of the state-action value function are defined as in Eq. 64:
∇ x S ( x ˉ k , u ˉ k ) = ∇ x f ~ T ∇ x V ′ + ∇ x ℓ ∇ u S ( x ˉ k , u ˉ k ) = ∇ u f T ∇ x V ′ + ∇ u ℓ ∇ x x 2 S ( x k , u k ) = ∇ x f ~ T ∇ x x 2 V ′ ∇ x f ~ + ∇ x x 2 ℓ ∇ x u 2 S ( x k , u k ) = ∇ x f ~ T ∇ x x 2 V ′ ∇ u f + ∇ x u 2 ℓ ∇ u x 2 S ( x k , u k ) = ∇ u f T ∇ x x 2 V ′ ∇ x f ~ + ∇ u x 2 ℓ ∇ u u 2 S ( x k , u k ) = ∇ u f T ∇ x x 2 V ′ ∇ u f + ∇ u u 2 ℓ (67) \begin{aligned} \nabla_{\boldsymbol{x}}S(\bar{\boldsymbol{x}}_{k},\bar{\boldsymbol{u}}_{k})&=\nabla_{\boldsymbol{x}}\tilde{f}^{T}\nabla_{\boldsymbol{x}}V^{\prime}+\nabla_{\boldsymbol{x}}\ell \\ \nabla_{\boldsymbol{u}}S(\bar{\boldsymbol{x}}_k,\bar{\boldsymbol{u}}_k)&=\nabla_{\boldsymbol{u}}f^T\nabla_{\boldsymbol{x}}V^{\prime}+\nabla_{\boldsymbol{u}}\ell \\ \nabla_{\boldsymbol{xx}}^{2}S(\boldsymbol{x}_{k},\boldsymbol{u}_{k})&=\nabla_{\boldsymbol{x}}\tilde{\boldsymbol{f}}^{T}\nabla_{\boldsymbol{xx}}^{2}V^{\prime}\nabla_{\boldsymbol{x}}\tilde{\boldsymbol{f}}+\nabla_{\boldsymbol{xx}}^{2}\ell \\ \nabla_{\boldsymbol{xu}}^{2}S(\boldsymbol{x}_{k},\boldsymbol{u}_{k})&=\nabla_{\boldsymbol{x}}\tilde{\boldsymbol{f}}^{T}\nabla_{\boldsymbol{xx}}^{2}V^{\prime}\nabla_{\boldsymbol{u}}f+\nabla_{\boldsymbol{xu}}^{2}\ell \\ \nabla_{\boldsymbol{ux}}^{2}S(\boldsymbol{x}_{k},\boldsymbol{u}_{k})&=\nabla_{\boldsymbol{u}}f^{T}\nabla_{\boldsymbol{xx}}^{2}V^{\prime}\nabla_{\boldsymbol{x}}\tilde{\boldsymbol{f}}+\nabla_{\boldsymbol{ux}}^{2}\ell \\ \nabla_{\boldsymbol{uu}}^{2}S(\boldsymbol{x}_{k},\boldsymbol{u}_{k})&=\nabla_{\boldsymbol{u}}f^{T}\nabla_{\boldsymbol{xx}}^{2}V^{\prime}\nabla_{\boldsymbol{u}}\boldsymbol{f}+\nabla_{\boldsymbol{uu}}^{2}\ell \end{aligned} \tag{67} xS(xˉk,uˉk)uS(xˉk,uˉk)xx2S(xk,uk)xu2S(xk,uk)ux2S(xk,uk)uu2S(xk,uk)=xf~TxV+x=ufTxV+u=xf~Txx2Vxf~+xx2=xf~Txx2Vuf+xu2=ufTxx2Vxf~+ux2=ufTxx2Vuf+uu2(67)

still, we can re-introduce the second order Taylor expansion of the dynamics which we have omitted earlier:
∇ x S ( x ˉ k , u ˉ k ) = ∇ x f ~ T ∇ x V ′ + ∇ x ℓ ∇ u S ( x ˉ k , u ˉ k ) = ∇ u f T ∇ x V ′ + ∇ u ℓ ∇ x x 2 S ( x k , u k ) = ∇ x f ~ T ∇ x x 2 V ′ ∇ x f ~ + ∇ x V ′ T ∇ x x 2 f ~ + ∇ x x 2 ℓ ∇ x u 2 S ( x k , u k ) = ∇ x f ~ T ∇ x x 2 V ′ ∇ u f + ∇ x V ′ T ∇ x u 2 f + ∇ x u 2 ℓ ∇ u x 2 S ( x k , u k ) = ∇ u f T ∇ x x 2 V ′ ∇ x f ~ + ∇ x V ′ T ∇ u x 2 f + ∇ u x 2 ℓ ∇ u u 2 S ( x k , u k ) = ∇ u f T ∇ x x 2 V ′ ∇ u f + ∇ x V ′ T ∇ u u 2 f + ∇ u u 2 ℓ (68) \begin{aligned} \nabla_{\boldsymbol{x}}S(\bar{\boldsymbol{x}}_{k},\bar{\boldsymbol{u}}_{k})&=\nabla_{\boldsymbol{x}}\tilde{\boldsymbol{f}}^{T}\nabla_{\boldsymbol{x}}V^{\prime}+\nabla_{\boldsymbol{x}}\ell\\ \nabla_{\boldsymbol{u}}S(\bar{\boldsymbol{x}}_{k},\bar{\boldsymbol{u}}_{k})&=\nabla_{\boldsymbol{u}}f^{T}\nabla_{\boldsymbol{x}}V^{\prime}+\nabla_{\boldsymbol{u}}\ell\\ \nabla_{\boldsymbol{x}\boldsymbol{x}}^{2}S(\boldsymbol{x}_{k},\boldsymbol{u}_{k})&=\nabla_{\boldsymbol{x}}\tilde{\boldsymbol{f}}^{T}\nabla_{\boldsymbol{x}\boldsymbol{x}}^{2}V^{\prime}\nabla_{\boldsymbol{x}}\tilde{\boldsymbol{f}}+\nabla_{\boldsymbol{x}}V^{\prime T}\nabla_{\boldsymbol{x}\boldsymbol{x}}^{2}\tilde{\boldsymbol{f}}+\nabla_{\boldsymbol{x}\boldsymbol{x}}^{2}\ell\\ \nabla_{\boldsymbol{x}\boldsymbol{u}}^{2}S(\boldsymbol{x}_{k},\boldsymbol{u}_{k})&=\nabla_{\boldsymbol{x}}\tilde{\boldsymbol{f}}^{T}\nabla_{\boldsymbol{x}\boldsymbol{x}}^{2}V^{\prime}\nabla_{\boldsymbol{u}}\boldsymbol{f}+\nabla_{\boldsymbol{x}}V^{\prime T}\nabla_{\boldsymbol{x}\boldsymbol{u}}^{2}\boldsymbol{f}+\nabla_{\boldsymbol{x}\boldsymbol{u}}^{2}\ell\\ \nabla_{\boldsymbol{u}\boldsymbol{x}}^{2}S(\boldsymbol{x}_{k},\boldsymbol{u}_{k})&=\nabla_{\boldsymbol{u}}\boldsymbol{f}^{T}\nabla_{\boldsymbol{x}\boldsymbol{x}}^{2}V^{\prime}\nabla_{\boldsymbol{x}}\tilde{\boldsymbol{f}}+\nabla_{\boldsymbol{x}}V^{\prime T}\nabla_{\boldsymbol{u}\boldsymbol{x}}^{2}\boldsymbol{f}+\nabla_{\boldsymbol{u}\boldsymbol{x}}^{2}\ell\\ \nabla_{\boldsymbol{u}\boldsymbol{u}}^{2}S(\boldsymbol{x}_{k},\boldsymbol{u}_{k})&=\nabla_{\boldsymbol{u}}\boldsymbol{f}^{T}\nabla_{\boldsymbol{x}\boldsymbol{x}}^{2}V^{\prime}\nabla_{\boldsymbol{u}}{\boldsymbol{f}}+\nabla_{\boldsymbol{x}}V^{\prime T}\nabla_{\boldsymbol{u}\boldsymbol{u}}^{2}\boldsymbol{f}+\nabla_{\boldsymbol{u}\boldsymbol{u}}^{2}\ell \end{aligned}\tag{68} xS(xˉk,uˉk)uS(xˉk,uˉk)xx2S(xk,uk)xu2S(xk,uk)ux2S(xk,uk)uu2S(xk,uk)=xf~TxV+x=ufTxV+u=xf~Txx2Vxf~+xVTxx2f~+xx2=xf~Txx2Vuf+xVTxu2f+xu2=ufTxx2Vxf~+xVTux2f+ux2=ufTxx2Vuf+xVTuu2f+uu2(68)

these two sets of derivatives make iterative LQR and DDP respectively. The difference between the two methods is that iterative LQR uses the first order Taylor expansion of the dynamics, while DDP uses the second order Taylor expansion of the dynamics.

Now we can solve the optimal control input u k ∗ \bm{u}_k^* uk by letting the gradient of the value function in Eq. 66 over u \bm{u} u be zero:

∇ u S ( x k , u k ) = ∇ u S ( x ˉ k , u ˉ k ) T + δ x k T ∇ x u 2 S ( x k , u k ) + δ u k T ∇ u u 2 S ( x k , u k ) = 0    ⟹    u k ∗ = − ∇ u u 2 S − 1 ∇ u S ( x ˉ k , u ˉ k ) ⏟ feedforward term k − ∇ u u 2 S − 1 ∇ u x 2 S δ x k ⏟ feedback term K (69) \begin{aligned}&\nabla_{\boldsymbol{u}}S(\boldsymbol{x}_{k},\boldsymbol{u}_{k})=\nabla_{\boldsymbol{u}}S(\bar{\boldsymbol{x}}_{k},\bar{\boldsymbol{u}}_{k})^{T}+\delta\boldsymbol{x}_{k}^{T}\nabla_{\boldsymbol{x}\boldsymbol{u}}^{2}S(\boldsymbol{x}_{k},\boldsymbol{u}_{k})+\delta\boldsymbol{u}_{k}^{T}\nabla_{\boldsymbol{u}\boldsymbol{u}}^{2}S(\boldsymbol{x}_{k},\boldsymbol{u}_{k})=0\\&\implies\boldsymbol{u}_{k}^{*}=\underbrace{-\nabla_{\boldsymbol{u}\boldsymbol{u}}^{2}S^{-1}\nabla_{\boldsymbol{u}}S(\bar{\boldsymbol{x}}_{k},\bar{\boldsymbol{u}}_{k})}_{\text{feedforward term k}}\underbrace{-\nabla_{\boldsymbol{u}\boldsymbol{u}}^{2}S^{-1}\nabla_{\boldsymbol{u}\boldsymbol{x}}^{2}S\delta\boldsymbol{x}_{k}}_{\text{feedback term K}}\end{aligned}\tag{69} uS(xk,uk)=uS(xˉk,uˉk)T+δxkTxu2S(xk,uk)+δukTuu2S(xk,uk)=0uk=feedforward term k uu2S1uS(xˉk,uˉk)feedback term K uu2S1ux2Sδxk(69)

3.2 Backward Pass

Given the one step optimal policy u k ∗ \bm{u}_k^* uk, we can start from the terminal step k = N k = N k=N and working backwards as we did in Dynamic Programming with Bellman’s equation.

Unlike normal DP, with the action-value function introduced, we still need the recursion for S S S to propagate, and also we are relying heavily on the derivatives of the value function at the next step (in other words, we should calculate and store that for the next iteration backwards at current step), in order to tackle that, we can plug the optimal control input u k ∗ \bm{u}_k^* uk back into Eq. 66.

V ( x k ) = S ( x k , u k ) = S ( x ˉ k + δ x , u ˉ k + δ u ) = S ( x ˉ k , u ˉ k ) + ∇ x S ( x ˉ k , u ˉ k ) T δ x k + ∇ u S ( x ˉ k , u ˉ k ) T δ u k ∗ + 1 2 δ x k T ∇ x x 2 S ( x k , u k ) δ x k + 1 2 δ x k T ∇ x u 2 S ( x k , u k ) δ u k ∗ + 1 2 δ u k ∗ T ∇ u x 2 S ( x k , u k ) δ x k + 1 2 δ u k ∗ T ∇ u u 2 S ( x k , u k ) δ u k ∗ (70) \begin{aligned}V(\boldsymbol{x}_{k})=&S(\boldsymbol{x}_{k},\boldsymbol{u}_{k})\\=&S(\bar{\boldsymbol{x}}_{k}+\delta\boldsymbol{x},\bar{\boldsymbol{u}}_{k}+\delta\boldsymbol{u})\\=&S(\bar{\boldsymbol{x}}_{k},\bar{\boldsymbol{u}}_{k})+\nabla_{\boldsymbol{x}}S(\bar{\boldsymbol{x}}_{k},\bar{\boldsymbol{u}}_{k})^{T}\delta\boldsymbol{x}_{k}+\nabla_{\boldsymbol{u}}S(\bar{\boldsymbol{x}}_{k},\bar{\boldsymbol{u}}_{k})^{T}\delta\boldsymbol{u}_{k}^{*}\\&+\frac{1}{2}\delta\boldsymbol{x}_{k}^{T}\nabla_{\boldsymbol{x}\boldsymbol{x}}^{2}S(\boldsymbol{x}_{k},\boldsymbol{u}_{k})\delta\boldsymbol{x}_{k}+\frac{1}{2}\delta\boldsymbol{x}_{k}^{T}\nabla_{\boldsymbol{x}\boldsymbol{u}}^{2}S(\boldsymbol{x}_{k},\boldsymbol{u}_{k})\delta\boldsymbol{u}_{k}^{*}\\&+\frac{1}{2}\delta\boldsymbol{u}_{k}^{*T}\nabla_{\boldsymbol{u}\boldsymbol{x}}^{2}S(\boldsymbol{x}_{k},\boldsymbol{u}_{k})\delta\boldsymbol{x}_{k}+\frac{1}{2}\delta\boldsymbol{u}_{k}^{*T}\nabla_{\boldsymbol{u}\boldsymbol{u}}^{2}S(\boldsymbol{x}_{k},\boldsymbol{u}_{k})\delta\boldsymbol{u}_{k}^{*}\end{aligned}\tag{70} V(xk)===S(xk,uk)S(xˉk+δx,uˉk+δu)S(xˉk,uˉk)+xS(xˉk,uˉk)Tδxk+uS(xˉk,uˉk)Tδuk+21δxkTxx2S(xk,uk)δxk+21δxkTxu2S(xk,uk)δuk+21δukTux2S(xk,uk)δxk+21δukTuu2S(xk,uk)δuk(70)

where
u k ∗ = − ∇ u u 2 S − 1 ∇ u S ( x ˉ k , u ˉ k ) − ∇ u u 2 S − 1 ∇ u x 2 S δ x k (71) \boldsymbol{u}_{k}^{*}=-\nabla_{\boldsymbol{uu}}^{2}S^{-1}\nabla_{\boldsymbol{u}}S(\bar{\boldsymbol{x}}_{k},\bar{\boldsymbol{u}}_{k})-\nabla_{\boldsymbol{uu}}^{2}S^{-1}\nabla_{\boldsymbol{ux}}^{2}S\delta\boldsymbol{x}_{k}\tag{71} uk=uu2S1uS(xˉk,uˉk)uu2S1ux2Sδxk(71)
reorgnizing the above equation leads to
V ( x k ) = S ( x ˉ k , u ˉ k ) − 1 2 ∇ u S T ∇ u u 2 S − 1 ∇ u S + ( ∇ x S T − ∇ u S T ∇ u u 2 S − 1 ∇ u , x 2 S ) δ x + 1 2 δ x T ( ∇ x x 2 S T − ∇ x u 2 S ∇ u u 2 S − 1 ∇ u x 2 S ) δ x (72) \begin{aligned}V(\boldsymbol{x}_{k})&=S(\bar{\boldsymbol{x}}_{k},\bar{\boldsymbol{u}}_{k})-\frac{1}{2}\nabla_{\boldsymbol{u}}S^{T}\nabla_{\boldsymbol{u}\boldsymbol{u}}^{2}S^{-1}\nabla_{\boldsymbol{u}}S+(\nabla_{\boldsymbol{x}}S^{T}-\nabla_{\boldsymbol{u}}S^{T}\nabla_{\boldsymbol{u}\boldsymbol{u}}^{2}S^{-1}\nabla_{\boldsymbol{u},\boldsymbol{x}}^{2}S)\delta\boldsymbol{x}\\&+\frac{1}{2}\delta\boldsymbol{x}^{T}(\nabla_{\boldsymbol{x}\boldsymbol{x}}^{2}S^{T}-\nabla_{\boldsymbol{x}\boldsymbol{u}}^{2}S\nabla_{\boldsymbol{u}\boldsymbol{u}}^{2}S^{-1}\nabla_{\boldsymbol{u}\boldsymbol{x}}^{2}S)\delta\boldsymbol{x}\end{aligned}\tag{72} V(xk)=S(xˉk,uˉk)21uSTuu2S1uS+(xSTuSTuu2S1u,x2S)δx+21δxT(xx2STxu2Suu2S1ux2S)δx(72)

which has a clear structure of ordered items from 0 to 2, i.e., we can approximately use
Δ V = S ( x ˉ k , u ˉ k ) − 1 2 ∇ u S T ∇ u u 2 S − 1 ∇ u S ∇ x V = ∇ x S T − ∇ u S T ∇ u u 2 S − 1 ∇ u , x 2 S ∇ x x V = ∇ x x 2 S T − ∇ x u 2 S ∇ u u 2 S − 1 ∇ u x 2 S (73) \begin{aligned} \Delta V&=S(\bar{\boldsymbol{x}}_{k},\bar{\boldsymbol{u}}_{k})-\frac{1}{2}\nabla_{u}S^{T}\nabla_{uu}^{2}S^{-1}\nabla_{u}S\\ \nabla_{\bm{x}}V&=\nabla_{x}S^{T}-\nabla_{u}S^{T}\nabla_{uu}^{2}S^{-1}\nabla_{u,x}^{2}S\\ \nabla_{\bm{xx}}V&=\nabla_{xx}^{2}S^{T}-\nabla_{xu}^{2}S\nabla_{uu}^{2}S^{-1}\nabla_{ux}^{2}S\tag{73} \end{aligned} ΔVxVxxV=S(xˉk,uˉk)21uSTuu2S1uS=xSTuSTuu2S1u,x2S=xx2STxu2Suu2S1ux2S(73)
for step k − 1 k-1 k1.

3.3 Line Search

Please be noted that the second order expansion of dynamics, cost function and value function can only hold locally, it can be possible that the u ∗ \bm{u}* u overshoots and leads to inefficiency or even unstable optimisation process. Hence we can introduce a line-search step for evluation the ferformance of applying scaled feedforward term by α = 0 ∼ 1 \alpha = 0 ∼ 1 α=01 during the iteration.

3.4 Forward Pass

Once we have finished the backward pass from the last timestep to the first one, we can roll-out the system response forward with dynamics.

Algorithm 1 DDP
Require: Initial State x 0 \bm{x}_0 x0, dynamics model x i + 1 = f ( x i , u i ) \bm{x}_{i+1}=f(\bm{x}_i,\bm{u}_i) xi+1=f(xi,ui), Initial reference control sequence U 0 \bm{U}_0 U0
Ensure: Optimal Control Sequence U ∗ \bm{U}^* U, Optimal State Trajectory X ∗ \bm{X}^* X
 1: function D D P ( f , l , l f ) DDP(f,l,l_f) DDP(f,l,lf)
 2:     Initial State x 0 \bm{x}_0 x0, Initial reference control sequence U 0 \bm{U}_0 U0
 3:     while not converged do
 4:          ∇ x V ( x N ) = ∇ x ℓ f \nabla_{\bm{x}}V(\boldsymbol{x}_N)=\nabla_{\boldsymbol{x}}\ell_f xV(xN)=xf
 5:          ∇ x x V ( x N ) = ∇ x x ℓ f \nabla_{\bm{xx}}V(\boldsymbol{x}_N)=\nabla_{\boldsymbol{x}\bm{x}}\ell_f xxV(xN)=xxf
 6:         for i = N − 1 → 0 i= N- 1\to 0 i=N10 do
 7:              k , K ← B a c k w a r d   P a s s \bm{k}, \bm{K}\leftarrow Backward\ Pass k,KBackward Pass
 8:         end for
 9:         for i = 0 → N − 1 i= 0\to N-1 i=0N1 do
10:             Forward Pass with line search
11:         end for
12:         return Optimal Trajectory
13:     end while
14: end function
15: function Forward Pass
16:     Initialization, x ˉ 0 ← x 0 , U = U 0 , α = 1 \bar {\bm{x}}_0\leftarrow \bm{x}_0, \boldsymbol{U}= \boldsymbol{U}_0, \alpha = 1 xˉ0x0,U=U0,α=1
17:     for i = 0 → N − 1 i= 0\to N-1 i=0N1 do
18:          u ‾ i ← u i + α k [ i ] + K [ i ] ( x ‾ i − x i ) \overline{\bm{u}}_i\leftarrow \bm{u}_i+\alpha k[i]+K[i](\overline{\bm{x}}_i-\bm{x}_i) uiui+αk[i]+K[i](xixi)
19:          x ‾ i + 1 = f ( x ‾ i , u ‾ i ) \overline{\bm{x}}_{i+1}=f(\overline{\bm{x}}_i,\overline{\bm{u}}_i) xi+1=f(xi,ui)
20:          X [ i ] ← x ‾ i \boldsymbol{X}[i]\leftarrow\overline{\bm{x}}_i X[i]xi
21:          U [ i ] ← u ‾ i \bm{U}[i]\leftarrow\overline{\bm{u}}_i U[i]ui
22:     end for
23:      α = ρ α \alpha=\rho\alpha α=ρα
24:      X [ N ] ← x ‾ N \bm{X}[N]\leftarrow\overline{\bm{x}}_N X[N]xN
25:      T ← { X , U } \bm{T}\leftarrow\{\bm{X},\bm{U}\} T{X,U}
26:      J = J ( T ) J=J(T) J=J(T)
27:     return j , T j,\bm T j,T
28: end function
29: function Backward Pass
30:     Initialization, V x ← l f , x ( x n ) , V x , x ← l f , x ( x n ) , k ← [ ] , K ← [ ] V_{\boldsymbol{x}}\leftarrow l_{f,\boldsymbol{x}}(\bm{x}_{n}),V_{\boldsymbol{x},\boldsymbol{x}}\leftarrow l_{f,\boldsymbol{x}}(\bm{x}_{n}),k\leftarrow[],K\leftarrow[] Vxlf,x(xn),Vx,xlf,x(xn),k[],K[]
31:     for i = N − 1 → 0 i= N- 1\to 0 i=N10 do
32:          S x ← l x ∣ x i + ( f x T V x ) ∣ x i S_{x}\leftarrow l_{x}|_{x_{i}}+(f_{x}^{T}V_{x})|_{x_{i}} Sxlxxi+(fxTVx)xi
33:          S u ← l u ∣ x i + ( f u T V x ) ∣ u i , x i S_{u}\leftarrow l_{u}|_{{\boldsymbol{x}_i}}+(f_{u}^{T}V_{x})|_{{\boldsymbol{u}_i,\boldsymbol{x}_i}} Suluxi+(fuTVx)ui,xi
34:          S x x ← l x x ∣ x i + ( f x T V x x f x ) ∣ x i S_{\boldsymbol{x}\boldsymbol{x}}\leftarrow l_{\boldsymbol{x}\boldsymbol{x}}|_{\boldsymbol{x}_i}+(f_x^TV_{\boldsymbol{xx}}f_x)|_{\boldsymbol{x}_i} Sxxlxxxi+(fxTVxxfx)xi
35:          S u u ← l u u ∣ u i + ( f u T V x x f u ) ∣ u i , x i , u i S_{\boldsymbol{u}\boldsymbol{u}}\leftarrow l_{\boldsymbol{u}\boldsymbol{u}}|_{\boldsymbol{u}_i}+(f_u^TV_{\boldsymbol{xx}}f_u)|_{\boldsymbol{u}_i,\boldsymbol{x}_i,\boldsymbol{u}_i} Suuluuui+(fuTVxxfu)ui,xi,ui
36:          S u x ← l u x ∣ u i , x i + ( f u T V x x f x ) ∣ u i , x i , x i S_{\boldsymbol{u}\boldsymbol{x}}\leftarrow l_{\boldsymbol{u}\boldsymbol{x}}|_{\boldsymbol{u}_i,\boldsymbol{x}_i}+(f_u^TV_{\boldsymbol{xx}}f_x)|_{\boldsymbol{u}_i,\boldsymbol{x}_i,\boldsymbol{x}_i} Suxluxui,xi+(fuTVxxfx)ui,xi,xi
37:          S ^ u ← l u ∣ x i + ( f u T ( V x + μ I n ) ) ∣ u i , x i \widehat{S}_{\bm{u}}\leftarrow l_{u}|_{{\boldsymbol{x}_i}}+(f_{u}^{T}(V_{x}+\mu \bm{I}_n))|_{{\boldsymbol{u}_i,\boldsymbol{x}_i}} S uluxi+(fuT(Vx+μIn))ui,xi
38:          S ~ u u ← l u u ∣ u i + ( f u T ( V x x + μ I n ) ) f u ) ∣ u i , x i , u i \widetilde{S}_{\boldsymbol{u}\boldsymbol{u}}\leftarrow l_{\boldsymbol{u}\boldsymbol{u}}|_{\boldsymbol{u}_i}+(f_u^T(V_{\boldsymbol{xx}}+\mu \bm{I}_n))f_u)|_{\boldsymbol{u}_i,\boldsymbol{x}_i,\boldsymbol{u}_i} S uuluuui+(fuT(Vxx+μIn))fu)ui,xi,ui
39:          S ~ u x ← l u x ∣ u i , x i + ( f u T ( V x x + μ I n ) f x ) ∣ u i , x i , x i \widetilde{S}_{\boldsymbol{u}\boldsymbol{x}}\leftarrow l_{\boldsymbol{u}\boldsymbol{x}}|_{\boldsymbol{u}_i,\boldsymbol{x}_i}+(f_u^T(V_{\boldsymbol{xx}}+\mu \bm{I}_n)f_x)|_{\boldsymbol{u}_i,\boldsymbol{x}_i,\boldsymbol{x}_i} S uxluxui,xi+(fuT(Vxx+μIn)fx)ui,xi,xi
40:          k [ i ] ← − S ~ u u − 1 S ^ u k[i]\leftarrow-\widetilde{S}_{\boldsymbol{uu}}^{-1}\widehat{S}_{\boldsymbol{u}} k[i]S uu1S u
41:          K [ i ] ← − S ~ u u − 1 S ~ u x K[i]\leftarrow -\widetilde{S}_{\boldsymbol{uu}}^{-1}\widetilde{S}_{\boldsymbol{u}\bm x} K[i]S uu1S ux
42:          V x ← S x − K T S u u k V_{x}\leftarrow S_{x}-K^{T}S_{\boldsymbol{uu}}k VxSxKTSuuk
43:          V x x ← S x x − K T S u u K V_{xx}\leftarrow S_{xx}-K^TS_{uu}K VxxSxxKTSuuK
44:     end for
45:     return k \bm{k} k, K \bm{K} K
46: end function
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值