增广拉格朗日方法在iLQR算法中的应用
背景
在之前的文章《iLQR算法公式推导》中,我们推导了iLQR算法在无约束情况下的迭代公式,但是在实际应用中,我们往往需要考虑约束条件。本文将介绍如何使用增广拉格朗日法Augmented Lagrangian Methods来处理约束条件。
约束优化问题
具有约束的优化问题具有如下形式:
min x 0 : N , u 0 : N − 1 ℓ N ( x N ) + ∑ k = 0 N − 1 ℓ k ( x k , u k ) s . t . x k + 1 = f ( x k , u k ) , k = 0 , ⋯ , N − 1 g k ( x k , u k ) ≤ 0 , h k ( x k , u k ) = 0 , (1) \begin{aligned} \min_{x_{0:N},u_{0:N-1}} \quad & \ell_N(x_N)+\sum_{k=0}^{N-1}\ell_k(x_k,u_k)\\ s.t. \quad & x_{k+1}=f(x_k,u_k),\quad k=0,\cdots,N-1\\ & g_k(x_k,u_k)\leq 0,\\ & h_k(x_k,u_k)=0, \end{aligned} \tag{1} x0:N,u0:N−1mins.t.ℓN(xN)+k=0∑N−1ℓk(xk,uk)xk+1=f(xk,uk),k=0,⋯,N−1gk(xk,uk)≤0,hk(xk,uk)=0,(1)
其中 k k k是时间步, x k ∈ R n x x_k\in\mathbb{R}^{n_x} xk∈Rnx是状态, u k ∈ R n u u_k\in\mathbb{R}^{n_u} uk∈Rnu是控制量, f f f是状态转移函数, ℓ f \ell_f ℓf是终端损失函数, ℓ k \ell_k ℓk是中间损失函数, g k g_k gk是不等式约束, h k h_k hk是等式约束。
增广拉格朗日法
将约束优化问题转化为增广拉格朗日函数的形式:
L ( x 0 : N , u 0 : N − 1 , λ 0 : N , μ 0 : N − 1 ) = ℓ N ( x N ) + ( λ N + 1 2 I μ N c N ( x N ) ) T c N ( x N ) + ∑ k = 0 N − 1 [ ℓ k ( x k , u k ) + ( λ k + 1 2 I μ k c k ( x k , u k ) ) T c k ( x k , u k ) ] = L N ( x N , λ N , μ N ) + ∑ k = 0 N − 1 L k ( x k , u k , λ k , μ k ) , (2) \begin{aligned} \mathcal{L}(x_{0:N},u_{0:N-1},\lambda_{0:N},\mu_{0:N-1})&=\ell_N(x_N)+(\lambda_N + \frac{1}{2}I_{\mu_N}c_N(x_N))^Tc_N(x_N)\\ &+\sum_{k=0}^{N-1}\left[\ell_k(x_k,u_k)+(\lambda_k + \frac{1}{2}I_{\mu_k}c_k(x_k,u_k))^Tc_k(x_k,u_k)\right]\\ &=\mathcal{L}_N(x_N,\lambda_N,\mu_N)+\sum_{k=0}^{N-1}\mathcal{L}_k(x_k,u_k,\lambda_k,\mu_k), \end{aligned} \tag{2} L(x0:N,u0:N−1,λ0:N,μ0:N−1)=ℓN(xN)+(λN+21IμNcN(xN))TcN(xN)+k=0∑N−1[ℓk(xk,uk)+(λk+21Iμkck(xk,uk))Tck(xk,uk)]=LN(xN,λN,μN)+k=0∑N−1Lk(xk,uk,λk,μk),(2)
其中 λ k ∈ R p k \lambda_k\in\mathbb{R}^{p_k} λk∈Rpk是拉格朗日乘子, μ k ∈ R p k \mu_k\in\mathbb{R}^{p_k} μk∈Rpk是惩罚系数, c k = ( g k , h k ) ∈ R p k c_k=(g_k,h_k)\in\mathbb{R}^{p_k} ck=(gk,hk)∈Rpk是不等式约束和等式约束的组合,相应的,不等式约束的序号和等式约束的序号的集合分别是 I k \mathcal{I}_k Ik和 E k \mathcal{E}_k Ek, I μ k I_{\mu_k} Iμk是对角矩阵,它的定义如下:
I μ k , i i = { 0 , if c k i < 0 ∧ λ k i = 0 , i ∈ I k μ k i , otherwise , (3) I_{\mu_k,ii}= \begin{cases} 0, & \text{if $c_{k_i}<0$ $\land$ $\lambda_{k_i} = 0$},i\in\mathcal{I}_k\\ \mu_{k_i}, & \text{otherwise}, \end{cases} \tag{3} Iμk,ii={0,μki,if cki<0 ∧ λki=0,i∈Ikotherwise,(3)
其中 k i k_i ki表示第 k k k个时间步的第 i i i个约束, c k i c_{k_i} cki表示第 k k k个时间步的第 i i i个约束的值, λ k i \lambda_{k_i} λki表示第 k k k个时间步的第 i i i个约束的拉格朗日乘子, μ k i \mu_{k_i} μki表示第 k k k个时间步的第 i i i个约束的惩罚系数。
系统的动力学约束可以通过如下方式显式得处理:使用系统的初始状态 x 0 x_0 x0和控制量 u 0 : N − 1 u_{0:N-1} u0:N−1,通过状态转移函数 f f f得到状态序列 x 1 : N x_{1:N} x1:N。
增广拉格朗日法的迭代公式
根据iLQR算法的反向传播过程,我们可以通过固定拉格朗日乘子和惩罚系数来定义cost-to-go函数:
J ^ N ( x N ) ∣ λ , μ = L N ( x N , λ N , μ N ) = ℓ N ( x N ) + ( λ N + 1 2 I μ N c N ( x N ) ) T c N ( x N ) J ^ k ( x k ) ∣ λ , μ = min u k [ L k ( x k , u k , λ k , μ k ) + J ^ k + 1 ( f ( x k , u k ) ) ∣ λ , μ ] = min u k J ~ k ( x k , u k ) ∣ λ , μ , (4) \begin{aligned} \hat J_N(x_N)|_{\lambda,\mu}&=\mathcal{L}_N(x_N,\lambda_N,\mu_N)\\ &=\ell_N(x_N)+(\lambda_N + \frac{1}{2}I_{\mu_N}c_N(x_N))^Tc_N(x_N)\\ \hat J_k(x_k)|_{\lambda,\mu}&=\min_{u_k}\left[\mathcal{L}_k(x_k,u_k,\lambda_k,\mu_k)+\hat{J}_{k+1}(f(x_k,u_k))|_{\lambda,\mu}\right]\\&=\min_{u_k}\tilde{J}_k(x_k,u_k)|_{\lambda,\mu}, \end{aligned} \tag{4} J^N(xN)∣λ,μJ^k(xk)∣λ,μ=LN(xN,λN,μN)=ℓN(xN)+(λN+21IμNcN(xN))TcN(xN)=ukmin[Lk(xk,uk,λk,μk)+J^k+1(f(xk,uk))∣λ,μ]=ukminJ~k(xk,uk)∣λ,μ,(4)
根据上述新的 J ^ k ( x k ) ∣ λ , μ \hat J_k(x_k)\vert_{\lambda,\mu} J^k(xk)∣λ,μ 和 J ~ k ( x k ) ∣ λ , μ \tilde J_k(x_k)\vert_{\lambda,\mu} J~k(xk)∣λ,μ 的定义对其进行泰勒展开:
δ J ^ k ( x k ) ∣ λ , μ ≈ p k T δ x k + 1 2 δ x k T P k δ x k (5) \begin{aligned} \delta \hat J_k(x_k)\vert_{\lambda,\mu}&\approx p_k^T\delta x_k+\frac{1}{2}\delta x_k^TP_k\delta x_k\\ \end{aligned} \tag{5} δJ^k(xk)∣λ,μ≈pkTδxk+21δxkTPkδxk(5)
根据定义,我们可以得到终末(terminal)的最优cost-to-go函数的二阶展开之后的系数:
p N = ∇ x L N ( x N , λ N , μ N ) = ( ℓ N ) x + ( c N ) x T ( λ + I μ N c N ) P N = ∇ x x L N ( x N , λ N , μ N ) = ( ℓ N ) x x + ( c N ) x T I μ N ( c N ) x (6) \begin{aligned} p_N&=\nabla_x\mathcal{L}_N(x_N,\lambda_N,\mu_N)\\ &=(\ell_N)_x+(c_N)^T_x(\lambda+I_{\mu_N}c_N)\\ P_N&=\nabla_{xx}\mathcal{L}_N(x_N,\lambda_N,\mu_N)\\ &=(\ell_N)_{xx}+(c_N)^T_{x}I_{\mu_N}(c_N)_x\\ \end{aligned} \tag{6} pNPN=∇xLN(xN,λN,μN)=(ℓN)x+(cN)xT(λ+IμNcN)=∇xxLN(xN,λN,μN)=(ℓN)xx+(cN)xTIμN(cN)x(6)
其中:
( ℓ N ) x = ∂ ℓ N ∂ x ( c N ) x = ∂ c N ∂ x ( ℓ N ) x x = ∂ 2 ℓ N ∂ x 2 ( c N ) x x = ∂ 2 c N ∂ x 2 (7) \begin{aligned} (\ell_N)_x&=\frac{\partial \ell_N}{\partial x}\\ (c_N)_x&=\frac{\partial c_N}{\partial x}\\ (\ell_N)_{xx}&=\frac{\partial^2 \ell_N}{\partial x^2}\\ (c_N)_{xx}&=\frac{\partial^2 c_N}{\partial x^2}\\ \end{aligned} \tag{7} (ℓN)x(cN)x(ℓN)xx(cN)xx=∂x∂ℓN=∂x∂cN=∂x2∂2ℓN=∂x2∂2cN(7)
接下来我们看action-value函数的二阶展开之后的系数,根据iLQR中的推导:
δ J ~ i ( x i , u i ) = 1 2 [ δ x i δ u i ] T [ Q x i 2 Q x i u i Q u i x i Q u i 2 ] [ δ x i δ u i ] + [ Q x i Q u i ] T [ δ x i δ u i ] (8) \begin{aligned} \delta \tilde{J}_i(x_i,u_i)&=\frac{1}{2} \begin{bmatrix} \delta x_i \\ \delta u_i \end{bmatrix}^T \begin{bmatrix} Q_{x_i^2} & Q_{x_iu_i}\\ Q_{u_ix_i} & Q_{u_i^2} \end{bmatrix} \begin{bmatrix} \delta x_i \\ \delta u_i \end{bmatrix}+ \begin{bmatrix} Q_{x_i} \\ Q_{u_i} \end{bmatrix}^T \begin{bmatrix} \delta x_i \\ \delta u_i \end{bmatrix}\\ \end{aligned} \tag{8} δJ~i(xi,ui)=21[δxiδui]T[Qxi2QuixiQxiuiQui2][δxiδui]+[QxiQui]T[δxiδui](8)
其中 Q x i Q_{x_i} Qxi, Q u i Q_{u_i} Qui, Q x i 2 Q_{x_i^2} Qxi2, Q x i u i Q_{x_iu_i} Qxiui, Q u i x i Q_{u_ix_i} Quixi, Q u i 2 Q_{u_i^2} Qui2的形式和iLQR中的有所不同,它们的定义如下:
Q x i = ℓ x i + A i T p i + 1 + ( c i ) x i T ( λ i + I μ i c i ) Q u i = ℓ u i + B i T p i + 1 + ( c i ) u i T ( λ i + I μ i c i ) Q x i 2 = ℓ x i x i + A i T P i + 1 A i + ( c i ) x i T I μ i ( c i ) x i Q x i u i = ℓ x i u i + A i T P i + 1 B i + ( c i ) x i T I μ i ( c i ) u i Q u i x i = ℓ u i x i + B i T P i + 1 A i + ( c i ) u i T I μ i ( c i ) x i Q u i 2 = ℓ u i u i + B i T P i + 1 B i + ( c i ) u i T I μ i ( c i ) u i (9) \begin{aligned} Q_{x_i}&=\ell_{x_i}+A_i^Tp_{i+1}+(c_i)^T_{x_i}(\lambda_i+I_{\mu_i}c_i)\\ Q_{u_i}&=\ell_{u_i}+B_i^Tp_{i+1}+(c_i)^T_{u_i}(\lambda_i+I_{\mu_i}c_i)\\ Q_{x_i^2}&=\ell_{x_ix_i}+A_i^TP_{i+1}A_i+(c_i)^T_{x_i}I_{\mu_i}(c_i)_{x_i}\\ Q_{x_iu_i}&=\ell_{x_iu_i}+A_i^TP_{i+1}B_i+(c_i)^T_{x_i}I_{\mu_i}(c_i)_{u_i}\\ Q_{u_ix_i}&=\ell_{u_ix_i}+B_i^TP_{i+1}A_i+(c_i)^T_{u_i}I_{\mu_i}(c_i)_{x_i}\\ Q_{u_i^2}&=\ell_{u_iu_i}+B_i^TP_{i+1}B_i+(c_i)^T_{u_i}I_{\mu_i}(c_i)_{u_i}\\ \end{aligned} \tag{9} QxiQuiQxi2QxiuiQuixiQui2=ℓxi+AiTpi+1+(ci)xiT(λi+Iμici)=ℓui+BiTpi+1+(ci)uiT(λi+Iμici)=ℓxixi+AiTPi+1Ai+(ci)xiTIμi(ci)xi=ℓxiui+AiTPi+1Bi+(ci)xiTIμi(ci)ui=ℓuixi+BiTPi+1Ai+(ci)uiTIμi(ci)xi=ℓuiui+BiTPi+1Bi+(ci)uiTIμi(ci)ui(9)
其中:
A i = ∂ f ∂ x ∣ x i , u i B i = ∂ f ∂ u ∣ x i , u i (10) \begin{aligned} A_i&=\frac{\partial f}{\partial x}|_{x_i,u_i}\\ B_i&=\frac{\partial f}{\partial u}|_{x_i,u_i}\\ \end{aligned} \tag{10} AiBi=∂x∂f∣xi,ui=∂u∂f∣xi,ui(10)
接下来就和iLQR中的过程一样了,其中反向传播的过程如下:
δ u ^ i = − Q u i 2 − 1 ( Q u i + Q u i x i T δ x i ) ≜ K i δ x i + d i K i = − Q u i 2 − 1 Q u i x i T d i = − Q u i 2 − 1 Q u i p i = Q x i + K i T Q u i 2 d i + Q x i u i d i + K i T Q u P i = Q x i 2 + K i T Q u i 2 K i + Q x i u i K i + K i T Q u i x i Δ J ^ i = 1 2 d i T Q u i 2 d i + Q u i T d i (11) \begin{aligned} \delta \hat u_i &= -Q_{u_i^2}^{-1}(Q_{u_i}+Q_{u_ix_i}^T\delta x_i)\\ &\triangleq K_i\delta x_i + d_i\\ K_i &= -Q_{u_i^2}^{-1}Q_{u_ix_i}^T\\ d_i &= -Q_{u_i^2}^{-1}Q_{u_i}\\ p_i&=Q_{x_i}+K_i^TQ_{u_i^2}d_i+Q_{x_iu_i}d_i+K_i^TQ_u\\ P_i&=Q_{x_i^2}+K_i^TQ_{u_i^2}K_i+Q_{x_iu_i}K_i+K_i^TQ_{u_ix_i}\\ \Delta \hat J_i&=\frac{1}{2}d_i^TQ_{u_i^2}d_i+Q_{u_i}^Td_i\\ \end{aligned} \tag{11} δu^iKidipiPiΔJ^i=−Qui2−1(Qui+QuixiTδxi)≜Kiδxi+di=−Qui2−1QuixiT=−Qui2−1Qui=Qxi+KiTQui2di+Qxiuidi+KiTQu=Qxi2+KiTQui2Ki+QxiuiKi+KiTQuixi=21diTQui2di+QuiTdi(11)
当然,如果考虑正则化, δ u ^ i \delta \hat u_i δu^i的计算方式如下:
方法1:
Q ‾ u i 2 = Q u i 2 + ρ I = ℓ u i 2 + B i T P i + 1 B i + ρ I d i = − Q ‾ u i 2 − 1 Q u i K i = − Q ‾ u i 2 − 1 Q u i x i Δ J ^ i = 1 2 d i T Q u i 2 d i + Q u i T d i p i = Q x i + K i T Q u i 2 d i + Q x i u i d i + K i T Q u i P i = Q x i 2 + K i T Q u i 2 K i + Q x i u i K i + K i T Q u i x i (12) \begin{aligned} \overline{Q}_{u_i^2} &= Q_{u_i^2}+\rho I\\ &= \ell_{u_i^2}+B_i^TP_{i+1}B_i+\rho I\\ \\ d_i &= -\overline{Q}_{u_i^2}^{-1}Q_{u_i}\\ K_i &= -\overline{Q}_{u_i^2}^{-1}Q_{u_ix_i}\\ \\ \Delta \hat J_i &= \frac{1}{2}d_i^TQ_{u_i^2}d_i+Q_{u_i}^Td_i\\ p_i &= Q_{x_i}+K_i^TQ_{u_i^2}d_i+Q_{x_iu_i}d_i+K_i^TQ_{u_i}\\ P_i &= Q_{x_i^2}+K_i^TQ_{u_i^2}K_i+Q_{x_iu_i}K_i+K_i^TQ_{u_ix_i}\\ \end{aligned} \tag{12} Qui2diKiΔJ^ipiPi=Qui2+ρI=ℓui2+BiTPi+1Bi+ρI=−Qui2−1Qui=−Qui2−1Quixi=21diTQui2di+QuiTdi=Qxi+KiTQui2di+Qxiuidi+KiTQui=Qxi2+KiTQui2Ki+QxiuiKi+KiTQuixi(12)
方法2:
Q ‾ u i 2 = ℓ u i 2 + B i T ( P i + 1 + ρ I ) B i + ( c i ) u i T I μ i ( c i ) u i Q ‾ u i x i = ℓ u i x i + B i T ( P i + 1 + ρ I ) A i + ( c i ) u i T I μ i ( c i ) x i d i = − Q ‾ u i 2 − 1 Q u i K i = − Q ‾ u i 2 − 1 Q ‾ u i x i Δ J ^ i = 1 2 d i T Q u i 2 d i + Q u i T d i p i = Q x i + K i T Q u i 2 d i + Q x i u i d i + K i T Q u i P i = Q x i 2 + K i T Q u i 2 K i + Q x i u i K i + K i T Q u i x i (13) \begin{aligned} \overline{Q}_{u_i^2} &= \ell_{u_i^2}+B_i^T(P_{i+1}+\rho I)B_i+(c_i)^T_{u_i}I_{\mu_i}(c_i)_{u_i}\\ \overline{Q}_{u_ix_i} &= \ell_{u_ix_i}+B_i^T(P_{i+1}+\rho I)A_i+(c_i)^T_{u_i}I_{\mu_i}(c_i)_{x_i}\\ \\ d_i &= -\overline{Q}_{u_i^2}^{-1}Q_{u_i}\\ K_i &= -\overline{Q}_{u_i^2}^{-1}\overline{Q}_{u_ix_i}\\ \\ \Delta \hat J_i &= \frac{1}{2}d_i^TQ_{u_i^2}d_i+Q_{u_i}^Td_i\\ p_i &= Q_{x_i}+K_i^TQ_{u_i^2}d_i+Q_{x_iu_i}d_i+K_i^TQ_{u_i}\\ P_i &= Q_{x_i^2}+K_i^TQ_{u_i^2}K_i+Q_{x_iu_i}K_i+K_i^TQ_{u_ix_i}\\ \end{aligned} \tag{13} Qui2QuixidiKiΔJ^ipiPi=ℓui2+BiT(Pi+1+ρI)Bi+(ci)uiTIμi(ci)ui=ℓuixi+BiT(Pi+1+ρI)Ai+(ci)uiTIμi(ci)xi=−Qui2−1Qui=−Qui2−1Quixi=21diTQui2di+QuiTdi=Qxi+KiTQui2di+Qxiuidi+KiTQui=Qxi2+KiTQui2Ki+QxiuiKi+KiTQuixi(13)
更新增广拉格朗日乘子和惩罚系数
在保持 λ \lambda λ和 μ \mu μ不变的情况下进行完iLQR操作之后(如上),我们可以得到新的控制量序列 δ u ^ 0 : N − 1 \delta \hat u_{0:N-1} δu^0:N−1以及对应的状态序列 δ x ^ 0 : N \delta \hat x_{0:N} δx^0:N,接下来我们在这个基础上更新 λ \lambda λ和 μ \mu μ:
λ k i + = { λ k i + μ k i c k i ( x ^ k , u ^ k ) i ∈ E k max ( 0 , λ k i + μ k i c k i ( x ^ k , u ^ k ) ) i ∈ I k (14) \begin{aligned} \lambda_{k_i}^+&=\begin{cases} \lambda_{k_i}+\mu_{k_i}c_{k_i}(\hat x_k,\hat u_k) & i \in \mathcal{E}_k\\ \max(0,\lambda_{k_i}+\mu_{k_i}c_{k_i}(\hat x_k,\hat u_k)) & i \in \mathcal{I}_k\\ \end{cases}\\ \end{aligned} \tag{14} λki+={λki+μkicki(x^k,u^k)max(0,λki+μkicki(x^k,u^k))i∈Eki∈Ik(14)
μ k i + = ϕ μ k i (15) \begin{aligned} \mu_{k_i}^+&=\phi\mu_{k_i} \end{aligned} \tag{15} μki+=ϕμki(15)
其中 ϕ \phi ϕ是一个大于1的常数, λ k i + \lambda_{k_i}^+ λki+和 μ k i + \mu_{k_i}^+ μki+表示第 k k k个时间步的第 i i i个约束的更新后的拉格朗日乘子和惩罚系数。
AL-iLQR算法整体流程
- 初始化 λ \lambda λ, μ \mu μ和 ϕ \phi ϕ;
- 根据当前的 λ \lambda λ和 μ \mu μ,使用iLQR算法计算 δ u ^ 0 : N − 1 \delta \hat u_{0:N-1} δu^0:N−1和 δ x ^ 0 : N \delta \hat x_{0:N} δx^0:N;
- 根据 δ u ^ 0 : N − 1 \delta \hat u_{0:N-1} δu^0:N−1和 δ x ^ 0 : N \delta \hat x_{0:N} δx^0:N,更新 u 0 : N − 1 u_{0:N-1} u0:N−1和 x 0 : N x_{0:N} x0:N。
- 根据 δ u ^ 0 : N − 1 \delta \hat u_{0:N-1} δu^0:N−1和 δ x ^ 0 : N \delta \hat x_{0:N} δx^0:N,更新 λ \lambda λ和 μ \mu μ;
- 检查 max ( c ) > t o l \max(c) > tol max(c)>tol如果是真的,转到步骤2,否则结束。
- 输出 u 0 : N − 1 u_{0:N-1} u0:N−1和 x 0 : N x_{0:N} x0:N和 λ \lambda λ。
至此AL-iLQR算法的推导就完成了。
未经允许,禁止转载。