2023-09-30-连续系统的LQR变分法推导

连续系统的LQR变分法推导

优化控制问题

考虑下述优化问题:

J = h ( x ( t f ) , t f ) + ∫ t 0 t f g ( x ( t ) , u ( t ) , t ) d t subject to x ˙ ( t ) = a ( x ( t ) , u ( t ) , t ) t 0 , x ( t 0 ) fixed t f free x ( t f ) free or fixed (1) \begin{aligned} J=h(x(t_f),t_f)+\int_{t_0}^{t_f}g(x(t),u(t),t)dt\\ \text{subject to}\quad&\dot{x}(t)=a(x(t),u(t),t)\\ &t_0,x(t_0)&\quad\text{fixed}\\ &t_f&\quad\text{free}\\ &x(t_f)&\quad\text{free or fixed} \end{aligned} \tag{1} J=h(x(tf),tf)+t0tfg(x(t),u(t),t)dtsubject tox˙(t)=a(x(t),u(t),t)t0,x(t0)tfx(tf)fixedfreefree or fixed(1)

其中 x ˙ ( t ) = a ( x ( t ) , u ( t ) , t ) \dot{x}(t)=a(x(t),u(t),t) x˙(t)=a(x(t),u(t),t)可以被看作一种约束,因此我们定义拉格朗日乘子 p ( t ) p(t) p(t),同时对原有的代价函数 J J J进行增广,得到如下增广代价函数:

J a = h ( x ( t f ) , t f ) + ∫ t 0 t f [ g ( x ( t ) , u ( t ) , t ) + p ( t ) T { a ( x ( t ) , u ( t ) , t ) − x ˙ ( t ) } ] d t (2) J_a=h(x(t_f),t_f)+\int_{t_0}^{t_f}[g(x(t),u(t),t)+p(t)^T\{a(x(t),u(t),t)-\dot{x}(t)\}]dt \tag{2} Ja=h(x(tf),tf)+t0tf[g(x(t),u(t),t)+p(t)T{ a(x(t),u(t),t)x˙(t)}]dt(2)

J a J_a Ja变分(Variation),得到如下表达式:

δ J a = h x f δ x f + h t f δ t f + ∫ t 0 t f [ g x δ x + g u δ u + ( a − x ˙ ) T δ p ( t ) + p T ( t ) a x δ x + a u δ u − δ x ˙ ] d t + [ g + p T ( a − x ˙ ) ] ( t f ) δ t f (3) \delta J_a=h_{x_f}\delta x_f+h_{t_f}\delta t_f+\int_{t_0}^{t_f}[g_x\delta x+g_u\delta u+(a-\dot{x})^T\delta p(t)+p^T(t){a_x\delta x+a_u\delta u-\delta\dot{x}}]dt+[g+p^T(a-\dot{x})](t_f)\delta t_f \tag{3} δJa=hxfδxf+htfδtf+t0tf[gxδx+guδu+(ax˙)Tδp(t)+pT(t)axδx+auδuδx˙]dt+[g+pT(ax˙)](tf)δtf(3)

其中:

x f = x ( t f ) x ˙ f = x ˙ ( t f ) u f = u ( t f ) p f = p ( t f ) h x f = ∂ h ( x , t ) ∂ x ( x f , t f ) h t f = ∂ h ( x , t ) ∂ t ( x f , t f ) g x = ∂ g ( x , u , t ) ∂ x g u = ∂ g ( x , u , t ) ∂ u [ g + p T ( a − x ˙ ) ] ( t f ) = g ( x f , u f , t f ) + p f T ( a ( x f , u f , t f ) − x ˙ f ) (4) \begin{aligned} x_f&=x(t_f)\\ \dot x_f&=\dot x(t_f)\\ u_f&=u(t_f)\\ p_{f}&=p(t_f)\\ h_{x_f} &= \frac{\partial h(x,t)}{\partial x}(x_f,t_f)\\ h_{tf} &= \frac{\partial h(x,t)}{\partial t}(x_f,t_f)\\ g_x&=\frac{\partial g(x,u,t)}{\partial x}\\ g_u&=\frac{\partial g(x,u,t)}{\partial u}\\ [g+p^T(a-\dot x)](t_f)&=g(x_f,u_f,t_f)+p^T_f(a(x_f,u_f,t_f)-\dot x_f) \end{aligned} \tag{4} xfx˙fufpfhxfhtfgxgu[g+pT(ax˙)](tf)=x(tf)=x˙(tf)=u(tf)=p(tf)=xh(x,t)(xf,tf)=th(x,t)(xf,tf)=xg(x,u,t)=ug(x,u,t)=g(xf,uf,tf)+pfT(a(xf,uf,tf)x˙f)(4)

下面我们定义哈密顿量(Hamiltonian)

H ( x , u , p , t ) = g ( x ( t ) , u ( t ) , t ) + p T ( t ) a ( x ( t ) , u ( t ) , t ) (5) H(x,u,p,t)=g(x(t),u(t),t)+p^T(t)a(x(t),u(t),t) \tag{5} H(x,u,p,t)=g(x(t),u(t),t)+pT(t)a(x(t),u(t),t)(5)

将定义的哈密顿量代入到公式(3)的变分中可以得到:

δ J a = h x f δ x f + [ h t f + g + p T ( a − x ˙ ) ] ( t f ) δ t + ∫ t 0 t f [ H x δ x + H u δ u + ( a − x ˙ ) T δ p ( t ) − p T ( t ) δ x ˙ ⏟ ( 6.1 ) ] d t (6) \begin{aligned} \delta J_a&=h_{x_f}\delta x_f+[h_{t_f}+g+p^T(a-\dot x)](t_f)\delta t\\ &+\int_{t_0}^{t_f}[H_x\delta x+H_u\delta u+(a-\dot x)^T\delta p(t)\underbrace{-p^T(t)\delta\dot x}_{(6.1)}]dt \end{aligned} \tag{6} δJa=hxfδxf+[htf+g+pT(ax˙)](tf)δt+t0tf[Hxδx+Huδu+(ax˙)Tδp(t)(6.1) pT(t)δx˙]dt(6)

我们需要将变分表达式中的变分项进行合并,(6.1)中的 δ x ˙ \delta\dot x δx˙可以通过分部积分法(Integragint by Parts)将导数从变分项中移出,具体操作如下:

− ∫ t 0 t f p T ( t ) δ x ˙ d t = − ∫ t 0 t f p T ( t ) d δ x = − p T δ x ∣ t 0 t f + ∫ t 0 t f ( d p ( t ) d t ) T δ x d t = − p T ( t f ) δ x ( t f ) + ∫ t 0 t f p ˙ T δ x d t = − p T ( t f ) ( δ x f − x ˙ ( t f ) δ

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值