2023-09-25-连续系统的LQR推导

连续系统的LQR推导

连续时域上的DP(Dynamic Programming)

首先考虑如下形式的优化问题:

min ⁡ J = h ( x ( t f ) , t f ) + ∫ t 0 t f g ( x ( t ) , u ( t ) , t ) d t subject to x ˙ = a ( x , u , t ) x ( t 0 ) = x 0 m ( x ( t f ) , t f ) = 0 u ( t ) ∈ U (1) \begin{aligned} &&\min J = h(x(t_f),t_f) + \int_{t_0}^{t_f}g(x(t), u(t), t)dt \\ &\text{subject to} \\ &&\dot{x} &= a(x, u, t) \\ &&x(t_0) &= x_0 \\ &&m(x(t_f), t_f) &= 0 \\ &&u(t) &\in \mathscr{U} \end{aligned} \tag{1} subject tominJ=h(x(tf),tf)+t0tfg(x(t),u(t),t)dtx˙x(t0)m(x(tf),tf)u(t)=a(x,u,t)=x0=0U(1)

其中 t f t_f tf是终止时间, t 0 t_0 t0是起始时间, m ( x ( t f ) , t f ) = 0 m(x(t_f),t_f)=0 m(x(tf),tf)=0是终止条件(可能不唯一,因为 m m m的值域是一个向量), U \mathscr{U} U表示对于 u ( t ) u(t) u(t)的约束。

这个问题的解决的最终形式是一个非线性偏微分方程(Nonlinear Partial Differential Equation),被称作Hamilton-Jacobi-Bellman方程(HJB),下面进行推导。

现在我们设 [ t 0 , t f ] [t_0, t_f] [t0,tf]区间内的任意一个时间点 t t t,我们考虑 [ t , t f ] [t,t_f] [t,tf]这个区间内的代价函数,其中 τ ∈ [ t , t f ] \tau \in [t, t_f] τ[t,tf],那么有如下关系:

J ( x ( t ) , t , u ( τ ) ) = h ( x ( t f ) , t f ) + ∫ t t f g ( x ( τ ) , u ( τ ) , τ ) d τ (2) J(x(t), t, u(\tau)) = h(x(t_f), t_f) + \int_{t}^{t_f}g(x(\tau), u(\tau), \tau)d\tau \tag{2} J(x(t),t,u(τ))=h(x(tf),tf)+ttfg(x(τ),u(τ),τ)dτ(2)

显然我们把区间 [ t , t f ] [t,t_f] [t,tf]分成两个区间来考虑: [ t , t + Δ t ] [t,t+\Delta t] [t,t+Δt] [ t + Δ t , t f ] [t+\Delta t,t_f] [t+Δt,tf]。如下:

J ^ ( x ( t ) , t ) = min ⁡ u ( τ ) ∈ U , τ ∈ [ t , t f ] J ( x ( t ) , t , u ( τ ) ) = min ⁡ u ( τ ) ∈ U , τ ∈ [ t , t f ] { h ( x ( t f ) , t f ) + ∫ t t f g ( x ( τ ) , u ( τ ) , τ ) d τ } = min ⁡ u ( τ ) ∈ U , τ ∈ [ t , t f ] { h ( x ( t f ) , t f ) + ∫ t t + Δ t g ( x ( τ ) , u ( τ ) , τ ) d τ + ∫ t + Δ t t f g ( x ( τ ) , u ( τ ) , τ ) d τ } (4) \begin{aligned} \hat{J}(x(t), t) &= \underset{u(\tau)\in\mathscr{U},\tau\in[t, t_f]}{\min}J(x(t),t,u(\tau)) \\ &=\underset{u(\tau)\in\mathscr{U},\tau\in[t, t_f]}{\min}\left\{h(x(t_f), t_f)+\int_{t}^{t_f}g(x(\tau),u(\tau), \tau)d\tau\right\}\\ &=\underset{u(\tau)\in\mathscr{U},\tau\in[t, t_f]}{\min}\left\{h(x(t_f),t_f)+\int_{t}^{t+\Delta{t}}g(x(\tau), u(\tau),\tau)d\tau+\int_{t+\Delta{t}}^{t_f}g(x(\tau), u(\tau),\tau)d\tau\right\} \end{aligned} \tag{4} J^(x(t),t)=u(τ)U,τ[t,tf]minJ(x(t),t,u(τ))=u(τ)U,τ[t,tf]min{ h(x(tf),tf)+ttfg(x(τ),u(τ),τ)dτ}=u(τ)U,τ[t,tf]min{ h(x(tf),tf)+tt+Δtg(x(τ),u(τ),τ)dτ+t+Δttfg(x(τ),u(τ),τ)dτ}(4)

我们定义 [ t + Δ t , t f ] [t+\Delta t,t_f] [t+Δt,tf]范围内的最优代价函数:

J ^ ( x ( t + Δ t ) , t + Δ t ) = min ⁡ u ( τ ) ∈ U , τ ∈ [ t + Δ t , t f ] { h ( x ( t f ) , t f ) + ∫ t + Δ t t f g ( x ( τ ) , u ( τ ) , τ ) d τ } (5) \begin{aligned} \hat{J}(x(t+\Delta{t}), t+\Delta{t}) &=\underset{u(\tau)\in\mathscr{U},\tau\in[t+\Delta{t}, t_f]}{\min}\left\{h(x(t_f),t_f)+\int_{t+\Delta{t}}^{t_f}g(x(\tau), u(\tau),\tau)d\tau\right\} \end{aligned} \tag{5} J^(x(t+Δt),t+Δt)=u(τ)U,τ[t+Δt,tf]min{ h(x(tf),tf)+t+Δttfg(x(τ),u(τ),τ)dτ}(5)

于是有:

J ^ ( x ( t ) , t ) = min ⁡ u ( τ ) ∈ U , τ ∈ [ t , t + Δ t ] { ∫ t t + Δ t g ( x ( τ ) , u ( τ ) , τ ) d τ + J ^ ( x ( t + Δ t ) , t + Δ t ) } (6) \begin{aligned} \hat{J}(x(t), t) &=\underset{u(\tau)\in\mathscr{U},\tau\in[t, t+\Delta{t}]}{\min}\left\{\int_{t}^{t+\Delta{t}}g(x(\tau), u(\tau),\tau)d\tau+\hat{J}(x(t+\Delta{t}), t+\Delta{t}) \right\} \end{aligned} \tag{6} J^(x(t),t)=u(τ)U,τ[t,t+Δt]min{

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值