最优控制(控制无边界)的必要条件推导- Lagrange problem

首先介绍最优控制问题:

\max\limits_{u} \int_{t_{0}}^{t_{1}} f(t, x(t), u(t))dt

subject to \dot{x}(t)=g(t, x(t), u(t))

                       x(t_{0})=x_{0} and x(t_{1}) free.

所谓的必要性条件就是, 如果 u^{*}(t)x^{*}(t) 是最优的, 那么他们应该满足什么条件? 下面我们开始推导必要条件. 首先, 我们令目标函数

J(u)=\int_{t_{0}}^{t_{1}}f(t, x(t), u(t))dt

假设分段连续的最优控制 u^{*}(t) 存在, 对应的状态变量为 x^{*}(t), 并且对任意的 u 均有 J(u)\leq J(u^{*})<\infty. 令 h(t) 是任一个分段连续函数, \epsilon 是一个实常值. 则

u^{\epsilon}(t)=u^{*}(t)+\epsilon h(t)

是一个分段连续的控制.

令 x^{\epsilon} 是控制 u^{\epsilon} 对应的状态变量, 则 x^{\epsilon} 满足

\frac{\rm{d}}{\rm{d}t}x^{\epsilon}(t)=g(t, x^{\epsilon}(t), u^{\epsilon}(t))

因为所有的轨迹都从相同的位置开始, 我们取 x^{\epsilon}(t_{0})=x_{0}

u^{\epsilon} 的表达式, 我们可以得到 u^{\epsilon}(t)\rightarrow u^{*}(t), \epsilon \rightarrow 0 , 并且

\frac{\partial u^{\epsilon}(t)}{\partial \epsilon}\mid _{\epsilon=0}=h(t)

事实上, 对于 x^{\epsilon} 我们具有同样的结论

x^{\epsilon}(t)\rightarrow x^{*}(t), \epsilon \rightarrow 0 and \frac{\partial x^{\epsilon}(t)}{\partial \epsilon}\mid _{\epsilon=0} exists.

在控制 u^{\epsilon} 处的目标函数变为:

J(u^{\epsilon})=\int_{t_{0}}^{t_{1}} f(t, x^{\epsilon}(t), u^{\epsilon}(t))

对于伴随函数 \lambda(t) , 我们具有如下性质:

\int_{t_{0}}^{t_{1}} \frac{\rm{d}}{\rm{d} t}[\lambda(t) x^{\epsilon}(t)] dt=\lambda(t_{1}) x^{\epsilon}(t_{1})-\lambda(t_{0}) x^{\epsilon}(t_{0})

\int_{t_{0}}^{t_{1}} \frac{\rm{d}}{\rm{d} t}[\lambda(t) x^{\epsilon}(t)] dt + \lambda(t_{0}) x^{\epsilon}(t_{0})-\lambda(t_{1}) x^{\epsilon}(t_{1})=0

从而

J(u^{\epsilon})=\int_{t_{0}}^{t_{1}}\left[ f(t, x^{\epsilon}(t), u^{\epsilon}(t))+ \frac{d}{dt} (\lambda(t)x^{\epsilon}(t))\right]dt + \lambda(t_{0})x_{0}-\lambda(t_{1}) x^{\epsilon}(t)\\ ~~~~~~~~=\int_{t_{0}}^{t_{1}}\left[ f(t, x^{\epsilon}(t), u^{\epsilon}(t))+ \dot{\lambda}(t)x^{\epsilon}(t) + \lambda(t) g(t, x^{\epsilon}(t), u^{\epsilon}(t)) \right]dt + \lambda(t_{0})x_{0}-\lambda(t_{1}) x^{\epsilon}(t)

因为 J 在 u^{*} 处取得最大值, 所以

0=\frac{d}{d\epsilon}J(u^{\epsilon})\mid_{\epsilon=0}=\lim\limits_{\epsilon\rightarrow 0}\frac{J(u^{\epsilon})-J(u^{*})}{\epsilon}

0=\frac{d}{d\epsilon}J(u^{\epsilon}) \mid_{\epsilon}=0\\ ~~ =\int_{t_{0}}^{t_{1}} \frac{\partial}{\partial \epsilon}\left[ f(t, x^{\epsilon}(t), u^{\epsilon}(t)) + \dot{\lambda}(t)x^{\epsilon}(t)+\lambda(t) g(t, x^{\epsilon}(t), u^{\epsilon}(t))dt \right ] \mid_{\epsilon=0}-\frac{\partial}{\partial \epsilon}\lambda(t_{1})x^{\epsilon}(t_{1})\mid_{\epsilon=0}

从而

0=\int_{t_{0}}^{t_{1}}\left[ f_{x} \frac{\partial x^{\epsilon}}{\partial \epsilon}+ f_{u} \frac{\partial u^{\epsilon}}{\partial \epsilon} + \dot{\lambda}(t) \frac{\partial x^{\epsilon}}{\partial \epsilon} + \lambda(t) \left( g_{x}\frac{\partial x^{\epsilon}}{\partial \epsilon}+g_{u} \frac{\partial u^{\epsilon}}{\partial \epsilon} \right )\right ]\mid_{\epsilon=0} dt-\lambda(t_{1})\frac{\partial x^{\epsilon}}{\partial \epsilon}(t_{1})\mid_{\epsilon=0}

0=\int_{t_{0}}^{t_{1}}\left[ \left( f_{x}+\lambda(t)g_{x}+\dot{\lambda}(t) \right ) \frac{\partial x^{\epsilon}}{\partial \epsilon}(t) \mid_{\epsilon=0} + (f_{u}+\lambda(t)g_{u})h(t)\right ] dt-\lambda(t_{1})\frac{\partial x^{\epsilon}}{\partial \epsilon}(t_{1})\mid_{\epsilon=0}   (1)

我们需要选择合适的伴随函数 \lambda(t) 使得 (1) 式恒成立, 因此伴随函数满足:

\dot{\lambda}(t)=-\left[ f_{x}(t, x^{*}(t), u^{*}(t)) +\lambda(t) g_{x}(t, x^{*}(t), u^{*}(t)) \right ]       (3)

我们称此式为伴随方程. 并且满足边界条件:

\lambda(t_{1})=0      (4)

此式称为横截条件. 从而 (1)式变为

0=\int_{t_{0}}^{t_{1}} \left(f_{u}(t, x^{*}(t), u^{*}(t))+\lambda(t)g_{u}(t, x^{*}(t), u^{*}(t))\right)h(t) dt  (5)

此式应该对任意的分段连续函数 h(t) 均成立, 因此不妨取

h(t)=f_{u}(t, x^{*}(t), u^{*}(t))+\lambda(t)g_{u}(t, x^{*}(t), u^{*}(t))  

则 (5) 式变为

0=\int_{t_{0}}^{t_{1}} \left(f_{u}(t, x^{*}(t), u^{*}(t))+\lambda(t)g_{u}(t, x^{*}(t), u^{*}(t))\right)^{2}dt

因此最优控制与最优控制下的状态变量应该满足:

f_{u}(t, x^{*}(t), u^{*}(t))+\lambda(t)g_{u}(t, x^{*}(t), u^{*}(t))=0   (6)

我们称 (6) 式为最优条件, 它实际上等价于 \frac{\partial H}{\partial u}=0, u=u^{*}.

Tips: 不同的目标函数形式会具有不同的横截条件, 具体问题应该自行推导, 并不是所有的最优问题的横截条件均是终止时刻的伴随函数的值为0, 即\lambda(t_{1})=0.

  • 5
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值