自适应动态规划(五)-初值不为零的值迭代

5 篇文章 39 订阅
4 篇文章 2 订阅

初值不为零的值迭代稳定性证明

定理一
问题描述

假设初值为任意半正定函数
V 0 ( x k ) = Ψ ( x k ) V_0(x_k)=\Psi(x_k) V0(xk)=Ψ(xk)
定义变量 γ ‾ , γ ‾ , δ ‾ \underline{\gamma},\overline{\gamma},\underline{\delta} γ,γ,δ δ ‾ \overline{\delta} δ 变量如下:
KaTeX parse error: No such environment: equation at position 8: \begin{̲e̲q̲u̲a̲t̲i̲o̲n̲}̲ 0<\underline{\…
如果对于任意的 x k x_k xk,上述变量都满足下面
γ ‾ U ( x k , u k ) ≤ J ∗ ( F ( x k , u k ) ) ≤ γ ‾ U ( x k , u k ) δ ‾ J ∗ ( x k ) ≤ V 0 ( x k ) ≤ δ ‾ J ∗ ( x k ) \underline{\gamma} U(x_k,u_k)\leq J^*(F(x_k,u_k))\leq\overline{\gamma}U(x_k,u_k) \\ \underline{\delta}J^*(x_k)\leq V_0(x_k)\leq\overline{\delta}J^*(x_k) γU(xk,uk)J(F(xk,uk))γU(xk,uk)δJ(xk)V0(xk)δJ(xk)
证明下面的式子成立
( 1 + δ ‾ − 1 ( 1 + γ ‾ − 1 ) i ) J ∗ ( x k ) ≤ V i ( x k ) ≤ ( 1 + δ ‾ − 1 ( 1 + γ ‾ − 1 ) i ) J ∗ ( x k ) (1+\frac{\underline{\delta}-1}{(1+\overline{\gamma}^{-1})^{i}})J^*(x_k)\leq V_i(x_k)\leq(1+\frac{\overline{\delta}-1}{(1+\underline{\gamma}^{-1})^{i}})J^*(x_k) (1+(1+γ1)iδ1)J(xk)Vi(xk)(1+(1+γ1)iδ1)J(xk)

证明

这个证明过程分为两步,左边和右边成立证明。

第一步证明左边成立:

用数学归纳法进行证明

一、
V 1 ( x k ) = min ⁡ u k { U ( x k , u k ) + V 0 ( x k + 1 ) } ≥ min ⁡ u k { U ( x k , u k ) + δ ‾ J ∗ ( x k + 1 ) } ≥ min ⁡ u k { U ( x k , u k ) + δ ‾ J ∗ ( x k + 1 ) + δ ‾ − 1 1 + γ ‾ ( γ ‾ U ( x k , u k ) − J ∗ ( x k + 1 ) ) } ≥ min ⁡ u k { ( 1 + γ ‾ δ ‾ − 1 1 + γ ‾ ) U ( x k , u k ) + ( δ ‾ − δ ‾ − 1 1 + γ ‾ ) J ∗ ( x k + 1 ) }           ( 1 ) = ( 1 + γ ‾ δ ‾ − 1 1 + γ ‾ ) min ⁡ u k { U ( x k , u k ) + J ∗ ( x k + 1 ) }             ( 2 ) = ( 1 + γ ‾ δ ‾ − 1 1 + γ ‾ ) J ∗ ( x k ) \begin{aligned} V_1(x_k)&=\min_{u_k}\{U(x_k,u_k)+V_0(x_{k+1})\} \\ &\geq \min_{u_k}\{U(x_k,u_k)+\underline{\delta}J^*(x_{k+1})\} \\ &\geq\min_{u_k}\{U(x_k,u_k)+\underline{\delta}J^*(x_{k+1})+\frac{\underline{\delta}-1}{1+\overline{\gamma}}(\overline{\gamma}U(x_k,u_k)-J^*(x_{k+1}))\} \\ &\geq \min_{u_k}\{(1+\overline{\gamma}\frac{\underline{\delta}-1}{1+\overline{\gamma}})U(x_k,u_k)+(\underline{\delta}-\frac{\underline{\delta}-1}{1+\overline{\gamma}})J^*(x_{k+1})\} ~~~~~~~~~(1)\\ &= (1+\overline{\gamma}\frac{\underline{\delta}-1}{1+\overline{\gamma}})\min_{u_k}\{U(x_k,u_k)+J^*(x_{k+1})\} ~~~~~~~~~~~(2) \\ &=(1+\overline{\gamma}\frac{\underline{\delta}-1}{1+\overline{\gamma}})J^*(x_k) \end{aligned} V1(xk)=ukmin{U(xk,uk)+V0(xk+1)}ukmin{U(xk,uk)+δJ(xk+1)}ukmin{U(xk,uk)+δJ(xk+1)+1+γδ1(γU(xk,uk)J(xk+1))}ukmin{(1+γ1+γδ1)U(xk,uk)+(δ1+γδ1)J(xk+1)}         (1)=(1+γ1+γδ1)ukmin{U(xk,uk)+J(xk+1)}           (2)=(1+γ1+γδ1)J(xk)
其中(1)由下式推导可得
( 1 + γ ‾ δ ‾ − 1 1 + γ ‾ ) U ( x k , u k ) + ( δ ‾ − δ ‾ − 1 1 + γ ‾ ) J ∗ ( x k + 1 ) = U ( x k , u k ) + γ ‾ δ ‾ − 1 1 + γ ‾ U ( x k , u k ) + δ ‾ J ∗ ( x k + 1 ) − δ ‾ − 1 1 + γ ‾ J ∗ ( x k + 1 ) = U ( x k , u k ) + δ ‾ J ∗ ( x k + 1 ) + δ ‾ − 1 1 + γ ‾ ( γ ‾ U ( x k , u k ) − J ∗ ( x k + 1 ) ) ≤ U ( x k , u k ) + δ ‾ J ∗ ( x k + 1 ) \begin{aligned} (1+\overline{\gamma}\frac{\underline{\delta}-1}{1+\overline{\gamma}})U(x_k,u_k)+(\underline{\delta}-\frac{\underline{\delta}-1}{1+\overline{\gamma}})J^*(x_{k+1})&=U(x_k,u_k)+\overline{\gamma}\frac{\underline{\delta}-1}{1+\overline{\gamma}}U(x_k,u_k)+\underline{\delta}J^*(x_{k+1})-\frac{\underline{\delta}-1}{1+\overline{\gamma}}J^*(x_{k+1}) \\ &=U(x_k,u_k)+\underline{\delta}J^*(x_{k+1})+\frac{\underline{\delta}-1}{1+\overline{\gamma}}(\overline{\gamma}U(x_k,u_k)-J^*(x_{k+1})) \\ &\leq U(x_k,u_k)+\underline{\delta}J^*(x_{k+1}) \end{aligned} (1+γ1+γδ1)U(xk,uk)+(δ1+γδ1)J(xk+1)=U(xk,uk)+γ1+γδ1U(xk,uk)+δJ(xk+1)1+γδ1J(xk+1)=U(xk,uk)+δJ(xk+1)+1+γδ1(γU(xk,uk)J(xk+1))U(xk,uk)+δJ(xk+1)
上式中
δ ‾ − 1 1 + γ ‾ < 0 γ ‾ U ( x k , u k ) − J ∗ ( x k + 1 ) > 0 \frac{\underline{\delta}-1}{1+\overline{\gamma}}<0 \\ \overline{\gamma}U(x_k,u_k)-J^*(x_{k+1})>0 1+γδ1<0γU(xk,uk)J(xk+1)>0
其中(2)式可由下式推导可以得出
δ ‾ − δ ‾ − 1 1 + γ ‾ = δ ‾ ( 1 + γ ‾ ) − δ ‾ + 1 1 + γ ‾ = δ ‾ + δ ‾ γ ‾ − δ ‾ + 1 1 + γ ‾ = δ ‾ γ ‾ + 1 + γ ‾ − γ ‾ 1 + γ ‾ = 1 + δ ‾ γ ‾ − γ ‾ 1 + γ ‾ = 1 + δ ‾ − 1 1 + γ ‾ − 1 \begin{aligned} \underline{\delta}-\frac{\underline{\delta}-1}{1+\overline{\gamma}}&=\frac{\underline{\delta}(1+\overline{\gamma})-\underline{\delta}+1}{1+\overline{\gamma}}=\frac{\underline{\delta}+\underline{\delta}\overline{\gamma}-\underline{\delta}+1}{1+\overline{\gamma}} \\ &=\frac{\underline{\delta}\overline{\gamma}+1+\overline{\gamma}-\overline{\gamma}}{1+\overline{\gamma}} =1+\frac{\underline{\delta}\overline{\gamma}-\overline{\gamma}}{1+\overline{\gamma}}=1+\frac{\underline{\delta}-1}{1+\overline{\gamma}^{-1}} \end{aligned} δ1+γδ1=1+γδ(1+γ)δ+1=1+γδ+δγδ+1=1+γδγ+1+γγ=1+1+γδγγ=1+1+γ1δ1
二、

假设结论对 i = l − 1 , l = 1 , 2 , ⋯ i=l-1,l=1,2,\cdots i=l1,l=1,2, 都成立,则 i = l i=l i=l 可得

由假设得
V l − 1 ( x k ) ≥ ( 1 + δ ‾ − 1 ( 1 + γ ‾ − 1 ) l − 1 ) J ∗ ( x k ) V_{l-1}(x_k)\geq (1+\frac{\underline{\delta}-1}{(1+\overline{\gamma}^{-1})^{l-1}})J^*(x_k) Vl1(xk)(1+(1+γ1)l1δ1)J(xk)

V l = min ⁡ u k { U ( x k , u k ) + V l − 1 ( x k + 1 ) } ≥ min ⁡ u k { U ( x k , u k ) + ( 1 + δ ‾ − 1 ( 1 + γ ‾ − 1 ) l − 1 ) J ∗ ( x k ) + δ ‾ − 1 ( 1 + γ ‾ ) ( 1 + γ ‾ − 1 ) l − 1 ( γ ‾ U ( x k , u k ) − J ∗ ( x k + 1 ) ) } = min ⁡ u k { U ( x k , u k ) + δ ‾ − 1 ( 1 + γ ‾ − 1 ) l U ( x k , u k ) + ( 1 + δ ‾ − 1 ( 1 + γ ‾ − 1 ) l − 1 − δ ‾ − 1 ( 1 + γ ‾ ) ( 1 + γ ‾ − 1 ) l − 1 ) J ∗ ( x k + 1 ) } = min ⁡ u k { U ( x k , u k ) + δ ‾ − 1 ( 1 + γ ‾ − 1 ) l U ( x k , u k ) + ( 1 + ( δ ‾ − 1 ) ( 1 + γ ‾ ) − δ ‾ + 1 ( 1 + γ ‾ ) ( 1 + γ ‾ − 1 ) l − 1 ) J ∗ ( x k + 1 ) } = min ⁡ u k { U ( x k , u k ) + δ ‾ − 1 ( 1 + γ ‾ − 1 ) l U ( x k , u k ) + ( 1 + ( δ ‾ − 1 ) γ ‾ ( 1 + γ ‾ ) ( 1 + γ ‾ − 1 ) l − 1 ) J ∗ ( x k + 1 ) } = min ⁡ u k { U ( x k , u k ) + δ ‾ − 1 ( 1 + γ ‾ − 1 ) l U ( x k , u k ) + ( 1 + ( δ ‾ − 1 ) ( 1 + γ ‾ − 1 ) l ) J ∗ ( x k + 1 ) } = ( 1 + ( δ ‾ − 1 ) ( 1 + γ ‾ − 1 ) l ) min ⁡ u k { U ( x k , u k ) + + J ∗ ( x k + 1 ) } = ( 1 + ( δ ‾ − 1 ) ( 1 + γ ‾ − 1 ) l ) J ∗ ( x k ) \begin{aligned} V_{l}&=\min_{u_k}\{U(x_k,u_k)+V_{l-1}(x_{k+1})\} \\ &\geq \min_{u_k}\{U(x_k,u_k)+(1+\frac{\underline{\delta}-1}{(1+\overline{\gamma}^{-1})^{l-1}})J^*(x_k)+\frac{\underline{\delta}-1}{(1+\overline{\gamma})(1+\overline{\gamma}^{-1})^{l-1}}(\overline{\gamma}U(x_k,u_k)-J^*(x_{k+1}))\} \\ &=\min_{u_k}\{U(x_k,u_k)+\frac{\underline{\delta}-1}{(1+\overline{\gamma}^{-1})^{l}}U(x_k,u_k)+(1+\frac{\underline{\delta}-1}{(1+\overline{\gamma}^{-1})^{l-1}}-\frac{\underline{\delta}-1}{(1+\overline{\gamma})(1+\overline{\gamma}^{-1})^{l-1}})J^*(x_{k+1})\} \\ &=\min_{u_k}\{U(x_k,u_k)+\frac{\underline{\delta}-1}{(1+\overline{\gamma}^{-1})^{l}}U(x_k,u_k)+(1+\frac{(\underline{\delta}-1)(1+\overline{\gamma})-\underline{\delta}+1}{(1+\overline{\gamma})(1+\overline{\gamma}^{-1})^{l-1}})J^*(x_{k+1})\} \\ &=\min_{u_k}\{U(x_k,u_k)+\frac{\underline{\delta}-1}{(1+\overline{\gamma}^{-1})^{l}}U(x_k,u_k)+(1+\frac{(\underline{\delta}-1)\overline{\gamma}}{(1+\overline{\gamma})(1+\overline{\gamma}^{-1})^{l-1}})J^*(x_{k+1})\} \\ &=\min_{u_k}\{U(x_k,u_k)+\frac{\underline{\delta}-1}{(1+\overline{\gamma}^{-1})^{l}}U(x_k,u_k)+(1+\frac{(\underline{\delta}-1)}{(1+\overline{\gamma}^{-1})^{l}})J^*(x_{k+1})\} \\ &=(1+\frac{(\underline{\delta}-1)}{(1+\overline{\gamma}^{-1})^{l}})\min_{u_k}\{U(x_k,u_k)++J^*(x_{k+1})\} \\ &=(1+\frac{(\underline{\delta}-1)}{(1+\overline{\gamma}^{-1})^{l}})J^*(x_k) \end{aligned} Vl=ukmin{U(xk,uk)+Vl1(xk+1)}ukmin{U(xk,uk)+(1+(1+γ1)l1δ1)J(xk)+(1+γ)(1+γ1)l1δ1(γU(xk,uk)J(xk+1))}=ukmin{U(xk,uk)+(1+γ1)lδ1U(xk,uk)+(1+(1+γ1)l1δ1(1+γ)(1+γ1)l1δ1)J(xk+1)}=ukmin{U(xk,uk)+(1+γ1)lδ1U(xk,uk)+(1+(1+γ)(1+γ1)l1(δ1)(1+γ)δ+1)J(xk+1)}=ukmin{U(xk,uk)+(1+γ1)lδ1U(xk,uk)+(1+(1+γ)(1+γ1)l1(δ1)γ)J(xk+1)}=ukmin{U(xk,uk)+(1+γ1)lδ1U(xk,uk)+(1+(1+γ1)l(δ1))J(xk+1)}=(1+(1+γ1)l(δ1))ukmin{U(xk,uk)++J(xk+1)}=(1+(1+γ1)l(δ1))J(xk)

因此当 i = l i=l i=l时成立,证闭。

第二步证明右边成立

一、

同理可得
V 1 ( x k ) = min ⁡ u k { U ( x k , u k ) + V 0 ( x k + 1 ) } ≤ min ⁡ u k { U ( x k , u k ) + δ ‾ J ∗ ( x k + 1 ) } ≥ min ⁡ u k { ( 1 + γ ‾ δ ‾ − 1 1 + γ ‾ ) U ( x k , u k ) + ( δ ‾ − δ ‾ − 1 1 + γ ‾ ) J ∗ ( x k + 1 ) }           ( 1 ) = ( 1 + γ ‾ δ ‾ − 1 1 + γ ‾ ) min ⁡ u k { U ( x k , u k ) + J ∗ ( x k + 1 ) }             ( 2 ) = ( 1 + γ ‾ δ ‾ − 1 1 + γ ‾ ) J ∗ ( x k ) \begin{aligned} V_1(x_k)&=\min_{u_k}\{U(x_k,u_k)+V_0(x_{k+1})\} \\ &\leq \min_{u_k}\{U(x_k,u_k)+\overline{\delta}J^*(x_{k+1})\} \\ &\geq \min_{u_k}\{(1+\underline{\gamma}\frac{\overline{\delta}-1}{1+\underline{\gamma}})U(x_k,u_k)+(\overline{\delta}-\frac{\overline{\delta}-1}{1+\underline{\gamma}})J^*(x_{k+1})\} ~~~~~~~~~(1)\\ &= (1+\underline{\gamma}\frac{\overline{\delta}-1}{1+\underline{\gamma}})\min_{u_k}\{U(x_k,u_k)+J^*(x_{k+1})\} ~~~~~~~~~~~(2) \\ &=(1+\underline{\gamma}\frac{\overline{\delta}-1}{1+\underline{\gamma}})J^*(x_k) \end{aligned} V1(xk)=ukmin{U(xk,uk)+V0(xk+1)}ukmin{U(xk,uk)+δJ(xk+1)}ukmin{(1+γ1+γδ1)U(xk,uk)+(δ1+γδ1)J(xk+1)}         (1)=(1+γ1+γδ1)ukmin{U(xk,uk)+J(xk+1)}           (2)=(1+γ1+γδ1)J(xk)
二、

假设结论对 i = l − 1 , l = 1 , 2 , ⋯ i=l-1,l=1,2,\cdots i=l1,l=1,2, 都成立,则 i = l i=l i=l 可得
V l = min ⁡ u k { U ( x k , u k ) + V l − 1 ( x k + 1 ) } ≤ min ⁡ u k { U ( x k , u k ) + ( 1 + δ ‾ − 1 ( 1 + γ ‾ − 1 ) l − 1 ) J ∗ ( x k ) + 1 − δ ‾ ( 1 + γ ‾ ) ( 1 + γ ‾ − 1 ) l − 1 ( J ∗ ( x k + 1 ) − γ ‾ U ( x k , u k ) ) } = min ⁡ u k { U ( x k , u k ) + δ ‾ − 1 ( 1 + γ ‾ − 1 ) l U ( x k , u k ) + ( 1 + δ ‾ − 1 ( 1 + γ ‾ − 1 ) l − 1 − δ ‾ − 1 ( 1 + γ ‾ ) ( 1 + γ ‾ − 1 ) l − 1 ) J ∗ ( x k + 1 ) } = min ⁡ u k { U ( x k , u k ) + δ ‾ − 1 ( 1 + γ ‾ − 1 ) l U ( x k , u k ) + ( 1 + ( δ ‾ − 1 ) ( 1 + γ ‾ ) − δ ‾ + 1 ( 1 + γ ‾ ) ( 1 + γ ‾ − 1 ) l − 1 ) J ∗ ( x k + 1 ) } = min ⁡ u k { U ( x k , u k ) + δ ‾ − 1 ( 1 + γ ‾ − 1 ) l U ( x k , u k ) + ( 1 + ( δ ‾ − 1 ) γ ‾ ( 1 + γ ‾ ) ( 1 + γ ‾ − 1 ) l − 1 ) J ∗ ( x k + 1 ) } = min ⁡ u k { U ( x k , u k ) + δ ‾ − 1 ( 1 + γ ‾ − 1 ) l U ( x k , u k ) + ( 1 + ( δ ‾ − 1 ) ( 1 + γ ‾ − 1 ) l ) J ∗ ( x k + 1 ) } = ( 1 + ( δ ‾ − 1 ) ( 1 + γ ‾ − 1 ) l ) min ⁡ u k { U ( x k , u k ) + + J ∗ ( x k + 1 ) } = ( 1 + ( δ ‾ − 1 ) ( 1 + γ ‾ − 1 ) l ) J ∗ ( x k ) \begin{aligned} V_{l}&=\min_{u_k}\{U(x_k,u_k)+V_{l-1}(x_{k+1})\} \\ &\leq \min_{u_k}\{U(x_k,u_k)+(1+\frac{\overline{\delta}-1}{(1+\underline{\gamma}^{-1})^{l-1}})J^*(x_k)+\frac{1-\overline{\delta}}{(1+\underline{\gamma})(1+\underline{\gamma}^{-1})^{l-1}}(J^*(x_{k+1})-\underline{\gamma}U(x_k,u_k))\} \\ &=\min_{u_k}\{U(x_k,u_k)+\frac{\overline{\delta}-1}{(1+\underline{\gamma}^{-1})^{l}}U(x_k,u_k)+(1+\frac{\overline{\delta}-1}{(1+\underline{\gamma}^{-1})^{l-1}}-\frac{\overline{\delta}-1}{(1+\underline{\gamma})(1+\underline{\gamma}^{-1})^{l-1}})J^*(x_{k+1})\} \\ &=\min_{u_k}\{U(x_k,u_k)+\frac{\overline{\delta}-1}{(1+\underline{\gamma}^{-1})^{l}}U(x_k,u_k)+(1+\frac{(\overline{\delta}-1)(1+\underline{\gamma})-\overline{\delta}+1}{(1+\underline{\gamma})(1+\underline{\gamma}^{-1})^{l-1}})J^*(x_{k+1})\} \\ &=\min_{u_k}\{U(x_k,u_k)+\frac{\overline{\delta}-1}{(1+\underline{\gamma}^{-1})^{l}}U(x_k,u_k)+(1+\frac{(\overline{\delta}-1)\underline{\gamma}}{(1+\underline{\gamma})(1+\underline{\gamma}^{-1})^{l-1}})J^*(x_{k+1})\} \\ &=\min_{u_k}\{U(x_k,u_k)+\frac{\overline{\delta}-1}{(1+\underline{\gamma}^{-1})^{l}}U(x_k,u_k)+(1+\frac{(\overline{\delta}-1)}{(1+\underline{\gamma}^{-1})^{l}})J^*(x_{k+1})\} \\ &=(1+\frac{(\overline{\delta}-1)}{(1+\underline{\gamma}^{-1})^{l}})\min_{u_k}\{U(x_k,u_k)++J^*(x_{k+1})\} \\ &=(1+\frac{(\overline{\delta}-1)}{(1+\underline{\gamma}^{-1})^{l}})J^*(x_k) \end{aligned} Vl=ukmin{U(xk,uk)+Vl1(xk+1)}ukmin{U(xk,uk)+(1+(1+γ1)l1δ1)J(xk)+(1+γ)(1+γ1)l11δ(J(xk+1)γU(xk,uk))}=ukmin{U(xk,uk)+(1+γ1)lδ1U(xk,uk)+(1+(1+γ1)l1δ1(1+γ)(1+γ1)l1δ1)J(xk+1)}=ukmin{U(xk,uk)+(1+γ1)lδ1U(xk,uk)+(1+(1+γ)(1+γ1)l1(δ1)(1+γ)δ+1)J(xk+1)}=ukmin{U(xk,uk)+(1+γ1)lδ1U(xk,uk)+(1+(1+γ)(1+γ1)l1(δ1)γ)J(xk+1)}=ukmin{U(xk,uk)+(1+γ1)lδ1U(xk,uk)+(1+(1+γ1)l(δ1))J(xk+1)}=(1+(1+γ1)l(δ1))ukmin{U(xk,uk)++J(xk+1)}=(1+(1+γ1)l(δ1))J(xk)
因此当 i = l i=l i=l时成立,证闭。

这里限定了初值只能取到比最优代价函数小的初值。

定理二

重新定义
0 ≤ δ ‾ ≤ 1 ≤ δ ‾ ≤ ∞ 0\leq\underline{\delta}\leq1\leq\overline{\delta}\leq\infty 0δ1δ
证明下面的式子成立
( 1 + δ ‾ − 1 ( 1 + γ ‾ − 1 ) i ) J ∗ ( x k ) ≤ V i ( x k ) ≤ ( 1 + δ ‾ − 1 ( 1 + γ ‾ − 1 ) i ) J ∗ ( x k ) (1+\frac{\underline{\delta}-1}{(1+\overline{\gamma}^{-1})^{i}})J^*(x_k)\leq V_i(x_k)\leq(1+\frac{\overline{\delta}-1}{(1+\overline{\gamma}^{-1})^{i}})J^*(x_k) (1+(1+γ1)iδ1)J(xk)Vi(xk)(1+(1+γ1)iδ1)J(xk)

证明

左边的证明跟上面是一样,只需要证明右边的即可。
V 1 ( x k , u k ) = min ⁡ u k { U ( x k , u k ) + V 0 ( x k + 1 ) } ≤ min ⁡ u k { U ( x k , u k ) + δ ‾ J ∗ ( x k + 1 ) + δ ‾ − 1 1 + γ ‾ ( γ ‾ U ( x k , u k ) − J ∗ ( x k + 1 ) ) } = min ⁡ u k { ( 1 + δ ‾ − 1 1 + γ ‾ − 1 ) U ( x k , u k ) + ( 1 + δ ‾ − 1 1 + γ ‾ − 1 ) J ∗ ( x k ) } = ( 1 + δ ‾ − 1 1 + γ ‾ − 1 ) J ∗ ( x k ) \begin{aligned} V_1(x_k,u_k)&=\min_{u_k}\{U(x_k,u_k)+V_0(x_{k+1})\} \\ &\leq\min_{u_k}\{U(x_k,u_k)+\overline{\delta}J^*(x_{k+1})+\frac{\overline{\delta}-1}{1+\overline{\gamma}}(\overline{\gamma}U(x_k,u_k)-J^*(x_{k+1}))\} \\ &=\min_{u_k}\{(1+\frac{\overline{\delta}-1}{1+\overline{\gamma}^{-1}})U(x_k,u_k)+(1+\frac{\overline{\delta}-1}{1+\overline{\gamma}^{-1}})J^*(x_k)\} \\ &=(1+\frac{\overline{\delta}-1}{1+\overline{\gamma}^{-1}})J^*(x_k) \\ \end{aligned} V1(xk,uk)=ukmin{U(xk,uk)+V0(xk+1)}ukmin{U(xk,uk)+δJ(xk+1)+1+γδ1(γU(xk,uk)J(xk+1))}=ukmin{(1+1+γ1δ1)U(xk,uk)+(1+1+γ1δ1)J(xk)}=(1+1+γ1δ1)J(xk)
假设结论对 i = l − 1 , l = 1 , 2 , ⋯ i=l-1,l=1,2,\cdots i=l1,l=1,2, 都成立,则 i = l i=l i=l 可得
V l ( x k ) = min ⁡ u k { U ( x k , u k ) + V l − 1 ( x k + 1 ) } ≤ min ⁡ u k { U ( x k , u k ) + ( 1 + δ ‾ − 1 ( 1 + γ ‾ − 1 ) l − 1 ) J ∗ ( x k + 1 ) + δ ‾ − 1 ( 1 + γ ‾ ) ( 1 + γ ‾ − 1 ) l − 1 ( γ ‾ U ( x k , u k ) − J ∗ ( x k + 1 ) ) } = ( 1 + δ ‾ − 1 ( 1 + γ ‾ − 1 ) l ) J ∗ ( x k ) \begin{aligned} V_{l}(x_k)&=\min_{u_k}\{U(x_k,u_k)+V_{l-1}(x_{k+1})\} \\ &\leq \min_{u_k}\{U(x_k,u_k)+(1+\frac{\overline{\delta}-1}{(1+\overline{\gamma}^{-1})^{l-1}})J^*(x_{k+1})+\frac{\overline{\delta}-1}{(1+\overline{\gamma})(1+\overline{\gamma}^{-1})^{l-1}}(\overline{\gamma}U(x_k,u_k)-J^*(x_{k+1}))\} \\ &=(1+\frac{\overline{\delta}-1}{(1+\overline{\gamma}^{-1})^{l}})J^*(x_k) \end{aligned} Vl(xk)=ukmin{U(xk,uk)+Vl1(xk+1)}ukmin{U(xk,uk)+(1+(1+γ1)l1δ1)J(xk+1)+(1+γ)(1+γ1)l1δ1(γU(xk,uk)J(xk+1))}=(1+(1+γ1)lδ1)J(xk)
因此当 i = l i=l i=l时成立,证闭。

定理三

重新定义
1 ≤ δ ‾ ≤ δ ‾ ≤ ∞ 1\leq\underline{\delta}\leq\overline{\delta}\leq\infty 1δδ
满足
( 1 + δ ‾ − 1 ( 1 + γ ‾ − 1 ) i ) J ∗ ( x k ) ≤ V i ( x k ) ≤ ( 1 + δ ‾ − 1 ( 1 + γ ‾ − 1 ) i ) J ∗ ( x k ) (1+\frac{\underline{\delta}-1}{(1+\underline{\gamma}^{-1})^{i}})J^*(x_k)\leq V_i(x_k)\leq(1+\frac{\overline{\delta}-1}{(1+\overline{\gamma}^{-1})^{i}})J^*(x_k) (1+(1+γ1)iδ1)J(xk)Vi(xk)(1+(1+γ1)iδ1)J(xk)
跟定理一的证明是一样的。

定理四

重新定义
0 ≤ δ ‾ ≤ δ ‾ ≤ ∞ 0\leq\underline{\delta}\leq\overline{\delta}\leq\infty 0δδ
使下式成立
lim ⁡ i → ∞ V i ( x k ) = J ∗ ( x k ) \lim_{i\rightarrow\infty}V_i(x_k)=J^*(x_k) ilimVi(xk)=J(xk)
证明:

左边的不等式
lim ⁡ i → ∞ { ( 1 + δ ‾ − 1 ( 1 + γ ‾ − 1 ) i ) J ∗ ( x k ) } = lim ⁡ i → ∞ { ( 1 + δ ‾ − 1 ( 1 + γ ‾ − 1 ) i ) J ∗ ( x k ) } = J ∗ ( x k ) \lim_{i\rightarrow\infty}\{(1+\frac{\underline{\delta}-1}{(1+\overline{\gamma}^{-1})^{i}})J^*(x_k)\}=\lim_{i\rightarrow\infty}\{(1+\frac{\underline{\delta}-1}{(1+\underline{\gamma}^{-1})^{i}})J^*(x_k)\}=J^*(x_k) ilim{(1+(1+γ1)iδ1)J(xk)}=ilim{(1+(1+γ1)iδ1)J(xk)}=J(xk)
右边不等式
lim ⁡ i → ∞ { ( 1 + δ ‾ − 1 ( 1 + γ ‾ − 1 ) i ) J ∗ ( x k ) } = lim ⁡ i → ∞ { ( 1 + δ ‾ − 1 ( 1 + γ ‾ − 1 ) i ) J ∗ ( x k ) } = J ∗ ( x k ) \lim_{i\rightarrow\infty}\{(1+\frac{\overline{\delta}-1}{(1+\underline{\gamma}^{-1})^{i}})J^*(x_k)\}=\lim_{i\rightarrow\infty}\{(1+\frac{\overline{\delta}-1}{(1+\overline{\gamma}^{-1})^{i}})J^*(x_k)\}=J^*(x_k) ilim{(1+(1+γ1)iδ1)J(xk)}=ilim{(1+(1+γ1)iδ1)J(xk)}=J(xk)
上面的证明可以得出,收敛条件和初值是无关的,因此没有必要获取变量 γ ‾ , γ ‾ , δ ‾ \underline{\gamma},\overline{\gamma},\underline{\delta} γ,γ,δ δ ‾ \overline{\delta} δ 变量。

定理五

如果
V 1 ( x k ) ≤ V 0 ( x k ) V_1(x_k)\leq V_0(x_k) V1(xk)V0(xk)

V i + 1 ( x k ) ≤ V i ( x k ) V_{i+1}(x_k)\leq V_i(x_k) Vi+1(xk)Vi(xk)

证明

用数学归纳法进行证明:

i = 1 i=1 i=1,可得
V 2 ( x k ) = min ⁡ u k { U ( x k , u k ) + V 1 ( x k + 1 ) } ≤ min ⁡ u k { U ( x k , u k ) + V 0 ( x k + 1 ) } = V 1 ( x k ) \begin{aligned} V_2(x_k)&=\min_{u_k}\{U(x_k,u_k)+V_1(x_{k+1})\} \\ &\leq \min_{u_k}\{U(x_k,u_k)+V_0(x_{k+1})\} \\ &=V_1(x_k) \end{aligned} V2(xk)=ukmin{U(xk,uk)+V1(xk+1)}ukmin{U(xk,uk)+V0(xk+1)}=V1(xk)
假设结论对 i = l − 1 , l = 1 , 2 , ⋯ i=l-1,l=1,2,\cdots i=l1,l=1,2, 都成立,则 i = l i=l i=l 可得
V l + 1 ( x k ) = min ⁡ u k { U ( x k , u k ) + V l ( x k + 1 ) } ≤ min ⁡ u k { U ( x k , u k ) + V l − 1 ( x k + 1 ) } = V l ( x k ) \begin{aligned} V_{l+1}(x_k)&=\min_{u_k}\{U(x_k,u_k)+V_l(x_{k+1})\} \\ &\leq \min_{u_k}\{U(x_k,u_k)+V_{l-1}(x_{k+1})\} \\ &=V_l(x_k) \end{aligned} Vl+1(xk)=ukmin{U(xk,uk)+Vl(xk+1)}ukmin{U(xk,uk)+Vl1(xk+1)}=Vl(xk)

定理六

如果
V 1 ( x k ) ≥ V 0 ( x k ) V_1(x_k)\geq V_0(x_k) V1(xk)V0(xk)

V i + 1 ( x k ) ≥ V i ( x k ) V_{i+1}(x_k)\geq V_i(x_k) Vi+1(xk)Vi(xk)

证明

用数学归纳法进行证明:

i = 1 i=1 i=1,可得
V 2 ( x k ) = min ⁡ u k { U ( x k , u k ) + V 1 ( x k + 1 ) } ≥ min ⁡ u k { U ( x k , u k ) + V 0 ( x k + 1 ) } = V 1 ( x k ) \begin{aligned} V_2(x_k)&=\min_{u_k}\{U(x_k,u_k)+V_1(x_{k+1})\} \\ &\geq \min_{u_k}\{U(x_k,u_k)+V_0(x_{k+1})\} \\ &=V_1(x_k) \end{aligned} V2(xk)=ukmin{U(xk,uk)+V1(xk+1)}ukmin{U(xk,uk)+V0(xk+1)}=V1(xk)
假设结论对 i = l − 1 , l = 1 , 2 , ⋯ i=l-1,l=1,2,\cdots i=l1,l=1,2, 都成立,则 i = l i=l i=l 可得
V l + 1 ( x k ) = min ⁡ u k { U ( x k , u k ) + V l ( x k + 1 ) } ≥ min ⁡ u k { U ( x k , u k ) + V l − 1 ( x k + 1 ) } = V l ( x k ) \begin{aligned} V_{l+1}(x_k)&=\min_{u_k}\{U(x_k,u_k)+V_l(x_{k+1})\} \\ &\geq \min_{u_k}\{U(x_k,u_k)+V_{l-1}(x_{k+1})\} \\ &=V_l(x_k) \end{aligned} Vl+1(xk)=ukmin{U(xk,uk)+Vl(xk+1)}ukmin{U(xk,uk)+Vl1(xk+1)}=Vl(xk)

推论

如果 V 1 ( x k ) ≤ V 0 ( x k ) V_1(x_k)\leq V_0(x_k) V1(xk)V0(xk),则 V i ( x k ) ≥ J ∗ ( x k ) V_i(x_k)\geq J^*(x_k) Vi(xk)J(xk)

由定理五可知
V i ( x k ) ≥ V i + 1 ( x k ) ≥ V i + 2 ( x k ) ≥ ⋯ ≥ V_i(x_k)\geq V_{i+1}(x_k)\geq V_{i+2}(x_k)\geq\cdots\geq Vi(xk)Vi+1(xk)Vi+2(xk)
对于 l ≥ i l\geq i li
V i ( x k ) ≥ V l ( x K ) V_i(x_k)\geq V_l(x_K) Vi(xk)Vl(xK)
可得
V i ( x k ) ≥ lim ⁡ l → ∞ V l ( x k ) = J ∗ ( x k ) V_i(x_k)\geq\lim_{l\rightarrow\infty}V_l(x_k)=J^*(x_k) Vi(xk)llimVl(xk)=J(xk)
同理可得果 V 1 ( x k ) ≥ V 0 ( x k ) V_1(x_k)\geq V_0(x_k) V1(xk)V0(xk),则 V i ( x k ) ≤ J ∗ ( x k ) V_i(x_k)\leq J^*(x_k) Vi(xk)J(xk)

上述推论,反过来是不成立的。知道初值的大小,是不能决定单调性的。

当满足一些条件也是成立的。

定理七

初值是一个半正定函数 Ψ ( x k ) \Psi(x_k) Ψ(xk) v ‾ ( x k ) \overline{v}(x_k) v(xk)是容许控制律,则值函数是不增的函数, Ψ ( x k ) \Psi(x_k) Ψ(xk)满足
Ψ ( x k ) = U ( x k , v ‾ ( x k ) ) + Ψ ( x k + 1 ) \Psi(x_k)=U(x_k,\overline{v}(x_k))+\Psi(x_{k+1}) Ψ(xk)=U(xk,v(xk))+Ψ(xk+1)
证明:
V 1 ( x k ) = U ( x k , v 0 ( x k ) ) + V 0 ( x k + 1 ) = min ⁡ u k { U ( x k , u k ) + Ψ ( x k + 1 } ≤ U ( x k , v ‾ ( x k ) ) + Ψ ( x k + 1 ) = Ψ ( x k ) \begin{aligned} V_1(x_k)&=U(x_k,v_0(x_k))+V_0(x_{k+1}) \\ &=\min_{u_k}\{U(x_k,u_k)+\Psi(x_{k+1}\} \\ &\leq U(x_k,\overline{v}(x_k))+\Psi(x_{k+1}) \\ &=\Psi(x_k) \end{aligned} V1(xk)=U(xk,v0(xk))+V0(xk+1)=ukmin{U(xk,uk)+Ψ(xk+1}U(xk,v(xk))+Ψ(xk+1)=Ψ(xk)
通过数学归纳法,就可以证明值函数是单调不增的。

通过上面的证明可以看出,如果任意定义一个初始值函数,不满足迭代公式时, U ( x k , v ‾ ( x k ) ) + Ψ ( x k + 1 ) U(x_k,\overline{v}(x_k))+\Psi(x_{k+1}) U(xk,v(xk))+Ψ(xk+1) 的值的大小是未知的,因此比较公式会变成$V_1(x_k) $ 和 U ( x k , v ‾ ( x k ) ) + Ψ ( x k + 1 ) U(x_k,\overline{v}(x_k))+\Psi(x_{k+1}) U(xk,v(xk))+Ψ(xk+1) 进行比较,这才是真正能和 J ∗ ( x k ) J^*(x_k) J(xk)进行比较的量,决定当前的单调性的地方。从公式中看出,此时不仅与初值有关,而且和当前控制网络的控制律有关。其实,也可以看做,当前的值迭代问题,又变成了策略迭代了。

如果此时的初始值函数是恒为零的函数, V 1 ( x k ) V_1(x_k) V1(xk)的值就是效用函数,肯定大于零,大于当前的初始值函数。

定理八

终止条件是
∣ V i + 1 ( x k ) − V i ( x k ) ∣ ≤ ε |V_{i+1}(x_k)-V_i(x_k)|\leq \varepsilon Vi+1(xk)Vi(xk)ε
控制律 v i ( x k ) v_i(x_k) vi(xk)是最终一致有界(UUB)。

定义:

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-e576hTLL-1604646023919)(C:\Users\Wandering\AppData\Roaming\Typora\typora-user-images\image-20201023152548119.png)]

证明

首先当初始值函数 Ψ ( x k ) \Psi(x_k) Ψ(xk) 是半正定的,则 V i ( x k ) V_i(x_k) Vi(xk) 就是正定的。
∣ V i + 1 ( x k ) − V i ( x k ) ∣ ≤ ε → ∣ U ( x k , v i ( x k ) + V i ( x k + 1 ) ) − V i ( x k ) ∣ ≤ ε − U ( x k , v i ( x k ) ) − ε ≤ Δ V i ( x k ) = V i ( x k + 1 ) − V i ( x k ) ≤ − U ( x k , v i ( x k ) ) + ε |V_{i+1}(x_k)-V_i(x_k)|\leq \varepsilon \rightarrow|U(x_k,v_i(x_k)+V_i(x_{k+1}))-V_i(x_k)|\leq\varepsilon\\ -U(x_k,v_i(x_k))-\varepsilon\leq\Delta V_i(x_k)=V_i(x_{k+1})-V_i(x_k)\leq-U(x_k,v_i(x_k))+\varepsilon Vi+1(xk)Vi(xk)εU(xk,vi(xk)+Vi(xk+1))Vi(xk)εU(xk,vi(xk))εΔVi(xk)=Vi(xk+1)Vi(xk)U(xk,vi(xk))+ε
我们可以很容易的证明当 − U ( x k , v i ( x k ) ) − ε ≤ Δ V i ( x k ) -U(x_k,v_i(x_k))-\varepsilon\leq\Delta V_i(x_k) U(xk,vi(xk))εΔVi(xk) V i ( x k ) V_i(x_k) Vi(xk)是一个李雅普诺夫函数,此时系统是渐进稳定的。只需要分析 0 ≤ Δ V i ( x k ) ≤ − U ( x k , v i ( x k ) ) + ε 0\leq\Delta V_i(x_k)\leq-U(x_k,v_i(x_k))+\varepsilon 0ΔVi(xk)U(xk,vi(xk))+ε 这一种情形。

由于 V i ( x k ) V_i(x_k) Vi(xk) 是正定的,因此一定存在 α ( ∣ ∣ x k ∣ ∣ ) \alpha(||x_k||) α(xk) β ( ∣ ∣ x k ∣ ∣ ) \beta(||x_k||) β(xk) 满足下面的式子
0 < α ( ∣ ∣ x k ∣ ∣ ) ≤ V i ( x k ) ≤ β ( ∣ ∣ x k ∣ ∣ ) 0<\alpha(||x_k||)\leq V_i(x_k)\leq\beta(||x_k||) 0<α(xk)Vi(xk)β(xk)
定义一个新的状态空间
Ω x k = { x k ∣ x k ∈ R n  and   U ( x k , v i ( x k ) ) ≤ ε } \Omega_{x_k}=\{x_k|x_k\in R^n~\text{and}~~U(x_k,v_i(x_k))\leq\varepsilon\} Ωxk={xkxkRn and  U(xk,vi(xk))ε}
由于 U ( x k , v i ( x k ) ) U(x_k,v_i(x_k)) U(xk,vi(xk)) 是正定函数,因此 ∣ ∣ x k ∣ ∣ ||x_k|| xk 是有限的, ∣ ∣ x k ∣ ∣ ||x_k|| xk 是欧几里得范式。定义
ϱ = sup ⁡ x k ∈ Ω x k { ∣ ∣ x k ∣ ∣ } \varrho=\sup_{x_k\in \Omega_{x_k}}\{||x_k||\} ϱ=xkΩxksup{xk}
由于 ε \varepsilon ε是有限的, ϱ \varrho ϱ 是有限的,对于任意的 ϱ \varrho ϱ 满足上式,总存在一个有限的 Γ \Gamma Γ ∣ ∣ Γ ∣ ∣ ≥ ∣ ∣ ϱ ∣ ∣ ||\Gamma||\geq||\varrho|| Γϱ ,满足
α ( ∣ ∣ Γ ∣ ∣ ) ≥ β ( ∣ ∣ ϱ ∣ ∣ ) \alpha(||\Gamma||)\geq\beta(||\varrho||) α(Γ)β(ϱ)
ϵ \epsilon ϵ 满足 ϵ ≥ ∣ ∣ Γ ∣ ∣ \epsilon\geq||\Gamma|| ϵΓ,存在 δ ( ϵ ) \delta(\epsilon) δ(ϵ),使 δ ( ϵ ) ≥ ∣ ∣ ϱ ∣ ∣ \delta(\epsilon)\geq||\varrho|| δ(ϵ)ϱ 成立,满足 β ( δ ) ≤ α ( ϵ ) \beta(\delta)\leq\alpha(\epsilon) β(δ)α(ϵ)。因此存在状态 x k x_k xk ∣ ∣ ϱ ∣ ∣ ≤ ∣ ∣ x k ∣ ∣ ≤ δ ( ϵ ) ||\varrho||\leq||x_k||\leq\delta(\epsilon) ϱxkδ(ϵ),满足
α ( ϵ ) ≥ β ( δ ) ≥ V i ( x k ) \alpha(\epsilon)\geq\beta(\delta)\geq V_i(x_k) α(ϵ)β(δ)Vi(xk)
∣ ∣ x k ∣ ∣ ≥ ∣ ∣ ϱ ∣ ∣ ||x_k||\geq||\varrho|| xkϱ,可得
V i ( x k + 1 ) − V i ( x k ) ≤ 0 V_i(x_{k+1})-V_i(x_k)\leq0 Vi(xk+1)Vi(xk)0
因此对于任意 x k x_k xk 满足 ∣ ∣ ϱ ∣ ∣ ≤ ∣ ∣ x k ∣ ∣ ≤ δ ( ϵ ) ||\varrho||\leq||x_k||\leq\delta(\epsilon) ϱxkδ(ϵ) ,总存在一个 T > 0 T>0 T>0 满足
α ( ϵ ) ≥ β ( δ ) ≥ V i ( x k ) ≥ V i ( x k + T ≥ α ( ∣ ∣ x k + T ∣ ∣ ) ) \alpha(\epsilon)\geq\beta(\delta)\geq V_i(x_k)\geq V_i(x_{k+T}\geq\alpha(||x_{k+T}||)) α(ϵ)β(δ)Vi(xk)Vi(xk+Tα(xk+T))
可以得到 ϵ > ∣ ∣ x k + T ∣ ∣ \epsilon>||x_{k+T}|| ϵ>xk+T。因此对于任意的 x k x_k xk,满足 ∣ ∣ x k ∣ ∣ ≥ ∣ ∣ ϱ ∣ ∣ ||x_k||\geq||\varrho|| xkϱ 存在 T = 1 , 2 , ⋯ T=1,2,\cdots T=1,2,,使 ∣ ∣ x k + T ∣ ∣ ≤ ∣ ∣ ϱ ∣ ∣ ||x_{k+T}||\leq||\varrho|| xk+Tϱ 成立。当 ∣ ∣ Γ ∣ ∣ ≥ ∣ ∣ ϱ ∣ ∣ ||\Gamma||\geq||\varrho|| Γϱ,我们可以获得 ∣ ∣ x k + T ∣ ∣ ≤ ∣ ∣ Γ ∣ ∣ ||x_{k+T}||\leq||\Gamma|| xk+TΓ,因此定理得证。

这里 0 ≤ Δ V i ( x k ) ≤ − U ( x k , v i ( x k ) ) + ε 0\leq\Delta V_i(x_k)\leq-U(x_k,v_i(x_k))+\varepsilon 0ΔVi(xk)U(xk,vi(xk))+ε 可以看出 U ( x k , v i ( x k ) ) ≤ ε U(x_k,v_i(x_k))\leq\varepsilon U(xk,vi(xk))ε ,对效用函数进行了限制,当不满足这个不等式时,系统是渐进稳定的,因此系统在这个条件下状态向量有界。

定理九

当满足条件
V i + 1 ( x k ) − V i ( x k ) < U ( x k , v i ( x k ) ) V_{i+1}(x_k)-V_i(x_k)<U(x_k,v_i(x_k)) Vi+1(xk)Vi(xk)<U(xk,vi(xk))
此时的控制律 v i ( x k ) v_i(x_k) vi(xk)是容许控制律。

存在 − ∞ < θ < 1 -\infty<\theta<1 <θ<1
V i + 1 ( x k ) − V i ( x k ) < θ U ( x k , v i ( x k ) ) U ( x k , v i ( x k ) ) + V i ( x k + 1 ) − V i ( x k ) < θ U ( x k , v i ( x k ) ) V i ( x k + 1 ) − V i ( x k ) < ( θ − 1 ) U ( x k , v i ( x k ) ) V_{i+1}(x_k)-V_i(x_k)<\theta U(x_k,v_i(x_k)) \\ U(x_k,v_i(x_k))+V_i(x_{k+1})-V_i(x_k)<\theta U(x_k,v_i(x_k)) \\ V_i(x_{k+1})-V_i(x_k)<(\theta-1) U(x_k,v_i(x_k)) Vi+1(xk)Vi(xk)<θU(xk,vi(xk))U(xk,vi(xk))+Vi(xk+1)Vi(xk)<θU(xk,vi(xk))Vi(xk+1)Vi(xk)<(θ1)U(xk,vi(xk))
因此 V i ( x k + 1 ) − V i ( x k ) < 0 V_i(x_{k+1})-V_i(x_k)<0 Vi(xk+1)Vi(xk)<0,由李雅普诺夫稳定判据可知,系统是稳定的,故当前的控制是稳定的控制。
{ V i ( x k + 1 ) − V i ( x k ) < ( θ − 1 ) U ( x k , v i ( x k ) ) V i ( x k + 2 ) − V i ( x k + 1 ) < ( θ − 1 ) U ( x k + 1 , v i ( x k + 1 ) ) V i ( x k + 3 ) − V i ( x k + 2 ) < ( θ − 1 ) U ( x k + 2 , v i ( x k + 2 ) ) ⋅ ⋅ ⋅ V i ( x k + N ) − V i ( x k + N − 1 ) < ( θ − 1 ) U ( x k + N − 1 , v i ( x k + N − 1 ) ) \left \{ \begin{array}{cll} V_i(x_{k+1})-V_i(x_k)&<&(\theta-1)U(x_k,v_i(x_k)) \\ V_i(x_{k+2})-V_i(x_{k+1})&<&(\theta-1)U(x_{k+1},v_i(x_{k+1})) \\ V_i(x_{k+3})-V_i(x_{k+2})&<&(\theta-1)U(x_{k+2},v_i(x_{k+2})) \\ \cdot&& \\ \cdot&& \\ \cdot&& \\ V_i(x_{k+N})-V_i(x_{k+N-1})&<&(\theta-1)U(x_{k+N-1},v_i(x_{k+N-1})) \\ \end{array} \right. Vi(xk+1)Vi(xk)Vi(xk+2)Vi(xk+1)Vi(xk+3)Vi(xk+2)Vi(xk+N)Vi(xk+N1)<<<<(θ1)U(xk,vi(xk))(θ1)U(xk+1,vi(xk+1))(θ1)U(xk+2,vi(xk+2))(θ1)U(xk+N1,vi(xk+N1))
可得
V i ( x k + N ) − V i ( x k ) < ( θ − 1 ) ∑ j = 0 N U ( x k + j , v i ( x k + j ) ) V_i(x_{k+N})-V_i(x_k)<(\theta-1)\sum_{j=0}^{N}U(x_{k+j},v_i(x_{k+j})) Vi(xk+N)Vi(xk)<(θ1)j=0NU(xk+j,vi(xk+j))
v i ( x k ) v_i(x_k) vi(xk)是稳定的控制, N → ∞ N\rightarrow\infty N时, x N → 0 , V i ( x k + N ) → 0 x_N\rightarrow0,V_i(x_{k+N})\rightarrow0 xN0,Vi(xk+N)0,因此可得
V i ( x k ) > ( 1 − θ ) ∑ j = 0 N U ( x k + j , v i ( x k + j ) ) V_i(x_k)>(1-\theta)\sum_{j=0}^{N}U(x_{k+j},v_i(x_{k+j})) Vi(xk)>(1θ)j=0NU(xk+j,vi(xk+j))
对于有限的状态 x k x_k xk,值函数 V i ( x k ) V_i(x_k) Vi(xk)都是有限的,因此可以知 ∑ j = 0 N U ( x k + j , v i ( x k + j ) ) \sum_{j=0}^{N}U(x_{k+j},v_i(x_{k+j})) j=0NU(xk+j,vi(xk+j)) 是有限的,则控制律 v i ( x k ) v_i(x_k) vi(xk) 是容许控制律。

定理十

存在一个有限的常数 N > 0 N>0 N>0 满足
V N + 1 ( x k ) − V N ( x k ) < U ( x k , v N ( x k ) ) V_{N+1}(x_k)-V_N(x_k)<U(x_k,v_N(x_k)) VN+1(xk)VN(xk)<U(xk,vN(xk))
证明

利用反证法进行证明,对于 N = 0 , 1 , ⋯ N=0,1,\cdots N=0,1,,任意 x ‾ k ∈ R n \overline{x}_k\in R^n xkRn,满足
V N + 1 ( x ‾ k ) − V N ( x ‾ k ) ≥ U ( x ‾ k , v N ( x ‾ k ) ) V_{N+1}(\overline{x}_k)-V_N(\overline{x}_k)\geq U(\overline{x}_k,v_N(\overline{x}_k)) VN+1(xk)VN(xk)U(xk,vN(xk))
N → ∞ N\rightarrow\infty N,根据定理四我们可以得到 lim ⁡ N → ( V N + 1 ( x ‾ k ) − V N ( x ‾ k ) ) = 0 \lim_{N\rightarrow}(V_{N+1}(\overline{x}_k)-V_N(\overline{x}_k))=0 limN(VN+1(xk)VN(xk))=0 ,根据上式就可以得出
lim ⁡ N → ( V N + 1 ( x ‾ k ) − V N ( x ‾ k ) ) = U ( x ‾ k , v N ( x ‾ k ) ) = 0 \lim_{N\rightarrow}(V_{N+1}(\overline{x}_k)-V_N(\overline{x}_k))=U(\overline{x}_k,v_N(\overline{x}_k))=0 Nlim(VN+1(xk)VN(xk))=U(xk,vN(xk))=0
对任意 x ‾ k ∈ R n \overline{x}_k\in R^n xkRn 都成立。这与效用函数 U ( x k , u k ) U(x_k,u_k) U(xk,uk)是正定的相矛盾。所以定理得证。

定理十一
  1. V i + 1 ( x k ) + V i + j ( x k ) ≥ V i ( x k ) + V i + j + 1 ( x k ) V_{i+1}(x_k)+V_{i+j}(x_k)\geq V_i(x_k)+V_{i+j+1}(x_k) Vi+1(xk)+Vi+j(xk)Vi(xk)+Vi+j+1(xk)
  2. V j ( x k ) ≥ 1 2 ( V j + 1 ( x k ) + V j − 1 ( x k ) ) V_j(x_k)\geq \frac{1}{2}(V_{j+1}(x_k)+V_{j-1}(x_k)) Vj(xk)21(Vj+1(xk)+Vj1(xk))
  3. Δ V j ( x k ) = V j ( X K ) − V j − 1 ( x k ) \Delta V_j(x_k)=V_j(X_K)-V_{j-1}(x_k) ΔVj(xk)=Vj(XK)Vj1(xk),对于所有的 j > i j>i j>i Δ V j ( x k ) ≥ Δ V j + 1 ( x k ) \Delta V_j(x_k)\geq\Delta V_{j+1}(x_k) ΔVj(xk)ΔVj+1(xk)

v j ( x k ) v_j(x_k) vj(xk) 是容许的控制。

值函数满足 V i + 1 ( x k ) − V i ( x k ) < U ( x k , v i ( x k ) ) V_{i+1}(x_k)-V_i(x_k)<U(x_k,v_i(x_k)) Vi+1(xk)Vi(xk)<U(xk,vi(xk))

证明
( V i + j + 1 ( x k ) − V i + j ( x k ) ) − ( V i + 1 ( x k ) − V i ( x k ) ) ≤ 0 (V_{i+j+1}(x_k)-V_{i+j}(x_k))-(V_{i+1}(x_k)-V_i(x_k))\leq0 (Vi+j+1(xk)Vi+j(xk))(Vi+1(xk)Vi(xk))0
可得
( V i + j + 1 ( x k ) − U ( x k , v i + j ( x k ) ) ) − V i + j ( x k ) ≤ V i + 1 ( x k ) − U ( x k , v i ( x k ) ) − V i ( x k ) + U ( x k , v i ( x k ) ) V i + j + 1 ( x k + 1 ) − V i + j ( x k ) ≤ V i ( x k + 1 ) − V i ( x k ) + U ( x k , v i ( x k ) ) < 0 (V_{i+j+1}(x_k)-U(x_k,v_{i+j}(x_k)))-V_{i+j}(x_k)\leq V_{i+1}(x_k)-U(x_k,v_i(x_k))-V_i(x_k)+U(x_k,v_i(x_k)) \\ V_{i+j+1}(x_{k+1})-V_{i+j}(x_k)\leq V_{i}(x_{k+1})-V_i(x_k)+U(x_k,v_i(x_k))<0 (Vi+j+1(xk)U(xk,vi+j(xk)))Vi+j(xk)Vi+1(xk)U(xk,vi(xk))Vi(xk)+U(xk,vi(xk))Vi+j+1(xk+1)Vi+j(xk)Vi(xk+1)Vi(xk)+U(xk,vi(xk))<0
由定理九可知,此控制是容许的控制。

后面这几条定理,都是为了得到 V i ( x k + 1 ) − V i ( x k ) < 0 V_i(x_{k+1})-V_i(x_{k})<0 Vi(xk+1)Vi(xk)<0的结论,当这个条件满足了,就是李雅普诺夫函数满足了,因此此时肯定是一个稳定的控制。

  • 4
    点赞
  • 12
    收藏
    觉得还不错? 一键收藏
  • 3
    评论
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值