Dynamic Programming and Optimal Control 第三章习题

3.1 Solve the problem of Example 3.2.1 for the case where the cost function is ( x ( T ) ) 2 + ∫ 0 T ( u ( t ) ) 2 d t (x(T))^2+\int_0^T(u(t))^2dt (x(T))2+0T(u(t))2dtAlso, calculate the cost-to-go function J ∗ ( t , x ) J^*(t,x) J(t,x) and verify that it satisfies the HJB equation.
Solution. The scalar system x ˙ ( t ) = u ( t ) \dot x(t)=u(t) x˙(t)=u(t) with the constaint ∣ u ( t ) ∣ ≤ 1 |u(t)|\leq 1 u(t)1 for all t ∈ [ 0 , T ] t\in [0,T] t[0,T].

3.2 A young investor has earned in the stock market a large amount if money S S S and plans to spend it so as to maximize his enjoyment through the rest of his life without working. He estimates that he will live exactly T T T more than years and that his capital x ( t ) x(t) x(t) should be reduced to zero at time T T T, i.e., x ( T ) = 0 x(T)=0 x(T)=0. Also he models the evolution of his capital by the differential equation d x ( t ) d t = α x ( t ) − u ( t ) \frac{dx(t)}{dt}=\alpha x(t)-u(t) dtdx(t)=αx(t)u(t) where x ( 0 ) = S x(0)=S x(0)=S is his initial capital, α > 0 \alpha >0 α>0 is a given interest rate, and u ( t ) ≥ 0 u(t)\ge 0 u(t)0 is his rate of expenditure. The total enjoyment he will obtain is given by ∫ 0 T e − β t u ( t ) d t \int_0^Te^{-\beta t}\sqrt{u(t)}dt 0Teβtu(t) dt Here β \beta β is some positive scalar, which serves to discount future enjoyment. Find the optimal { u ( t ) ∣ t ∈ [ 0 , T ] } \{u(t)|t\in[0,T]\} {u(t)t[0,T]}.
Solution. We have f ( x , u ) = α x − u      , g ( x , u ) = e − β t u f(x,u)=\alpha x-u\;\;,\qquad g(x,u)=e^{-\beta t}\sqrt{u} f(x,u)=αxu,g(x,u)=eβtu giving the Hamiltonian as follows: H ( x , u , p ) = e − β t u + p ( α x − u ) H(x,u,p)=e^{-\beta t}\sqrt{u}+p(\alpha x-u) H(x,u,p)=eβtu +p(αxu) and the adjoint equation is
p ˙ ( t ) = − α p ( t ) \dot p(t)=-\alpha p(t) p˙(t)=αp(t) yielding p ( t ) = C 1 e − α t for some constant  C 1 p(t)=C_1e^{-\alpha t}\qquad\text{for some constant }C_1 p(t)=C1eαtfor some constant C1Notice that here x ( T ) = 0 x(T)=0 x(T)=0 is given, so p ( T ) = ∇ ( h ( x ∗ ( T ) ) ) = 0 p(T)=\nabla(h(x^*(T)))=0 p(T)=(h(x(T)))=0 is not true anymore.
\qquad The optimal control is obtained by maximizing the Hamiltonian with respect to u u u, yielding
u ∗ ( t ) = arg ⁡ max ⁡ u [ e − β t u + C 1 e − α t ( α x ∗ − u ) ] = e ( α − β ) t 2 C 1 ( 3.2.1 ) u^*(t)=\arg\max_u\left[e^{-\beta t}\sqrt{u}+C_1e^{-\alpha t}(\alpha x^*-u)\right]=\frac{e^{(\alpha -\beta)t}}{2C_1}\qquad (3.2.1) u(t)=argumax[eβtu +C1eαt(αxu)]=2C1e(αβ)t(3.2.1)Then by the differiential equation of the system we get x ˙ ∗ ( t ) = α x ∗ ( t ) − e ( α − β ) t 2 C 1 \dot{x}^*(t)=\alpha x^*(t)-\frac{e^{(\alpha -\beta)t}}{2C_1} x˙(t)=αx(t)2C1e(αβ)tSolving this equation, we obtain
x ∗ ( t ) = C 2 e α t + e ( α − β ) t 2 C 1 β for some constant  C 2 x^*(t)=C_2e^{\alpha t}+\frac{e^{(\alpha -\beta)t}}{2C_1\beta}\qquad\text{for some constant }C_2 x(t)=C2eαt+2C1βe(αβ)tfor some constant C2And together with the initial condition x ∗ ( 0 ) = S x^*(0)=S x(0)=S and the final condition x ∗ ( T ) = 0 x^*(T)=0 x(T)=0, we can get the exact values of C 1 C_1 C1 and C 2 C_2 C2. So u ∗ ( t ) u^*(t) u(t) in (3.2.1) gives the optimal control. □ \qquad\qquad\qquad\qquad\qquad\Box

3.9 Use the Minimum Principle to solve the linear-quadratic problem of Example 3.2.2.
Solution. The n n n-dimension linear-quadratic system is given by
x ˙ ( t ) = A x ( t ) + B u ( t ) \dot x(t)=Ax(t)+Bu(t) x˙(t)=Ax(t)+Bu(t) where A A A and B B B are given matrices, and the quadratic cost
x ( T ) ′ Q T x ( T ) + ∫ 0 T ( x ( t ) ′ Q x ( t ) + u ( t ) ′ R u ( t ) ) d t x(T)'Q_Tx(T)+\int_0^T\left(x(t)'Qx(t)+u(t)'Ru(t)\right)dt x(T)QTx(T)+0T(x(t)Qx(t)+u(t)Ru(t))dt where the matrices Q T Q_T QT and Q Q Q are symmetric positive semidefinite, and the matrix R R R is symmetric positive definite.
\qquad The Hamiltonian here is H ( x , u , p ) = x ′ Q x + u ′ R u + p ′ ( A x + B u ) H(x,u,p)=x'Qx+u'Ru+p'(Ax+Bu) H(x,u,p)=xQx+uRu+p(Ax+Bu) and the adjoint equation is
p ˙ ( t ) = 2 Q x + A ′ p ( t ) ( 1 ) \dot p(t)=2Qx+A'p(t)\qquad (1) p˙(t)=2Qx+Ap(t)(1) with the terminal conditon p ( T ) = ∇ h ( x ∗ ( T ) ) = 2 Q T x ∗ ( T ) p(T)=\nabla h(x^*(T))=2Q_Tx^*(T) p(T)=h(x(T))=2QTx(T)The optimal control can be obtained by minimizing the Hamiltonian with respect to u u u, yielding
u ∗ ( t ) = arg ⁡ min ⁡ u { x ∗ ( t ) ′ Q x ∗ ( t ) + u ′ R u + p ′ ( A x ∗ ( t ) + B u ) } u^*(t)=\arg\min_{u}\left\{x^*(t)'Qx^*(t)+u'Ru+p'(Ax^*(t)+Bu)\right\} u(t)=argumin{x(t)Qx(t)+uRu+p(Ax(t)+Bu)} Since ∇ u { x ∗ ( t ) ′ Q x ∗ ( t ) + u ′ R u + p ′ ( A x ∗ ( t ) + B u ) } = 2 R u + B ′ p \nabla_u\{x^*(t)'Qx^*(t)+u'Ru+p'(Ax^*(t)+Bu)\}=2Ru+B'p u{x(t)Qx(t)+uRu+p(Ax(t)+Bu)}=2Ru+Bp, we get u ∗ ( t ) = − 1 2 R − 1 B ′ p ( t ) ( 2 ) u^*(t)=-\frac{1}{2}R^{-1}B'p(t)\qquad(2) u(t)=21R1Bp(t)(2) together with the system function leading to x ˙ ∗ ( t ) = A x ∗ ( t ) − 1 2 B R − 1 B ′ p ( t ) ( 3 ) \dot x^*(t)=Ax^*(t)-\frac{1}{2}BR^{-1}B'p(t)\qquad (3) x˙(t)=Ax(t)21BR1Bp(t)(3) So p ( t ) p(t) p(t) can be solved by (1) (But I don’t know the answer!!) , and then x ∗ ( t ) x^*(t) x(t) can be solved by (3).

3.11 Use the discrete-time Minimum Principle to solve Exercise 1.14 of Chapter 1, assuming that each w k w_k wk is fixed at a known deterministic value.
Solution. Let w k = w ‾ w_k=\overline{w} wk=w for some fixed number w ‾ > 0 \overline{w}>0 w>0, the system is characterized by x k + 1 = f k ( x k , u k ) = x k + w ‾ u k x k x_{k+1}=f_k(x_k,u_k)=x_k+\overline{w}u_kx_k xk+1=fk(xk,uk)=xk+wukxk and the cost functiom becomes J ( u ) = x N + ∑ k = 0 N − 1 ( 1 − u k ) x k J(u)=x_N+\mathop{\sum}\limits_{k=0}^{N-1}(1-u_k)x_k J(u)=xN+k=0N1(1uk)xk Then the Hamiltonian function can be written as H k ( x k , u k , p k + 1 ) = ( 1 − u k ) x k + p k + 1 ( x k + w ‾ u k x k ) H_k(x_k,u_k,p_{k+1})=(1-u_k)x_k+p_{k+1}(x_k+\overline{w}u_kx_k) Hk(xk,uk,pk+1)=(1uk)xk+pk+1(xk+wukxk) By the Discrete-time Minimum Principle, for k = 0 , 1 , ⋯   , N − 1 k=0,1,\cdots,N-1 k=0,1,,N1, we have u k ∗ = arg ⁡ max ⁡ u k H k ( x k ∗ , u k , p k + 1 )      u_k^*=\arg\mathop{\max}\limits_{u_k}H_k(x_k^*,u_k,p_{k+1})\qquad\qquad\qquad\qquad\;\; uk=argukmaxHk(xk,uk,pk+1) = arg ⁡ max ⁡ u k [ ( p k + 1 w ‾ − 1 ) u k x k + ( p k + 1 + 1 ) x k ] =\arg\mathop{\max}\limits_{u_k}\left[(p_{k+1}\overline{w}-1)u_kx_k+(p_{k+1}+1)x_k\right] =argukmax[(pk+1w1)ukxk+(pk+1+1)xk] = { 1 ,  if     p k + 1 w ‾ > 1 0 ,  if     p k + 1 w ‾ ≤ 1 ( 3.11.1 ) =\begin{cases} 1, & \text{ if }\; p_{k+1}\overline{w}>1\\ 0, & \text{ if }\; p_{k+1}\overline{w}\leq1 \end{cases}\qquad\qquad\qquad(3.11.1) ={1,0, if pk+1w>1 if pk+1w1(3.11.1) On the other hand, for k = 0 , 1 , ⋯   , N − 1 k=0,1,\cdots,N-1 k=0,1,,N1, the adjoint equation reads p k = ∇ x k H k ( x k ∗ , u k ∗ , p k + 1 ) = ( p k + 1 w ‾ − 1 ) u k ∗ + p k + 1 + 1 ( 3.11.2 ) p_k=\nabla_{x_k}H_k(x_k^*,u_k^*,p_{k+1})=(p_{k+1}\overline{w}-1)u_k^*+p_{k+1}+1\qquad(3.11.2) pk=xkHk(xk,uk,pk+1)=(pk+1w1)uk+pk+1+1(3.11.2) with the terminal condition p N = ∇ g N ( x N ∗ ) = 1. p_N=\nabla_{g_N}(x_N^*)=1. pN=gN(xN)=1.
Combing (3.11.1) with (3.11.2), we can obtain the following argument
p k + 1 w ‾ > 1    ⇒    μ k ∗ = 1    ⇒    p k = ( w ‾ + 1 ) p k + 1 ( 3.11.3 ) p_{k+1}\overline{w}>1\;\Rightarrow\;\mu_k^*=1\;\Rightarrow\;p_k=(\overline{w}+1)p_{k+1}\qquad (3.11.3) pk+1w>1μk=1pk=(w+1)pk+1(3.11.3) p k + 1 w ‾ ≤ 1    ⇒    μ k ∗ = 0    ⇒    p k = p k + 1 + 1      ( 3.11.4 ) p_{k+1}\overline{w}\leq1\;\Rightarrow\;\mu_k^*=0\;\Rightarrow\;p_k=p_{k+1}+1\;\;\qquad (3.11.4) pk+1w1μk=0pk=pk+1+1(3.11.4) So by induction, we can easily conclude the following optimal control results:
(1) If w ‾ > 1 \overline{w}>1 w>1, u 0 ∗ = ⋯ = u N − 1 ∗ = 1 u_0^*=\cdots=u_{N-1}^*=1 u0==uN1=1.
(2) If 0 &lt; w ‾ &lt; 1 / N 0&lt;\overline{w}&lt;1/N 0<w<1/N, u 0 ∗ = ⋯ = u N − 1 ∗ = 0 u_0^*=\cdots=u_{N-1}^*=0 u0==uN1=0.
(3) If 1 / N ≤ w ‾ ≤ 1 1/N\leq\overline{w}\leq 1 1/Nw1, u 0 ∗ = ⋯ = u N − k ˉ − 1 ∗ = 1 u_0^*=\cdots=u_{N-\bar{k}-1}^*=1 u0==uNkˉ1=1 u N − k ˉ ∗ = ⋯ = u N − 1 ∗ = 0 u_{N-\bar{k}}^*=\cdots=u_{N-1}^*=0 uNkˉ==uN1=0 where k ˉ \bar{k} kˉ is such that 1 / ( k ˉ + 1 ) &lt; w ‾ ≤ 1 / k ˉ 1/{(\bar{k}+1)}&lt;\overline{w}\leq 1/{\bar{k}} 1/(kˉ+1)<w1/kˉ. □ \qquad\qquad\qquad\qquad\qquad\qquad\qquad\Box

3.12 Use the discrete-time Minimum Principle to solve Exercise 1.15 of Chapter 1, assuming that each γ k \gamma_k γk and δ k \delta_k δk are fixed at a known deterministic values.

  • 1
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

zte10096334

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值