Dynamic Programming and Optimal Control 第四章习题

4.3 Consider an inventory problem similar to the problem of Section 4.2 (zero fixed cost). The only difference is that at the beginning of each period k k k the decision maker, in addition to knowing the current inventory level x k x_k xk, receives an accurate forecast that the demand w k w_k wk will be selected in accordance with one out of two possible probability distributions P ℓ , P s P_{\ell},P_s P,Ps (large demand, small demand). The a priori probability of a large demand forecast is known (d. Section 1.4).
(1) Obtain the optimal ordering policy for the case of a single-period problem.
(2) Extend the result to the N N N-period case.
(3) Extend the result to the case of any finite number of possible distributions.

Solution. (1) Assume the priori probability of P ℓ , P s P_{\ell},P_s P,Ps is   p ℓ , p s \,p_{\ell},p_s p,ps respectively.
J 1 ( x 1 , y 1 ) = 0 J_1(x_1,y_1)=0 J1(x1,y1)=0 J 0 ( x 0 , y 0 ) = p s ⋅ min ⁡ u 0 [ c u 0 + H ( x 0 + u 0 − w 0 ) ] + p ℓ ⋅ min ⁡ u 0 [ c u 0 + H ( x 0 + u 0 − w 0 ) ] J_0(x_0,y_0)=p_s\cdot\min_{u_0}\left[cu_0+H(x_0+u_0-w_0)\right]+p_{\ell}\cdot\min_{u_0}\left[cu_0+H(x_0+u_0-w_0)\right] J0(x0,y0)=psu0min[cu0+H(x0+u0w0)]+pu0min[cu0+H(x0+u0w0)] = c u 0 + p ⋅ p s ( x 0 + u 0 − w 0 ) − h ⋅ p ℓ ( x 0 + u 0 − w 0 ) =cu_0+p\cdot p_s(x_0+u_0-w_0)-h\cdot p_{\ell}(x_0+u_0-w_0)\qquad\qquad =cu0+pps(x0+u0w0)hp(x0+u0w0) = c u 0 + ⋅ ( p ⋅ p s − h ⋅ p ℓ ) ( x 0 + u 0 − w 0 )            =cu_0+\cdot (p\cdot p_s-h\cdot p_{\ell})(x_0+u_0-w_0)\qquad\qquad\;\;\;\;\;\qquad\qquad =cu0+(ppshp)(x0+u0w0)
? ? ? ? ? ? ? ? ? ? ? ? ? ????????????? ?????????????

4.5 Consider the inventory problem of Section 4.2 for the case where the cost has the general form
E { ∑ k = 0 N r k ( x k ) } \mathbf{E}\left\{\sum_{k=0}^Nr_k(x_k)\right\} E{k=0Nrk(xk)} The functions r k r_k rk are convex and differentiable and
lim ⁡ x → − ∞ d r ( x ) d x = − ∞ , lim ⁡ x → ∞ d r ( x ) d x = ∞ k = 0 , ⋯   , N \lim_{x\rightarrow -\infty}\frac{d r(x)}{dx}=-\infty,\qquad \lim_{x\rightarrow \infty}\frac{d r(x)}{dx}=\infty\qquad k=0,\cdots,N xlimdxdr(x)=,xlimdxdr(x)=k=0,,N (1) Assume that the fixed cost is zero. Write the DP algorithm for this problem and show that the optimal ordering policy has the same form as the one derived in Section 4.2.
(2) Suppose there is a one-period time lag between the order and the delivery of inventory; that is, the system equation is of the form x k + 1 = x k + u k − 1 − w k , k = 0 , 1 , ⋯   , N − 1 x_{k+1}=x_k+u_{k-1}-w_k,\qquad k=0,1,\cdots,N-1 xk+1=xk+uk1wk,k=0,1,,N1 where u − 1 u_{-1} u1 is given. Reformulate the problem so that it has the form of the problem of part (1).

Solution.

4.13 (A Gambling Problem) A gambler enters a game whereby he may at time k k k stake any amount u l ≥ 0 u_l\ge 0 ul0 that does not exceed his current fortune x k x_k xk (defined to be his initial capital plus his gain or minus his loss thus far). He wins his stake back and as much more with probability p p p, 0 &lt; p &lt; 1 0&lt;p&lt;1 0<p<1, and he loses his stake with probability ( 1 − p ) (1-p) (1p). Show that the gambling strategy that maximizes E { ln ⁡ x N } \mathbf{E}\{\ln x_N\} E{lnxN}, where x N x_N xN denotes his fortune after N N N plays, is to stake at each time k k k an amount u k = ( 2 p − 1 ) x k u_k=(2p-1)x_k uk=(2p1)xk.

Solution. The gambling system can be formulated as follows: x k + 1 = x k + w k u k x_{k+1}=x_k+w_ku_k xk+1=xk+wkuk where w k w_k wk is a random variable with probability p p p taking value 1 1 1, and probability ( 1 − p ) (1-p) (1p) taking value − 1 -1 1. And the DP-algorithm can be written as: J N ( x N ) = ln ⁡ ( x N ) J_N(x_N)=\ln (x_N) JN(xN)=ln(xN) J k ( x k ) = min ⁡ u k E w k [ J k + 1 ( x k + w k u k ) ] J_k(x_k)=\mathop{\min}\limits_{u_k}\mathop\mathbf{E}\limits_{w_k}\left[J_{k+1}(x_k+w_ku_k)\right] Jk(xk)=ukminwkE[Jk+1(xk+wkuk)] For k = N − 1 k=N-1 k=N1, we have J N − 1 ( x N − 1 ) = min ⁡ u k E w k { ln ⁡ [ x N − 1 + w N − 1 x N − 1 ] } J_{N-1}(x_{N-1})=\mathop{\min}\limits_{u_k}\mathop\mathbf{E}\limits_{w_k}\left\{\ln[x_{N-1}+w_{N-1}x_{N-1}]\right\}\qquad\qquad\qquad\qquad\qquad\qquad JN1(xN1)=ukminwkE{ln[xN1+wN1xN1]} = p ⋅ ln ⁡ [ x N − 1 + u N − 1 ] + ( 1 − p ) ln ⁡ ( x N − 1 − u N − 1 ) =p\cdot \ln[x_{N-1}+u_{N-1}]+(1-p)\ln(x_{N-1}-u_{N-1}) =pln[xN1+uN1]+(1p)ln(xN1uN1) Calculate the derivative at u N − 1 u_{N-1} uN1 of the right hand side, we obtain p x N − 1 + u N − 1 − 1 − p x N − 1 − u N − 1 = ( 2 p − 1 ) x N − 1 − u N − 1 x N − 1 2 − u N − 1 2 = 0 \frac{p}{x_{N-1}+u_{N-1}}-\frac{1-p}{x_{N-1}-u_{N-1}}=\frac{(2p-1)x_{N-1}-u_{N-1}}{x_{N-1}^2-u_{N-1}^2}=0 xN1+uN1pxN1uN11p=xN12uN12(2p1)xN1uN1=0 yielding u N − 1 ∗ = ( 2 p − 1 ) x N − 1 u_{N-1}^*=(2p-1)x_{N-1} uN1=(2p1)xN1and J N − 1 ∗ = p ln ⁡ ( 2 p ) + ( 1 − p ) ln ⁡ ( 2 − 2 p ) + ln ⁡ x N − 1 = C p + ln ⁡ ( 2 − 2 p ) J_{N-1}^*=p\ln(2p)+(1-p)\ln(2-2p)+\ln x_{N-1}=C_p+\ln(2-2p) JN1=pln(2p)+(1p)ln(22p)+lnxN1=Cp+ln(22p) Using the similar augement and by induction, we can prove that for any k = 0 , 1 , ⋯ &ThinSpace; , N − 1 k=0,1,\cdots,N-1 k=0,1,,N1, u k ∗ = ( 2 p − 1 ) x k u_k^*=(2p-1)x_k uk=(2p1)xk and J k ∗ ( x k ) = C k + ln ⁡ x k J_k^*(x_k)=C_k+\ln x_k Jk(xk)=Ck+lnxk for some constant C k C_k Ck. □ \qquad\qquad\qquad\qquad\qquad\qquad\Box

  • 2
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

zte10096334

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值