§ 8 \S8 §8动态规划
基本概念
1. 阶段
每一个子问题,对应一个阶段的决策。
2. 状态与状态变量
每个阶段的初始自然状况、客观条件即为动态规划问题的状态。
描述状态的变量为状态变量。
状态应具有“无后效性”,即后续过程发展不受这一阶段之前各阶段状态的影响。
3. 决策
描述决出决策的变量,和初始状态有关。
D k ( s k ) D_k(s_k) Dk(sk)表示k阶段从状态 s k s_k sk出发的允许决策集合,决策变量 u k ( s k ) ∈ D k ( s k ) u_k(s_k)\in{D_k(s_k)} uk(sk)∈Dk(sk)
4. 策略
顺序排列的决策的集合,k阶段之后的问题为k子过程。
p
k
,
n
(
s
k
)
=
{
u
k
(
s
k
)
,
u
k
+
1
(
s
k
+
1
)
,
.
.
,
u
n
(
s
n
)
}
p_{k,n}(s_k)=\{u_{k}(s_k),u_{k+1}(s_{k+1}),..,u_{n}(s_n)\}
pk,n(sk)={uk(sk),uk+1(sk+1),..,un(sn)}
k
=
1
k=1
k=1时,为允许策略集合
p
1
,
n
(
s
1
)
=
{
u
1
(
s
1
)
,
u
2
(
s
2
)
,
.
.
,
u
n
(
s
n
)
}
p_{1,n}(s_1)=\{u_{1}(s_1),u_{2}(s_{2}),..,u_{n}(s_n)\}
p1,n(s1)={u1(s1),u2(s2),..,un(sn)}
5. 状态转移方程
如果k阶段的状态变量
s
k
s_k
sk和决策变量
u
k
u_k
uk确定,则下一阶段的状态变量则已确定,这种关系为状态转移方程
s
k
+
1
=
T
k
(
s
k
,
u
k
)
s_{k+1}=T_k(s_k,u_k)
sk+1=Tk(sk,uk)
6. 指标函数和最优值函数
指标函数为衡量实现过程优劣的数量指标
V
k
,
n
=
V
k
,
n
(
s
k
,
u
k
,
s
k
+
1
,
.
.
.
,
s
n
+
1
)
,
k
=
1
,
2
,
.
.
.
,
n
V_{k,n}=V_{k,n}(s_k,u_k,s_{k+1},...,s_{n+1}),k=1,2,...,n
Vk,n=Vk,n(sk,uk,sk+1,...,sn+1),k=1,2,...,n
指标函数具有可分离性,并满足递推关系
V
k
,
n
(
s
k
,
u
k
,
s
k
+
1
,
.
.
.
,
s
n
+
1
)
=
ψ
k
[
s
k
,
u
k
,
V
k
+
1
,
n
(
s
k
+
1
,
.
.
.
,
s
n
+
1
)
]
V_{k,n}(s_k,u_k,s_{k+1},...,s_{n+1})=\psi_k[s_k,u_k,V_{k+1},n(s_{k+1},...,s_{n+1})]
Vk,n(sk,uk,sk+1,...,sn+1)=ψk[sk,uk,Vk+1,n(sk+1,...,sn+1)]
指标函数的最优值,为最优值函数
f
s
(
s
k
)
=
max
u
k
,
.
.
.
,
u
n
V
k
,
n
(
s
k
,
u
k
,
.
.
.
,
s
n
+
1
)
f_s(s_k)=\max_{u_k,...,u_n}V_{k,n}(s_k,u_k,...,s_{n+1})
fs(sk)=uk,...,unmaxVk,n(sk,uk,...,sn+1)
或者
f
s
(
s
k
)
=
min
u
k
,
.
.
.
,
u
n
V
k
,
n
(
s
k
,
u
k
,
.
.
.
,
s
n
+
1
)
f_s(s_k)=\min_{u_k,...,u_n}V_{k,n}(s_k,u_k,...,s_{n+1})
fs(sk)=uk,...,unminVk,n(sk,uk,...,sn+1)
求解方法
在初始状态给定时,用逆推解法;终止方式给定时,用顺推解法
决策过程: n n n阶段
状态变量: s 1 , s 2 , . . . , s n + 1 s_1,s_2,...,s_{n+1} s1,s2,...,sn+1
决策变量: x 1 , x 2 , . . . , x n x_1,x_2,...,x_n x1,x2,...,xn
状态转移方程: s k + 1 = T k ( s k , x k ) s_{k+1}=T_k(s_k,x_k) sk+1=Tk(sk,xk)
总效益(指标函数)与各阶段效益之间的关系: V 1 , n = v 1 ( s 1 , x 1 ) ∗ v 2 ( s 2 , x 2 ) ∗ . . . ∗ v n ( s n , x n ) V_{1,n}=v_1(s_1,x_1)*v_2(s_2,x_2)*...*v_n(s_n,x_n) V1,n=v1(s1,x1)∗v2(s2,x2)∗...∗vn(sn,xn)
逆序解法-逆向归纳法
为使得总效益最大,需求解 opt V 1 , n \text{opt}{\quad}V_{1,n} optV1,n, 即 max V 1 , n {\max}{\quad}V_{1,n} maxV1,n
从 k k k阶段到 n n n阶段采用最优决策,最大收益为 f k ( s k ) f_k(s_k) fk(sk)
最后一个阶段有:
f
n
(
S
n
)
=
max
x
n
∈
D
n
(
s
n
)
v
n
(
s
n
,
x
n
)
f_n(S_n)=\max_{x_n{\in}D_n(s_n)}v_n(s_n,x_n)
fn(Sn)=xn∈Dn(sn)maxvn(sn,xn)
D
n
(
s
n
)
D_n(s_n)
Dn(sn)是状态
s
n
s_n
sn所有允许的决策集合,假设其最优解为
x
n
=
x
n
(
s
n
)
x_n=x_n(s_n)
xn=xn(sn)
n-1阶段,有:
f
n
−
1
(
s
n
−
1
)
=
max
x
n
−
1
∈
D
n
−
1
(
s
n
−
1
)
[
v
n
−
1
(
s
n
−
1
,
x
n
−
1
)
∗
f
n
(
s
n
)
)
]
f_{n-1}(s_{n-1})=\max_{x_{n-1}{\in}D_{n-1}(s_{n-1})}[v_{n-1}(s_{n-1},x_{n-1})*f_n(s_n))]
fn−1(sn−1)=xn−1∈Dn−1(sn−1)max[vn−1(sn−1,xn−1)∗fn(sn))]
其中
s
n
=
T
n
−
1
(
s
n
−
1
,
x
n
−
1
)
s_n=T_{n-1}(s_{n-1},x_{n-1})
sn=Tn−1(sn−1,xn−1)
求解一维极值问题,可以得到最优解 x n − 1 = x n − 1 ( s n − 1 ) x_{n-1}=x_{n-1}(s_{n-1}) xn−1=xn−1(sn−1)和最优值 f n − 1 ( s n − 1 ) f_{n-1}(s_{n-1}) fn−1(sn−1)
在k阶段,有:
f
k
(
s
k
)
=
max
x
k
∈
D
k
(
s
k
)
[
v
k
(
s
k
,
x
k
)
∗
f
k
+
1
(
s
k
+
1
)
]
f_k(s_k)=\max_{x_{k}{\in}D_k(s_{k})}[v_{k}(s_{k},x_{k})*f_{k+1}(s_{k+1})]
fk(sk)=xk∈Dk(sk)max[vk(sk,xk)∗fk+1(sk+1)]
其中
s
k
+
1
=
T
k
(
s
n
−
1
,
x
n
−
1
)
s_{k+1}=T_{k}(s_{n-1},x_{n-1})
sk+1=Tk(sn−1,xn−1)
求解一维极值问题,可以得到最优解 x n − 1 = x n − 1 ( s n − 1 ) x_{n-1}=x_{n-1}(s_{n-1}) xn−1=xn−1(sn−1)和最优值 f n − 1 ( s n − 1 ) f_{n-1}(s_{n-1}) fn−1(sn−1)
以此类推,直到第一阶段,可以得到最优解 x 1 = x 1 ( s 1 ) x_1=x_1(s_1) x1=x1(s1)和最优值 f 1 ( s 1 ) f_1(s_1) f1(s1)
由于初始状态已知,因此可以逐步确定后续解。
eg1
KaTeX parse error: Undefined control sequence: \mbox at position 63: …^2{\cdot}x_3\\ \̲m̲b̲o̲x̲{s.t.}\quad &x_…
状态转移方程和决策变量:
s
3
=
x
3
,
s
3
+
x
2
=
s
2
,
s
2
+
x
1
=
s
1
=
c
x
3
=
s
3
,
0
≤
x
2
≤
s
2
,
0
≤
x
1
≤
s
1
=
c
\begin{alignat}{2} &s_3=x_3,s_3+x_2=s_2,s_2+x_1=s_1=c\\ &x_3=s_3,0{\leq}x_2{\leq}s_2,0{\leq}x_1{\leq}s_1=c \end{alignat}
s3=x3,s3+x2=s2,s2+x1=s1=cx3=s3,0≤x2≤s2,0≤x1≤s1=c
求解:
f
3
(
s
3
)
=
max
x
3
=
s
3
(
x
3
)
=
s
3
f_3(s_3)=\max_{x_3=s_3}(x_3)=s_3
f3(s3)=x3=s3max(x3)=s3
最优解
x
3
∗
=
s
3
x_3^*=s_3
x3∗=s3
f
2
(
s
2
)
=
max
0
≤
x
2
≤
s
2
[
x
2
2
f
3
(
s
3
)
]
=
max
0
≤
x
2
≤
s
2
[
x
2
2
(
s
2
−
x
2
)
]
f_2(s_2)=\max_{0{\leq}x_2{\leq}s_2}\left[{x_2^2f_3(s_3)}\right]=\max_{0{\leq}x_2{\leq}s_2}\left[{x_2^2(s_2-x_2)}\right]
f2(s2)=0≤x2≤s2max[x22f3(s3)]=0≤x2≤s2max[x22(s2−x2)]
假设
h
2
(
s
2
,
x
2
)
=
x
2
2
(
s
2
,
x
2
)
h_2(s_2,x_2)=x_2^2(s_2,x_2)
h2(s2,x2)=x22(s2,x2)
由一阶条件: d h 2 d x 2 = 2 x 2 s 2 − 3 x 2 2 = 0 \frac{dh_2}{dx_2}=2x_2s_2-3x_2^2=0 dx2dh2=2x2s2−3x22=0得到 x 2 = 2 3 s 2 x_2=\frac{2}{3}s_2 x2=32s2和 x 2 = 0 x_2=0 x2=0(舍去)
又由二阶条件: d 2 h 2 d x 2 2 = 2 s 2 − 6 x 2 \frac{d^2h_2}{dx_2^2}=2s_2-6x_2 dx22d2h2=2s2−6x2,代入 x 2 = 2 3 s 2 x_2=\frac{2}{3}s_2 x2=32s2, < 0 <0 <0,因此其为极大值
代入 h 2 h_2 h2,得到 f 2 ( s 2 ) = 4 27 s 2 3 f_2(s_2)=\frac{4}{27}s_2^3 f2(s2)=274s23,以及最优解 x 2 ∗ = 2 3 s 2 x_2^*=\frac{2}{3}s_2 x2∗=32s2
同理可得
f
1
(
s
1
)
=
max
0
≤
x
1
≤
s
1
[
x
1
f
2
(
s
2
)
]
=
max
0
≤
x
1
≤
s
1
[
x
1
⋅
4
27
(
s
1
−
x
1
)
3
]
f_1(s_1)=\max_{0{\leq}x_1{\leq}s_1}\left[{x_1f_2(s_2)}\right]=\max_{0{\leq}x_1{\leq}s_1}\left[{x_1{\cdot}\frac{4}{27}(s_1-x_1)^3}\right]
f1(s1)=0≤x1≤s1max[x1f2(s2)]=0≤x1≤s1max[x1⋅274(s1−x1)3]
解得
x
1
∗
=
1
4
s
1
x_1^*=\frac{1}{4}s_1
x1∗=41s1,
1
64
s
1
4
\frac{1}{64}s_1^4
641s14
由于已知
s
1
=
c
s_1=c
s1=c,逆向归纳得到:
x
1
∗
=
1
4
c
,
f
1
(
c
)
=
1
64
c
4
x_1^*=\frac{1}{4}c,f_1(c)=\frac{1}{64}c^4
x1∗=41c,f1(c)=641c4
s 2 = s 1 − x 1 ∗ = 3 4 c s_2=s_1-x_1^*=\frac{3}{4}c s2=s1−x1∗=43c
x 2 ∗ = 2 3 s 2 = 1 2 c , f 2 ( s 2 ) = 1 16 c 3 x_2^*=\frac{2}{3}s_2=\frac{1}{2}c,f_2(s_2)=\frac{1}{16}c^3 x2∗=32s2=21c,f2(s2)=161c3
s 3 = s 2 − x 2 ∗ = 1 4 c s_3=s_2-x_2^*=\frac{1}{4}c s3=s2−x2∗=41c
x 3 ∗ = 1 4 c , f 3 ( s 3 ) = 1 4 c x_3^*=\frac{1}{4}c,f_3(s_3)=\frac{1}{4}c x3∗=41c,f3(s3)=41c
最优解: x 1 ∗ = 1 4 c , x 2 ∗ = 1 2 c , x 3 ∗ = 1 4 c x_1^*=\frac{1}{4}c,x_2^*=\frac{1}{2}c,x_3^*=\frac{1}{4}c x1∗=41c,x2∗=21c,x3∗=41c
最优目标函数值: max z = f 1 ( c ) = 1 64 c 4 \max{\quad}z=f_1(c)=\frac{1}{64}c^4 maxz=f1(c)=641c4
库存管理问题
P289
状态变量: x t = x_t= xt=第t周期的期初库存
订货量决策: q t q_t qt
最优期望利润: R t ( x t ) R_t(x_t) Rt(xt)
状态转移方程:
x
t
+
1
=
{
x
t
+
q
t
−
D
t
i
f
D
t
≤
x
t
+
q
t
0
i
f
D
t
>
x
t
+
q
t
x_{t+1}=\left\{ \begin{array}{rcl} x_t+q_t-D_t{\quad}{if{\quad}D_t\leq{x_t+q_t}}&\\ 0{\quad}{if{\quad}D_t>{x_t+q_t}}& \end{array} \right.
xt+1={xt+qt−DtifDt≤xt+qt0ifDt>xt+qt
利润:
ϕ
(
q
t
∣
D
t
)
=
{
p
D
t
−
c
q
t
−
h
(
x
t
+
q
t
−
D
t
)
i
f
D
t
≤
x
t
+
q
t
p
(
x
t
+
q
t
)
−
c
q
t
i
f
D
t
>
x
t
+
q
t
\phi(q_t|D_t)=\left\{ \begin{array}{rcl} pD_t-cq_t-h(x_t+q_t-D_t){\quad}{if{\quad}D_t\leq{x_t+q_t}}&\\ p(x_t+q_t)-cq_t{\quad}{if{\quad}D_t>{x_t+q_t}}& \end{array} \right.
ϕ(qt∣Dt)={pDt−cqt−h(xt+qt−Dt)ifDt≤xt+qtp(xt+qt)−cqtifDt>xt+qt
即
ϕ
(
q
t
∣
D
t
)
=
p
⋅
min
(
D
t
,
x
t
+
q
t
)
−
c
q
t
−
h
⋅
max
(
x
t
+
q
t
−
D
t
,
0
)
\phi(q_t|D_t)=p{\cdot}\min(D_t,x_t+q_t)-cq_t-h{\cdot}\max(x_t+q_t-D_t,0)
ϕ(qt∣Dt)=p⋅min(Dt,xt+qt)−cqt−h⋅max(xt+qt−Dt,0)
sup:上确界
递归方程式(Bellman方程):
R
t
(
x
t
)
=
sup
q
t
≥
0
E
{
ϕ
(
q
t
∣
D
t
)
}
=
sup
q
t
≥
0
{
−
c
q
t
+
E
[
p
⋅
min
(
D
t
,
x
t
+
q
t
)
−
h
⋅
max
(
x
t
+
q
t
−
D
t
,
0
)
+
R
t
+
1
(
x
t
+
1
)
]
}
\left. \begin{aligned} R_t(x_t)&=\sup_{q_t\geq0}\mathbf{E}\{\phi(q_t|D_t)\}\\ &=\sup_{q_t\geq0}\{-cq_t+\mathbf{E}[p{\cdot}\min(D_t,x_t+q_t)-h{\cdot}\max(x_t+q_t-D_t,0)+R_{t+1}(x_{t+1})]\} \end{aligned} \right.
Rt(xt)=qt≥0supE{ϕ(qt∣Dt)}=qt≥0sup{−cqt+E[p⋅min(Dt,xt+qt)−h⋅max(xt+qt−Dt,0)+Rt+1(xt+1)]}
其中:
x
t
+
1
=
max
{
x
t
+
q
t
−
D
t
,
0
}
x_{t+1}=\max\{x_t+q_t-D_t,0\}
xt+1=max{xt+qt−Dt,0}
模型的边际条件为(在最后一周期)
R
T
(
x
t
)
=
sup
q
T
≥
0
{
−
c
q
T
+
E
[
p
⋅
min
(
D
T
,
x
T
+
q
T
)
]
}
R_T(x_t)=\sup_{q_T\geq0}\{-cq_T+\mathbf{E}[p{\cdot}\min(D_T,x_T+q_T)]\}
RT(xt)=qT≥0sup{−cqT+E[p⋅min(DT,xT+qT)]}
在最优决策下,
T
T
T个周期内的最优期望总利润为
R
1
(
0
)
R_1(0)
R1(0)