batch manufacturing problem

最新推荐文章于 2024-07-12 19:06:42 发布

无聊的小把戏

最新推荐文章于 2024-07-12 19:06:42 发布

阅读量1.4k

点赞数 3

文章标签： mdp python 马尔科夫决策过程概率论数学

本文链接：https://blog.csdn.net/qq_43630348/article/details/122239785

版权

batch manufacturing problem

《Dynamic Programming and Optimal Volume》习题7.8

A Manufacturer at each time period receives an order for her product with probability p and receives no order with probability 1 - p
At any period, she has a choice of processing all the unfilled orders in a batch, or process no order at all. The maximum number of orders that can remain unfilled is n
The cost per unfilled order at each time period is c > 0, the setup cost to process the unfilled orders is K > 0
The manufacturer wants to find a processing policy that minimizes the total expected cost with discount factor α < 1

实现value iteration（值迭代）和policy iteration（策略迭代）两种算法

其中

c = 1    K = 5    n = 10    p = 0.5    alpha = 0.9

值迭代算法

算法简介

$J_{k+1}(i)=\min _{u \in U(i)}\left[g(i, u)+\sum_{j=1}^{n} p_{i j}(u) J_{k}(j)\right], \forall i$

for any initial conditions, for instance $J_{0}(i)=0$ .
It is guaranteed that $\lim _{k \rightarrow \infty} J_{k}(i)=J^{*}(i)$

实现

def value_iteration(c, K, n, p, alpha=1):
    Jk = np.zeros(n+1)
    Jkp1 = np.zeros(n+1)
    threshold = 1e-10
    k=0
    while k<1000:
        k+=1
        for i in range(n):
            pro = K + alpha*(1-p)*Jk[0] + alpha*p*Jk[1]
            unpro = c*i + alpha*(1-p)*Jk[i] + alpha*p*Jk[i+1]
            Jkp1[i] = min(pro, unpro)
        Jkp1[n] = K + alpha*(1-p)*Jk[0] + alpha*p*Jk[1]
        if np.sum( np.fabs( Jkp1[n]-Jk[n] ) )<threshold:
            break
        Jk = Jkp1.copy()    
    print("the result of value_itreation is: ")
    print(Jk)

策略迭代

算法简介

$J_{\mu}(i)=g(i, \mu(i))+\sum_{j=1} p_{i j}(\mu(i)) J_{\mu}(j), i=1, \ldots, n$

has a unique solution $J_{\mu}(i), i=1, \ldots, n$
Policy iteration method:

Policy evaluation: Solve the above linear equations for $\mu^{k}$ to obtain $J_{\mu^{k}}(i), i=1, \ldots, n$ （迭代法求解线性方程组）
Policy improvement: Find an improved policy
$\mu^{k+1}(i)=\arg \min _{u \in U(i)}\left[g(i, u)+\sum_{j=1}^{n} p_{i j}(u) J_{\mu^{k}}(j)\right], \forall i$
Terminate condition: $J_{\mu^{k+1}}(i)=J_{\mu^{k}}(i)$ for all $i$

实现

def value_function(c, K, policy, n, p, alpha):
    value_table = np.ones(n)
    threshold = 1e-10
    k = 0
    while k<1000:
        new_value_table = value_table.copy()
        for i in range(n):
            action = policy[i]
            
            if action:  # process
                value_table[i] = K + alpha*(1-p)*new_value_table[0] + alpha*p*new_value_table[1]

            else:  # not process
                value_table[i] = c*i + alpha*(1-p)*new_value_table[i] + alpha*p*new_value_table[i+1]

        if (np.sum((np.fabs(new_value_table-value_table))) <= threshold):
            break
        
        k+=1
    return value_table


def policy_iteration(c, K, n, p, alpha=1):
    policy = np.ones(n)
    new_policy = np.ones(n)
    iter_num = 100  # number of iteration
    k = 0
    while k < iter_num:
        k += 1
        # compute value
        value_table = value_function(c, K, policy, n, p, alpha)
        
        # improve policy
        # 计算 当前状态i 做出 动作act 后的值
        for i in range(n-1):
            pro = K + alpha*((1-p)*value_table[0] + p*value_table[1])
            unpro = c*i + alpha*( (1-p)*value_table[i] + p*value_table[i+1] )
            if pro<unpro:
                new_policy[i] = 1
            else:
                new_policy[i] = 0
        new_policy[n-1] = 1
        if np.all(policy == new_policy):
            break
        policy = new_policy.copy()
        
    print("the policy of discount problem of alpha = " + str(alpha) + " is: ")
    print(policy)

    return new_policy

无聊的小把戏

关注

3
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
batch manufacturing problem

batch manufacturing problem《Dynamic Programming and Optimal Volume》习题7.8A Manufacturer at each time period receives an order for her product with probability p and receives no order with probability 1 - pAt any period, she has a choice of processing al
复制链接

扫一扫