dueling network原理和实现

最新推荐文章于 2024-04-20 09:40:49 发布

KPer_Yang

最新推荐文章于 2024-04-20 09:40:49 发布

阅读量253

点赞数 2

分类专栏：机器学习文章标签：机器学习人工智能 python 深度学习

本文链接：https://blog.csdn.net/KPer_Yang/article/details/132156765

版权

机器学习专栏收录该内容

87 篇文章 18 订阅

订阅专栏

算法原理：
$\begin{gathered}Q(s,a;\theta,\alpha,\beta)=V(s;\theta,\beta)+\left(A(s,a;\theta,\alpha)-\max_{a'\in|\mathcal{A}|}A(s,a';\theta,\alpha)\right).\end{gathered}$
注：DuelingNetwork只是改变最优动作价值网络的架构，原本用来训练DQN的策略依然可以使用：

1、优先级经验回放；

2、Double DQN;

3、Multi-step TD;

在这里插入图片描述

代码实现，只需要将原来的DQN的最优动作价值网络修改成Dueling Network的形式：

class DuelingNetwork(nn.Module):
    """QNet.
    Input: feature
    Output: num_act of values
    """

    def __init__(self, dim_state, num_action):
        super().__init__()
        # A分支
        self.a_fc1 = nn.Linear(dim_state, 64)
        self.a_fc2 = nn.Linear(64, 32)
        self.a_fc3 = nn.Linear(32, num_action)
        # V分支
        self.v_fc1 = nn.Linear(dim_state, 64)
        self.v_fc2 = nn.Linear(64, 32)
        self.v_fc3 = nn.Linear(32, 1)

    def forward(self, state):
        # 计算A
        a_x = F.relu(self.a_fc1(state))
        a_x = F.relu(self.a_fc2(a_x))
        a_x = self.a_fc3(a_x)
        # 计算V
        v_x = F.relu(self.v_fc1(state))
        v_x = F.relu(self.v_fc2(v_x))
        v_x = self.v_fc3(v_x)
        # 计算输出
        x = a_x - v_x - a_x.max()
        return x