Pytorch实现强化学习DQN玩迷宫游戏(莫凡强化学习DQN章节pytorch版本)

本文介绍了使用Pytorch实现强化学习DQN算法,并应用于解决迷宫游戏。参照莫凡老师的教程,提供了完整的代码资源,读者可以直接运行DQN_new.py进行体验。对于不熟悉的部分,建议查阅相关资料进行学习。
摘要由CSDN通过智能技术生成

1.详细的资料可以参考莫凡老师的网页

2.用pytorch实现DQN并用于玩maze

# -*- coding: utf-8 -*-


import math
import random
import matplotlib.pyplot as plt
from collections import namedtuple, deque
from itertools import count
import numpy as np

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F

from maze_env import Maze

random.seed(1)
torch.manual_seed(1)
np.random.seed(1)
BATCH_SIZE = 128  # BATCH_SIZE is the number of transitions sampled from the replay buffer
GAMMA = 0.9     # GAMMA is the discount factor as mentioned in the previous section
EPS_START = 0.9   # EPS_START is the starting value of epsilon
EPS_END = 0.05    # EPS_END is the final value of epsilon
EPS_DECAY = 1000  # EPS_DECAY controls the rate of exponential decay of epsilon, higher means a slower decay
TAU = 0.005   # TAU is the update rate of the target network
LR = 1e-4    # LR is the learning rate of the AdamW optimizer
env= Maze()
# Get number of actions from gym action space
n_actions = env.n_actions
# Get the number of state observations
state = env.reset()
n_observations = len(state)
steps_done = 0
episode_durations = []

# if gpu is to be used
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

Transition = namedtuple('Transition',
                        ('state', 'action', 'next_state', 'reward'))


class ReplayMemory(object):

    def __init__(self, capacity):
        self.memory = deque([], maxlen=capacity)

    def push(self, *args):
        """Save a transition"""
        self.memory.append(Transition(*args))

    def sample(self, batch_size):
        return random.sample(self.memory<
DQN(Deep Q-Network)是一种使用深度神经网络实现强化学习算法,用于解决离散动作空间的问题。在PyTorch实现DQN可以分为以下几个步骤: 1. 定义神经网络:使用PyTorch定义一个包含多个全连接层的神经网络,输入为状态空间的维度,输出为动作空间的维度。 ```python import torch.nn as nn import torch.nn.functional as F class QNet(nn.Module): def __init__(self, state_dim, action_dim): super(QNet, self).__init__() self.fc1 = nn.Linear(state_dim, 64) self.fc2 = nn.Linear(64, 64) self.fc3 = nn.Linear(64, action_dim) def forward(self, x): x = F.relu(self.fc1(x)) x = F.relu(self.fc2(x)) x = self.fc3(x) return x ``` 2. 定义经验回放缓存:包含多条经验,每条经验包含一个状态、一个动作、一个奖励和下一个状态。 ```python import random class ReplayBuffer(object): def __init__(self, max_size): self.buffer = [] self.max_size = max_size def push(self, state, action, reward, next_state): if len(self.buffer) < self.max_size: self.buffer.append((state, action, reward, next_state)) else: self.buffer.pop(0) self.buffer.append((state, action, reward, next_state)) def sample(self, batch_size): state, action, reward, next_state = zip(*random.sample(self.buffer, batch_size)) return torch.stack(state), torch.tensor(action), torch.tensor(reward), torch.stack(next_state) ``` 3. 定义DQN算法:使用PyTorch定义DQN算法,包含训练和预测两个方法。 ```python class DQN(object): def __init__(self, state_dim, action_dim, gamma, epsilon, lr): self.qnet = QNet(state_dim, action_dim) self.target_qnet = QNet(state_dim, action_dim) self.gamma = gamma self.epsilon = epsilon self.lr = lr self.optimizer = torch.optim.Adam(self.qnet.parameters(), lr=self.lr) self.buffer = ReplayBuffer(100000) self.loss_fn = nn.MSELoss() def act(self, state): if random.random() < self.epsilon: return random.randint(0, action_dim - 1) else: with torch.no_grad(): q_values = self.qnet(state) return q_values.argmax().item() def train(self, batch_size): state, action, reward, next_state = self.buffer.sample(batch_size) q_values = self.qnet(state).gather(1, action.unsqueeze(1)).squeeze(1) target_q_values = self.target_qnet(next_state).max(1)[0].detach() expected_q_values = reward + self.gamma * target_q_values loss = self.loss_fn(q_values, expected_q_values) self.optimizer.zero_grad() loss.backward() self.optimizer.step() def update_target_qnet(self): self.target_qnet.load_state_dict(self.qnet.state_dict()) ``` 4. 训练模型:使用DQN算法进行训练,并更新目标Q网络。 ```python dqn = DQN(state_dim, action_dim, gamma=0.99, epsilon=1.0, lr=0.001) for episode in range(num_episodes): state = env.reset() total_reward = 0 for step in range(max_steps): action = dqn.act(torch.tensor(state, dtype=torch.float32)) next_state, reward, done, _ = env.step(action) dqn.buffer.push(torch.tensor(state, dtype=torch.float32), action, reward, torch.tensor(next_state, dtype=torch.float32)) state = next_state total_reward += reward if len(dqn.buffer.buffer) > batch_size: dqn.train(batch_size) if step % target_update == 0: dqn.update_target_qnet() if done: break dqn.epsilon = max(0.01, dqn.epsilon * 0.995) ``` 5. 测试模型:使用训练好的模型进行测试。 ```python total_reward = 0 state = env.reset() while True: action = dqn.act(torch.tensor(state, dtype=torch.float32)) next_state, reward, done, _ = env.step(action) state = next_state total_reward += reward if done: break print("Total reward: {}".format(total_reward)) ``` 以上就是在PyTorch实现DQN强化学习的基本步骤。需要注意的是,DQN算法中还有很多细节和超参数需要调整,具体实现过程需要根据具体问题进行调整。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值