Pytorch实现强化学习DQN玩迷宫游戏(莫凡强化学习DQN章节pytorch版本)

最新推荐文章于 2024-09-12 09:11:48 发布

Miller·Chen

最新推荐文章于 2024-09-12 09:11:48 发布

阅读量1.3k

点赞数 1

文章标签： pytorch 游戏 python

本文链接：https://blog.csdn.net/u012864339/article/details/129272810

版权

本文介绍了使用Pytorch实现强化学习DQN算法，并应用于解决迷宫游戏。参照莫凡老师的教程，提供了完整的代码资源，读者可以直接运行DQN_new.py进行体验。对于不熟悉的部分，建议查阅相关资料进行学习。

摘要由CSDN通过智能技术生成

1.详细的资料可以参考莫凡老师的网页

2.用pytorch实现DQN并用于玩maze

# -*- coding: utf-8 -*-


import math
import random
import matplotlib.pyplot as plt
from collections import namedtuple, deque
from itertools import count
import numpy as np

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F

from maze_env import Maze

random.seed(1)
torch.manual_seed(1)
np.random.seed(1)
BATCH_SIZE = 128  # BATCH_SIZE is the number of transitions sampled from the replay buffer
GAMMA = 0.9     # GAMMA is the discount factor as mentioned in the previous section
EPS_START = 0.9   # EPS_START is the starting value of epsilon
EPS_END = 0.05    # EPS_END is the final value of epsilon
EPS_DECAY = 1000  # EPS_DECAY controls the rate of exponential decay of epsilon, higher means a slower decay
TAU = 0.005   # TAU is the update rate of the target network
LR = 1e-4    # LR is the learning rate of the AdamW optimizer
env= Maze()
# Get number of actions from gym action space
n_actions = env.n_actions
# Get the number of state observations
state = env.reset()
n_observations = len(state)
steps_done = 0
episode_durations = []

# if gpu is to be used
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

Transition = namedtuple('Transition',
                        ('state', 'action', 'next_state', 'reward'))


class ReplayMemory(object):

    def __init__(self, capacity):
        self.memory = deque([], maxlen=capacity)

    def push(self, *args):
        """Save a transition"""
        self.memory.append(Transition(*args))

    def sample(self, batch_size):
        return random.sample(self.memory<