代码实现 Human-level control through deep reinforcement learning

最新推荐文章于 2021-12-10 14:02:33 发布

weixin_45681037

最新推荐文章于 2021-12-10 14:02:33 发布

阅读量370

点赞数

文章标签：强化学习 python

本文链接：https://blog.csdn.net/weixin_45681037/article/details/117714761

版权

深度强化学习 DQN CartPole环境神经网络 ε-贪婪策略

关键词由CSDN通过智能技术生成

代码实现 Human-level control through deep reinforcement learning

提示：文章写完后，目录可以自动生成，如何生成可参考右边的帮助文档

前言

使用DQN实现网页：https://www.youtube.com/watch?v=NP8pXZdU-5U&ab_channel=brthorbrthor

提示：以下是本篇文章正文内容，下面案例可供参考

一、论文名称？

Title:Human-level control through deep reinforcement learning
doi:10.1038/nature14236

二、代码

代码如下（示例）：

from torch import nn
import torch
import gym
from collections import deque
import itertools
import numpy as np
import random

GAMMA=0.99 #计算TD目标的折扣率
BATCH_SIZE=32 #the number of ransitions we are going to sample from the replay buffer when we are computing gradients
BUFFER_SIZE=50000 #max number of transitions we are going to stror in the replay buffer before overwriting old transitions
MIN_REPLAY_SIZE=1000 #how many transitions we want in the replay buffer before we start computing gradients and doing training
EPSILON_START=1.0# the starting value of epsilon
EPSILON_END=0.02#
EPSILON_DECAY=10000#the decay period which the epsilon will linearly anneal from EPSILON_START to EPSILON_END over this many steps
TARGET_UPDATE_FREQ=1000#the number of steps where we set the target parameters equal to the online parameters

#create our network class by creting a class which inherits from an nn.module(pytoch)
class Network(nn.Module):
    #this is  a discrete action space,and continuous action space is different
    def __init__(self,env):
        super().__init__()

        in_features=int(np.prod(env.observation_space.shape))
        #use a standard two layer sequential linear network with 64 hidden units
        self.net=nn.Sequential(
            nn.Linear(in_features,64),
            nn.Tanh(),
            nn.Linear(64,env.action_space.n))
    # reward function
    def forward(self,x):
        return self.net(x)

    def act(self,obs):
        #turn obs into a pytorch tensor
        obs_t=torch.as_tensor(obs, dtype=torch.float32)
        #compute the Q values for this specific observation
        q_values = self(obs_t.unsqueeze(0))#let unsq