代码实现 Human-level control through deep reinforcement learning
提示:文章写完后,目录可以自动生成,如何生成可参考右边的帮助文档
前言
使用DQN实现 网页:https://www.youtube.com/watch?v=NP8pXZdU-5U&ab_channel=brthorbrthor提示:以下是本篇文章正文内容,下面案例可供参考
一、论文名称?
Title:Human-level control through deep reinforcement learning
doi:10.1038/nature14236
二、代码
代码如下(示例):
from torch import nn
import torch
import gym
from collections import deque
import itertools
import numpy as np
import random
GAMMA=0.99 #计算TD目标的折扣率
BATCH_SIZE=32 #the number of ransitions we are going to sample from the replay buffer when we are computing gradients
BUFFER_SIZE=50000 #max number of transitions we are going to stror in the replay buffer before overwriting old transitions
MIN_REPLAY_SIZE=1000 #how many transitions we want in the replay buffer before we start computing gradients and doing training
EPSILON_START=1.0# the starting value of epsilon
EPSILON_END=0.02#
EPSILON_DECAY=10000#the decay period which the epsilon will linearly anneal from EPSILON_START to EPSILON_END over this many steps
TARGET_UPDATE_FREQ=1000#the number of steps where we set the target parameters equal to the online parameters
#create our network class by creting a class which inherits from an nn.module(pytoch)
class Network(nn.Module):
#this is a discrete action space,and continuous action space is different
def __init__(self,env):
super().__init__()
in_features=int(np.prod(env.observation_space.shape))
#use a standard two layer sequential linear network with 64 hidden units
self.net=nn.Sequential(
nn.Linear(in_features,64),
nn.Tanh(),
nn.Linear(64,env.action_space.n))
# reward function
def forward(self,x):
return self.net(x)
def act(self,obs):
#turn obs into a pytorch tensor
obs_t=torch.as_tensor(obs, dtype=torch.float32)
#compute the Q values for this specific observation
q_values = self(obs_t.unsqueeze(0))#let unsq