代码实现 Human-level control through deep reinforcement learning

代码实现 Human-level control through deep reinforcement learning

提示:文章写完后,目录可以自动生成,如何生成可参考右边的帮助文档


前言

使用DQN实现 网页:https://www.youtube.com/watch?v=NP8pXZdU-5U&ab_channel=brthorbrthor

提示:以下是本篇文章正文内容,下面案例可供参考

一、论文名称?

Title:Human-level control through deep reinforcement learning
doi:10.1038/nature14236

二、代码

代码如下(示例):

from torch import nn
import torch
import gym
from collections import deque
import itertools
import numpy as np
import random

GAMMA=0.99 #计算TD目标的折扣率
BATCH_SIZE=32 #the number of ransitions we are going to sample from the replay buffer when we are computing gradients
BUFFER_SIZE=50000 #max number of transitions we are going to stror in the replay buffer before overwriting old transitions
MIN_REPLAY_SIZE=1000 #how many transitions we want in the replay buffer before we start computing gradients and doing training
EPSILON_START=1.0# the starting value of epsilon
EPSILON_END=0.02#
EPSILON_DECAY=10000#the decay period which the epsilon will linearly anneal from EPSILON_START to EPSILON_END over this many steps
TARGET_UPDATE_FREQ=1000#the number of steps where we set the target parameters equal to the online parameters

#create our network class by creting a class which inherits from an nn.module(pytoch)
class Network(nn.Module):
    #this is  a discrete action space,and continuous action space is different
    def __init__(self,env):
        super().__init__()

        in_features=int(np.prod(env.observation_space.shape))
        #use a standard two layer sequential linear network with 64 hidden units
        self.net=nn.Sequential(
            nn.Linear(in_features,64),
            nn.Tanh(),
            nn.Linear(64,env.action_space.n))
    # reward function
    def forward(self,x):
        return self.net(x)

    def act(self,obs):
        #turn obs into a pytorch tensor
        obs_t=torch.as_tensor(obs, dtype=torch.float32)
        #compute the Q values for this specific observation
        q_values = self(obs_t.unsqueeze(0))#let unsq
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值