AC算法运行自定义gym环境

gym环境搭建gym使用解析ACpole中gym的环境使用:import gymenv = gym.make('CartPole-v0')env.seed(1) # reproducibleenv = env.unwrappedN_F = env.observation_space.shape[0]N_A = env.action_space.ns = env.reset(...
摘要由CSDN通过智能技术生成

gym使用解析

ACpole中gym的环境使用代码:

import gym
env = gym.make('CartPole-v0')
env.seed(1)  # reproducible
env = env.unwrapped

N_F = env.observation_space.shape[0]
N_A = env.action_space.n

s = env.reset()
s_, r, done, info = env.step(a)
env.render()

代码分析https://www.colabug.com/2019/0821/6655535/

机器人走迷宫实例https://blog.csdn.net/extremebingo/article/details/80867486

在本地库导入https://blog.csdn.net/u011254180/article/details/88221426

要引入自定义环境,不必改动 Gym 的源码,只需创建一个 Python 模块 即可。目录结构解释如下

img

在构造函数中,先定义action_space(包含智体在此环境中可能采取的所有actions)的type和shape,类似的定义observation_space(包含此环境下智体一次观察到的所有类型数据).

AC算法实现机器人走迷宫

改动的目录结构比上图少了README和gym/setup.py,主要参考博文https://blog.csdn.net/extremebingo/article/details/80867486

  1. 在路径D:\python\Python3_7\Lib\site-packages\gym\envs下新建文件夹user,存放自定义强化学习环境

center

  1. 新建环境grid_mdp_v1.py

    创建GridEnv1类,详见代码。主要设置如下:

    """
        Description:
            A robot is trying to move in a maze, where exists obstruction, 
            gem and fire. The goal is to get the gem avoiding fire.
        
        Observation:
            Type: 
            Num Observation Min Max
            0   state       0   15    
        
        Actions:
            {0:'n',1:'e',2:'s',3:'w'}
    
        Reward:
            Fire    ->  -20
            Gem     ->  20
            Invalid ->  -1
            Valid   ->  0
            No gem until MAX STEPS  -> -50
        
        Staring State:
            random position except 5,11,12,15 (obstruction, Fire, Gem)
    
        Episode Termination:
            Robot touch fire or gem or reach MAX STEPS.
            [11,12,15]
    """
    

    全部代码如下,对原博文稍作修改,解决了几个小bug并适应AC算法的接口:

import logging
import random
import gym
import numpy as np


class GridEnv1(gym.Env):
    metadata = {
   
        'render.modes': ['human', 'rgb_array'],
        'video.frames_per_second': 2
    }

    def __init__(self):

        self.states = range(0,16) #状态空间0~15
        self.observation_space=np.array([1])

        self.x=[150,250,350,450] * 4
        self.y=[450] * 4 + [350] * 4 + [250] * 40 + [150] * 4

        self.terminate_states = dict()  #终止状态为字典格式
        self.terminate_states[11] = 1
        self.terminate_states[12] = 1
        self.terminate_states[15] = 1

        self.action_space = {
   0:'n',1:'e',2:'s',3:'w'}

        self.rewards = dict();        #回报的数据结构为字典
        self.rewards['8_s'] = -20.0
        self.rewards['13_w'] = -20.0
        self.rewards['7_s'] = -20.0
        self.rewards['10_e'] = -20.0
        self.rewards['14_e'] = 100.0

        self.t = dict();             #状态转移的数据格式为字典
        self.t['0_s'] = 4
        self.t['0_e'] = 1
        self.t['1_e'] = 2
        self.t['1_w'] = 0
        self.t['2_w'] = 1
        self.t['2_e'] = 3
        self.t['2_s'] = 6
        self.t['3_w'] = 2
        self.t['3_s'] = 7
        self.t['4_n'] = 0
        self.t['4_s'] = 8
        self.t['6_n'] = 2
        self.t['6_s'] = 10
        self.t['6_e'] = 7
        self.t['7_w'] = 6
        self.t['7_n'] = 3
        self.t['7_s'] = 11
        self.t['8_n'] = 4
        self.t['8_e'] = 9
        self.t['8_s'] = 12
        self.t['9_w'] = 8
        self.t['9_e'] = 10
        self.t['9_s'] = 13
        self.t['10_w'] = 9
        self.t['10_n'] = 6
        self.t['10_e'] = 11
        self.t['10_s'] = 14
  
  • 1
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 2
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值