gym使用解析
ACpole中gym的环境使用代码:
import gym
env = gym.make('CartPole-v0')
env.seed(1) # reproducible
env = env.unwrapped
N_F = env.observation_space.shape[0]
N_A = env.action_space.n
s = env.reset()
s_, r, done, info = env.step(a)
env.render()
代码分析https://www.colabug.com/2019/0821/6655535/
机器人走迷宫实例https://blog.csdn.net/extremebingo/article/details/80867486
在本地库导入https://blog.csdn.net/u011254180/article/details/88221426
要引入自定义环境,不必改动 Gym 的源码,只需创建一个 Python 模块 即可。目录结构解释如下
在构造函数中,先定义action_space
(包含智体在此环境中可能采取的所有actions
)的type和shape,类似的定义observation_space
(包含此环境下智体一次观察到的所有类型数据).
AC算法实现机器人走迷宫
改动的目录结构比上图少了README和gym/setup.py,主要参考博文https://blog.csdn.net/extremebingo/article/details/80867486
- 在路径
D:\python\Python3_7\Lib\site-packages\gym\envs
下新建文件夹user,存放自定义强化学习环境
-
新建环境
grid_mdp_v1.py
创建GridEnv1类,详见代码。主要设置如下:
""" Description: A robot is trying to move in a maze, where exists obstruction, gem and fire. The goal is to get the gem avoiding fire. Observation: Type: Num Observation Min Max 0 state 0 15 Actions: {0:'n',1:'e',2:'s',3:'w'} Reward: Fire -> -20 Gem -> 20 Invalid -> -1 Valid -> 0 No gem until MAX STEPS -> -50 Staring State: random position except 5,11,12,15 (obstruction, Fire, Gem) Episode Termination: Robot touch fire or gem or reach MAX STEPS. [11,12,15] """
全部代码如下,对原博文稍作修改,解决了几个小bug并适应AC算法的接口:
import logging
import random
import gym
import numpy as np
class GridEnv1(gym.Env):
metadata = {
'render.modes': ['human', 'rgb_array'],
'video.frames_per_second': 2
}
def __init__(self):
self.states = range(0,16) #状态空间0~15
self.observation_space=np.array([1])
self.x=[150,250,350,450] * 4
self.y=[450] * 4 + [350] * 4 + [250] * 40 + [150] * 4
self.terminate_states = dict() #终止状态为字典格式
self.terminate_states[11] = 1
self.terminate_states[12] = 1
self.terminate_states[15] = 1
self.action_space = {
0:'n',1:'e',2:'s',3:'w'}
self.rewards = dict(); #回报的数据结构为字典
self.rewards['8_s'] = -20.0
self.rewards['13_w'] = -20.0
self.rewards['7_s'] = -20.0
self.rewards['10_e'] = -20.0
self.rewards['14_e'] = 100.0
self.t = dict(); #状态转移的数据格式为字典
self.t['0_s'] = 4
self.t['0_e'] = 1
self.t['1_e'] = 2
self.t['1_w'] = 0
self.t['2_w'] = 1
self.t['2_e'] = 3
self.t['2_s'] = 6
self.t['3_w'] = 2
self.t['3_s'] = 7
self.t['4_n'] = 0
self.t['4_s'] = 8
self.t['6_n'] = 2
self.t['6_s'] = 10
self.t['6_e'] = 7
self.t['7_w'] = 6
self.t['7_n'] = 3
self.t['7_s'] = 11
self.t['8_n'] = 4
self.t['8_e'] = 9
self.t['8_s'] = 12
self.t['9_w'] = 8
self.t['9_e'] = 10
self.t['9_s'] = 13
self.t['10_w'] = 9
self.t['10_n'] = 6
self.t['10_e'] = 11
self.t['10_s'] = 14