按照spiningup我们学习DRL,链接
https://github.com/openai/gym
https://github.com/openai/baselines
1. 安装anaconda(为了方便包管理)
参考:https://docs.continuum.io/anaconda/install/linux/
首先安装必要依赖:apt-get install libgl1-mesa-glx libegl1-mesa libxrandr2 libxrandr2 libxss1 libxcursor1 libxcomposite1 libasound2 libxi6 libxtst6
然后sha256sum /path/filename 检查是否和https://docs.continuum.io/anaconda/install/hashes/的相应系统版本的sha256相同,一般相同。
然后bash ~/Downloads/Anaconda3-xxxxx-Linux-x86_64.sh
默认安装即可
然后conda create -n spinningup python=3.6创建强化学习的环境。
source activate spinningup激活环境。
2.Installing OpenMPI(高性能消息传递库)
sudo apt-get update && sudo apt-get install libopenmpi-dev
3.Installing Spinning Up
提示Spinning Up defaults to installing everything in Gym except the MuJoCo environments
git clone https://github.com/openai/spinningup.git
cd spinningup
pip install -e .
可以看到spinningup安装的时候gym-0.15.7依赖被安装了
(在spinningup的环境下)
测试:
python -m spinup.run ppo --hid “[32,32]” --env LunarLander-v2 --exp_name installtest --gamma 0.999
训练十分钟后
python -m spinup.run test_policy data/installtest/installtest_s0
python -m spinup.run plot data/installtest/installtest_s0
安装mujoco
https://blog.csdn.net/farm_coder/article/details/90295093
安装mujoco-py
pip install -U ‘mujoco-py<2.1,>=2.0’ 即可
我遇到了ImportError: No module named ‘fasteners’
原因是我刚刚使用pip3 install -U ‘mujoco-py<2.1,>=2.0’,而,pip3是系统自带的python3.5的pip所以官方的pip3需要修改成pip,使用conda的pip。
然后提示
Please add following line to .bashrc:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/nvidia-384
添加之后遇到/tmp/pip-install-amwwpr1j/mujoco-py/mujoco_py/gl/eglshim.c:4:21: fatal error: GL/glew.h: No such file or directory
解决方案:https://github.com/openai/mujoco-py/issues/180
sudo apt-get install libglew-dev之后完美解决并且在python交互界面使用OK
如果你按照https://github.com/openai/mujoco-py的readme安装好了mujoco2.00以及mujoco-py2.0之后,你一般来说可以运行github上的历程代码。
安装其他的游戏环境
然后pip install gym[mujoco,robotics]
这里显示
You appear to be missing MuJoCo. We expected to find the file here: /home/asber/.mujoco/mjpro150
但是我在python交互的时候mujoco_py.utils.discover_mujoco()
提示的是(’/home/asber/.mujoco/mujoco200’, ‘/home/asber/.mujoco/mjkey.txt’)很明显可以找到
(spinningup) asber@asber-X550VX:~/Documents/RL/spinningup$ pip show gym
Name: gym
Version: 0.15.7
Summary: The OpenAI Gym: A toolkit for developing and comparing your reinforcement learning agents.
Home-page: https://github.com/openai/gym
Author: OpenAI
Author-email: gym@openai.com
License: UNKNOWN
Location: /home/asber/anaconda3/envs/spinningup/lib/python3.6/site-packages
Requires: pyglet, scipy, six, numpy, cloudpickle
Required-by: baselines, spinup
解决方案:pip install gym[all]==0.15.3
心路历程:
按照https://github.com/openai/mujoco-py/issues/477的提示把pip install的mujoco-py uninstall 了之后再git clone https://github.com/openai/mujoco-py.git安装,然后export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libGLEW.so 最后 pip install gym[all]0.15.3成功
但是后来我发现不需要重新安装mujoco-py,只需要gym0.15.3即可,只是这个gym其实和baseline不兼容,可是>=0.15.3的会报错。
验证是否安装成功:
python -m spinup.run ppo --hid “[32,32]” --env Walker2d-v2 --exp_name mujocotest
提示ERROR: Could not read activation key,这个错误把MUJOCO_PY_MJKEY_PATH修改成正确地址即可(包含key文件名的地址)
最后很神奇的是
ERROR: spinup 0.2.0 has requirement gym[atari,box2d,classic_control]~=0.15.3, but you’ll have gym 0.17.1 which is incompatible.
ERROR: baselines 0.1.6 has requirement gym<0.16.0,>=0.15.4, but you’ll have gym 0.17.1 which is incompatible.
千辛万苦终于搭建好环境,到时候再看最后两个怎么解决。
不过还有如何读懂源码,使用代码需要了解。
但是感觉现在还没有摸清楚如何学习spinningup代码结构,以及里面的算法实现,如何自己搭建环境以及应用算法
强化学习仿真环境gym搭建
有用资料:
- openAI spinningup
- 强化学习环境-Gym安装到使用入门
- 强化学习简书专栏
书籍推荐:强化学习精要
视频课程推荐:https://blog.csdn.net/kuizhao8951/article/details/102725894
入门:莫烦python中RL的实践笔记
入门到入土资料:https://zhuanlan.zhihu.com/p/34918639
下面内容主要reference:强化学习实战 第一讲 gym学习及二次开发和《强化学习精要》中关于GYM的demo以及知乎提问
env有三个核心方法
1.reset:重新开始人物,并且将环境设置成任务初始的状态s0s0
2.step(a):传入参数a表示动作,环境根据行动产生下一刻观测、奖励、任务是否结束等信息(计算-物理引擎)
3.render:输出当前状态(可视化-图像引擎)
自己写蛇棋environment的例子
特点 所有的环境都需要继承gym.Env并且实现reset、setp至少这两个函数
import numpy as np
import gym
from gym.spaces import Discrete
class SnakeEnv(gym.Env):
SIZE=100 # 100 states in total 蛇行棋一共一百格
'''
How Environment build
Args:
(int)ladder_num:number of ladder 多少个梯子
(list)dices:max number every way to throw dice can acheive 扔骰子的方式
Returns:
None
Example:
SnakeEnv(10, [3,6])
Generate 10 ladders randomly and give two ways to throw dice,one can throw out number between 1 and 3,the other one can throw out number between 1 and 6
'''
def __init__(self, ladder_num, dices):
self.ladder_num = ladder_num
self.dices = dices
self.ladders = dict(np.random.randint(1, self.SIZE, size=(self.ladder_num, 2)))
self.observation_space=Discrete(self.SIZE+1)
self.action_space=Discrete(len(dices))
print('ladders info:')
print(self.ladders)
print('dice ranges:')
print(self.dices)
self.pos = 1
'''
Reset Enviroment
Args:
None
Returns:
(int)pos:the initial position of enviroment
Example:
Env.reset()
'''
def reset(self):
self.pos = 1
return self.pos
'''
According given action culculate posible next status and reward and if it's end and other info
Args:
(int)the number which strategy do you choose
Returns:
(int)pos:the next position of enviroment
(int)reward:reward based on status and action
(bool)done:if the game is over
(None){}:infomation-here we do not pass any info
Example:
Env.step(0)
choose first strategy as my action
'''
def step(self, a):
step = np.random.randint(1, self.dices[a] + 1)
self.pos += step
if self.pos == 100:
return 100, 100, 1, {}
elif self.pos > 100:
self.pos = 200 - self.pos
if self.pos in self.ladders:
self.pos = self.ladders[self.pos]
return self.pos, -1, 0, {}
def reward(self, s):
if s == 100:
return 100
else:
return -1
def render(self):
pass
if __name__ == '__main__':
env = SnakeEnv(10, [3,6])
env.reset()
while True:
state, reward, terminate, _ = env.step(0)
print(reward, state)
if terminate == 1:
break
那这里只是同一个文件coding的环境,那么如何像gym里一样封装起来呢?
https://zhuanlan.zhihu.com/p/26985029
这里教会我们如何写demo,并且提供了一个demo,然后就是放入gym的文件夹,但是这里记住,gym的文件夹不是自己随便git clone下来的
而是pip 安装的时候存放的目录,我们使用pip show 包名 可以看到gym所在的地点(我这里为/home/asber/anaconda3/envs/spinningup/lib/python3.6/site-packages)
cd /home/asber/anaconda3/envs/spinningup/lib/python3.6/site-packages/gym
cp ~/Desktop/reinforcement-learning-code-master/第一讲\ \ gym\ 学习及二次开发/grid_mdp.py envs/classic_control/
cd env/classic_control
gedit init.py
from gym.envs.classic_control.cartpole import CartPoleEnv
from gym.envs.classic_control.mountain_car import MountainCarEnv
from gym.envs.classic_control.continuous_mountain_car import Continuous_MountainCarEnv
from gym.envs.classic_control.pendulum import PendulumEnv
from gym.envs.classic_control.acrobot import AcrobotEnv
from gym.envs.classic_control.grid_mdp import GridEnv
最后加上from gym.envs.classic_control.grid_mdp import GridEnv
cd …
gedit init.py
最后加上
register(
id='GridWorld-v0',
entry_point='gym.envs.classic_control:GridEnv',
max_episode_steps=200,
reward_threshold=100.0,
)
python
import gym
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/asber/anaconda3/envs/spinningup/lib/python3.6/site-packages/gym/__init__.py", line 10, in <module>
from gym.spaces import Space
File "/home/asber/anaconda3/envs/spinningup/lib/python3.6/site-packages/gym/spaces/__init__.py", line 1, in <module>
from gym.spaces.space import Space
File "/home/asber/anaconda3/envs/spinningup/lib/python3.6/site-packages/gym/spaces/space.py", line 1, in <module>
from gym.utils import seeding
File "/home/asber/anaconda3/envs/spinningup/lib/python3.6/site-packages/gym/utils/seeding.py", line 2, in <module>
import numpy as np
File "/home/asber/anaconda3/envs/spinningup/lib/python3.6/site-packages/numpy/__init__.py", line 228, in <module>
from .testing import Tester
File "/home/asber/anaconda3/envs/spinningup/lib/python3.6/site-packages/numpy/testing/__init__.py", line 10, in <module>
from unittest import TestCase
ImportError: cannot import name 'TestCase'
会出现这个问题,但是其实我发现这个是因为最后from unittest import TestCase失败,unittest可以import 但是无法from unittest import TestCase。
然后我asber@asber-X550VX:~/anaconda3/envs/spinningup/lib/python3.6/site-packages/numpy/testing$ gedit init.py
#from unittest import TestCase
然后我发现
我uninstall gym之后import numpy没有报错并且from unittest import TestCase也是OK的
但是一安装gym就会这样报错
看来应该是原来找pkg的时候找到的是numpy的unittest现在找到gym的unittest导致错误
我看到这一条:https://blog.csdn.net/sinat_42483341/article/details/103706560?depth_1-utm_source=distribute.pc_relevant.none-task&utm_source=distribute.pc_relevant.none-task
我顿悟!原来现在的路径下有unittest了所以python编译器直接去找他,导致一直import numpy错误
这。。。。python方便是方便但是有时候有点太随心所欲了。
在gym0.15.4版本下mujoco-py的安装会出现各种redefined和其他错误
error: command ‘gcc’ failed with exit status 1
ERROR: Failed building wheel for mujoco-py
看来mujoco-py2.0和gym0.15.4有冲突,不过这时候我执行
python -m spinup.run ppo --hid “[32,32]” --env Walker2d-v2 --exp_name mujocotest
这个可以测试mujoco,测试通过,虽然上面这里报错,但是不影响使用。
我们按照上面的方法cd到主目录然后执行发现这个代码有点错误,所以会报错,我们按照同样的方法,
参考:https://zhuanlan.zhihu.com/p/33553076
and 机器人强化学习之使用 OpenAI Gym 教程与笔记
import gym
env = gym.make(‘Car2D-v0’)
env.reset()
array([ 5., -1.])
可以看出的确是这样创建环境的。然后我将地一个gym构建环境教程(也就是本文的代码修改,将_reset 等下划线去掉之后)也顺利了
分享一些我在资料寻找中看到的好玩的东西