官网:
https://github.com/openai/safety-gym
https://github.com/openai/safety-starter-agents
一、安装依赖环境配置
建议使用python 3.7及以下环境,因为官方的safety-rl是基于tensorflow1.13.1实现,而tensorflow1.13.1只能支持python3.7及以下。如果不用官方的safety-rl可以装python3.8以上。
1. MuJoCo安装(for Linux)
https://github.com/deepmind/mujoco
Mac M1无法安装,运行后会报错:
[1] 8409 illegal hardware instruction ./simulate
1.下载mujoco200:
https://www.roboti.us/download.html
点击mujoco200 linux,下载一个zip压缩包。
- 下载激活码(已被DeepMind收购,可以免费和激活)
https://www.roboti.us/license.html
点击Activation key,下载一个txt文件。
- 安装
在home目录下
mkdir ~/.mujoco # 创建.mujoco目录
cp mujoco200_linux.zip ~/.mujoco
cd ~/.mujoco
unzip mujoco200_linux.zip # 解压
mv mujoco200_linux mujoco200 # 这一步很重要
cp mjkey.txt ~/.mujoco/mujoco200/bin # 把激活码放到bin目录下
- 添加环境变量
vim ~/.bashrc
在最后添加下面两行:
export LD_LIBRARY_PATH=~/.mujoco/mujoco200/bin${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export MUJOCO_KEY_PATH=~/.mujoco${MUJOCO_KEY_PATH}
source ~/.bashrc
- 测试
cd ~/.mujoco/mujoco200/bin
./simulate ../model/humanoid.xml
出现下面这个界面表示安装成功。
2. 安装mujoco-py
https://github.com/deepmind/mujoco
- 安装
不同mujoco版本对应的mujoco-py版本
Mujoco150对应的Mujoco-py版本
(Windows系统只支持这个版本的mujoco,但是safety-gym以来mujoco_py==2.0.2.7及以上版本的,所有似乎Windows下不能用)
pip install mujoco-py==1.50.1.68
Mujoco200对应的Mujoco-py版本
pip install mujoco-py==2.0.2.8
Mujoco210对应的Mujoco-py版本
pip install mujoco-py==2.1.2.14
- 测试
import mujoco_py
import os
mj_path, _ = mujoco_py.utils.discover_mujoco()
xml_path = os.path.join(mj_path, 'model', 'humanoid.xml')
model = mujoco_py.load_model_from_path(xml_path)
sim = mujoco_py.MjSim(model)
print(sim.data.qpos)
# [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
sim.step()
print(sim.data.qpos)
# [-2.09531783e-19 2.72130735e-05 6.14480786e-22 -3.45474715e-06
# 7.42993721e-06 -1.40711141e-04 -3.04253586e-04 -2.07559344e-04
# 8.50646247e-05 -3.45474715e-06 7.42993721e-06 -1.40711141e-04
# -3.04253586e-04 -2.07559344e-04 -8.50646247e-05 1.11317030e-04
# -7.03465386e-05 -2.22862221e-05 -1.11317030e-04 7.03465386e-05
# -2.22862221e-05]
报错:
解决方案:
运行显示以下错误:
distutils.errors.CompileError:command′/usr/bin/gcc′failedwithexitcode
分发、编译错误,原因是缺少libosmesa6-dev、patchelf用以动态链接
第一步:安装libosmesa6-dev
sudo apt install libosmesa6-dev
安装好libosmesa6-dev,如若运行测试缺少如下包,便执行第二步
第二步:安装patchelf(两个命令选一个就行)
pip install patchelf sudo apt-get -y install patchelf
3. 安装safety-gym
https://github.com/openai/safety-gym
- 安装
git clone https://github.com/openai/safety-gym.git
cd safety-gym
pip install -e .
- 测试
import safety_gym
import gym
env = gym.make('Safexp-PointGoal1-v0')
4. 安装safe-rl
https://github.com/openai/safety-starter-agents
(这一步可不用,看你需不需要使用openai团队官方提供的算法)
cd safety-starter-agents
pip install -e .
注意:
-
建议单独pip依赖库,然后setup.py里注释掉,再运行
pip install -e .
-
安装
mpi4py==3.0.2
时也可能会报错,可以去掉版本号或安装mpi4py-3.1.4
。 -
安装tensorflow可能会出现
tensorflow==1.13.1
版本可能找不到(对于python3.8及以上),可以去掉版本号:pip install tensorflow
但是这样会有很多代码会报错,因为版本不同,很多方法修改删减了。
或者可以到下面网址下载对应版本tensorflow安装:
https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple/tensorflow/
windows下安装mujoco和mujoco_py参考:
Win 10、Win 11 安装 MuJoCo 及 mujoco-py 教程_lan 606的博客-CSDN博客_windows安装mujoco_py
二、测试案例
1. safety-gym测试案例
- 使用pre-configured 环境
import safety_gym
import gym
from tqdm import tqdm
def main():
robot = "Point" # Point | Car | Doggo
task = "Button" # Goal | Button | Push
level = "1" # 0 | 1 | 2
# env = gym.make('Safexp-PointGoal1-v0')
env = gym.make(f'Safexp-{robot}{task}{level}-v0')
print("Actin Space:", env.action_space)
print("Observation:", env.observation_space)
env.reset()
for i in tqdm(range(10000)):
env.render()
action = env.action_space.sample() # take a random action
next_observation, reward, done, info = env.step(action)
# print(f"[{i}] reward: {reward}, info: {info}")
if done:
env.reset()
if __name__ == "__main__":
main()
An environment in the Safety Gym benchmark suite is formed as a combination of a robot (one of Point
, Car
, or Doggo
), a task (one of Goal
, Button
, or Push
), and a level of difficulty (one of 0
, 1
, or 2
, with higher levels having more challenging constraints). Environments include:
Safexp-{Robot}Goal0-v0
: 机器人必须导航到目标。Safexp-{Robot}Goal1-v0
: 机器人必须导航到目标,同时避免危险。场景中有一个花瓶,但代理人不会因击中它而受到惩罚。Safexp-{Robot}Goal2-v0
: 机器人必须导航到目标,同时避免更多的危险和花瓶。Safexp-{Robot}Button0-v0
: 机器人必须按下目标按钮。Safexp-{Robot}Button1-v0
: 机器人必须按下目标按钮,同时避免危险和 gremlins,同时不要按下任何错误的按钮。Safexp-{Robot}Button2-v0
: 机器人必须按下目标按钮,同时避免更多的危险和 gremlins,同时不要按下任何错误的按钮。Safexp-{Robot}Push0-v0
: 机器人必须将盒子推向目标。Safexp-{Robot}Push1-v0
: 机器人必须将箱子推向目标,同时避免危险。场景中存在一根柱子,但智能体不会因击中它而受到惩罚。Safexp-{Robot}Push2-v0
: 机器人必须将箱子推向目标,同时避开更多的危险和柱子。
(To make one of the above, make sure to substitute {Robot}
for one of Point
, Car
, or Doggo
.)
- 自定义创建环境
import safety_gym
import gym
from safety_gym.envs.engine import Engine
from gym.envs.registration import register
config = {
'robot_base': 'xmls/car.xml',
'task': 'push',
'observe_goal_lidar': True,
'observe_box_lidar': True,
'observe_hazards': True,
'observe_vases': True,
'constrain_hazards': True,
'lidar_max_dist': 3,
'lidar_num_bins': 16,
'hazards_num': 4,
'vases_num': 4
}
env = Engine(config)
register(id='SafexpTestEnvironment-v0',
entry_point='safety_gym.envs.mujoco:Engine',
kwargs={'config': config})
env.reset()
for i in range(10000):
# action = env.sample()
env.render()
action = env.action_space.sample() # take a random action
next_observation, reward, done, info = env.step(action)
print(f"[{i}] reward: {reward}, info: {info}")
# print(info)
# break
if done:
env.reset()
env.close()
2. safety-rl测试案例
- Example Script
from safe_rl import ppo_lagrangian
import gym, safety_gym
ppo_lagrangian(
env_fn = lambda : gym.make('Safexp-PointGoal1-v0'),
ac_kwargs = dict(hidden_sizes=(64,64))
)
- Reproduce Experiments from Paper
cd /path/to/safety-starter-agents/scripts
python experiment.py --algo ALGO --task TASK --robot ROBOT --seed SEED
--exp_name EXP_NAME --cpu CPU
其中
ALGO
is in['ppo', 'ppo_lagrangian', 'trpo', 'trpo_lagrangian', 'cpo']
.TASK
is in['goal1', 'goal2', 'button1', 'button2', 'push1', 'push2']
.ROBOT
is in['point', 'car', 'doggo']
.SEED
is an integer. In the paper experiments, we used seeds of 0, 10, and 20, but results may not reproduce perfectly deterministically across machines.CPU
is an integer for how many CPUs to parallelize across.
EXP_NAME
is an optional argument for the name of the folder where results will be saved. The save folder will be placed in /path/to/safety-starter-agents/data
例如:
python experiment.py --algo ppo--task goal1--robot point--seed 1024
--exp_name project --cpu 1
报错:
解决:
https://blog.csdn.net/qq_42951560/article/details/124997453
pip uninstall protobuf
pip install protobuf==3.20.1