openai_ros教程( ros gazebo 深度强化学习)

7 篇文章 11 订阅
4 篇文章 0 订阅

一、环境搭建

测试环境:ubuntu16.04,kinetic

下载openai_ros

git clone https://bitbucket.org/theconstructcore/openai_ros.git

openai_ros相关的依赖:

message_runtime rospy gazebo_msgs std_msgs geometry_msgs controller_manager_msgs

 例如你没有相关依赖就会报错:

安装controller_manager_msgs:

sudo apt-get install ros-kinetic-controller-manager-msgs

相关例子: 

 

二、用openai_ros创建一个TurtleBot训练脚本

在catkin_ws / src目录下创建一个my_turtlebot2_training包:

再创建一个start_training.py的Python文件, 并添加可执行权限 chmod +x:

#!/usr/bin/env python

import gym
import numpy
import time
import qlearn
from gym import wrappers
# ROS packages required
import rospy
import rospkg
from openai_ros.openai_ros_common import StartOpenAI_ROS_Environment


if __name__ == '__main__':

    rospy.init_node('example_turtlebot2_maze_qlearn',
                    anonymous=True, log_level=rospy.WARN)

    # Init OpenAI_ROS ENV
    task_and_robot_environment_name = rospy.get_param(
        '/turtlebot2/task_and_robot_environment_name')
    env = StartOpenAI_ROS_Environment(
        task_and_robot_environment_name)
    # Create the Gym environment
    rospy.loginfo("Gym environment done")
    rospy.loginfo("Starting Learning")

    # Set the logging system
    rospack = rospkg.RosPack()
    pkg_path = rospack.get_path('turtle2_openai_ros_example')
    outdir = pkg_path + '/training_results'
    env = wrappers.Monitor(env, outdir, force=True)
    rospy.loginfo("Monitor Wrapper started")

    last_time_steps = numpy.ndarray(0)

    # Loads parameters from the ROS param server
    # Parameters are stored in a yaml file inside the config directory
    # They are loaded at runtime by the launch file
    Alpha = rospy.get_param("/turtlebot2/alpha")
    Epsilon = rospy.get_param("/turtlebot2/epsilon")
    Gamma = rospy.get_param("/turtlebot2/gamma")
    epsilon_discount = rospy.get_param("/turtlebot2/epsilon_discount")
    nepisodes = rospy.get_param("/turtlebot2/nepisodes")
    nsteps = rospy.get_param("/turtlebot2/nsteps")

    running_step = rospy.get_param("/turtlebot2/running_step")

    # Initialises the algorithm that we are going to use for learning
    qlearn = qlearn.QLearn(actions=range(env.action_space.n),
                           alpha=Alpha, gamma=Gamma, epsilon=Epsilon)
    initial_epsilon = qlearn.epsilon

    start_time = time.time()
    highest_reward = 0

    # Starts the main training loop: the one about the episodes to do
    for x in range(nepisodes):
        rospy.logdebug("############### WALL START EPISODE=>" + str(x))

        cumulated_reward = 0
        done = False
        if qlearn.epsilon > 0.05:
            qlearn.epsilon *= epsilon_discount

        # Initialize the environment and get first state of the robot
        observation = env.reset()
        state = ''.join(map(str, observation))

        # Show on screen the actual situation of the robot
        # env.render()
        # for each episode, we test the robot for nsteps
        for i in range(nsteps):
            rospy.logwarn("############### Start Step=>" + str(i))
            # Pick an action based on the current state
            action = qlearn.chooseAction(state)
            rospy.logwarn("Next action is:%d", action)
            # Execute the action in the environment and get feedback
            observation, reward, done, info = env.step(action)

            rospy.logwarn(str(observation) + " " + str(reward))
            cumulated_reward += reward
            if highest_reward < cumulated_reward:
                highest_reward = cumulated_reward

            nextState = ''.join(map(str, observation))

            # Make the algorithm learn based on the results
            rospy.logwarn("# state we were=>" + str(state))
            rospy.logwarn("# action that we took=>" + str(action))
            rospy.logwarn("# reward that action gave=>" + str(reward))
            rospy.logwarn("# episode cumulated_reward=>" +
                          str(cumulated_reward))
            rospy.logwarn(
                "# State in which we will start next step=>" + str(nextState))
            qlearn.learn(state, action, reward, nextState)

            if not (done):
                rospy.logwarn("NOT DONE")
                state = nextState
            else:
                rospy.logwarn("DONE")
                last_time_steps = numpy.append(last_time_steps, [int(i + 1)])
                break
            rospy.logwarn("############### END Step=>" + str(i))
            #raw_input("Next Step...PRESS KEY")
            # rospy.sleep(2.0)
        m, s = divmod(int(time.time() - start_time), 60)
        h, m = divmod(m, 60)
        rospy.logerr(("EP: " + str(x + 1) + " - [alpha: " + str(round(qlearn.alpha, 2)) + " - gamma: " + str(
            round(qlearn.gamma, 2)) + " - epsilon: " + str(round(qlearn.epsilon, 2)) + "] - Reward: " + str(
            cumulated_reward) + "     Time: %d:%02d:%02d" % (h, m, s)))

    rospy.loginfo(("\n|" + str(nepisodes) + "|" + str(qlearn.alpha) + "|" + str(qlearn.gamma) + "|" + str(
        initial_epsilon) + "*" + str(epsilon_discount) + "|" + str(highest_reward) + "| PICTURE |"))

    l = last_time_steps.tolist()
    l.sort()

    # print("Parameters: a="+str)
    rospy.loginfo("Overall score: {:0.2f}".format(last_time_steps.mean()))
    rospy.loginfo("Best 100 score: {:0.2f}".format(
        reduce(lambda x, y: x + y, l[-100:]) / len(l[-100:])))

    env.close()

创建了一个强化学习训练循环,并使用 Q-learning 强化学习进行训练。

 

每次培训都有一个关联的配置文件,其中包含该任务所需的参数,在包中,创建一个名为config的新文件夹,并在其中创建一个名为my_turtlebot2_maze_params.yaml的新文件:

turtlebot2: #namespace
    task_and_robot_environment_name: 'MyTurtleBot2Wall-v0'
    ros_ws_abspath: "/home/user/simulation_ws"
    running_step: 0.04 # amount of time the control will be executed
    pos_step: 0.016     # increment in position for each command
    
    #qlearn parameters
    alpha: 0.1
    gamma: 0.7
    epsilon: 0.9
    epsilon_discount: 0.999
    nepisodes: 500
    nsteps: 10000
    running_step: 0.06 # Time for each step

参数分为两个不同的部分:

环境相关参数:取决于你选择的RobotEnvironmentTaskEnvironment

RL算法参数:那些取决于你选择的RL算法

 

创建启动文件夹。在启动文件夹中,创建一个名为start_training.launch的新文件:

<?xml version="1.0" encoding="UTF-8"?>
<launch>
    <!-- This version uses the openai_ros environments -->
    <rosparam command="load" file="$(find my_turtlebot2_training)/config/turtlebot2_openai_qlearn_params_wall_v2.yaml" />
    <!-- Launch the training system -->
    <node pkg="my_turtlebot2_training" name="turtlebot2_maze" type="my_start_qlearning_wall_v2.py" output="screen"/>
</launch>

在终端中启动代码:

roslaunch my_turtlebot2_training start_training.launch

其中会遇到的问题:

未安装gym模块 ImportError: No module named gym


git clone https://github.com/openai/gym
cd gym
pip install -e .

缺qlearn 

 

 

 

 

 

 

 

 

 

  • 11
    点赞
  • 107
    收藏
    觉得还不错? 一键收藏
  • 15
    评论
ROS系统中,可以通过以下步骤来创建一个功能包并导入依赖包:urdf、xacro、gazebo_rosgazebo_ros_control、gazebo_plugins。 1. 打开终端,进入你想要创建功能包的路径,例如:`cd ~/catkin_ws/src` 2. 创建一个新的功能包,例如:`catkin_create_pkg my_package rospy urdf xacro gazebo_ros gazebo_ros_control gazebo_plugins` 这个命令会在当前路径下创建一个名为`my_package`的功能包,并且指定了该功能包所依赖的其他功能包,包括`rospy`、`urdf`、`xacro`、`gazebo_ros`、`gazebo_ros_control`、`gazebo_plugins`。 3. 进入功能包的目录:`cd my_package` 4. 创建一个`urdf`文件,例如:`touch my_robot.urdf` 5. 编辑`my_robot.urdf`文件,添加你的机器人模型信息 6. 创建一个`xacro`文件,例如:`touch my_robot.xacro` 7. 编辑`my_robot.xacro`文件,添加你的机器人模型信息 8. 在`my_robot.urdf`或`my_robot.xacro`文件中,添加`gazebo`所需的插件和控制器等信息,例如: ``` <gazebo> <plugin name="my_plugin" filename="libmy_plugin.so"/> <plugin name="my_controller" filename="libmy_controller.so"/> </gazebo> ``` 9. 在`package.xml`文件中添加`gazebo_ros`、`gazebo_ros_control`、`gazebo_plugins`的依赖信息,例如: ``` <depend>gazebo_ros</depend> <depend>gazebo_ros_control</depend> <depend>gazebo_plugins</depend> ``` 10. 编译你的功能包:`catkin_make` 现在,你已经成功创建了一个功能包,并且导入了`urdf`、`xacro`、`gazebo_ros`、`gazebo_ros_control`、`gazebo_plugins`等依赖包。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 15
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值