Unit 2: How to apply openai_ros to a new robot
SUMMARY |
---|
Estimated time to completion: 10 minutes
In the previous unit, we introduced you the openai_ros package and how it works to train a robot with reinforcement learning.
In the following units, you will learn how to apply the openai_ros package to your own robot.
This unit is just a presentation of the robot we are going to use. We will use as an example the cube robot. However, you will see generic definitions so you can equally apply the method to any other robot.
END OF SUMMARY |
---|
The moving cube robot
This robot was developed by the ETHZurich and it’s a perfect platform for control theory and mechanical physics. You can find more information in this link: Cubli Research
In order to see the video below, select the next cell of the notebook and press the play button.
from IPython.display import YouTubeVideo
# Cubli Robot created By ETHZurich
# Video credit: William Stein.
YouTubeVideo('n_6p-1J551Y')
【注意】
油管上面可以看这个视频,搜’n_6p-1J551Y’即可
另外B站也可,观看链接
The simulation of the robot
We have created a simulation of the Cubli robot, the one you can see on the simulation window. This first simulated version has only One inertia disk, but it s enough to the goal we want to achieve.
The goal we want the robot to learn
We want to make this robot be able to walk around in the direction that we want. And it has to do it by learning by itself, not by using mathematical calculations as was done with the original.
So, the objective is to make the Cubli robot learn how to move forwards in the WORLD Y-AXIS direction.
In the next units of this course, we are going to learn how to build all the software parts to achieve that goal
Structure of the next units:
- On unit 3, you are going to create a new Robot Environment for the Cubli that allows you to access the sensors and actuators.
- On unit 4, you are going to create the Training Environment that inherits from the Robot Environment that you created. You will use this Environment to define the reward and the conditions to detect when the task is done. You will also use it to provide to the training algorithm the vector of observations, as well as to provide to the Robot Environment the actual commands to move based on the action decided by the training algorithm.
- On unit 5, you will create a Training Script for Qlearning and deepQ that uses your created Task Environment.
Switch to the next unit learn how to create a new Robot Environment for a new Robot