OpenAI Gym is a toolkit for developing and comparing reinforcement learning algorithms.
Install
git clone https://github.com/openai/gym
cd gym
sudo pip install -e .
That’s minimal install. You can also try full install.
A Demo
import gym
env = gym.make('CartPole-v0')
env.reset()
for _ in range(1000):
env.render()
env.step(env.action_space.sample())
This is just a demo to verify that your gym works well.
Observation
import gym
env = gym.make('CartPole-v0')
for i_episode in range(20):
observation = env.reset()
for t in range(100):
env.render()
print(observation)
action = env.action_space.sample()
observation, reward, done, info = env.step(action)
if done:
print("Episode finished after {} timesteps".format(t+1))
break
The environment’s step function returns exactly what we need. In fact, step returns four values. These are:
- observation(object): An environment-specific object representing your observation of the environment, i.e. the state.
- reward(float): Rewards achieved by the previous action.
- done(boolean): The sign of the termination of an episode.
- info(dict): Diagnostic information useful for debugging.
Every environment comes with first-class Space objects that describe the valid actions and observations.
print(env.action_space)
#> Discrete(2)
print(env.observation_space)
#> Box(4,)
print(env.observation_space.high)
#> array([ 2.4 , inf, 0.20943951, inf])
print(env.observation_space.low)
#> array([-2.4 , -inf, -0.20943951, -inf])
The Discrete space allows a fixed range of non-negative numbers, so in this case valid actions are either 0 or 1. The Box space represents an n-dimensional box, so valid observations will be an array of 4 numbers. Box and Discrete are the most common Spaces. You can sample from a Space or check that something belongs to it:
from gym import spaces
space = spaces.Discrete(8) # Set with 8 elements {0, 1, 2, ..., 7}
x = space.sample()
assert space.contains(x)
assert space.n == 8
Environment
from gym import envs
print(envs.registry.all())
This will give you a list of EnvSpecs
Record & Update
Wrap your environment with a Monitor Wrapper as follows:
import gym
from gym import wrappers
env = gym.make('CartPole-v0')
env = wrappers.Monitor(env, '/tmp/cartpole-experiment-1')
for i_episode in range(20):
observation = env.reset()
for t in range(100):
env.render()
print(observation)
action = env.action_space.sample()
observation, reward, done, info = env.step(action)
if done:
print("Episode finished after {} timesteps".format(t+1))
break
You may install ffmpeg firstly:
sudo apt-get install ffmpeg
You can then upload your results to OpenAI Gym:
import gym
gym.upload('/tmp/cartpole-experiment-1', api_key='YOUR_API_KEY')