OpenAI Gym中FrozenLake环境（场景）源码分析（1）

蓝天居士

已于 2023-07-14 14:44:15 修改

阅读量461

点赞数

分类专栏：强化学习 OpenAI Gym 文章标签： OpenAI Gym 强化学习 Q-learning

于 2023-07-14 14:41:26 首次发布

本文链接：https://blog.csdn.net/phmatthaus/article/details/131722365

版权

强化学习同时被 2 个专栏收录

11 篇文章 1 订阅

订阅专栏

OpenAI Gym

7 篇文章 0 订阅

订阅专栏

针对于OpenAI Gym中FrozenLake（冻湖）环境（场景）的示例代码网上有很多，如下代码就是其中比较经典的：

import numpy as np
import gym
import random
import time
from IPython.display import clear_output

env = gym.make("FrozenLake-v1")

observation_space = env.observation_space
print("The observation space: {}".format(observation_space))
observation_space_size = env.observation_space.n
print(observation_space_size)

action_space = env.action_space
print("The action space: {}".format(action_space))
action_space_size = env.action_space.n
print(action_space_size)

q_table = np.zeros((observation_space_size, action_space_size))
# q_table = np.zeros([observation_space_size, action_space_size])
print(q_table)

"""
num_episodes = 10000
max_steps_per_episode = 100

learning_rate = 0.1
discount_rate = 0.99

exploration_rate = 1
max_exploration_rate = 1
min_exploration_rate = 0.01
exploration_decay_rate = 0.01
"""

total_episodes = 15000        # Total episodes 训练次数
learning_rate = 0.8           # Learning rate 学习率
max_steps = 99                # Max steps per episode 一次训练中最多决策次数
gamma = 0.95                  # Discounting rate 折扣率，对未来收益的折扣

# Exploration parameters
epsilon = 1.0                 # Exploration rate 探索率，就是选择动作时，随机选择动作的概率
max_epsilon = 1.0             # Exploration probability at start 初始探索率
min_epsilon = 0.01            # Minimum exploration probability 最低探索率
decay_rate = 0.001            # Exponential decay rate for exploration prob 探索率消减的指数

# List of rewards
rewards = []

# For life or until learning is stopped
for episode in range(total_episodes):
    # Reset the environment
    state = env.reset()
    state = state[0] #本来没这条代码，但是我看这个是二元组，为了后面估计Q值可以跑，我就改成这个了，我看着是不影响的
    step = 0
    done = False
    total_rewards = 0

    for step in range(max_steps):
        # Choose an action a in the current world state (s)
        ## First we randomize a number
        exp_exp_tradeoff = random.uniform(0, 1)
        
        ## If this number > greater than epsilon --> exploitation (taking the biggest Q value for this state)
        if exp_exp_tradeoff > epsilon:
            action = np.argmax(q_table[state,:])

        # Else doing a random choice --> exploration
        else:
            action = env.action_space.sample()

        # Take the action (a) and observe the outcome state(s') and reward (r)
        new_state, reward, done, truncated, info = env.step(action) # 这个也是，刚开始报错，来后我查了新的库这个函数输出五个数，网上说最后那个加‘_’就行
        #new_state, reward, done, info, _ = env.step(action) # 这个也是，刚开始报错，来后我查了新的库这个函数输出五个数，网上说最后那个加‘_’就行

        # Update Q(s,a):= Q(s,a) + lr [R(s,a) + gamma * max Q(s',a') - Q(s,a)]
        # qtable[new_state,:] : all the actions we can take from new state
        q_table[state, action] = q_table[state, action] + learning_rate * (reward + gamma * np.max(q_table[new_state, :]) - q_table[state, action])
        
        total_rewards += reward
        
        # Our new state is state
        state = new_state
        
        # If done (if we're dead) : finish episode
        if done == True: 
            break
        
        #if truncated == True:
            #break
        
    # Reduce epsilon (because we need less and less exploration) 随着智能体对环境熟悉程度增加，可以减少对环境的探索
    epsilon = min_epsilon + (max_epsilon - min_epsilon)*np.exp(-decay_rate*episode) 
    rewards.append(total_rewards)

print ("Score over time: " +  str(sum(rewards)/total_episodes))
print(q_table)

另外，也有很不错的讲解视频（不过是英语解说），链接如下：

【吴长星精选系列】用于 Q-learning 的 OpenAI Gym 和 Python - 强化学习代码项目OpenAI Gym and Python for_哔哩哔哩_bilibili

【吴长星精选系列】用 Python 训练 Q-learning Agent - 强化学习代码项目Train Q-learning Agent with Pyth_哔哩哔哩_bilibili

【吴长星精选系列】观看 Q-learning Agent Play Game with Python - Reinforcement Learning Code_哔哩哔哩_bilibili

这个系列视频中把如何基于OpenAI Gym中的FrozenLake框架编写应用代码交代得清清楚楚。

不论是上边的例程还是视频中的示例代码，都只是用FrozenLake库（模块）的代码，并没有深入到库的底层实现，即底层是如何实现该功能的。那么本文就来带领大家深入了解一下底层的代码实现。

要了解底层代码，先得知道它具体在什么位置。在笔者之前的文章OpenAI Gym入门与实操（1）_蓝天居士的博客-CSDN博客

中通过pip install gym命令下载安装了OpenAI Gym，并且又通过pip install gym[all]命令安装了全部环境。安装完成后OpenAI Gym的存放路径为用户目录下的“.local/python3.xx/site-packages/gym”，笔者电脑上的实际路径即及内容如下：

$ ls ~/.local/lib/python3.11/site-packages/gym
core.py  error.py     logger.py    py.typed  utils   version.py
envs     __init__.py  __pycache__  spaces    vector  wrappers

$ tree ~/.local/lib/python3.11/site-packages/gym
/home/penghao/.local/lib/python3.11/site-packages/gym
├── core.py
├── envs
│   ├── box2d
│   │   ├── bipedal_walker.py
│   │   ├── car_dynamics.py
│   │   ├── car_racing.py
│   │   ├── __init__.py
│   │   ├── lunar_lander.py
│   │   └── __pycache__
│   │       ├── bipedal_walker.cpython-311.pyc
│   │       ├── car_dynamics.cpython-311.pyc
│   │       ├── car_racing.cpython-311.pyc
│   │       ├── __init__.cpython-311.pyc
│   │       └── lunar_lander.cpython-311.pyc
│   ├── classic_control
│   │   ├── acrobot.py
│   │   ├── assets
│   │   │   └── clockwise.png
│   │   ├── cartpole.py
│   │   ├── continuous_mountain_car.py
│   │   ├── __init__.py
│   │   ├── mountain_car.py
│   │   ├── pendulum.py
│   │   ├── __pycache__
│   │   │   ├── acrobot.cpython-311.pyc
│   │   │   ├── cartpole.cpython-311.pyc
│   │   │   ├── continuous_mountain_car.cpython-311.pyc
│   │   │   ├── __init__.cpython-311.pyc
│   │   │   ├── mountain_car.cpython-311.pyc
│   │   │   ├── pendulum.cpython-311.pyc
│   │   │   └── utils.cpython-311.pyc
│   │   └── utils.py
│   ├── __init__.py
│   ├── mujoco
│   │   ├── ant.py
│   │   ├── ant_v3.py
│   │   ├── ant_v4.py
│   │   ├── assets
│   │   │   ├── ant.xml
│   │   │   ├── half_cheetah.xml
│   │   │   ├── hopper.xml
│   │   │   ├── humanoidstandup.xml
│   │   │   ├── humanoid.xml
│   │   │   ├── inverted_double_pendulum.xml
│   │   │   ├── inverted_pendulum.xml
│   │   │   ├── point.xml
│   │   │   ├── pusher.xml
│   │   │   ├── reacher.xml
│   │   │   ├── swimmer.xml
│   │   │   └── walker2d.xml
│   │   ├── half_cheetah.py
│   │   ├── half_cheetah_v3.py
│   │   ├── half_cheetah_v4.py
│   │   ├── hopper.py
│   │   ├── hopper_v3.py
│   │   ├── hopper_v4.py
│   │   ├── humanoid.py
│   │   ├── humanoidstandup.py
│   │   ├── humanoidstandup_v4.py
│   │   ├── humanoid_v3.py
│   │   ├── humanoid_v4.py
│   │   ├── __init__.py
│   │   ├── inverted_double_pendulum.py
│   │   ├── inverted_double_pendulum_v4.py
│   │   ├── inverted_pendulum.py
│   │   ├── inverted_pendulum_v4.py
│   │   ├── mujoco_env.py
│   │   ├── mujoco_rendering.py
│   │   ├── pusher.py
│   │   ├── pusher_v4.py
│   │   ├── __pycache__
│   │   │   ├── ant.cpython-311.pyc
│   │   │   ├── ant_v3.cpython-311.pyc
│   │   │   ├── ant_v4.cpython-311.pyc
│   │   │   ├── half_cheetah.cpython-311.pyc
│   │   │   ├── half_cheetah_v3.cpython-311.pyc
│   │   │   ├── half_cheetah_v4.cpython-311.pyc
│   │   │   ├── hopper.cpython-311.pyc
│   │   │   ├── hopper_v3.cpython-311.pyc
│   │   │   ├── hopper_v4.cpython-311.pyc
│   │   │   ├── humanoid.cpython-311.pyc
│   │   │   ├── humanoidstandup.cpython-311.pyc
│   │   │   ├── humanoidstandup_v4.cpython-311.pyc
│   │   │   ├── humanoid_v3.cpython-311.pyc
│   │   │   ├── humanoid_v4.cpython-311.pyc
│   │   │   ├── __init__.cpython-311.pyc
│   │   │   ├── inverted_double_pendulum.cpython-311.pyc
│   │   │   ├── inverted_double_pendulum_v4.cpython-311.pyc
│   │   │   ├── inverted_pendulum.cpython-311.pyc
│   │   │   ├── inverted_pendulum_v4.cpython-311.pyc
│   │   │   ├── mujoco_env.cpython-311.pyc
│   │   │   ├── mujoco_rendering.cpython-311.pyc
│   │   │   ├── pusher.cpython-311.pyc
│   │   │   ├── pusher_v4.cpython-311.pyc
│   │   │   ├── reacher.cpython-311.pyc
│   │   │   ├── reacher_v4.cpython-311.pyc
│   │   │   ├── swimmer.cpython-311.pyc
│   │   │   ├── swimmer_v3.cpython-311.pyc
│   │   │   ├── swimmer_v4.cpython-311.pyc
│   │   │   ├── walker2d.cpython-311.pyc
│   │   │   ├── walker2d_v3.cpython-311.pyc
│   │   │   └── walker2d_v4.cpython-311.pyc
│   │   ├── reacher.py
│   │   ├── reacher_v4.py
│   │   ├── swimmer.py
│   │   ├── swimmer_v3.py
│   │   ├── swimmer_v4.py
│   │   ├── walker2d.py
│   │   ├── walker2d_v3.py
│   │   └── walker2d_v4.py
│   ├── __pycache__
│   │   ├── __init__.cpython-311.pyc
│   │   └── registration.cpython-311.pyc
│   ├── registration.py
│   └── toy_text
│       ├── blackjack.py
│       ├── cliffwalking.py
│       ├── font
│       │   └── Minecraft.ttf
│       ├── frozen_lake.py
│       ├── img
│       │   ├── C2.png
│       │   ├── C3.png
│       │   ├── C4.png
│       │   ├── C5.png
│       │   ├── C6.png
│       │   ├── C7.png
│       │   ├── C8.png
│       │   ├── C9.png
│       │   ├── cab_front.png
│       │   ├── cab_left.png
│       │   ├── cab_rear.png
│       │   ├── cab_right.png
│       │   ├── CA.png
│       │   ├── Card.png
│       │   ├── CJ.png
│       │   ├── CK.png
│       │   ├── cookie.png
│       │   ├── CQ.png
│       │   ├── cracked_hole.png
│       │   ├── CT.png
│       │   ├── D2.png
│       │   ├── D3.png
│       │   ├── D4.png
│       │   ├── D5.png
│       │   ├── D6.png
│       │   ├── D7.png
│       │   ├── D8.png
│       │   ├── D9.png
│       │   ├── DA.png
│       │   ├── DJ.png
│       │   ├── DK.png
│       │   ├── DQ.png
│       │   ├── DT.png
│       │   ├── elf_down.png
│       │   ├── elf_left.png
│       │   ├── elf_right.png
│       │   ├── elf_up.png
│       │   ├── goal.png
│       │   ├── gridworld_median_bottom.png
│       │   ├── gridworld_median_horiz.png
│       │   ├── gridworld_median_left.png
│       │   ├── gridworld_median_right.png
│       │   ├── gridworld_median_top.png
│       │   ├── gridworld_median_vert.png
│       │   ├── H2.png
│       │   ├── H3.png
│       │   ├── H4.png
│       │   ├── H5.png
│       │   ├── H6.png
│       │   ├── H7.png
│       │   ├── H8.png
│       │   ├── H9.png
│       │   ├── HA.png
│       │   ├── HJ.png
│       │   ├── HK.png
│       │   ├── hole.png
│       │   ├── hotel.png
│       │   ├── HQ.png
│       │   ├── HT.png
│       │   ├── ice.png
│       │   ├── mountain_bg1.png
│       │   ├── mountain_bg2.png
│       │   ├── mountain_cliff.png
│       │   ├── mountain_near-cliff1.png
│       │   ├── mountain_near-cliff2.png
│       │   ├── passenger.png
│       │   ├── S2.png
│       │   ├── S3.png
│       │   ├── S4.png
│       │   ├── S5.png
│       │   ├── S6.png
│       │   ├── S7.png
│       │   ├── S8.png
│       │   ├── S9.png
│       │   ├── SA.png
│       │   ├── SJ.png
│       │   ├── SK.png
│       │   ├── SQ.png
│       │   ├── stool.png
│       │   ├── ST.png
│       │   └── taxi_background.png
│       ├── __init__.py
│       ├── __pycache__
│       │   ├── blackjack.cpython-311.pyc
│       │   ├── cliffwalking.cpython-311.pyc
│       │   ├── frozen_lake.cpython-311.pyc
│       │   ├── __init__.cpython-311.pyc
│       │   ├── taxi.cpython-311.pyc
│       │   └── utils.cpython-311.pyc
│       ├── taxi.py
│       └── utils.py
├── error.py
├── __init__.py
├── logger.py
├── __pycache__
│   ├── core.cpython-311.pyc
│   ├── error.cpython-311.pyc
│   ├── __init__.cpython-311.pyc
│   ├── logger.cpython-311.pyc
│   └── version.cpython-311.pyc
├── py.typed
├── spaces
│   ├── box.py
│   ├── dict.py
│   ├── discrete.py
│   ├── graph.py
│   ├── __init__.py
│   ├── multi_binary.py
│   ├── multi_discrete.py
│   ├── __pycache__
│   │   ├── box.cpython-311.pyc
│   │   ├── dict.cpython-311.pyc
│   │   ├── discrete.cpython-311.pyc
│   │   ├── graph.cpython-311.pyc
│   │   ├── __init__.cpython-311.pyc
│   │   ├── multi_binary.cpython-311.pyc
│   │   ├── multi_discrete.cpython-311.pyc
│   │   ├── sequence.cpython-311.pyc
│   │   ├── space.cpython-311.pyc
│   │   ├── text.cpython-311.pyc
│   │   ├── tuple.cpython-311.pyc
│   │   └── utils.cpython-311.pyc
│   ├── sequence.py
│   ├── space.py
│   ├── text.py
│   ├── tuple.py
│   └── utils.py
├── utils
│   ├── colorize.py
│   ├── env_checker.py
│   ├── ezpickle.py
│   ├── __init__.py
│   ├── passive_env_checker.py
│   ├── play.py
│   ├── __pycache__
│   │   ├── colorize.cpython-311.pyc
│   │   ├── env_checker.cpython-311.pyc
│   │   ├── ezpickle.cpython-311.pyc
│   │   ├── __init__.cpython-311.pyc
│   │   ├── passive_env_checker.cpython-311.pyc
│   │   ├── play.cpython-311.pyc
│   │   ├── save_video.cpython-311.pyc
│   │   ├── seeding.cpython-311.pyc
│   │   └── step_api_compatibility.cpython-311.pyc
│   ├── save_video.py
│   ├── seeding.py
│   └── step_api_compatibility.py
├── vector
│   ├── async_vector_env.py
│   ├── __init__.py
│   ├── __pycache__
│   │   ├── async_vector_env.cpython-311.pyc
│   │   ├── __init__.cpython-311.pyc
│   │   ├── sync_vector_env.cpython-311.pyc
│   │   └── vector_env.cpython-311.pyc
│   ├── sync_vector_env.py
│   ├── utils
│   │   ├── __init__.py
│   │   ├── misc.py
│   │   ├── numpy_utils.py
│   │   ├── __pycache__
│   │   │   ├── __init__.cpython-311.pyc
│   │   │   ├── misc.cpython-311.pyc
│   │   │   ├── numpy_utils.cpython-311.pyc
│   │   │   ├── shared_memory.cpython-311.pyc
│   │   │   └── spaces.cpython-311.pyc
│   │   ├── shared_memory.py
│   │   └── spaces.py
│   └── vector_env.py
├── version.py
└── wrappers
    ├── atari_preprocessing.py
    ├── autoreset.py
    ├── clip_action.py
    ├── compatibility.py
    ├── env_checker.py
    ├── filter_observation.py
    ├── flatten_observation.py
    ├── frame_stack.py
    ├── gray_scale_observation.py
    ├── human_rendering.py
    ├── __init__.py
    ├── monitoring
    │   ├── __init__.py
    │   ├── __pycache__
    │   │   ├── __init__.cpython-311.pyc
    │   │   └── video_recorder.cpython-311.pyc
    │   └── video_recorder.py
    ├── normalize.py
    ├── order_enforcing.py
    ├── pixel_observation.py
    ├── __pycache__
    │   ├── atari_preprocessing.cpython-311.pyc
    │   ├── autoreset.cpython-311.pyc
    │   ├── clip_action.cpython-311.pyc
    │   ├── compatibility.cpython-311.pyc
    │   ├── env_checker.cpython-311.pyc
    │   ├── filter_observation.cpython-311.pyc
    │   ├── flatten_observation.cpython-311.pyc
    │   ├── frame_stack.cpython-311.pyc
    │   ├── gray_scale_observation.cpython-311.pyc
    │   ├── human_rendering.cpython-311.pyc
    │   ├── __init__.cpython-311.pyc
    │   ├── normalize.cpython-311.pyc
    │   ├── order_enforcing.cpython-311.pyc
    │   ├── pixel_observation.cpython-311.pyc
    │   ├── record_episode_statistics.cpython-311.pyc
    │   ├── record_video.cpython-311.pyc
    │   ├── render_collection.cpython-311.pyc
    │   ├── rescale_action.cpython-311.pyc
    │   ├── resize_observation.cpython-311.pyc
    │   ├── step_api_compatibility.cpython-311.pyc
    │   ├── time_aware_observation.cpython-311.pyc
    │   ├── time_limit.cpython-311.pyc
    │   ├── transform_observation.cpython-311.pyc
    │   ├── transform_reward.cpython-311.pyc
    │   └── vector_list_info.cpython-311.pyc
    ├── record_episode_statistics.py
    ├── record_video.py
    ├── render_collection.py
    ├── rescale_action.py
    ├── resize_observation.py
    ├── step_api_compatibility.py
    ├── time_aware_observation.py
    ├── time_limit.py
    ├── transform_observation.py
    ├── transform_reward.py
    └── vector_list_info.py

27 directories, 322 files

知道代码位置只是万里长征走完了第一步，后续文章会介绍如何对代码进行调试，并随调试随讲解整个代码。