接前一篇文章:OpenAI Gym中FrozenLake环境(场景)源码分析(2)
前一篇文章讲到通过python进行调试:
$ python -m pdb frozen_lake2.py
> /home/penghao/OpenAI-Gym/sample_code/frozen_lake2.py(1)<module>()
-> import numpy as np
(Pdb)
程序自己主动停在了第1行,等待调试。
为了便于理解,再次贴出frozen_lake2.py的源码:
import numpy as np
import gym
import random
import time
from IPython.display import clear_output
env = gym.make("FrozenLake-v1")
observation_space = env.observation_space
print("The observation space: {}".format(observation_space))
observation_space_size = env.observation_space.n
print(observation_space_size)
action_space = env.action_space
print("The action space: {}".format(action_space))
action_space_size = env.action_space.n
print(action_space_size)
q_table = np.zeros((observation_space_size, action_space_size))
# q_table = np.zeros([observation_space_size, action_space_size])
print(q_table)
"""
num_episodes = 10000
max_steps_per_episode = 100
learning_rate = 0.1
discount_rate = 0.99
exploration_rate = 1
max_exploration_rate = 1
min_exploration_rate = 0.01
exploration_decay_rate = 0.01
"""
total_episodes = 15000 # Total episodes 训练次数
learning_rate = 0.8 # Learning rate 学习率
max_steps = 99 # Max steps per episode 一次训练中最多决策次数
gamma = 0.95 # Discounting rate 折扣率,对未来收益的折扣
# Exploration parameters
epsilon = 1.0 # Exploration rate 探索率,就是选择动作时,随机选择动作的概率
max_epsilon = 1.0 # Exploration probability at start 初始探索率
min_epsilon = 0.01 # Minimum exploration probability 最低探索率
decay_rate = 0.001 # Exponential decay rate for exploration prob 探索率消减的指数
# List of rewards
rewards = []
# For life or until learning is stopped
for episode in range(total_episodes):
# Reset the environment
state = env.reset()
state = state[0] #本来没这条代码,但是我看这个是二元组,为了后面估计Q值可以跑,我就改成这个了,我看着是不影响的
step = 0
done = False
total_rewards = 0
for step in range(max_steps):
# Choose an action a in the current world state (s)
## First we randomize a number
exp_exp_tradeoff = random.uniform(0, 1)
## If this number > greater than epsilon --> exploitation (taking the biggest Q value for this state)
if exp_exp_tradeoff > epsilon:
action = np.argmax(q_table[state,:])
# Else doing a random choice --> exploration
else:
action = env.action_space.sample()
# Take the action (a) and observe the outcome state(s') and reward (r)
new_state, reward, done, truncated, info = env.step(action) # 这个也是,刚开始报错,来后我查了新的库这个函数输出五个数,网上说最后那个加‘_’就行
#new_state, reward, done, info, _ = env.step(action) # 这个也是,刚开始报错,来后我查了新的库这个函数输出五个数,网上说最后那个加‘_’就行
# Update Q(s,a):= Q(s,a) + lr [R(s,a) + gamma * max Q(s',a') - Q(s,a)]
# qtable[new_state,:] : all the actions we can take from new state
q_table[state, action] = q_table[state, action] + learning_rate * (reward + gamma * np.max(q_table[new_state, :]) - q_table[state, action])
total_rewards += reward
# Our new state is state
state = new_state
# If done (if we're dead) : finish episode
if done == True:
break
#if truncated == True:
#break
# Reduce epsilon (because we need less and less exploration) 随着智能体对环境熟悉程度增加,可以减少对环境的探索
epsilon = min_epsilon + (max_epsilon - min_epsilon)*np.exp(-decay_rate*episode)
rewards.append(total_rewards)
print ("Score over time: " + str(sum(rewards)/total_episodes))
print(q_table)
前文提到框架中包括4个关键步骤:
- env = gym.make("FrozenLake-v1")
- env.reset()
- env.action_space.sample()
- new_state, reward, done, truncated, info = env.step(action)
下面一个一个进行单步调试,并进行代码解析。
先看步骤1:env = gym.make("FrozenLake-v1")。
这句代码在frozen_lake2.py文件中的第7行,因此将断点设置在文件的第10行,命令及结果如下:
python -m pdb frozen_lake2.py
> /home/penghao/OpenAI-Gym/sample_code/frozen_lake2.py(1)<module>()
-> import numpy as np
(Pdb) b 7
Breakpoint 1 at /home/penghao/OpenAI-Gym/sample_code/frozen_lake2.py:7
(Pdb)
之后输入c,使程序继续运行(调到这个断点)。如下所示:
$ python -m pdb frozen_lake2.py
> /home/penghao/OpenAI-Gym/sample_code/frozen_lake2.py(1)<module>()
-> import numpy as np
(Pdb) b 7
Breakpoint 1 at /home/penghao/OpenAI-Gym/sample_code/frozen_lake2.py:7
(Pdb) c
> /home/penghao/OpenAI-Gym/sample_code/frozen_lake2.py(7)<module>()
-> env = gym.make("FrozenLake-v1")
(Pdb)
可以看到,程序已经停在了之前设置的断点的位置。
输入s,细点执行,也就是通常所说的Step In,即进入到函数或方法中。如下所示:
-> env = gym.make("FrozenLake-v1")
(Pdb) s
--Call--
> /home/penghao/.local/lib/python3.11/site-packages/gym/envs/registration.py(502)make()
-> def make(
(Pdb)
可以看到,程序已经进入到了make函数中。最为关键的是,其指示出了make函数所在的位置,是在gym/envs/registration.py文件中。make方法代码如下:
def make(
id: Union[str, EnvSpec],
max_episode_steps: Optional[int] = None,
autoreset: bool = False,
apply_api_compatibility: Optional[bool] = None,
disable_env_checker: Optional[bool] = None,
**kwargs,
) -> Env:
"""Create an environment according to the given ID.
To find all available environments use `gym.envs.registry.keys()` for all valid ids.
Args:
id: Name of the environment. Optionally, a module to import can be included, eg. 'module:Env-v0'
max_episode_steps: Maximum length of an episode (TimeLimit wrapper).
autoreset: Whether to automatically reset the environment after each episode (AutoResetWrapper).
apply_api_compatibility: Whether to wrap the environment with the `StepAPICompatibility` wrapper that
converts the environment step from a done bool to return termination and truncation bools.
By default, the argument is None to which the environment specification `apply_api_compatibility` is used
which defaults to False. Otherwise, the value of `apply_api_compatibility` is used.
If `True`, the wrapper is applied otherwise, the wrapper is not applied.
disable_env_checker: If to run the env checker, None will default to the environment specification `disable_env_checker`
(which is by default False, running the environment checker),
otherwise will run according to this parameter (`True` = not run, `False` = run)
kwargs: Additional arguments to pass to the environment constructor.
Returns:
An instance of the environment.
Raises:
Error: If the ``id`` doesn't exist then an error is raised
"""
if isinstance(id, EnvSpec):
spec_ = id
else:
module, id = (None, id) if ":" not in id else id.split(":")
if module is not None:
try:
importlib.import_module(module)
except ModuleNotFoundError as e:
raise ModuleNotFoundError(
f"{e}. Environment registration via importing a module failed. "
f"Check whether '{module}' contains env registration and can be imported."
)
spec_ = registry.get(id)
ns, name, version = parse_env_id(id)
latest_version = find_highest_version(ns, name)
if (
version is not None
and latest_version is not None
and latest_version > version
):
logger.warn(
f"The environment {id} is out of date. You should consider "
f"upgrading to version `v{latest_version}`."
)
if version is None and latest_version is not None:
version = latest_version
new_env_id = get_env_id(ns, name, version)
spec_ = registry.get(new_env_id)
logger.warn(
f"Using the latest versioned environment `{new_env_id}` "
f"instead of the unversioned environment `{id}`."
)
if spec_ is None:
_check_version_exists(ns, name, version)
raise error.Error(f"No registered env with id: {id}")
_kwargs = spec_.kwargs.copy()
_kwargs.update(kwargs)
if spec_.entry_point is None:
raise error.Error(f"{spec_.id} registered but entry_point is not specified")
elif callable(spec_.entry_point):
env_creator = spec_.entry_point
else:
# Assume it's a string
env_creator = load(spec_.entry_point)
mode = _kwargs.get("render_mode")
apply_human_rendering = False
apply_render_collection = False
# If we have access to metadata we check that "render_mode" is valid and see if the HumanRendering wrapper needs to be applied
if mode is not None and hasattr(env_creator, "metadata"):
assert isinstance(
env_creator.metadata, dict
), f"Expect the environment creator ({env_creator}) metadata to be dict, actual type: {type(env_creator.metadata)}"
if "render_modes" in env_creator.metadata:
render_modes = env_creator.metadata["render_modes"]
if not isinstance(render_modes, Sequence):
logger.warn(
f"Expects the environment metadata render_modes to be a Sequence (tuple or list), actual type: {type(render_modes)}"
)
# Apply the `HumanRendering` wrapper, if the mode=="human" but "human" not in render_modes
if (
mode == "human"
and "human" not in render_modes
and ("rgb_array" in render_modes or "rgb_array_list" in render_modes)
):
logger.warn(
"You are trying to use 'human' rendering for an environment that doesn't natively support it. "
"The HumanRendering wrapper is being applied to your environment."
)
apply_human_rendering = True
if "rgb_array" in render_modes:
_kwargs["render_mode"] = "rgb_array"
else:
_kwargs["render_mode"] = "rgb_array_list"
elif (
mode not in render_modes
and mode.endswith("_list")
and mode[: -len("_list")] in render_modes
):
_kwargs["render_mode"] = mode[: -len("_list")]
apply_render_collection = True
elif mode not in render_modes:
logger.warn(
f"The environment is being initialised with mode ({mode}) that is not in the possible render_modes ({render_modes})."
)
else:
logger.warn(
f"The environment creator metadata doesn't include `render_modes`, contains: {list(env_creator.metadata.keys())}"
)
if apply_api_compatibility is True or (
apply_api_compatibility is None and spec_.apply_api_compatibility is True
):
# If we use the compatibility layer, we treat the render mode explicitly and don't pass it to the env creator
render_mode = _kwargs.pop("render_mode", None)
else:
render_mode = None
try:
env = env_creator(**_kwargs)
except TypeError as e:
if (
str(e).find("got an unexpected keyword argument 'render_mode'") >= 0
and apply_human_rendering
):
raise error.Error(
f"You passed render_mode='human' although {id} doesn't implement human-rendering natively. "
"Gym tried to apply the HumanRendering wrapper but it looks like your environment is using the old "
"rendering API, which is not supported by the HumanRendering wrapper."
)
else:
raise e
# Copies the environment creation specification and kwargs to add to the environment specification details
spec_ = copy.deepcopy(spec_)
spec_.kwargs = _kwargs
env.unwrapped.spec = spec_
# Add step API wrapper
if apply_api_compatibility is True or (
apply_api_compatibility is None and spec_.apply_api_compatibility is True
):
env = EnvCompatibility(env, render_mode)
# Run the environment checker as the lowest level wrapper
if disable_env_checker is False or (
disable_env_checker is None and spec_.disable_env_checker is False
):
env = PassiveEnvChecker(env)
# Add the order enforcing wrapper
if spec_.order_enforce:
env = OrderEnforcing(env)
# Add the time limit wrapper
if max_episode_steps is not None:
env = TimeLimit(env, max_episode_steps)
elif spec_.max_episode_steps is not None:
env = TimeLimit(env, spec_.max_episode_steps)
# Add the autoreset wrapper
if autoreset:
env = AutoResetWrapper(env)
# Add human rendering wrapper
if apply_human_rendering:
env = HumanRendering(env)
elif apply_render_collection:
env = RenderCollection(env)
return env
这个比较长(200行左右),我们只需要在后文中关注我们实际用到的,到时候再回过头来看此处的代码。
余下3个步骤的分析请看后续文章。