开源项目 `invalid-action-masking` 使用教程

最新推荐文章于 2024-08-16 07:44:06 发布

卓禄嘉Ernestine

最新推荐文章于 2024-08-16 07:44:06 发布

阅读量610

点赞数 5

本文链接：https://blog.csdn.net/gitblog_01196/article/details/141236492

版权

开源项目 `invalid-action-masking` 使用教程

invalid-action-maskingSource Code for A Closer Look at Invalid Action Masking in Policy Gradient Algorithms项目地址:https://gitcode.com/gh_mirrors/in/invalid-action-masking

1. 项目的目录结构及介绍

invalid-action-masking/
├── gym_vec_api/
│   ├── __init__.py
│   ├── invalid_action_masking.py
├── plots/
│   ├── plot_script.py
├── .gitignore
├── .python-version
├── LICENSE
├── README.md
├── build.sh
├── poetry.lock
├── ppo.py
├── pyproject.toml
├── requirements.txt
├── test_pytest.py

gym_vec_api/: 包含与 gym 环境相关的接口和无效动作掩码的实现。
plots/: 包含用于生成图表的脚本。
.gitignore: 指定 Git 版本控制系统忽略的文件和目录。
.python-version: 指定项目使用的 Python 版本。
LICENSE: 项目的许可证文件，本项目使用 MIT 许可证。
README.md: 项目的说明文档。
build.sh: 构建脚本。
poetry.lock: 锁定依赖版本的文件。
ppo.py: 包含 PPO 算法的实现。
pyproject.toml: 项目配置文件，包含依赖和构建系统信息。
requirements.txt: 项目依赖的 Python 包列表。
test_pytest.py: 测试脚本。

2. 项目的启动文件介绍

项目的启动文件是 ppo.py，它包含了 PPO（Proximal Policy Optimization）算法的实现。要启动项目，可以使用以下命令：

python ppo.py

ppo.py 文件负责初始化环境和训练代理，使用无效动作掩码来优化策略。

3. 项目的配置文件介绍

项目的配置文件是 pyproject.toml，它使用 poetry 来管理依赖和构建系统。以下是 pyproject.toml 文件的部分内容：

[tool.poetry]
name = "invalid-action-masking"
version = "0.1.0"
description = "Source Code for A Closer Look at Invalid Action Masking in Policy Gradient Algorithms"
authors = ["Your Name <you@example.com>"]
license = "MIT"

[tool.poetry.dependencies]
python = "^3.8"
gym = "^0.18.0"
numpy = "^1.19.2"
torch = "^1.7.0"

[tool.poetry.dev-dependencies]
pytest = "^6.2.1"

[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"