熟悉资源
\experiment 文件
目录概览
configs/experiment/base.yaml:7通过 Hydra 的 defaults 组合 `model/env/callbacks/trainer/logger,提供 AM+TSP+REINFORCE 的教学模板。configs/experiment/base.yaml:18设定 TSP 生成器 num_loc=50 并关闭 check_solution,示范如何在任务级别覆写环境参数。configs/experiment/base.yaml:24使用 WandB 记录项目/标签/命名约定,便于同一问题规模聚合实验。configs/experiment/base.yaml:34在 model 与 trainer 区块固定数据规模、批量与 max_epochs,最后 seed:1234 统一随机性。configs/experiment/routing/am.yaml:3显示子目录会覆写 defaults 指向各自的 model/env 配置组,从而派生具体实验脚本。
# @package _global_
# Example configuration for experimenting. Trains the Attention Model on
# the TSP environment with 50 locations via REINFORCE with greedy rollout baseline.
# You may find comments on the most common hyperparameters below.
# Override defaults: take configs from relative path
# 通过 Hydra 的 defaults 组合 model/env/callbacks/trainer/logger,提供 AM+TSP+REINFORCE 的教学模板。
defaults:
- override /model: am.yaml
- override /env: tsp.yaml
- override /callbacks: default.yaml
- override /trainer: default.yaml
# - override /logger: null # comment this line to enable logging
- override /logger: wandb.yaml
# Environment configuration
# Note that here we load by default the `.npz` files for the TSP environment
# that are automatically generated with seed following Kool et al. (2019).
# 设定 TSP 生成器 num_loc=50 并关闭 check_solution,示范如何在任务级别覆写环境参数。
env:
generator_params:
num_loc: 50
check_solution: False # optimization
# Logging: we use Wandb in this case
# 使用 WandB 记录项目/标签/命名约定,便于同一问题规模聚合实验。
logger:
wandb:
project: "rl4co"
tags: ["am", "tsp"]
group: "tsp${env.generator_params.num_loc}"
name: "am-tsp${env.generator_params.num_loc}"
# Model: this contains the environment (which gets automatically passed to the model on
# initialization), the policy network and other hyperparameters.
# This is a `LightningModule` and can be trained with PyTorch Lightning.
# 在 model 与 trainer 区块固定数据规模、批量与 max_epochs,最后 seed:1234 统一随机性。
model:
batch_size: 512
val_batch_size: 1024
test_batch_size: 1024
train_data_size: 1_280_000
val_data_size: 10_000
test_data_size: 10_000
optimizer_kwargs:
lr: 1e-4
# Trainer: this is a customized version of the PyTorch Lightning trainer.
trainer:
max_epochs: 100
seed: 1234
EDA & Graph
configs/experiment/eda/am.yaml:3针对离散事件调度 (MDPP) 使用 AM+REINFORCE,减小 batch/数据规模并把学习率调高到 1e-4、weight_decay=1e-3。
# @package _global_
# 显示子目录会覆写 defaults 指向各自的 model/env 配置组,从而派生具体实验脚本。
# 针对离散事件调度 (MDPP) 使用 AM+REINFORCE,减小 batch/数据规模并把学习率调高到 1e-4、weight_decay=1e-3。
defaults:
- override /model: am.yaml
- override /env: mdpp.yaml
- override /callbacks: default.yaml
- override /trainer: default.yaml
- override /logger: wandb.yaml
logger:
wandb:
project: "rl4co"
tags: ["am", "${env.name}"]
group: ${env.name}
name: am-${env.name}
model:
batch_size: 64
train_data_size: 500
val_data_size: 100
test_data_size: 100
optimizer_kwargs:
lr: 1e-4
weight_decay: 1e-3
trainer:
max_epochs: 10
seed: 1234
configs/experiment/eda/am-a2c.yaml:18将 model 切换为 rl4co.models.A2C,复用 AM 策略,但拆分 actor/critic 优化器并沿用小样本设定。
# @package _global_
defaults:
- override /model: am.yaml
- override /env: mdpp.yaml
- override /callbacks: default.yaml
- override /trainer: default.yaml
- override /logger: wandb.yaml
logger:
wandb:
project: "rl4co"
tags: ["am", "${env.name}"]
group: ${env.name}
name: am-a2c-${env.name}
# 将 model 切换为 rl4co.models.A2C,复用 AM 策略,但拆分 actor/critic 优化器并沿用小样本设定。
model:
_target_: rl4co.models.A2C
policy:
_target_: rl4co.models.AttentionModelPolicy
env_name: "${env.name}"
actor_optimizer_kwargs:
lr: 1e-4
weight_decay: 1e-3
critic_optimizer_kwargs: null # default to actor_optimizer_kwargs
batch_size: 64
train_data_size: 500
val_data_size: 100
test_data_size: 100
trainer:
max_epochs: 10
seed: 1234
configs/experiment/eda/am-ppo.yaml:19改用 Step-wise PPO 设置 (clip_range/ppo_epochs/mini_batch),保持 10 个 epoch 的快速实验。
# @package _global_
defaults:
- override /model: am-ppo.yaml
- override /env: mdpp.yaml
- override /callbacks: default.yaml
- override /trainer: default.yaml
- override /logger: wandb.yaml
logger:
wandb:
project: "rl4co"
tags: ["am-ppo", "${env.name}"]
group: ${env.name}
name: am-ppo-${env.name}
# 改用 Step-wise PPO 设置 (clip_range/ppo_epochs/mini_batch),保持 10 个 epoch 的快速实验。
model:
batch_size: 64
train_data_size: 1000
val_data_size: 100
test_data_size: 100
clip_range: 0.2
ppo_epochs: 2
mini_batch_size: ${model.batch_size}
vf_lambda: 0.5
entropy_lambda: 0.01
normalize_adv: False
max_grad_norm: 0.5
optimizer_kwargs:
lr: 1e-4
weight_decay: 1e-3
trainer:
max_epochs: 10
gradient_clip_val: Null # not supported in manual optimization
precision: "32-true" # NOTE: this seems to be important during manual optimization
seed: 1234
configs/experiment/graph/am.yaml:3为图匹配任务加载 AM 策略,和 EDA 模板类似但以 graph 相关的 env 组与批量设置为 256/1024。
# @package _global_
# 为图匹配任务加载 AM 策略,和 EDA 模板类似但以 graph 相关的 env 组与批量设置为 256/1024。
defaults:
- override /model: am.yaml
- override /env: flp.yaml
- override /callbacks: default.yaml
- override /trainer: default.yaml
- override /logger: wandb.yaml
logger:
wandb:
project: "rl4co"
tags: ["am", "${env.name}"]
group: ${env.name}
name: am-${env.name}
model:
batch_size: 1000
train_data_size: 100_000
val_data_size: 1000
test_data_size: 1000
optimizer_kwargs:
lr: 1e-4
trainer:
max_epochs: 100
seed: 1234
Routing A
configs/experiment/routing/am.yaml:3作为路由基线,AM+TSP+REINFORCE,配 lr_scheduler=MultiStepLR (80/95) 且批量 512。
# @package _global_
# 作为路由基线,AM+TSP+REINFORCE,配 lr_scheduler=MultiStepLR (80/95) 且批量 512。
defaults:
- override /model: am.yaml
- override /env: tsp.yaml
- override /callbacks: default.yaml
- override /trainer: default.yaml
- override /logger: wandb.yaml
env:
generator_params:
num_loc: 50
logger:
wandb:
project: "rl4co"
tags: ["am", "${env.name}"]
group: ${env.name}${env.generator_params.num_loc}
name: am-${env.name}${env.generator_params.num_loc}
model:
batch_size: 512
val_batch_size: 1024
test_batch_size: 1024
train_data_size: 1_280_000
val_data_size: 10_000
test_data_size: 10_000
optimizer_kwargs:
lr: 1e-4
weight_decay: 1e-6
lr_scheduler:
"MultiStepLR"
lr_scheduler_kwargs:
milestones: [80, 95]
gamma: 0.1
trainer:
max_epochs: 100
seed: 1234
configs/experiment/routing/am-xl.yaml:22扩展模型:num_encoder_layers=6、batch_size=2048、训练 500 epoch,定位长程大批量训练。
# @package _global_
defaults:
- override /model: am.yaml
- override /env: tsp.yaml
- override /callbacks: default.yaml
- override /trainer: default.yaml
- override /logger: wandb.yaml
env:
generator_params:
num_loc: 50
logger:
wandb:
project: "rl4co"
tags: ["am", "${env.name}"]
group: "${env.name}${env.generator_params.num_loc}"
name: "am-xl-${env.name}${env.generator_params.num_loc}"
# 扩展模型:num_encoder_layers=6、batch_size=2048、训练 500 epoch,定位长程大批量训练。
model:
policy_kwargs:
num_encoder_layers: 6
normalization: 'instance'
batch_size: 2048
val_batch_size: 1024
test_batch_size: 1024
train_data_size: 1_280_000
val_data_size: 10_000
test_data_size: 10_000
optimizer_kwargs:
lr: 1e-4
weight_decay: 1e-6
lr_scheduler:
"MultiStepLR"
lr_scheduler_kwargs:
milestones: [480, 495]
gamma: 0.1
trainer:
max_epochs: 500
seed: 1234
configs/experiment/routing/am-svrp.yaml:20切换到 SVRP 环境 (需求随机) 并把学习率降到 1e-6 以稳定训练。
# @package _global_
defaults:
- override /model: am.yaml
- override /env: svrp.yaml
- override /callbacks: default.yaml
- override /trainer: default.yaml
- override /logger: wandb.yaml
env:
num_loc: 50
logger:
wandb:
project: "rl4co"
tags: ["am", "${env.name}"]
group: ${env.name}${env.generator_params.num_loc}
name: am-${env.name}${env.generator_params.num_loc}
# 切换到 SVRP 环境 (需求随机) 并把学习率降到 1e-6 以稳定训练。
model:
batch_size: 512
val_batch_size: 1024
test_batch_size: 1024
train_data_size: 1_280_000
val_data_size: 10_000
test_data_size: 10_000
optimizer_kwargs:
lr: 1e-6
weight_decay: 0
lr_scheduler:
"MultiStepLR"
lr_scheduler_kwargs:
milestones: [80, 95]
gamma: 0.1
trainer:
max_epochs: 100
seed: 1234
configs/experiment/routing/am-a2c.yaml:14将模型换成 A2C,保持 AM 策略,使用与 REINFORCE 同样的 Hydra 组合以快速比较算法。
# @package _global_
# Use the following to take the default values from am.yaml
# Replace below only the values that you want to change compared to the default values
defaults:
- routing/am.yaml
- _self_
logger:
wandb:
tags: ["am-a2c", "${env.name}"]
name: am-a2c-${env.name}${env.generator_params.num_loc}
# 将模型换成 A2C,保持 AM 策略,使用与 REINFORCE 同样的 Hydra 组合以快速比较算法。
model:
_target_: rl4co.models.A2C
policy:
_target_: rl4co.models.AttentionModelPolicy
env_name: "${env.name}"
actor_optimizer_kwargs:
lr: 1e-4
weight_decay: 1e-6
critic_optimizer_kwargs: null # default to actor_optimizer_kwargs
configs/experiment/routing/am-ppo.yaml:22配置 AM 的 PPO 变体:clip_range=0.2、ppo_epochs=2、max_grad_norm=0.5,并强制 precision=“32-true” 适配手动优化。
# @package _global_
defaults:
- override /model: am-ppo.yaml
- override /env: tsp.yaml
- override /callbacks: default.yaml
- override /trainer: default.yaml
- override /logger: wandb.yaml
env:
generator_params:
num_loc: 50
logger:
wandb:
project: "rl4co"
tags: ["am-ppo", "${env.name}"]
group: ${env.name}${env.generator_params.num_loc}
name: ppo-${env.name}${env.generator_params.num_loc}
# 配置 AM 的 PPO 变体:clip_range=0.2、ppo_epochs=2、max_grad_norm=0.5,并强制 precision="32-true" 适配手动优化。
model:
batch_size: 512
val_batch_size: 1024
test_batch_size: 1024
train_data_size: 1_280_000
val_data_size: 10_000
test_data_size: 10_000
clip_range: 0.2
ppo_epochs: 2
mini_batch_size: 512
vf_lambda: 0.5
entropy_lambda: 0.01
normalize_adv: False
max_grad_norm: 0.5
optimizer_kwargs:
lr: 1e-4
weight_decay: 1e-6
lr_scheduler:
"MultiStepLR"
lr_scheduler_kwargs:
milestones: [80, 95]
gamma: 0.1
trainer:
max_epochs: 100
gradient_clip_val: Null # not supported in manual optimization
precision: "32-true" # NOTE: this seems to be important during manual optimization
seed: 1234
Routing B
configs/experiment/routing/pomo.yaml:3设定 POMO (多起点策略梯度) 基线,批量 64、160k 训练样本,维持 MultiStepLR。
# @package _global_
# 设定 POMO (多起点策略梯度) 基线,批量 64、160k 训练样本,维持 MultiStepLR。
defaults:
- override /model: pomo.yaml
- override /env: tsp.yaml
- override /callbacks: default.yaml
- override /trainer: default.yaml
- override /logger: wandb.yaml
env:
generator_params:
num_loc: 50
logger:
wandb:
project: "rl4co"
tags: ["pomo", "${env.name}"]
group: "${env.name}${env.generator_params.num_loc}"
name: "pomo-${env.name}${env.generator_params.num_loc}"
model:
batch_size: 64
train_data_size: 160_000
val_data_size: 10_000
test_data_size: 10_000
optimizer_kwargs:
lr: 1e-4
weight_decay: 1e-6
lr_scheduler:
"MultiStepLR"
lr_scheduler_kwargs:
milestones: [80, 95]
gamma: 0.1
trainer:
max_epochs: 100
seed: 1234
使用上述配置文件:
- 先按
README.md或PROJECT_MANUAL.md的方式配置好虚拟环境并安装依赖(推荐uv sync --all-extras或pip install -e .[all])。 - 在项目根目录运行 Hydra 入口脚本,例如:
python run.py experiment=routing/pomo
这里 experiment=routing/pomo 指向 ` ;Hydra 会自动加载该文件,并根据其中的 defaults 组合 model: pomo.yaml、env: tsp.yaml 等子配置。
修改方案
要改 configs/experiment/routing/pomo.yaml 对应的“方法”,需要调整它指向的 Python 类/模块,主要有以下几个入口:
rl4co/models/zoo/pomo/model.py:15定义了 target: rl4co.models.POMO,也就是 POMO 算法的 LightningModule;其中 init、shared_step 等方法控制训练/推理流程,是最直接的“方法”实现。rl4co/models/zoo/am/policy.py:10中的 AttentionModelPolicy 是 POMO 默认使用的策略网络;若要改编码/解码逻辑,可继续深入到rl4co/models/zoo/am/encoder.py(注意力编码器)、rl4co/models/zoo/am/decoder.py(指针解码器)。
rl4co/envs/routing/tsp/env.py:24定义了 TSPEnv,因为pomo.yaml默认加载configs/env/tsp.yaml,如需改环境状态转移或奖励,需要在这里修改。- 其它依赖:
- 基线 SharedBaseline 位于
rl4co/models/rl/baselines/shared.py; - 数据增广 StateAugmentation 在
rl4co/data/transforms/state.py; - 如果想改训练回合或调度器,可以查看
configs/model/pomo.yaml(Hydra 配置层)或rl4co/models/rl/reinforce/reinforce.py(父类逻辑)。
- 基线 SharedBaseline 位于
configs/experiment/routing/ar-gnn.yaml:22在 POMO 框架下引入 NARGNNNodeEncoder,强调 GNN 编码器替换 AM encoder。
# @package _global_
defaults:
- override /model: pomo.yaml
- override /env: tsp.yaml
- override /callbacks: default.yaml
- override /trainer: default.yaml
- override /logger: wandb.yaml
env:
generator_params:
num_loc: 50
logger:
wandb:
project: "rl4co"
tags: ["pomo", "${env.name}"]
group: "${env.name}${env.generator_params.num_loc}"
name: "pomo-${env.name}${env.generator_params.num_loc}"
# 在 POMO 框架下引入 NARGNNNodeEncoder,强调 GNN 编码器替换 AM encoder。
model:
policy:
_target_: rl4co.models.zoo.am.policy.AttentionModelPolicy
encoder:
_target_: rl4co.models.zoo.nargnn.encoder.NARGNNNodeEncoder
embed_dim: 128
env_name: "${env.name}"
env_name: "${env.name}"
batch_size: 64
train_data_size: 160_000
val_data_size: 10_000
test_data_size: 10_000
optimizer_kwargs:
lr: 1e-4
weight_decay: 1e-6
lr_scheduler:
"MultiStepLR"
lr_scheduler_kwargs:
milestones: [80, 95]
gamma: 0.1
trainer:
max_epochs: 100
seed: 1234
configs/experiment/routing/mdpomo.yaml:24针对 CVRP 的混合分布生成器,延长到 10k 训练集和 10k epoch 以匹配论文版 MDPOMO。
# @package _global_
#
defaults:
- override /model: pomo.yaml
- override /env: cvrp.yaml
- override /callbacks: default.yaml
- override /trainer: default.yaml
- override /logger: wandb.yaml
env:
generator_params:
num_loc: 50
loc_distribution: "mix_distribution"
logger:
wandb:
project: "rl4co"
tags: ["mdpomo", "${env.name}"]
group: "${env.name}${env.generator_params.num_loc}"
name: "mdpomo-${env.name}${env.generator_params.num_loc}"
# 针对 CVRP 的混合分布生成器,延长到 10k 训练集和 10k epoch 以匹配论文版 MDPOMO。
model:
batch_size: 512
train_data_size: 10_000
val_data_size: 10_000
test_data_size: 10_000
optimizer_kwargs:
lr: 1e-4
weight_decay: 1e-6
lr_scheduler:
"MultiStepLR"
lr_scheduler_kwargs:
milestones: [9001]
gamma: 0.1
trainer:
max_epochs: 10000
seed: 1234
configs/experiment/routing/ptrnet.yaml:21复现 Pointer Network 基线,额外定义 data 区块保证 DataModule 批量与模型一致。
# @package _global_
defaults:
- override /model: ptrnet.yaml
- override /env: tsp.yaml
- override /callbacks: default.yaml
- override /trainer: default.yaml
- override /logger: wandb.yaml
env:
generator_params:
num_loc: 50
logger:
wandb:
project: "rl4co"
tags: ["ptrnet", "${env.name}"]
group: ${env.name}${env.generator_params.num_loc}
name: ptrnet-${env.name}${env.generator_params.num_loc}
# 复现 Pointer Network 基线,额外定义 data 区块保证 DataModule 批量与模型一致。
model:
batch_size: 512
val_batch_size: 1024
test_batch_size: 1024
train_data_size: 1_280_000
val_data_size: 10_000
test_data_size: 10_000
optimizer_kwargs:
lr: 1e-4
weight_decay: 1e-6
lr_scheduler:
"MultiStepLR"
lr_scheduler_kwargs:
milestones: [80, 95]
gamma: 0.1
trainer:
max_epochs: 100
data:
batch_size: 512
train_size: 1_280_000
val_size: 10_000
seed: 1234
configs/experiment/routing/polynet.yaml:22POLYNET 多解训练,指定 k=100 与 val_num_solutions,并在日志名中注入 k 值。
# @package _global_
defaults:
- override /model: polynet.yaml
- override /env: tsp.yaml
- override /callbacks: default.yaml
- override /trainer: default.yaml
- override /logger: wandb.yaml
env:
generator_params:
num_loc: 50
check_solution: False # optimization
logger:
wandb:
project: "rl4co"
tags: ["polynet", "${env.name}"]
group: "${env.name}${env.generator_params.num_loc}"
name: "polynet-${env.name}${env.generator_params.num_loc}-${model.k}"
# POLYNET 多解训练,指定 k=100 与 val_num_solutions,并在日志名中注入 k 值。
model:
k: 100
val_num_solutions: ${model.k}
batch_size: 512
val_batch_size: 1024
test_batch_size: 1024
train_data_size: 1_280_000
val_data_size: 10_000
test_data_size: 10_000
optimizer_kwargs:
lr: 1e-4
weight_decay: 1e-6
lr_scheduler:
"MultiStepLR"
lr_scheduler_kwargs:
milestones: [80, 95]
gamma: 0.1
trainer:
max_epochs: 100
seed: 1234
Routing C
configs/experiment/routing/symnco.yaml:21SYMNCO 多起点策略,启用 num_augment=10 与 num_starts=0 表示只用对称增广。
# @package _global_
defaults:
- override /model: symnco.yaml
- override /env: tsp.yaml
- override /callbacks: default.yaml
- override /trainer: default.yaml
- override /logger: wandb.yaml
env:
generator_params:
num_loc: 50
logger:
wandb:
project: "rl4co"
tags: ["symnco", "${env.name}"]
group: "${env.name}${env.generator_params.num_loc}"
name: "symnco-${env.name}${env.generator_params.num_loc}"
# SYMNCO 多起点策略,启用 num_augment=10 与 num_starts=0 表示只用对称增广。
model:
batch_size: 512
val_batch_size: 1024
test_batch_size: 1024
train_data_size: 1_280_000
val_data_size: 10_000
test_data_size: 10_000
num_starts: 0 # 0 for no augmentation for multi-starts
num_augment: 10
optimizer_kwargs:
lr: 1e-4
weight_decay: 1e-6
lr_scheduler:
"MultiStepLR"
lr_scheduler_kwargs:
milestones: [80, 95]
gamma: 0.1
trainer:
max_epochs: 100
seed: 1234
configs/experiment/routing/deepaco.yaml:21DeepACO 设定:超小数据集、train_with_local_search=True、细化蚁群 policy_kwargs,并用 CosineAnnealingLR。
# @package _global_
defaults:
- override /model: deepaco.yaml
- override /env: tsp.yaml
- override /callbacks: default.yaml
- override /trainer: default.yaml
- override /logger: wandb.yaml
env:
generator_params:
num_loc: 50
logger:
wandb:
project: "rl4co"
tags: ["deepaco", "${env.name}"]
group: ${env.name}${env.generator_params.num_loc}
name: deepaco-${env.name}${env.generator_params.num_loc}
# DeepACO 设定:超小数据集、train_with_local_search=True、细化蚁群 policy_kwargs,并用 CosineAnnealingLR。
model:
batch_size: 20
val_batch_size: 20
test_batch_size: 20
train_data_size: 400
val_data_size: 20
test_data_size: 100
optimizer: "AdamW"
optimizer_kwargs:
lr: 5e-4
weight_decay: 0
lr_scheduler:
"CosineAnnealingLR"
lr_scheduler_kwargs:
T_max: 50
eta_min: 1e-5
metrics:
test:
- reward_000
- reward_002
- reward_009 # since n_iterations["text"] = 10
train_with_local_search: True
ls_reward_aug_W: 0.99
policy_kwargs:
n_ants:
train: 30
val: 30
test: 100
n_iterations:
train: 1 # unused value
val: 5
test: 10
temperature: 1.0
top_p: 0.0
top_k: 0
start_node: null
multistart: False
k_sparse: 5 # this should be adjusted based on the `num_loc` value
aco_kwargs:
alpha: 1.0
beta: 1.0
decay: 0.95
use_local_search: True
use_nls: True
n_perturbations: 5
local_search_params:
max_iterations: 1000
perturbation_params:
max_iterations: 20
trainer:
max_epochs: 50
gradient_clip_val: 3.0
precision: "bf16-mixed"
device:
- 0
seed: 1234
configs/experiment/routing/gfacs.yaml:21GFACS 扩展 DeepACO,引入自适应 alpha/beta 退火参数并共享局部搜索配置。
# @package _global_
defaults:
- override /model: gfacs.yaml
- override /env: tsp.yaml
- override /callbacks: default.yaml
- override /trainer: default.yaml
- override /logger: wandb.yaml
env:
generator_params:
num_loc: 50
logger:
wandb:
project: "rl4co"
tags: ["gfacs", "${env.name}"]
group: ${env.name}${env.generator_params.num_loc}
name: gfacs-${env.name}${env.generator_params.num_loc}
# GFACS 扩展 DeepACO,引入自适应 alpha/beta 退火参数并共享局部搜索配置。
model:
batch_size: 20
val_batch_size: 20
test_batch_size: 20
train_data_size: 400
val_data_size: 20
test_data_size: 100
optimizer: "AdamW"
optimizer_kwargs:
lr: 5e-4
weight_decay: 0
lr_scheduler:
"CosineAnnealingLR"
lr_scheduler_kwargs:
T_max: 50
eta_min: 1e-5
metrics:
test:
- reward_000
- reward_002
- reward_009 # since n_iterations["text"] = 10
train_with_local_search: True
alpha_min: 0.5
alpha_max: 1.0
alpha_flat_epochs: 5
beta_min: 100
beta_max: 500
beta_flat_epochs: 5
policy_kwargs:
n_ants:
train: 30
val: 30
test: 100
n_iterations:
train: 1 # unused value
val: 5
test: 10
temperature: 1.0
top_p: 0.0
top_k: 0
multistart: False
k_sparse: 5 # this should be adjusted based on the `num_loc` value
aco_kwargs:
alpha: 1.0 # This alpha is different from the alpha in the model
beta: 1.0 # This beta is different from the beta in the model
decay: 0.95
use_local_search: True
use_nls: True
n_perturbations: 5
local_search_params:
max_iterations: 1000
perturbation_params:
max_iterations: 20
trainer:
max_epochs: 50
gradient_clip_val: 3.0
precision: "bf16-mixed"
devices:
- 0
seed: 1234
configs/experiment/routing/glop.yaml:22针对大规模 CVRPMVC (num_loc=1000),小批量、多样本 (policy_kwargs.n_samples=20) 与 50 epoch 训练。
# @package _global_
defaults:
- override /model: glop.yaml
- override /env: cvrpmvc.yaml
- override /callbacks: default.yaml
- override /trainer: default.yaml
- override /logger: wandb.yaml
env:
generator_params:
num_loc: 1000
logger:
wandb:
project: "rl4co"
tags: ["glop", "${env.name}"]
group: ${env.name}${env.generator_params.num_loc}
name: glop-${env.name}${env.generator_params.num_loc}
# 针对大规模 CVRPMVC (num_loc=1000),小批量、多样本 (policy_kwargs.n_samples=20) 与 50 epoch 训练。
model:
batch_size: 16
val_batch_size: 128
test_batch_size: 128
train_data_size: 3200
val_data_size: 1024
test_data_size: 10_000
optimizer_kwargs:
lr: 1e-4
weight_decay: 0
lr_scheduler:
"MultiStepLR"
lr_scheduler_kwargs:
milestones: [37, 45]
gamma: 0.1
policy_kwargs:
n_samples: 20
trainer:
max_epochs: 50
precision: 32
gradient_clip_val: 1
seed: 1234
configs/experiment/routing/tsp-stepwise-ppo.yaml:27使用 Stepwise PPO + L2D 风格解码器解决 TSP,embed_dim/num_heads 显式暴露以便调参。
# @package _global_
defaults:
- override /model: l2d.yaml
- override /callbacks: default.yaml
- override /trainer: default.yaml
- override /logger: wandb.yaml
env:
_target_: rl4co.envs.TSPEnv4PPO
generator_params:
num_loc: 20
logger:
wandb:
project: "rl4co"
tags: ["am-stepwise-ppo", "${env.name}"]
group: ${env.name}${env.generator_params.num_loc}
name: ppo-${env.name}${env.generator_params.num_loc}
trainer:
max_epochs: 10
precision: 32-true
embed_dim: 256
num_heads: 8
# 使用 Stepwise PPO + L2D 风格解码器解决 TSP,embed_dim/num_heads 显式暴露以便调参。
model:
_target_: rl4co.models.StepwisePPO
policy:
_target_: rl4co.models.L2DPolicy4PPO
decoder:
_target_: rl4co.models.zoo.l2d.decoder.L2DDecoder
env_name: ${env.name}
embed_dim: ${embed_dim}
feature_extractor:
_target_: rl4co.models.zoo.am.encoder.AttentionModelEncoder
embed_dim: ${embed_dim}
num_heads: ${num_heads}
num_layers: 4
normalization: "batch"
env_name: "tsp"
actor:
_target_: rl4co.models.zoo.l2d.decoder.AttnActor
embed_dim: ${embed_dim}
num_heads: ${num_heads}
env_name: ${env.name}
embed_dim: ${embed_dim}
env_name: ${env.name}
het_emb: False
batch_size: 512
mini_batch_size: 512
train_data_size: 20000
val_data_size: 1_000
test_data_size: 1_000
reward_scale: scale
optimizer_kwargs:
lr: 1e-4
Scheduling A
configs/experiment/scheduling/base.yaml:3定义所有调度实验共用的 WandB 命名、32 位精度与 scaling_factor(来自 env 最大加工时间)。
# @package _global_
# 定义所有调度实验共用的 WandB 命名、32 位精度与 scaling_factor(来自 env 最大加工时间)。
defaults:
- override /model: l2d.yaml
- override /callbacks: default.yaml
- override /trainer: default.yaml
- override /logger: wandb.yaml
logger:
wandb:
project: "rl4co"
log_model: "all"
group: "${env.name}-${env.generator_params.num_jobs}-${env.generator_params.num_machines}"
tags: ???
name: ???
trainer:
max_epochs: 10
# NOTE for some reason l2d is extremely sensitive to precision
# ONLY USE 32-true for l2d!
precision: 32-true
seed: 12345678
scaling_factor: ${env.generator_params.max_processing_time}
model:
_target_: ???
batch_size: ???
train_data_size: 2_000
val_data_size: 1_000
test_data_size: 100
optimizer_kwargs:
lr: 2e-4
weight_decay: 1e-6
lr_scheduler: "ExponentialLR"
lr_scheduler_kwargs:
gamma: 0.95
reward_scale: scale
max_grad_norm: 1
configs/experiment/scheduling/am-pomo.yaml:11使用 L2DAttnPolicy + POMO,保留 num_starts=10 多起点评估并沿用 base 的 reward scaling。configs/experiment/scheduling/am-ppo.yaml:14构建 Stepwise PPO + L2D 解码器堆栈,指定 MatNet encoder 与 het_emb=True,并让环境输出 stepwise reward。
# @package _global_
defaults:
- scheduling/base
logger:
wandb:
tags: ["am-ppo", "${env.name}"]
name: "am-ppo-${env.name}-${env.generator_params.num_jobs}j-${env.generator_params.num_machines}m"
# 使用 L2DAttnPolicy + POMO,保留 num_starts=10 多起点评估并沿用 base 的 reward scaling。
embed_dim: 256
num_heads: 8
# 构建 Stepwise PPO + L2D 解码器堆栈,指定 MatNet encoder 与 het_emb=True,并让环境输出 stepwise reward。
model:
_target_: rl4co.models.StepwisePPO
policy:
_target_: rl4co.models.L2DPolicy4PPO
decoder:
_target_: rl4co.models.zoo.l2d.decoder.L2DDecoder
env_name: ${env.name}
embed_dim: ${embed_dim}
feature_extractor:
_target_: rl4co.models.zoo.matnet.matnet_w_sa.Encoder
embed_dim: ${embed_dim}
num_heads: ${num_heads}
num_layers: 4
normalization: "batch"
init_embedding:
_target_: rl4co.models.nn.env_embeddings.init.FJSPMatNetInitEmbedding
embed_dim: ${embed_dim}
scaling_factor: ${scaling_factor}
actor:
_target_: rl4co.models.zoo.l2d.decoder.L2DAttnActor
embed_dim: ${embed_dim}
num_heads: ${num_heads}
env_name: ${env.name}
scaling_factor: ${scaling_factor}
stepwise: True
env_name: ${env.name}
embed_dim: ${embed_dim}
scaling_factor: ${scaling_factor}
het_emb: True
batch_size: 128
val_batch_size: 512
test_batch_size: 64
train_data_size: 2000
mini_batch_size: 512
env:
stepwise_reward: True
configs/experiment/scheduling/matnet-pomo.yaml:13将 POMO 结合 MatNet encoder,强调 het_emb=True 与 FJSPMatNetInitEmbedding 初始化。
# @package _global_
defaults:
- scheduling/base
logger:
wandb:
tags: ["matnet-pomo", "${env.name}"]
name: "matnet-pomo-${env.name}-${env.generator_params.num_jobs}j-${env.generator_params.num_machines}m"
embed_dim: 256
# 将 POMO 结合 MatNet encoder,强调 het_emb=True 与 FJSPMatNetInitEmbedding 初始化。
model:
_target_: rl4co.models.POMO
policy:
_target_: rl4co.models.L2DPolicy
encoder:
_target_: rl4co.models.zoo.matnet.matnet_w_sa.Encoder
embed_dim: ${embed_dim}
num_heads: 8
num_layers: 4
normalization: "batch"
init_embedding:
_target_: rl4co.models.nn.env_embeddings.init.FJSPMatNetInitEmbedding
embed_dim: ${embed_dim}
scaling_factor: ${scaling_factor}
env_name: ${env.name}
embed_dim: ${embed_dim}
stepwise_encoding: False
het_emb: True
scaling_factor: ${scaling_factor}
batch_size: 64
num_starts: 10
num_augment: 0
baseline: "shared"
metrics:
val: ["reward", "max_reward"]
test: ${model.metrics.val}
configs/experiment/scheduling/matnet-ppo.yaml:13同样基于 Stepwise PPO,但使用 MatNet encoder 且保持 stepwise reward。
# @package _global_
defaults:
- scheduling/base
logger:
wandb:
tags: ["matnet-ppo", "${env.name}"]
name: "matnet-ppo-${env.name}-${env.generator_params.num_jobs}j-${env.generator_params.num_machines}m"
embed_dim: 256
# 同样基于 Stepwise PPO,但使用 MatNet encoder 且保持 stepwise reward。
model:
_target_: rl4co.models.StepwisePPO
policy:
_target_: rl4co.models.L2DPolicy4PPO
decoder:
_target_: rl4co.models.zoo.l2d.decoder.L2DDecoder
env_name: ${env.name}
embed_dim: ${embed_dim}
het_emb: True
feature_extractor:
_target_: rl4co.models.zoo.matnet.matnet_w_sa.Encoder
embed_dim: ${embed_dim}
num_heads: 8
num_layers: 4
normalization: "batch"
init_embedding:
_target_: rl4co.models.nn.env_embeddings.init.FJSPMatNetInitEmbedding
embed_dim: ${embed_dim}
scaling_factor: ${scaling_factor}
env_name: ${env.name}
embed_dim: ${embed_dim}
scaling_factor: ${scaling_factor}
het_emb: True
batch_size: 128
val_batch_size: 512
test_batch_size: 64
mini_batch_size: 512
env:
stepwise_reward: True
Scheduling B
configs/experiment/scheduling/ffsp-matnet.yaml:3为柔性流车间 (FFSP) 染指 MatNet:覆写 env:ffsp、增大 train_data_size=10_000、max_epochs=50。
# @package _global_
# 为柔性流车间 (FFSP) 染指 MatNet:覆写 env:ffsp、增大 train_data_size=10_000、max_epochs=50。
defaults:
- override /model: matnet.yaml
- override /callbacks: default.yaml
- override /trainer: default.yaml
- override /logger: wandb.yaml
- override /env: ffsp.yaml
logger:
wandb:
project: "rl4co"
log_model: "all"
group: "${env.name}-${env.generator_params.num_job}-${env.generator_params.num_machine}"
tags: ["matnet", "${env.name}"]
name: "matnet-${env.name}-${env.generator_params.num_job}j-${env.generator_params.num_machine}m"
env:
generator_params:
num_stage: 3
num_machine: 4
num_job: 20
flatten_stages: False
trainer:
max_epochs: 50
# NOTE for some reason l2d is extremely sensitive to precision
# ONLY USE 32-true for l2d!
precision: 32-true
gradient_clip_val: 10 # orig paper does not use grad clipping
seed: 12345678
model:
batch_size: 50
train_data_size: 10_000
val_data_size: 1_000
test_data_size: 1_000
optimizer_kwargs:
lr: 1e-4
weight_decay: 1e-6
lr_scheduler:
"MultiStepLR"
lr_scheduler_kwargs:
milestones: [35, 45]
gamma: 0.1
configs/experiment/scheduling/gnn-ppo.yaml:12配置 L2DPPOModel,采用更轻量的 GNN encoder (num_encoder_layers=3) 并保持 10 epoch 的快速迭代。
# @package _global_
defaults:
- scheduling/base
logger:
wandb:
tags: ["gnn-ppo", "${env.name}"]
name: "gnn-ppo-${env.name}-${env.generator_params.num_jobs}j-${env.generator_params.num_machines}m"
# params from Song et al.
# 配置 L2DPPOModel,采用更轻量的 GNN encoder (num_encoder_layers=3) 并保持 10 epoch 的快速迭代。
model:
_target_: rl4co.models.L2DPPOModel
policy_kwargs:
embed_dim: 256
num_encoder_layers: 3
scaling_factor: ${scaling_factor}
ppo_epochs: 2
het_emb: False
normalization: instance
test_decode_type: greedy
batch_size: 128
val_batch_size: 512
test_batch_size: 64
mini_batch_size: 512
trainer:
max_epochs: 10
env:
stepwise_reward: True
configs/experiment/scheduling/hgnn-pomo.yaml:11H-GNN 版本的 POMO,启用 het_emb=True、num_starts=10,用于更复杂特征抽取。
# @package _global_
defaults:
- scheduling/base
logger:
wandb:
tags: ["hgnn-pomo", "${env.name}"]
name: "hgnn-pomo-${env.name}-${env.generator_params.num_jobs}j-${env.generator_params.num_machines}m"
# H-GNN 版本的 POMO,启用 het_emb=True、num_starts=10,用于更复杂特征抽取。
model:
_target_: rl4co.models.POMO
policy:
_target_: rl4co.models.L2DPolicy
env_name: ${env.name}
embed_dim: 256
num_encoder_layers: 3
stepwise_encoding: False
scaling_factor: ${scaling_factor}
het_emb: True
normalization: instance
num_starts: 10
batch_size: 64
num_augment: 0
baseline: "shared"
metrics:
val: ["reward", "max_reward"]
test: ${model.metrics.val}
configs/experiment/scheduling/hgnn-ppo.yaml:12H-GNN + PPO 组合,保持 stepwise 奖励并重用 base 的尺度归一化。
# @package _global_
defaults:
- scheduling/base
logger:
wandb:
tags: ["hgnn-ppo", "${env.name}"]
name: "hgnn-ppo-${env.name}-${env.generator_params.num_jobs}j-${env.generator_params.num_machines}m"
# params from Song et al.
# H-GNN + PPO 组合,保持 stepwise 奖励并重用 base 的尺度归一化。
model:
_target_: rl4co.models.L2DPPOModel
policy_kwargs:
embed_dim: 256
num_encoder_layers: 3
scaling_factor: ${scaling_factor}
ppo_epochs: 2
het_emb: True
normalization: instance
batch_size: 128
val_batch_size: 512
test_batch_size: 64
mini_batch_size: 512
env:
stepwise_reward: True
1675

被折叠的 条评论
为什么被折叠?



