Stable baselines为图像(CNNPolicies)和其他类型的输入特征(MlpPolicies)提供了默认策略网络(见 Policies)。
自定义策略网络结构的一种方法是创建模型的时候用policy_kwargs
给模型传递参数:
import gym
import tensorflow as tf
from stable_baselines import PPO2
# Custom MLP policy of two layers of size 32 each with tanh activation function
policy_kwargs = dict(act_fun=tf.nn.tanh, net_arch=[32, 32])
# Create the agent
model = PPO2("MlpPolicy", "CartPole-v1", policy_kwargs=policy_kwargs, verbose=1)
# Retrieve the environment
env = model.get_env()
# Train the agent
model.learn(total_timesteps=100000)
# Save the agent
model.save("ppo2-cartpole")
del model
# the policy_kwargs are automatically loaded
model = PPO2.load("ppo2-cartpole")
你也可以轻松为策略(或值)网络定义一个自定义结构:
定义一个自定义策略类等价于传递
policy_kwargs
。然而,它让你为策略命名,使代码简洁。在超参数搜索时应使用policy_kwargs
import gym
from stable_baselines.common.policies import FeedForwardPolicy, register_policy
from stable_baselines.common.vec_env import DummyVecEnv
from stable_baselines import A2C
# Custom MLP policy of three layers of size 128 each
class CustomPolicy(FeedForwardPolicy):
def __init__(self, *args, **kwargs):
super(CustomPolicy, self).__init__(*args, **kwargs,
net_arch=[dict(pi=[128, 128, 128],
vf=[128, 128, 128])],
feature_extraction="mlp")
# Create and wrap the environment
env = gym.make('LunarLander-v2'