RLlib模型和预处理器

最新推荐文章于 2025-03-20 14:34:18 发布

快乐地笑

最新推荐文章于 2025-03-20 14:34:18 发布

阅读量1.9k

点赞数 1

分类专栏：学习文章标签： ray rllib

本文链接：https://blog.csdn.net/weixin_43255962/article/details/92835675

版权

本文详细介绍了RLlib中的模型和预处理器，包括内置模型、预处理器的选择与配置。讨论了如何自定义TensorFlow和PyTorch模型，特别是自定义循环模型和批量标准化的实现。此外，还涵盖了自定义预处理器、监督模型损失以及变量长度/参数化动作空间的处理。通过示例展示了如何扩展策略以实现基于模型的部署。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

1.RLlib 模型和预处理器概述（RLlib Models and Preprocessors Overview）

下图提供了RLlib中不同组件之间数据流的概念性概述。我们从Environment开始，给出一个动作产生一个观察。在发送到神经网络之前，观察由 Preprocessor和Filter（例如，用于运行均值归一化）预处理Model。模型输出又由一个解释ActionDistribution以确定下一个动作。
在这里插入图片描述
以绿色突出显示的组件可以使用自定义用户定义的实现替换，如下部分中所述。紫色组件是RLlib内部，这意味着它们只能通过更改算法源代码进行修改。

内置模型和预处理器

RLlib基于简单的启发式选择默认模型：用于图像观察的视觉网络模型，以及用于其他一切的完全连接的网络模型。可以通过model模型目录中记录的配置键配置这些模型。请注意，您可能必须配置conv_filters环境观测是否具有自定义大小，例如，对于42x42观测"model": {"dim": 42, "conv_filters": [[16, [4, 4], 2], [32, [4, 4], 2], [512, [11, 11], 1]]} 。

此外，如果设置"model": {"use_lstm": true}，则模型输出将由LSTM单元进一步处理(LSTM是一种递归的神经网络)。更一般地，RLlib支持将循环模型用于其策略梯度算法（A3C，PPO，PG，IMPALA），并且RNN支持内置于其策略评估实用程序中。

对于预处理器，RLlib尝试根据环境的观察空间选择一个内置的预处理器。离散(Discrete)观测是单热(one-hot )编码的，Atari观测是按比例缩小的，Tuple和Dict观测是扁平的(这些是未扁平的，可以通过自定义模型中的input_dict参数访问)。注意，对于Atari, RLlib默认使用DeepMind预处理器，OpenAI基线库也使用这些预处理器。

内置模型参数

以下是内置模型超参数的列表：

MODEL_DEFAULTS = {
    # === Built-in options ===
    # Filter config. List of [out_channels, kernel, stride] for each filter
    # 过滤器配置。每个过滤器的[out_channels, kernel, stride]列表
    "conv_filters": None,
    # Nonlinearity for built-in convnet
    # 内置卷积网络的非线性激励函数
    "conv_activation": "relu",
    # Nonlinearity for fully connected net (tanh, relu)
    # 全连通网络非线性激励函数(tanh, relu)
    "fcnet_activation": "tanh",
    # Number of hidden layers for fully connected net
    # 全连通网络的隐藏层数
    "fcnet_hiddens": [256, 256],
    # For control envs, documented in ray.rllib.models.Model
    # 对于控制envs，在ray.rllib.models.Model中记录
    "free_log_std": False,
    # (deprecated) Whether to use sigmoid to squash actions to space range
    "squash_to_range": False,

    # == LSTM ==
    # Whether to wrap the model with a LSTM
    # 是否使用LSTM包装模型
    "use_lstm": False,
    # Max seq len for training the LSTM, defaults to 20
    # Max seq len用于训练LSTM，默认值为20
    "max_seq_len": 20,
    # Size of the LSTM cell
    # LSTM单元的大小
    "lstm_cell_size": 256,
    # Whether to feed a_{t-1}, r_{t-1} to LSTM
    # 是否向LSTM输入a_{t-1}、r_{t-1}
    "lstm_use_prev_action_reward": False,

    # == Atari ==
    # Whether to enable framestack for Atari envs
    # 是否为Atari envs启用framestack
    "framestack": True,
    # Final resized frame dimension
    # 最后调整框架尺寸（维度）
    "dim": 84,
    # (deprecated) Converts ATARI frame to 1 Channel Grayscale image
    "grayscale": False,
    # (deprecated) Changes frame to range from [-1, 1] if true
    "zero_mean": True,

    # === Options for custom models ===
    # Name of a custom preprocessor to use
    # 要使用的自定义预处理器的名称
    "custom_preprocessor": None,
    # Name of a custom model to use
    # 要使用的自定义模型的名称
    "custom_model": None,
    # Extra options to pass to the custom classes
    # 传递给自定义类的额外选项
    "custom_options": {},
}