NLP文档挖宝(3)——能够快速设计参数的TrainingArguments类

最新推荐文章于 2024-08-01 15:18:17 发布

天才小呵呵

最新推荐文章于 2024-08-01 15:18:17 发布

阅读量7.7k

点赞数 13

分类专栏： Pytorch 文档挖宝文章标签： python 深度学习

本文链接：https://blog.csdn.net/qq_33293040/article/details/117376382

版权

Pytorch 同时被 2 个专栏收录

9 篇文章 2 订阅

订阅专栏

文档挖宝

3 篇文章 0 订阅

订阅专栏

TrainingArguments是transformers库中用于训练循环参数设置的关键类，它使用dataclass和HfArgumentParser进行参数管理和解析。该类包含了训练、评估和预测的相关配置，如学习率、批处理大小、训练轮数等。此外，还涉及了混合精度训练、日志记录频率和检查点保存策略。可以根据实际需求选择启用或禁用特定参数，提供了灵活的训练参数框架。

摘要由CSDN通过智能技术生成

可以说，整个任务中的调参“源泉”就是这个TrainingArguments类，这个类是使用dataclass装饰器进行包装，然后再利用HfArgumentParser进行参数的解析，最后获得了对应的内容。
这个包可以调的参数有很多，有用的也有很多，所以还是有必要一起看一下的。

一开始看这个包，直觉上，其主要功能就是帮助实现一些已有的参数配置，可以快速的调用这个类来实现参数的使用，但是具体使用与否，还是需要我们在整个编程的过程中来设计的。用得上哪个，我就再命令里加上，用不上的就可以直接略过。总体感觉上有点像transformers包的作者在给我们设置了一个参数的框架。
下面来看一下具体的细节：

源码注释：

调用方法：

from transformers import TrainingArguments.

注释内容：


@dataclass
class TrainingArguments:
    """
    TrainingArguments is the subset of the arguments we use in our example scripts
    **which relate to the training loop itself**.
    Using :class:`~transformers.HfArgumentParser` we can turn this class
    into argparse arguments to be able to specify them on the command line.
    Parameters:
        output_dir (:obj:`str`):
            The output directory where the model predictions and checkpoints will be written.
        overwrite_output_dir (:obj:`bool`, `optional`, defaults to :obj:`False`):
            If :obj:`True`, overwrite the content of the output directory. Use this to continue training if
            :obj:`output_dir` points to a checkpoint directory.
        do_train (:obj:`bool`, `optional`, defaults to :obj:`False`):
            Whether to run training or not.
        do_eval (:obj:`bool`, `optional`, defaults to :obj:`False`):
            Whether to run evaluation on the dev set or not.
        do_predict (:obj:`bool`, `optional`, defaults to :obj:`False`):
            Whether to run predictions on the test set or not.
        evaluate_during_training (:obj:`bool`, `optional`, defaults to :obj:`False`):
            Whether to run evaluation during training at each logging step or not.
        per_device_train_batch_size (:obj:`int`, `optional`, defaults to 8):
            The batch size per GPU/TPU core/CPU for training.
        per_device_eval_batch_size (:obj:`int`, `optional`, defaults to 8):
            The batch size per GPU/TPU core/CPU for evaluation.
        gradient_accumulation_steps: (:obj:`int`, `optional`, defaults to 1):
            Number of updates steps to accumulate the gradients for, before performing a backward/update pass.
        learning_rate (:obj:`float`, `optional`, defaults to 5e-5):
            The initial learning rate for Adam.
        weight_decay (:obj:`float`, `optional`, defaults to 0):
            The weight decay to apply (if not zero).
        adam_epsilon (:obj:`float`, `optional`, defaults to 1e-8):
            Epsilon for the Adam optimizer.
        max_grad_norm (:obj:`float`, `optional`, defaults to 1.0):
            Maximum gradient norm (for gradient clipping).
        num_train_epochs(:obj:`float`, `optional`, defaults to 3.0):
            Total number of training epochs to perform.
        max_steps (:obj:`int`, `optional`, defaults to -1):
            If set to a positive number, the total number of training steps to perform. Overrides
            :obj:`num_train_epochs`.
        warmup_steps (:obj:`int`, `optional`, defaults to 0):
            Number of steps used for a linear warmup from 0 to :obj:`learning_rate`.
        logging_dir (:obj:`str`, `optional`):
            Tensorboard log directory. Will default to `runs/**CURRENT_DATETIME_HOSTNAME**`.
        logging_first_step (:obj:`bool`, `optional`, defaults to :obj:`False`):
            Wheter to log and evalulate the first :obj:`global_step` or not.
        logging_steps (:obj:`int`, `optional`, defaults to 500):
            Number of update steps between two logs.
        save_steps (:obj:`int`, `optional`, defaults to 500):
            Number of updates steps before two checkpoint saves.
        save_total_limit (:obj:`int`, `optional`):
            If a value is passed, will limit the total amount of checkpoints. Deletes the older checkpoints in
            :obj:`output_dir`.
        no_cuda (:obj:`bool`, `optional`, defaults to :obj:`False`):
            Wherher to not use CUDA even when it is available or not.
        seed (:obj:`int`, `optional`, defaults to 42):
            Random seed for initialization.
        fp16 (:obj:`bool`, `optional`, defaults to :obj:`False`):
            Whether to use 16-bit (mixed) precision training (through NVIDIA apex) instead of 32-bit training.
        fp16_opt_level (:obj:`str`, `optional`, defaults to 'O1'):
            For :obj:`fp16` training, apex AMP optimization level selected in ['O0', 'O1', 'O2', and 'O3']. See details
            on the `apex documentation <https://nvidia.github.io/apex/amp.html>`__.
        local_rank (:obj:`int`, `optional`, defaults to -1):
            During distributed training, the rank of the process.
        tpu_num_cores (:obj:`int`, `optional`):
            When training on TPU, the mumber of TPU cores (automatically passed by launcher script).
        debug (:obj:`bool`, `optional`, defaults to :obj:`False`):
            When training on TPU, whether to print debug metrics or not.
        dataloader_drop_last (:obj:`bool`, `optional`, defaults to :obj:`False`):
            Whether to drop the last incomplete batch (if the length of the dataset is not divisible by the batch size)
            or not.
        eval_steps (:obj:`int`, `optional`, defaults to 1000):
            Number of update steps between two evaluations.
        past_index (:obj:`int`, `optional`, defaults to -1):
            Some models like :doc:`TransformerXL <../model_doc/transformerxl>` or :doc`XLNet <../model_doc/xlnet>` can
            make use of the past hidden states for their predictions. If this argument is set to a positive int, the
            ``Trainer`` will use the corresponding output (usually index 2) as the past state and feed it to the model
            at the next training step under the keyword argument ``mems``.
    """
    ...

注释翻译：

    output_dir（：obj：`str`）：
        模型预测和检查点的输出目录。必须声明的字段。
    overwrite_output_dir（：obj：`bool`，`optional`，默认为：obj：`False`）：
        如果为True，则覆盖输出目录的内容。使用此继续训练，如果`output_dir`指向检查点目录。
    do_train（：obj：`bool`，`可选`，默认为：obj：`False`）：
        是否进行训练。
    do_eval（：obj：`bool`，`optional`，默认为：obj：`False`）：
        是否在验证集上运行评估。
    do_predict（：obj：`bool`，`optional`，默认为：obj：`False`）：
        是否在测试集上运行预测。
    validate_during_training（：obj：`bool`，`optional`，默认为：obj：`False`）：
        是否在每个记录步骤的训练过程中进行评估。
    per_device_train_batch_size（：obj：`int`，`optional`，默认为8）：
        每个GPU / TPU内核/ CPU的批处理大小。
    per_device_eval_batch_size（：obj：`int`，`optional`，默认为8）：
        每个GPU / TPU内核/ CPU的批处理大小，以进行评估。
    gradient_accumulation_steps：（：obj：`int`，`optional`，默认为1）：
        在执行反向传播/更新过程之前，要累积其梯度的更新步骤数。
    learning_rate（：obj：`float`，`optional`，默认为5e-5）：
        Adam初始学习率。#这里不知道为什么强调Adam？
    weight_decay（：obj：`float`，`optional`，默认为0）：
        要应用的权重衰减（如果不为零）。
    adam_epsilon（：obj：`float`，`optional`，默认为1e-8）：
        Epsilon，用于Adam优化器。
    max_grad_norm（：obj：`float`，`optional`，默认为1.0）：
        最大渐变范数（用于渐变裁剪）。
    num_train_epochs（：obj：`float`，`optional`，默认为3.0）：
        要执行的训练轮数总数。
    max_steps（：obj：`int`，`optional`，默认为-1）：
        如果设置为正数，则要执行的训练步骤总数。覆写
        ：obj：`num_train_epochs`。
    warmup_steps（：obj：`int`，`optional`，默认为0）：
        线性预热所用的步数（从0到：learning_rate）。
    logging_dir（：obj：`str`，`optional`）：
        Tensorboard日志目录。将默认为`runs / ** CURRENT_DATETIME_HOSTNAME **`。用当前时间构造
    logging_first_step（：obj：`bool`，`optional`，默认为：obj：`False`）：
        是否需要记录和评估第一个：obj：`global_step`或没有。
    logging_steps（：obj：`int`，`optional`，默认为500）：
        两个日志记录之间的更新步骤数。
    save_steps（：obj：`int`，`optional`，默认为500）：
        保存两个检查点之前的更新步骤数。
    save_total_limit（：obj：`int`，`Optional`）：
        如果设置具体数值，将限制检查点的总数。删除中的旧检查点
        ：obj：`output_dir`。
    no_cuda（：obj：`bool`，`optional`，默认为：obj：`False`）：
        设置是否不使用CUDA，即使没有CUDA。（大家都是有GPU的，就不要碰这个选项啦）
    seed（：obj：`int`，`可选`，默认为42）：
        用于初始化的随机种子。
    fp16（：obj：`bool`，`可选`，默认为：obj：`False`）：
        是否使用16位混合精度训练（通过NVIDIA apex）而不是32位训练。
    fp16_opt_level（：obj：`str`，`optional`，默认为'O1'）：
        对于fp16训练，请在['O0'，'O1'，'O2'和'O3']中选择顶点AMP优化级别。查看详细信息
        在`apex文档<https://nvidia.github.io/apex/amp.html>`__中。
    local_rank（：obj：`int`，`optional`，默认为-1）：
        在分布式训练中进行设置。
    tpu_num_cores（：obj：`int`，`optional`）：
        在TPU上进行训练时，会占用大量TPU核心（由启动脚本自动传递）。
    debug（：obj：`bool`，`optional`，默认为：obj：`False`）：
        在TPU上进行训练时，是否打印调试指标。
    dataloader_drop_last（：obj：`bool`，`optional`，默认为：obj：`False`）：
        是否删除最后一个不完整的批次。
    eval_steps（：obj：`int`，`optional`，默认为1000）：
        两次评估之间的更新步骤数。
    past_index（：obj：`int`，`optional`，默认为-1）：
        诸如TransformerXL <../ model_doc / transformerxl>或docNet XLNet <../ model_doc / xlnet>之类的某些模型可以
        利用过去的隐藏状态进行预测。如果将此参数设置为正整数，则
        在关键字参数``mems``下，``Trainer`` 将使用相应的输出（通常是索引2）作为过去的状态并将其输入下一个训练步骤中。

小trick

若想将参数设置为True，不需要指定，直接调用名称即可。
快速设置为True样例

小结

看了完整的参数内容，感觉基本只是一个常规内容，更多的训练参数的设置还是有一定的自由度的。这里只是提供了一些已经设置好的内容。
更多的，在以后的任务中，我们可以结合任务构建一个符合我们任务的参数dataclass，或者有个人风格的dataclass。

天才小呵呵

关注

13
点赞
踩
34

收藏

觉得还不错? 一键收藏
4
评论
NLP文档挖宝(3)——能够快速设计参数的TrainingArguments类

整个任务中的调参“源泉”就是这个TrainingArguments类，这个类是使用dataclass装饰器进行包装，然后再利用HfArgumentParser进行参数的解析，最后获得了对应的内容。这个包可以调的参数有很多，有用的也有很多，所以还是有必要一起看一下的。
复制链接

扫一扫

专栏目录