【学习总结】Python transformers TrainingArguments 重要参数说明

最新推荐文章于 2025-03-17 16:54:07 发布

爱学习的小道长

最新推荐文章于 2025-03-17 16:54:07 发布

阅读量2.3k

点赞数 26

分类专栏： AI 文章标签：学习 python 开发语言 AI编程

本文链接：https://blog.csdn.net/weixin_40378209/article/details/136710726

版权

AI 专栏收录该内容

36 篇文章

订阅专栏

代码：

from transformers import TrainingArguments

training_args = TrainingArguments(output_dir=model_dir,
                                  per_device_train_batch_size=16,
                                  num_train_epochs=5,
                                  logging_steps=100)

TrainingArguments 类在以下代码里：

/xxx/anaconda/envs/your_env/lib/python3.11/site-packages/transformers/training_args.py

huggingface TrainingArguments
Github 源代码

output_dir (str)

写入模型和检查点的输出目录，指定训练过程中的输出目录，即模型保存的位置。
The output directory where the model predictions and checkpoints will be written.

num_train_epochs(float, optional, defaults to 3.0)

训练的轮数，即模型将遍历训练数据集的次数。如果不是整数，将执行小数部分百分比
停止训练前的最后一个时期。
Total number of training epochs to perform (if not an integer, will perform the decimal part percents of the last epoch before stopping training).

per_device_train_batch_size (int, optional, defaults to 8)

每个设备上的训练批次大小，表示每个训练步骤中输入的样本数量
The batch size per GPU/XPU/TPU/MPS/NPU core/CPU for training.

per_device_eval_batch_size (int, optional, defaults to 8)

每个设备上的评估批次大小，表示每个评估步骤中输入的样本数量。
The batch size per GPU/XPU/TPU/MPS/NPU core/CPU for evaluation.

learning_rate (float, optional, defaults to 5e-5)

初始学习率
The initial learning rate for [AdamW] optimizer.

weight_decay (float, optional, defaults to 0)

权重衰减的系数，用于正则化模型的权重。
The weight decay to apply (if not zero) to all layers except all bias and LayerNorm weights in [AdamW] optimizer.

warmup_steps (int, optional, defaults to 0)

学习率的线性预热步骤数，即在训练开始时从0 逐渐增加学习率的步骤数。
Number of steps used for a linear warmup from 0 to learning_rate. Overrides any effect of warmup_ratio.

logging_steps (int or float, optional, defaults to 500)

在训练过程中记录日志的步数间隔
Number of update steps between two logs if logging_strategy="steps". Should be an integer or a float in range [0,1). If smaller than 1, will be interpreted as ratio of total training steps.

evaluation_strategy (str or [~trainer_utils.IntervalStrategy], optional, defaults to "no")
评估策略，可以是 “no”（不进行评估）、“steps”（按照指定的 logging_steps 进行评估）或 “epoch”（每个训练轮数结束时进行评估）。

 The evaluation strategy to adopt during training. Possible values are:

                - `"no"`: No evaluation is done during training.
                - `"steps"`: Evaluation is done (and logged) every `eval_steps`.
                - `"epoch"`: Evaluation is done at the end of each epoch.

save_strategy (str or [~trainer_utils.IntervalStrategy], optional, defaults to "steps")
保存策略，可以是 “no”（不保存模型）、“steps”（按照指定的 logging_steps 进行保存）或 “epoch”（每个训练轮数结束时保存）。

The checkpoint save strategy to adopt during training. Possible values are:

                - `"no"`: No save is done during training.
                - `"epoch"`: Save is done at the end of each epoch.
                - `"steps"`: Save is done every `save_steps`.

save_steps (int or float, optional, defaults to 500):

保存模型的步数间隔，如果save_strategy =“steps”，则两个检查点保存之前的更新步骤数。应该是整数或范围“[0,1)”内的浮点数。如果小于 1，将被解释为总训练步数的比率。
Number of updates steps before two checkpoint saves if save_strategy="steps". Should be an integer or a float in range [0,1). If smaller than 1, will be interpreted as ratio of total training steps.

gradient_accumulation_steps (int, optional, defaults to 1)
梯度积累的步骤数。通过累积多个小批次的梯度来模拟一个大批次的训练效果，有助于在内存有限的情况下使用更大的批次大小。

Number of updates steps to accumulate the gradients for, before performing a backward/update pass.
<Tip warning={true}>

When using gradient accumulation, one step is counted as one step with backward pass. Therefore, logging, evaluation, save will be conducted every `gradient_accumulation_steps * xxx_step` training examples.

</Tip>

save_total_limit (int, optional)

保存的模型检查点的总数限制
如果传递一个值，将限制检查点的总量。删除“output_dir”中较旧的检查点。当启用“load_best_model_at_end”时，除了最新的检查点之外，根据“metric_for_best_model”的“最佳”检查点将始终被保留。例如，对于“save_total_limit=5”和“load_best_model_at_end”，最后四个检查点将始终与最佳模型一起保留。当 save_total_limit=1 和 load_best_model_at_end 时，有可能保存两个检查点：最后一个和最好的一个（如果它们不同）。

disable_tqdm (bool, optional)

如果设置为 True，则禁用进度条显示
是否禁用 Jupyter Notebooks 中的 [~notebook.NotebookTrainingTracker] 生成的 tqdm 进度条和指标表。如果日志记录级别设置为警告或较低（默认），则默认为“True”，否则为“False”。

load_best_model_at_end (bool, optional, defaults to False)
如果设置为 True，则在训练结束时加载最佳模型

Whether or not to load the best model found during training at the end of training. When this option is enabled, the best checkpoint will always be saved. See [`save_total_limit`](https://huggingface.co/docs/transformers/main_classes/trainer#transformers.TrainingArguments.save_total_limit) for more.

When set to `True`, the parameters `save_strategy` needs to be the same as `evaluation_strategy`, and in the case it is "steps", `save_steps` must be a round multiple of `eval_steps`.

metric_for_best_model (str, optional)

用于选择最佳模型的指标名称
与“load_best_model_at_end”结合使用来指定用于比较两个不同模型的指标。必须是评估返回的指标名称，带或不带前缀“eval_”。将要如果未指定，则默认为“loss”，并且“load_best_model_at_end=True”（使用评估损失）。
如果设置此值，“greater_is_better”将默认为“True”。如果我们的指标越低越好，请不要忘记将其设置为“False”。