Trainer参数:保存相关

模型保存相关

1.如果不想在训练过程中保存模型而只是在结束的时候保存最好的模型,可以设置save_strategy为no并且load_best_model_at_end为True。
2. sft 使用deepspeed会保存一个很大的global_step文件,如果不需要的话可以通过设置–save_only_model来取消。

    save_strategy (`str` or [`~trainer_utils.IntervalStrategy`], *optional*, defaults to `"steps"`):
        The checkpoint save strategy to adopt during training. Possible values are:

            - `"no"`: No save is done during training.
            - `"epoch"`: Save is done at the end of each epoch.
            - `"steps"`: Save is done every `save_steps`.
    save_steps (`int` or `float`, *optional*, defaults to 500):
        Number of updates steps before two checkpoint saves if `save_strategy="steps"`. Should be an integer or a
        float in range `[0,1)`. If smaller than 1, will be interpreted as ratio of total training steps.
    save_total_limit (`int`, *optional*):
        If a value is passed, will limit the total amount of checkpoints. Deletes the older checkpoints in
        `output_dir`. When `load_best_model_at_end` is enabled, the "best" checkpoint according to
        `metric_for_best_model` will always be retained in addition to the most recent ones. For example, for
        `save_total_limit=5` and `load_best_model_at_end`, the four last checkpoints will always be retained
        alongside the best model. When `save_total_limit=1` and `load_best_model_at_end`, it is possible that two
        checkpoints are saved: the last one and the best one (if they are different).
    save_safetensors (`bool`, *optional*, defaults to `True`):
        Use [safetensors](https://huggingface.co/docs/safetensors) saving and loading for state dicts instead of
        default `torch.load` and `torch.save`.
    save_on_each_node (`bool`, *optional*, defaults to `False`):
        When doing multi-node distributed training, whether to save models and checkpoints on each node, or only on
        the main one.

        This should not be activated when the different nodes use the same storage as the files will be saved with
        the same names for each node.
    save_only_model (`bool`, *optional*, defaults to `False`):
        When checkpointing, whether to only save the model, or also the optimizer, scheduler & rng state.
        Note that when this is true, you won't be able to resume training from checkpoint.
        This enables you to save storage by not storing the optimizer, scheduler & rng state.
        You can only load the model using `from_pretrained` with this option set to `True`.
    load_best_model_at_end (`bool`, *optional*, defaults to `False`):
        Whether or not to load the best model found during training at the end of training. When this option is
        enabled, the best checkpoint will always be saved. See
        [`save_total_limit`](https://huggingface.co/docs/transformers/main_classes/trainer#transformers.TrainingArguments.save_total_limit)
        for more.

更详细的内容可以通过阅读https://github.com/huggingface/transformers/blob/main/src/transformers/training_args.py#L208来查找

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值