huggingface中Trainer设置了compute_metric后爆显存

最新推荐文章于 2024-06-08 10:35:01 发布

苏炘

最新推荐文章于 2024-06-08 10:35:01 发布

阅读量1.4k

点赞数 11

分类专栏：一些离谱的error 文章标签：人工智能机器学习深度学习

本文链接：https://blog.csdn.net/weixin_44902962/article/details/135198185

版权

一些离谱的error 专栏收录该内容

8 篇文章 0 订阅

订阅专栏

1.问题描述

我使用huggingface的Trainer，利用Lora微调Llama2模型，在我设置了compute_metrics属性后，出现Out of memory

trainer=transformers.Trainer(
    model=model,
    args=train_args,
    train_dataset=train_data,
    eval_dataset=test_data,
    data_collator=data_collator,
    compute_metrics=compute_metrics
)

2.原因

huggingface在设定了compute_metrics后，会把测试集上所有数据的模型输出（例如logits等）都cat成一个张量，而这个过程是在GPU完成的，最后才会把这些巨大无比的张量放到cpu上，很多情况下还没到转移到cpu那一步，就已经爆显存了

3.解决方案

(1)在TrainingArguments中设置eval_accumulation_steps，它代表多久一次将tensor搬到cpu，官方的文档是这样说的：

eval_accumulation_steps (int, optional) — Number of predictions steps to accumulate the output tensors for, before moving the results to the CPU. If left unset, the whole predictions are accumulated on GPU/NPU/TPU before being moved to the CPU (faster but requires more memory).

(2)在Trainer中设置preprocess_logits_for_metrics方法，它代表你要在每一个eval step后怎么处理这些张量，如果你并不需要所有的logits（例如我只想知道它到底属于哪一类），那么你可以在这个方法中定义，从而减小合并的时候占用的显存，官方的文档是这样说的：

preprocess_logits_for_metrics (Callable[[torch.Tensor, torch.Tensor], torch.Tensor], optional) — A function that preprocess the logits right before caching them at each evaluation step. Must take two tensors, the logits and the labels, and return the logits once processed as desired. The modifications made by this function will be reflected in the predictions received by compute_metrics.

本文的内容借鉴了https://discuss.huggingface.co/t/cuda-out-of-memory-when-using-trainer-with-compute-metrics/2941

苏炘

关注

11
点赞
踩
8

收藏

觉得还不错? 一键收藏
0
评论
huggingface中Trainer设置了compute_metric后爆显存

我使用huggingface的Trainer，利用Lora微调Llama2模型，在我设置了compute_metrics属性后，出现Out of memory。
复制链接

扫一扫