血的教训家人们
折磨了3天终于把模型训练起来了
如果你遇到这些问题(对于未启用这些功能摸不着头脑的初学者来说):
- 1.CUDA启动失败,但GPU可用(CUDA Setup failed despite GPU being available. Inspect the CUDA SETUP outputs above to fix your environment!
- If you cannot find any issues and suspect a bug, please open an issue with detals about your environment:)
- .FileExistsError: [WinError 183] 当文件已存在时,无法创建该文件。
- torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 44.00 MiB. GPU 0 has a total capacity of 22.00 GiB of which 17.97 GiB is free. Of the allocated memory 2.78 GiB is allocated by PyTorch, and 61.81 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management ...
- module ‘torch.library‘ has no attribute ‘register_fake‘
- 报错numpy必须<2.0.0
- 未检测到deepseepd(针对于初学者未启用的情况)
- loss一直等于0
- 加载“\lib\site-packages\torch\lib\shm.dll”或其依赖项之一时出错
- deepspeed安装报错 No module named ‘dskernels
- 没有经过微调的模型胡乱回答,乱码
如果你是初学者,对这些东西完全摸不着头脑,建议使用ms-swift的python代码进行训练!不要使用wei-ui!
最简示例:
# Experimental environment: RTX 2080 Ti 22G
# 22GB GPU memory
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
import torch
from swift.llm import (
DatasetName, InferArguments, ModelType, SftArguments,
infer_main, sft_main, app_ui_main, merge_lora_main
)
model_type = ModelType.qwen2_5_coder_7b_instruct # 改成你的模型的类型
sft_args = SftArguments(
model_type=model_type,
model_id_or_path="D:/LLaMA_Factory/Qwen/Qwen25Coder7B", # 改成你的本地模型路径
train_dataset_sample=2000,
dataset="D:/LLaMA_Factory/data/zh_INFJ_self_awareness.json", # 改成你的数据集路径
output_dir='output')
result = sft_main(sft_args)
best_model_checkpoint = result['best_model_checkpoint']
print(f'best_model_checkpoint: {best_model_checkpoint}')
torch.cuda.empty_cache()
infer_args = InferArguments(
ckpt_dir=best_model_checkpoint,
load_dataset_config=True,
show_dataset_sample=10)
# merge_lora_main(infer_args)
result = infer_main(infer_args)
torch.cuda.empty_cache()
app_ui_main(infer_args)
其实并没有想象中那么难对吧!
只需要更改这3个参数,小白都能把模型跑起来!
运行效果:
注意!千万不要在VS code等IDE中运行代码,必须通过命令行运行py文件!否则会卡死!
python xxx.py
以上就是本文的全部内容
(本文是作者花了3天在本地机和云服务器、分别在windows10、windows11、windowsServer2022尝试,翻阅了数千个网站得出的经验)
(由于作者也是初学者,本文可能存在不足之处,请多多指出!)