NLP学习与踩坑记录(持续更新版)
本博客记录了博主在学习NLP时遇到了各种各样的问题与解决方法,供大家参考,希望踩过的坑不踩第二次!
ImportError: cannot import name ‘prepare_model_for_int8_training’ from ‘peft’
prepare_model_for_int8_training
从 PEFT v0.10.0 开始已经被弃用,使用prepare_model_for_kbit_training
.
ValueError: Attempting to unscale FP16 gradients.
博主在使用llama-7b模型进行lora微调时,使用fp16精度加载模型,报错此错误,解决方法是使用fp32或int8精度加载。
- fp32精度(显存够用的情况下):
llm_model = LlamaForCausalLM.from_pretrained(llm_model, torch_dtype=torch.float32)
- in8精度(推荐)
from peft import LoraConfig, get_peft_model, prepare_model_for_int8_training
from transformers import LlamaTokenizer, LlamaForCausalLM
path_to_llama = xxx
model = LlamaForCausalLM.from_pretrained(
path_to_llama,
device_map="auto",
load_in_8bit=True
)
tokenizer = LlamaTokenizer.from_pretrained(path_to_llama)
config = LoraConfig(
r=16,
lora_alpha=32,
target_modules=["q_proj", "v_proj"],
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM",
)
model = prepare_model_for_int8_training(model)
model = get_peft_model(model, config)
... # get your dataset etc here
trainer = Trainer(
model=model,
...
)
OSError: Can’t load tokenizer for ‘bert-base-uncased’.
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased", truncation_side=truncation_side)
博主在调用上述代码时出现此报错,原因是在国内因为网络问题无法下载huggingface上的模型。
解决办法一:检查自己的网络,在国内需要使用VPN保证可以访问huggingface,然后重新运行代码。若不行,将模型下载到本地,再重新运行代码。
huggingface-cli download --resume-download google-bert/bert-base-cased --local-dir /home/user/bert-base-cased
解决办法二:使用modelscope上的镜像,速度较快,但可能存在一些huggingface上的模型modelscope上没有。
# pip install modelscope
from modelscope.hub.snapshot_download import snapshot_download
llm = snapshot_download('AI-ModelScope/bert-base-uncased')
tokenizer = BertTokenizer.from_pretrained(llm, truncation_side=truncation_side)
解决办法三:Colab下载转移至Google Drive上,再从Google Drive上下载。
google.protobuf.message.DecodeError: Error parsing message
原因是通过git clone命令直接下载,并没有下载到正确的模型参数文件,只是一个文本文档,解决方法是下载huggingface上的模型需要使用huggingface-cli
工具。
# 错误的下载方式
git clone https://huggingface.co/bert-base-uncased
# 正确的下载方式
pip install huggingface_hub
huggingface-cli download --resume-download [model_name] --local-dir [local path]
# eg: huggingface-cli download --resume-download google-bert/bert-base-cased --local-dir /home/user/
bitsandbytes
博主在安装bitsandbytes-0.42.0时,运行代码报错:
CUDA SETUP: Something unexpected happened. Please compile from source:
git clone https://github.com/TimDettmers/bitsandbytes.git
cd bitsandbytes
CUDA_VERSION=116 make cuda11x_nomatmul
python setup.py install
CUDA SETUP: Setup Failed!
Traceback (most recent call last):
File "/xxx/lib/python3.8/runpy.py", line 185, in _run_module_as_main
mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
File "/xxx/lib/python3.8/runpy.py", line 144, in _get_module_details
return _get_module_details(pkg_main_name, error)
File "/xxxm/lib/python3.8/runpy.py", line 111, in _get_module_details
__import__(pkg_name)
File "/xxx/lib/python3.8/site-packages/bitsandbytes/__init__.py", line 6, in <module>
from . import cuda_setup, utils, research
File "/xxx/lib/python3.8/site-packages/bitsandbytes/research/__init__.py", line 1, in <module>
from . import nn
File "/xxx/lib/python3.8/site-packages/bitsandbytes/research/nn/__init__.py", line 1, in <module>
from .modules import LinearFP8Mixed, LinearFP8Global
File "/xxx/lib/python3.8/site-packages/bitsandbytes/research/nn/modules.py", line 8, in <module>
from bitsandbytes.optim import GlobalOptimManager
File "/xxx/lib/python3.8/site-packages/bitsandbytes/optim/__init__.py", line 6, in <module>
from bitsandbytes.cextension import COMPILED_WITH_CUDA
File "/xxx/lib/python3.8/site-packages/bitsandbytes/cextension.py", line 20, in <module>
raise RuntimeError('''
RuntimeError:
CUDA Setup failed despite GPU being available. Please run the following command to get more information:
python -m bitsandbytes
Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues
解决办法是源码安装bitsandbytes或安装较低版本的bitsandbytes,例如pip install bitsandbytes==0.38.1
。
Deepspeed
- Deepspeed 在训练代码中如果单卡无法加载,初始化需要用init context,参考huggingface的trainer(training argument在模型加载前)https://huggingface.co/docs/transformers/v4.34.1/en/main_classes/deepspeed#constructing-massive-models
- 数据并行data parallelism (zero3 cuts model horizontally)、流水线并行pipeline parallelism (cuts model vertically)
https://huggingface.co/docs/transformers/v4.35.2/en/perf_train_gpu_many#zero-data-parallelism–pipeline-parallelism–tensor-parallelism - zero++ 优化通信策略 https://www.deepspeed.ai/tutorials/zeropp/#three-components-of-zero