山东大学创新实训第七周周报-----VisualGLM-6B微调环境配置

1.从GitHub上下载代码 

链接:THUDM/VisualGLM-6B: Chinese and English multimodal conversational language model | 多模态中英双语对话语言模型 (github.com)

2.创建一个虚拟环境

我们所使用的服务器系统为Ubuntu 18.04.6 LTS (GNU/Linux 4.15.0-76-generic x86_64),GPU为A100(NVIDIA-SMI 510.54;Driver Version: 510.54;CUDA Version: 11.6)

conda create -n GLM python=3.8

3.使用pip安装依赖

pip install -i https://pypi.org/simple -r requirements.txt
# 国内请使用aliyun镜像,TUNA等镜像同步最近出现问题,命令如下
pip install -i https://mirrors.aliyun.com/pypi/simple/ -r requirements.txt

3.解压fewshot-data.zip以后运行如下命令:

bash finetune/finetune_visualglm.sh

运行报错:

这个错误提示表明 GPU 驱动版本太旧,无法与当前的 PyTorch 版本兼容。需要更新GPU 驱动程序,以便与 PyTorch 的版本匹配,或更新PyTorch 的版本,以便与 GPU 驱动程序匹配。

输入:

conda list

以获取当前Conda环境的 PyTorch 的版本:

以下是Pytorch和CUDA对应的版本:

由于我们所使用的CUDA版本为11.6,所以我们选择安装对应版本的Pytorch

进入PyTorch官网:

# CUDA 11.6
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.6 -c pytorch -c nvidia

安装成功后,再次运行微调命令,报错如下:

Traceback (most recent call last):
  File "/opt/conda/envs/GLM/bin/deepspeed", line 3, in <module>
    from deepspeed.launcher.runner import main
  File "/opt/conda/envs/GLM/lib/python3.8/site-packages/deepspeed/__init__.py", line 25, in <module>
    from . import ops
  File "/opt/conda/envs/GLM/lib/python3.8/site-packages/deepspeed/ops/__init__.py", line 6, in <module>
    from . import adam
  File "/opt/conda/envs/GLM/lib/python3.8/site-packages/deepspeed/ops/adam/__init__.py", line 6, in <module>
    from .cpu_adam import DeepSpeedCPUAdam
  File "/opt/conda/envs/GLM/lib/python3.8/site-packages/deepspeed/ops/adam/cpu_adam.py", line 8, in <module>
    from deepspeed.utils import logger
  File "/opt/conda/envs/GLM/lib/python3.8/site-packages/deepspeed/utils/__init__.py", line 10, in <module>
    from .groups import *
  File "/opt/conda/envs/GLM/lib/python3.8/site-packages/deepspeed/utils/groups.py", line 28, in <module>
    from deepspeed import comm as dist
  File "/opt/conda/envs/GLM/lib/python3.8/site-packages/deepspeed/comm/__init__.py", line 7, in <module>
    from .comm import *
  File "/opt/conda/envs/GLM/lib/python3.8/site-packages/deepspeed/comm/comm.py", line 34, in <module>
    from deepspeed.utils import timer, get_caller_func
  File "/opt/conda/envs/GLM/lib/python3.8/site-packages/deepspeed/utils/timer.py", line 31, in <module>
    class CudaEventTimer(object):
  File "/opt/conda/envs/GLM/lib/python3.8/site-packages/deepspeed/utils/timer.py", line 33, in CudaEventTimer
    def __init__(self, start_event: get_accelerator().Event, end_event: get_accelerator().Event):
  File "/opt/conda/envs/GLM/lib/python3.8/site-packages/deepspeed/accelerator/real_accelerator.py", line 145, in get_accelerator
    torch.mps.current_allocated_memory()
  File "/opt/conda/envs/GLM/lib/python3.8/site-packages/torch/mps/__init__.py", line 102, in current_allocated_memory
    return torch._C._mps_currentAllocatedMemory()
AttributeError: module 'torch._C' has no attribute '_mps_currentAllocatedMemory'

这个错误可能是由于 DeepSpeed 与当前的 PyTorch 版本不兼容所致。在当前的 PyTorch 版本中,torch._C 模块没有 _mps_currentAllocatedMemory 属性,导致 DeepSpeed 的某些功能无法正常使用。因此,降低pytorch版本:

# CUDA 11.1
pip3 install torch==1.8.2 torchvision==0.9.2 torchaudio==0.8.2 --extra-index-url https://download.pytorch.org/whl/lts/1.8/cu111

再次运行微调命令,报错如下:

Traceback (most recent call last):
  File "/opt/conda/envs/GLM/lib/python3.8/site-packages/requests/compat.py", line 11, in <module>
    import chardet
ModuleNotFoundError: No module named 'chardet'

这个错误提示是由于缺少了一个名为 chardet 的模块而导致的问题。chardet 是一个用于检测编码类型的 Python 库,通常用于处理字符编码相关的任务。

解决这个问题的方法是安装缺失的 chardet 模块。命令如下:

pip install chardet

安装成功后再次运行微调报错:

Unzipping /root/.sat_models/visualglm-6b.zip...
[2024-04-14 13:17:20,087] [INFO] [RANK 0] building FineTuneVisualGLMModel model ...
INFO:sat:[RANK 0] building FineTuneVisualGLMModel model ...
Traceback (most recent call last):
  File "finetune_visualglm.py", line 178, in <module>
    model, args = FineTuneVisualGLMModel.from_pretrained(model_type, args)
  File "/opt/conda/envs/GLM/lib/python3.8/site-packages/sat/model/base_model.py", line 215, in from_pretrained
    return cls.from_pretrained_base(name, args=args, home_path=home_path, url=url, prefix=prefix, build_only=build_only, overwrite_args=overwrite_args, **kwargs)
  File "/opt/conda/envs/GLM/lib/python3.8/site-packages/sat/model/base_model.py", line 207, in from_pretrained_base
    model = get_model(args, cls, **kwargs)
  File "/opt/conda/envs/GLM/lib/python3.8/site-packages/sat/model/base_model.py", line 412, in get_model
    model = model_cls(args, params_dtype=params_dtype, **kwargs)
  File "finetune_visualglm.py", line 13, in __init__
    super().__init__(args, transformer=transformer, parallel_output=parallel_output, **kw_args)
  File "/root/HYJ/VisualGLM-6B-main/model/visualglm.py", line 34, in __init__
    self.add_mixin("eva", ImageMixin(args))
  File "/root/HYJ/VisualGLM-6B-main/model/visualglm.py", line 18, in __init__
    self.model = BLIP2(args.eva_args, args.qformer_args)
  File "/root/HYJ/VisualGLM-6B-main/model/blip2.py", line 56, in __init__
    self.vit = EVAViT(EVAViT.get_args(**eva_args))
  File "/root/HYJ/VisualGLM-6B-main/model/blip2.py", line 21, in __init__
    super().__init__(args, transformer=transformer, parallel_output=parallel_output, **kwargs)
  File "/opt/conda/envs/GLM/lib/python3.8/site-packages/sat/model/official/vit_model.py", line 111, in __init__
    super().__init__(args, transformer=transformer, parallel_output=parallel_output, **kwargs)
  File "/opt/conda/envs/GLM/lib/python3.8/site-packages/sat/model/base_model.py", line 92, in __init__
    self.transformer = BaseTransformer(
  File "/opt/conda/envs/GLM/lib/python3.8/site-packages/sat/model/transformer.py", line 464, in __init__
    self.word_embeddings = torch.nn.Embedding(vocab_size, hidden_size, dtype=params_dtype, device=device)
TypeError: __init__() got an unexpected keyword argument 'dtype'
Traceback (most recent call last):
  File "finetune_visualglm.py", line 178, in <module>
    model, args = FineTuneVisualGLMModel.from_pretrained(model_type, args)
  File "/opt/conda/envs/GLM/lib/python3.8/site-packages/sat/model/base_model.py", line 215, in from_pretrained
    return cls.from_pretrained_base(name, args=args, home_path=home_path, url=url, prefix=prefix, build_only=build_only, overwrite_args=overwrite_args, **kwargs)
  File "/opt/conda/envs/GLM/lib/python3.8/site-packages/sat/model/base_model.py", line 207, in from_pretrained_base
    model = get_model(args, cls, **kwargs)
  File "/opt/conda/envs/GLM/lib/python3.8/site-packages/sat/model/base_model.py", line 412, in get_model
    model = model_cls(args, params_dtype=params_dtype, **kwargs)
  File "finetune_visualglm.py", line 13, in __init__
    super().__init__(args, transformer=transformer, parallel_output=parallel_output, **kw_args)
  File "/root/HYJ/VisualGLM-6B-main/model/visualglm.py", line 34, in __init__
    self.add_mixin("eva", ImageMixin(args))
  File "/root/HYJ/VisualGLM-6B-main/model/visualglm.py", line 18, in __init__
    self.model = BLIP2(args.eva_args, args.qformer_args)
  File "/root/HYJ/VisualGLM-6B-main/model/blip2.py", line 56, in __init__
    self.vit = EVAViT(EVAViT.get_args(**eva_args))
  File "/root/HYJ/VisualGLM-6B-main/model/blip2.py", line 21, in __init__
    super().__init__(args, transformer=transformer, parallel_output=parallel_output, **kwargs)
  File "/opt/conda/envs/GLM/lib/python3.8/site-packages/sat/model/official/vit_model.py", line 111, in __init__
    super().__init__(args, transformer=transformer, parallel_output=parallel_output, **kwargs)
  File "/opt/conda/envs/GLM/lib/python3.8/site-packages/sat/model/base_model.py", line 92, in __init__
    self.transformer = BaseTransformer(
  File "/opt/conda/envs/GLM/lib/python3.8/site-packages/sat/model/transformer.py", line 464, in __init__
    self.word_embeddings = torch.nn.Embedding(vocab_size, hidden_size, dtype=params_dtype, device=device)
TypeError: __init__() got an unexpected keyword argument 'dtype'
[2024-04-14 13:17:27,363] [INFO] [launch.py:316:sigkill_handler] Killing subprocess 40318
[2024-04-14 13:17:27,370] [INFO] [launch.py:316:sigkill_handler] Killing subprocess 40319
[2024-04-14 13:17:27,371] [ERROR] [launch.py:322:sigkill_handler] ['/opt/conda/envs/GLM/bin/python', '-u', 'finetune_visualglm.py', '--local_rank=1', '--experiment-name', 'finetune-visualglm-6b', '--model-parallel-size', '1', '--mode', 'finetune', '--train-iters', '300', '--resume-dataloader', '--max_source_length', '64', '--max_target_length', '256', '--lora_rank', '10', '--layer_range', '0', '14', '--pre_seq_len', '4', '--train-data', './fewshot-data/dataset.json', '--valid-data', './fewshot-data/dataset.json', '--distributed-backend', 'nccl', '--lr-decay-style', 'cosine', '--warmup', '.02', '--checkpoint-activations', '--save-interval', '300', '--eval-interval', '10000', '--save', './checkpoints', '--split', '1', '--eval-iters', '10', '--eval-batch-size', '8', '--zero-stage', '1', '--lr', '0.0001', '--batch-size', '4', '--skip-init', '--fp16', '--use_lora'] exits with return code = 1

可以成功的连接host下载模型训练参数至本地,但在初始化模型时出现了问题。具体来说,torch.nn.Embedding 类的构造函数不支持 dtype 关键字参数。

经过阅读Pytorch官方文档:

Embedding — PyTorch 1.9.0 documentation

发现,Pytorch1.9.0支持dtype关键字参数,则更新Pytorch版本为1.9.0:

# CUDA 11.1
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html

报了一个新的错误:

这个错误表明在文件 finetune_visualglm.py 的第 118 行尝试调用了一个名为 encode 的方法,但是在 FakeTokenizer 对象中找不到这个方法。通常情况下,encode 方法是用于将文本编码成模型可接受的输入格式的方法。可能的原因是在使用 FakeTokenizer 对象时,没有正确初始化或者这个对象不包含 encode 方法。

找到下载的14个G的那个visualglm-6b.zip,然后unzip,然后解压的文件里面有个model_config.json文件,里面args.tokenizer_type='THUDM/chatglm-6b'替换成本地的路径就可以了.

此时,能够load参数了,但报了如下错误:

下载chatglm-6b(THUDM/chatglm-6b at main (huggingface.co))至visualglm-6b所在文件夹:

再次运行,报如下错误:

可能依赖的transformers模块版本不匹配,导致了tokenizer对象的属性不一致。降低 transformers 版本:

pip install transformers==4.33.2 -i https://mirrors.aliyun.com/pypi/simple/

运行后又报错:

重装deepspeed:

git clone https://github.com/microsoft/DeepSpeed.git
cd DeepSpeed
DS_BUILD_FUSED_ADAM=1 pip3 install .

在该重装之前需要:

export PATH=/usr/local/cuda/bin:$PATH

再进行微调即可运行成功!

  • 35
    点赞
  • 17
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值