Minicpm-V代码微调

使用minicpm的lora微调脚本进行模型微调

使用minicpm-v进行微调

使用4090进行lora微调时,设置deepspeed进行训练,使用配置文件为ds_config_zero3.json,由于编译过程中出现如下错误:

Building extension module fused_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/3] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output multi_tensor_adam.cuda.o.d -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/opt/conda/envs/LM/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -I/opt/conda/envs/LM/lib/python3.10/site-packages/deepspeed/ops/csrc/adam -isystem /opt/conda/envs/LM/lib/python3.10/site-packages/torch/include -isystem /opt/conda/envs/LM/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/LM/lib/python3.10/site-packages/torch/include/TH -isystem /opt/conda/envs/LM/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/envs/LM/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -lineinfo --use_fast_math -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_89,code=compute_89 -std=c++17 -c /opt/conda/envs/LM/lib/python3.10/site-packages/deepspeed/ops/csrc/adam/multi_tensor_adam.cu -o multi_tensor_adam.cuda.o 
FAILED: multi_tensor_adam.cuda.o 
/usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output multi_tensor_adam.cuda.o.d -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/opt/conda/envs/LM/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -I/opt/conda/envs/LM/lib/python3.10/site-packages/deepspeed/ops/csrc/adam -isystem /opt/conda/envs/LM/lib/python3.10/site-packages/torch/include -isystem /opt/conda/envs/LM/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/LM/lib/python3.10/site-packages/torch/include/TH -isystem /opt/conda/envs/LM/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/envs/LM/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -lineinfo --use_fast_math -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_89,code=compute_89 -std=c++17 -c /opt/conda/envs/LM/lib/python3.10/site-packages/deepspeed/ops/csrc/adam/multi_tensor_adam.cu -o multi_tensor_adam.cuda.o 
nvcc fatal   : Unsupported gpu architecture 'compute_89'
Using /root/.cache/torch_extensions/py310_cu121 as PyTorch extensions root...
[2/3] c++ -MMD -MF fused_adam_frontend.o.d -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/opt/conda/envs/LM/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -I/opt/conda/envs/LM/lib/python3.10/site-packages/deepspeed/ops/csrc/adam -isystem /opt/conda/envs/LM/lib/python3.10/site-packages/torch/include -isystem /opt/conda/envs/LM/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/LM/lib/python3.10/site-packages/torch/include/TH -isystem /opt/conda/envs/LM/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/envs/LM/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -c /opt/conda/envs/LM/lib/python3.10/site-packages/deepspeed/ops/csrc/adam/fused_adam_frontend.cpp -o fused_adam_frontend.o 
ninja: build stopped: subcommand failed.
[rank0]: Traceback (most recent call last):
[rank0]:   File "/opt/conda/envs/LM/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2107, in _run_ninja_build
[rank0]:     subprocess.run(
[rank0]:   File "/opt/conda/envs/LM/lib/python3.10/subprocess.py", line 526, in run
[rank0]:     raise CalledProcessError(retcode, process.args,
[rank0]: subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

因此无法编译出:/root/.cache/torch_extensions/py310_cu121/fused_adam/fused_adam.so。因此使用deepspeed的一个参数控制,使用torch原生的优化器进行训练,相关链接为:链接: torch_adam

修改后的配置文件参数为:

 "optimizer": {
        "type": "AdamW",
        "params": {
            "lr": "auto",
            "betas": "auto",
            "eps": "auto",
            "weight_decay": "auto",
            "torch_adam":true
        }
    },
  • 3
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值