MiniCPMV微调bug:ninja: build stopped: subcommand failed. CalledProcessError: Command ‘[‘ninja‘, ‘-v‘]

完整的报错信息非常之长,我一直在尝试解决后面的报错,忽略了ninja这个根本问题,浪费了两天时间,版本误我啊。为什么issue里没人说这个问题呢???

最后会放下完整的报错,遇到诸如以下报错的也可以看看是不是前面有个ninja的报错。

1. subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

2. RuntimeError: Error building extension 'fused_adam'

3. ImportError: libcudart.so.11.0: cannot open shared object file: No such file or directoryopen shared object file: No such file or directory

4. FAILED: multi_tensor_adam.cuda.o 

解决方案:
将你的虚拟环境里的/envs/xxx/lib/python3.xx/site-packages/torch/utils/cpp_extension.py中的[‘ninja’,‘-v’]改成[‘ninja’,‘–version’]

完美解决~,参考:
https://blog.csdn.net/fq9200/article/details/125362088
https://github.com/OpenBMB/MiniCPM-V/issues/220
https://blog.csdn.net/xjtdw/article/details/102929811
https://github.com/mapillary/inplace_abn/issues/104

完整的报错:

FAILED: multi_tensor_adam.cuda.o 
xxx/anaconda3/envs/cpmv/bin/nvcc  -ccbin xxx/anaconda3/envs/cpmv/bin/x86_64-conda-linux-gnu-cc -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -Ixxx/anaconda3/envs/cpmv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -Ixxx/anaconda3/envs/cpmv/lib/python3.10/site-packages/deepspeed/ops/csrc/adam -isystem xxx/anaconda3/envs/cpmv/lib/python3.10/site-packages/torch/include -isystem xxx/anaconda3/envs/cpmv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem xxx/anaconda3/envs/cpmv/lib/python3.10/site-packages/torch/include/TH -isystem xxx/anaconda3/envs/cpmv/lib/python3.10/site-packages/torch/include/THC -isystem xxx/anaconda3/envs/cpmv/include -isystem xxx/anaconda3/envs/cpmv/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 --compiler-options '-fPIC' -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -lineinfo --use_fast_math -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -DBF16_AVAILABLE -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -std=c++17 -c xxx/anaconda3/envs/cpmv/lib/python3.10/site-packages/deepspeed/ops/csrc/adam/multi_tensor_adam.cu -o multi_tensor_adam.cuda.o 
nvcc warning : incompatible redefinition for option 'compiler-bindir', the last value of this option was used
In file included from xxx/anaconda3/envs/cpmv/lib/python3.10/site-packages/deepspeed/ops/csrc/adam/multi_tensor_adam.cu:13:
xxx/anaconda3/envs/cpmv/lib/python3.10/site-packages/torch/include/ATen/cuda/CUDAContext.h:6:10: fatal error: cusparse.h: No such file or directory
    6 | #include <cusparse.h>
      |          ^~~~~~~~~~~~
compilation terminated.
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "xxx/anaconda3/envs/cpmv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2100, in _run_ninja_build
    subprocess.run(
  File "xxx/anaconda3/envs/cpmv/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/mnt/cfs/CV/zyt/minicpm/MiniCPM-V/finetune/finetune.py", line 328, in <module>
    train()
  File "/mnt/cfs/CV/zyt/minicpm/MiniCPM-V/finetune/finetune.py", line 318, in train
Loading extension module fused_adam...
    trainer.train()
  File "xxx/anaconda3/envs/cpmv/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train
Traceback (most recent call last):
  File "/mnt/cfs/CV/zyt/minicpm/MiniCPM-V/finetune/finetune.py", line 328, in <module>
    train()
  File "/mnt/cfs/CV/zyt/minicpm/MiniCPM-V/finetune/finetune.py", line 318, in train
    trainer.train()
  File "xxx/anaconda3/envs/cpmv/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train
    return inner_training_loop(
  File "xxx/anaconda3/envs/cpmv/lib/python3.10/site-packages/transformers/trainer.py", line 2015, in _inner_training_loop
    return inner_training_loop(
  File "xxx/anaconda3/envs/cpmv/lib/python3.10/site-packages/transformers/trainer.py", line 2015, in _inner_training_loop
    model, self.optimizer, self.lr_scheduler = self.accelerator.prepare(
  File "xxx/anaconda3/envs/cpmv/lib/python3.10/site-packages/accelerate/accelerator.py", line 1284, in prepare
    model, self.optimizer, self.lr_scheduler = self.accelerator.prepare(
  File "xxx/anaconda3/envs/cpmv/lib/python3.10/site-packages/accelerate/accelerator.py", line 1284, in prepare
    result = self._prepare_deepspeed(*args)
  File "xxx/anaconda3/envs/cpmv/lib/python3.10/site-packages/accelerate/accelerator.py", line 1751, in _prepare_deepspeed
    result = self._prepare_deepspeed(*args)
  File "xxx/anaconda3/envs/cpmv/lib/python3.10/site-packages/accelerate/accelerator.py", line 1751, in _prepare_deepspeed
    engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
  File "xxx/anaconda3/envs/cpmv/lib/python3.10/site-packages/deepspeed/__init__.py", line 181, in initialize
    engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
  File "xxx/anaconda3/envs/cpmv/lib/python3.10/site-packages/deepspeed/__init__.py", line 181, in initialize
    engine = DeepSpeedEngine(args=args,
  File "xxx/anaconda3/envs/cpmv/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 306, in __init__
    engine = DeepSpeedEngine(args=args,
  File "xxx/anaconda3/envs/cpmv/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 306, in __init__
    self._configure_optimizer(optimizer, model_parameters)
  File "xxx/anaconda3/envs/cpmv/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1223, in _configure_optimizer
    self._configure_optimizer(optimizer, model_parameters)
  File "xxx/anaconda3/envs/cpmv/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1223, in _configure_optimizer
    basic_optimizer = self._configure_basic_optimizer(model_parameters)
  File "xxx/anaconda3/envs/cpmv/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1300, in _configure_basic_optimizer
    basic_optimizer = self._configure_basic_optimizer(model_parameters)
  File "xxx/anaconda3/envs/cpmv/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1300, in _configure_basic_optimizer
    optimizer = FusedAdam(
  File "xxx/anaconda3/envs/cpmv/lib/python3.10/site-packages/deepspeed/ops/adam/fused_adam.py", line 94, in __init__
    optimizer = FusedAdam(
  File "xxx/anaconda3/envs/cpmv/lib/python3.10/site-packages/deepspeed/ops/adam/fused_adam.py", line 94, in __init__
    fused_adam_cuda = FusedAdamBuilder().load()
  File "xxx/anaconda3/envs/cpmv/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 508, in load
    fused_adam_cuda = FusedAdamBuilder().load()
  File "xxx/anaconda3/envs/cpmv/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 508, in load
    return self.jit_load(verbose)
  File "xxx/anaconda3/envs/cpmv/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 555, in jit_load
    return self.jit_load(verbose)
  File "xxx/anaconda3/envs/cpmv/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 555, in jit_load
    op_module = load(name=self.name,
  File "xxx/anaconda3/envs/cpmv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1308, in load
    op_module = load(name=self.name,
  File "xxx/anaconda3/envs/cpmv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1308, in load
    return _jit_compile(
  File "xxx/anaconda3/envs/cpmv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1710, in _jit_compile
    return _jit_compile(
  File "xxx/anaconda3/envs/cpmv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1736, in _jit_compile
    _write_ninja_file_and_build_library(
  File "xxx/anaconda3/envs/cpmv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1823, in _write_ninja_file_and_build_library
    return _import_module_from_library(name, build_directory, is_python_module)
  File "xxx/anaconda3/envs/cpmv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2136, in _import_module_from_library
    _run_ninja_build(
  File "xxx/anaconda3/envs/cpmv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2116, in _run_ninja_build
    module = importlib.util.module_from_spec(spec)
  File "<frozen importlib._bootstrap>", line 571, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 1176, in create_module
    raise RuntimeError(message) from e
RuntimeError: Error building extension 'fused_adam'
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
ImportError: libcudart.so.11.0: cannot open shared object file: No such file or directoryopen shared object file: No such file or directory

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值