Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing 问题解决

最新推荐文章于 2025-03-28 15:09:10 发布

endNone

最新推荐文章于 2025-03-28 15:09:10 发布

阅读量2.3k

点赞数 9

分类专栏：大模型debug 文章标签： python FastChat 人工智能深度学习语言模型 cuda torch

本文链接：https://blog.csdn.net/zwhszdx/article/details/141144329

版权

大模型debug 专栏收录该内容

16 篇文章

订阅专栏

文章目录

- 1.问题描述
- 2.问题解决

1.问题描述

笔者在使用FastChat框架中的vllm_work部署DeepSeek-Coder-V2时遇到了如下报错：

[root@gnho024 ~]$ python3 /ssdwork/FastChat/fastchat/serve/vllm_worker.py --model-path /ssdwork/DeepSeek-Coder-V2-Instruct/ --num-gpus 8
INFO 08-12 22:51:42 config.py:715] Defaulting to use mp for distributed inference
WARNING 08-12 22:51:42 arg_utils.py:762] Chunked prefill is enabled by default for models with max_model_len > 32K. Currently, chunked prefill might not work with some features or models. If you encounter any issues, please disable chunked prefill by setting --enable-chunked-prefill=False.
INFO 08-12 22:51:42 config.py:806] Chunked prefill is enabled with max_num_batched_tokens=512.
INFO 08-12 22:51:42 llm_engine.py:176] Initializing an LLM engine (v0.5.3.post1) with config: model='/ssdwork/DeepSeek-Coder-V2-Instruct/', speculative_config=None, tokenizer='/ssdwork/DeepSeek-Coder-V2-Instruct/', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=163840, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=8, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None), seed=0, served_model_name=/ssdwork/DeepSeek-Coder-V2-Instruct/, use_v2_block_manager=False, enable_prefix_caching=False)
INFO 08-12 22:51:43 custom_cache_manager.py:17] Setting Triton cache manager to: vllm.triton_utils.custom_cache_manager:CustomCacheManager
INFO 08-12 22:51:43 multiproc_worker_utils.py:215] Worker ready; awaiting tasks
ERROR 08-12 22:51:43 multiproc_worker_utils.py:226] Exception in worker VllmWorkerProcess while processing method init_device: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method, Traceback (most recent call last):
ERROR 08-12 22:51:43 multiproc_worker_utils.py:226]   File "/ssdwork/.local/lib/python3.10/site-packages/vllm/executor/multiproc_worker_utils.py", line 223, in _run_worker_process
ERROR 08-12 22:51:43 multiproc_worker_utils.py:226]     output = executor(*args, **kwargs)
ERROR 08-12 22:51:43 multiproc_worker_utils.py:226]   File "/ssdwork/.local/lib/python3.10/site-packages/vllm/worker/worker.py", line 123, in init_device
ERROR 08-12 22:51:43 multiproc_worker_utils.py:226]     torch.cuda.set_device(self.device)
ERROR 08-12 22:51:43 multiproc_worker_utils.py:226]   File "/ssdwork/.local/lib/python3.10/site-packages/torch/cuda/__init__.py", line 399, in set_device
ERROR 08-12 22:51:43 multiproc_worker_utils.py:226]     torch._C._cuda_setDevice(device)
ERROR 08-12 22:51:43 multiproc_worker_utils.py:226]   File "/ssdwork/.local/lib/python3.10/site-packages/torch/cuda/__init__.py", line 279, in _lazy_init
ERROR 08-12 22:51:43 multiproc_worker_utils.py:226]     raise RuntimeError(
ERROR 08-12 22:51:43 multiproc_worker_utils.py:226] RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
ERROR 08-12 22:51:43 multiproc_worker_utils.py:226]
INFO 08-12 22:51:43 multiproc_worker_utils.py:215] Worker ready; awaiting tasks

2.问题解决

在查阅了各种资料后，例如RuntimeError: Cannot re-initialize CUDA in forked subprocess.。似乎有不少人遇到过这种问题，尝试在vllm_worker.py（主脚本）里设置torch.multiprocessing.set_start_method('spawn'),但没有起作用，还有一种方法是设置num_workers=0,但我的脚本里根本没有这个变量，无法实操。
后来发现是其实是/ssdwork/.local/lib/python3.10/site-packages/vllm/executor/multiproc_worker_utils.py脚本进行了多进程相关的操作，将脚本中

mp_method = envs.VLLM_WORKER_MULTIPROC_METHOD
mp = multiprocessing.get_context(mp_method)

注释掉，并在开头加上torch.multiprocessing.set_start_method('spawn')，问题成功解决。
所以遇到Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing问题时，一定一定要找到使用torch多线程的脚本再设置，不然是没有用的，尤其是使用别人的框架时