Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing 问题解决

1.问题描述

笔者在使用FastChat框架中的vllm_work部署DeepSeek-Coder-V2时遇到了如下报错:

[root@gnho024 ~]$ python3 /ssdwork/FastChat/fastchat/serve/vllm_worker.py --model-path /ssdwork/DeepSeek-Coder-V2-Instruct/ --num-gpus 8
INFO 08-12 22:51:42 config.py:715] Defaulting to use mp for distributed inference
WARNING 08-12 22:51:42 arg_utils.py:762] Chunked prefill is enabled by default for models with max_model_len > 32K. Currently, chunked prefill might not work with some features or models. If you encounter any issues, please disable chunked prefill by setting --enable-chunked-prefill=False.
INFO 08-12 22:51:42 config.py:806] Chunked prefill is enabled with max_num_batched_tokens=512.
INFO 08-12 22:51:42 llm_engine.py:176] Initializing an LLM engine (v0.5.3.post1) with config: model='/ssdwork/DeepSeek-Coder-V2-Instruct/', speculative_config=None, tokenizer='/ssdwork/DeepSeek-Coder-V2-Instruct/', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=163840, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=8, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None), seed=0, served_model_name=/ssdwork/DeepSeek-Coder-V2-Instruct/, use_v2_block_manager=False, enable_prefix_caching=False)
INFO 08-12 22:51:43 custom_cache_manager.py:17] Setting Triton cache manager to: vllm.triton_utils.custom_cache_manager:CustomCacheManager
INFO 08-12 22:51:43 multiproc_worker_utils.py:215] Worker ready; awaiting tasks
ERROR 08-12 22:51:43 multiproc_worker_utils.py:226] Exception in worker VllmWorkerProcess while processing method init_device: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method, Traceback (most recent call last):
ERROR 08-12 22:51:43 multiproc_worker_utils.py:226]   File "/ssdwork/.local/lib/python3.10/site-packages/vllm/executor/multiproc_worker_utils.py", line 223, in _run_worker_process
ERROR 08-12 22:51:43 multiproc_worker_utils.py:226]     output = executor(*args, **kwargs)
ERROR 08-12 22:51:43 multiproc_worker_utils.py:226]   File "/ssdwork/.local/lib/python3.10/site-packages/vllm/worker/worker.py", line 123, in init_device
ERROR 08-12 22:51:43 multiproc_worker_utils.py:226]     torch.cuda.set_device(self.device)
ERROR 08-12 22:51:43 multiproc_worker_utils.py:226]   File "/ssdwork/.local/lib/python3.10/site-packages/torch/cuda/__init__.py", line 399, in set_device
ERROR 08-12 22:51:43 multiproc_worker_utils.py:226]     torch._C._cuda_setDevice(device)
ERROR 08-12 22:51:43 multiproc_worker_utils.py:226]   File "/ssdwork/.local/lib/python3.10/site-packages/torch/cuda/__init__.py", line 279, in _lazy_init
ERROR 08-12 22:51:43 multiproc_worker_utils.py:226]     raise RuntimeError(
ERROR 08-12 22:51:43 multiproc_worker_utils.py:226] RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
ERROR 08-12 22:51:43 multiproc_worker_utils.py:226]
INFO 08-12 22:51:43 multiproc_worker_utils.py:215] Worker ready; awaiting tasks

2.问题解决

在查阅了各种资料后,例如RuntimeError: Cannot re-initialize CUDA in forked subprocess.。似乎有不少人遇到过这种问题,尝试在vllm_worker.py(主脚本)里设置torch.multiprocessing.set_start_method('spawn'),但没有起作用,还有一种方法是设置num_workers=0,但我的脚本里根本没有这个变量,无法实操。
后来发现是其实是/ssdwork/.local/lib/python3.10/site-packages/vllm/executor/multiproc_worker_utils.py脚本进行了多进程相关的操作,将脚本中

mp_method = envs.VLLM_WORKER_MULTIPROC_METHOD
mp = multiprocessing.get_context(mp_method)

注释掉,并在开头加上torch.multiprocessing.set_start_method('spawn'),问题成功解决。
所以遇到Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing问题时,一定一定要找到使用torch多线程的脚本再设置,不然是没有用的,尤其是使用别人的框架时

  • 5
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
当您在使用CUDA进行多进程编程时,如果出现"RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method"的错误提示,这是因为在使用fork方式创建子进程时,CUDA运行时不支持重新初始化CUDA。 为了解决这个问题,您可以使用'spawn'启动方法来创建子进程。在您的代码中添加以下内容可以解决这个问题: ```python if __name__=='__main__': torch.multiprocessing.set_start_method('spawn') ``` 这样就会使用'spawn'启动方法来创建子进程,从而解决CUDA多进程编程的问题。同时,您可以参考以下链接来了解更多关于这个问题的信息: - [日志提示“RuntimeError: Cannot re-initialize CUDA in forked subprocess”](https://britishgeologicalsurvey.github.io/science/python-forking-vs-spawn/) - [Python中使用spawn或forkserver启动方法](https://www.pythonf.cn/read/65459) - [Stack Overflow上的相关问题和解答](https://stackoverflow.com/questions/64095876/multiprocessing-fork-vs-spawn) <span class="em">1</span><span class="em">2</span><span class="em">3</span> #### 引用[.reference_title] - *1* *3* [Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you](https://blog.csdn.net/weixin_37913042/article/details/103018611)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v93^chatsearchT3_1"}}] [.reference_item style="max-width: 50%"] - *2* [pythonRuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing](https://blog.csdn.net/ResumeProject/article/details/125449639)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v93^chatsearchT3_1"}}] [.reference_item style="max-width: 50%"] [ .reference_list ]
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值