mp_api.client.core.client.MPRestError报错已解决

mp_api.client.core.client.MPRestError报错已解决

1. 问题描述

  在使用以下包运行代码时,可以正常运行,过一段时间重新运行发现没办法正常运行,出现如题目所示报错信息,这可能是由于mp_api没有更新到最新版本。

from mp_api.client import MPRester

2. 解决方法

  卸载已有的mp_api包;

pip uninstall mp-api

  安装最新可用的mp_api包。

pip install mp-api==x.xx.x

其中x.xx.x代表最新的mp_api的版本号。

(vllm) zhzx@zhzx-S2600WF-LS:/media/zhzx/ssd2/Qwen3-32B$ vllm serve "/media/zhzx/ssd2/Qwen3-32B" --host 0.0.0.0 --port 8060 --dtype bfloat16 --tensor-parallel-size 2 --cpu-offload-gb 20 --gpu-memory-utilization 0.8 --max-model-len 8126 --api-key token-abc123 --enable-prefix-caching --trust-remote-code \ > INFO 05-10 20:06:33 [__init__.py:239] Automatically detected platform cuda. INFO 05-10 20:06:36 [api_server.py:1043] vLLM API server version 0.8.5.post1 INFO 05-10 20:06:36 [api_server.py:1044] args: Namespace(subparser='serve', model_tag='/media/zhzx/ssd2/Qwen3-32B', config='', host='0.0.0.0', port=8060, uvicorn_log_level='info', disable_uvicorn_access_log=False, allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_key='token-abc123', lora_modules=None, prompt_adapters=None, chat_template=None, chat_template_content_format='auto', response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, enable_ssl_refresh=False, ssl_cert_reqs=0, root_path=None, middleware=[], return_tokens_as_token_ids=False, disable_frontend_multiprocessing=False, enable_request_id_headers=False, enable_auto_tool_choice=False, tool_call_parser=None, tool_parser_plugin='', model='/media/zhzx/ssd2/Qwen3-32B', task='auto', tokenizer=None, hf_config_path=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=True, allowed_local_media_path=None, load_format='auto', download_dir=None, model_loader_extra_config={}, use_tqdm_on_load=True, config_format=<ConfigFormat.AUTO: 'auto'>, dtype='bfloat16', max_model_len=8126, guided_decoding_backend='auto', reasoning_parser=None, logits_processor_pattern=None, model_impl='auto', distributed_executor_backend=None, pipeline_parallel_size=1, tensor_parallel_size=2, data_parallel_size=1, enable_expert_parallel=False, max_parallel_loading_workers=None, ray_workers_use_nsight=False, disable_custom_all_reduce=False, block_size=None, gpu_memory_utilization=0.8, swap_space=4, kv_cache_dtype='auto', num_gpu_blocks_override=None, enable_prefix_caching=True, prefix_caching_hash_algo='builtin', cpu_offload_gb=20.0, calculate_kv_scales=False, disable_sliding_window=False, use_v2_block_manager=True, seed=None, max_logprobs=20, disable_log_stats=False, quantization=None, rope_scaling=None, rope_theta=None, hf_token=None, hf_overrides=None, enforce_eager=False, max_seq_len_to_capture=8192, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config={}, limit_mm_per_prompt={}, mm_processor_kwargs=None, disable_mm_preprocessor_cache=False, enable_lora=None, enable_lora_bias=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', long_lora_scaling_factors=None, max_cpu_loras=None, fully_sharded_loras=False, enable_prompt_adapter=None, max_prompt_adapters=1, max_prompt_adapter_token=0, device='auto', speculative_config=None, ignore_patterns=[], served_model_name=None, qlora_adapter_name_or_path=None, show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, disable_async_output_proc=False, max_num_batched_tokens=None, max_num_seqs=None, max_num_partial_prefills=1, max_long_partial_prefills=1, long_prefill_token_threshold=0, num_lookahead_slots=0, scheduler_delay_factor=0.0, preemption_mode=None, num_scheduler_steps=1, multi_step_stream_outputs=True, scheduling_policy='fcfs', enable_chunked_prefill=None, disable_chunked_mm_input=False, scheduler_cls='vllm.core.scheduler.Scheduler', override_neuron_config=None, override_pooler_config=None, compilation_config=None, kv_transfer_config=None, worker_cls='auto', worker_extension_cls='', generation_config='auto', override_generation_config=None, enable_sleep_mode=False, additional_config=None, enable_reasoning=False, disable_cascade_attn=False, disable_log_requests=False, max_log_len=None, disable_fastapi_docs=False, enable_prompt_tokens_details=False, enable_server_load_tracking=False, dispatch_function=<function ServeSubcommand.cmd at 0x7f625a071bc0>) WARNING 05-10 20:06:36 [config.py:2972] Casting torch.float16 to torch.bfloat16. INFO 05-10 20:06:42 [config.py:717] This model supports multiple tasks: {'reward', 'embed', 'generate', 'classify', 'score'}. Defaulting to 'generate'. WARNING 05-10 20:06:42 [arg_utils.py:1658] Compute Capability < 8.0 is not supported by the V1 Engine. Falling back to V0. INFO 05-10 20:06:42 [config.py:1770] Defaulting to use mp for distributed inference INFO 05-10 20:06:42 [api_server.py:246] Started engine process with PID 73230 INFO 05-10 20:06:45 [__init__.py:239] Automatically detected platform cuda. INFO 05-10 20:06:47 [llm_engine.py:240] Initializing a V0 LLM engine (v0.8.5.post1) with config: model='/media/zhzx/ssd2/Qwen3-32B', speculative_config=None, tokenizer='/media/zhzx/ssd2/Qwen3-32B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=8126, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=2, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='auto', reasoning_backend=None), observability_config=ObservabilityConfig(show_hidden_metrics=False, otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=None, served_model_name=/media/zhzx/ssd2/Qwen3-32B, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=False, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256}, use_cached_outputs=True, WARNING 05-10 20:06:48 [multiproc_worker_utils.py:306] Reducing Torch parallelism from 16 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed. INFO 05-10 20:06:48 [cuda.py:240] Cannot use FlashAttention-2 backend for Volta and Turing GPUs. INFO 05-10 20:06:48 [cuda.py:289] Using XFormers backend. INFO 05-10 20:06:50 [__init__.py:239] Automatically detected platform cuda. (VllmWorkerProcess pid=73270) INFO 05-10 20:06:53 [multiproc_worker_utils.py:225] Worker ready; awaiting tasks (VllmWorkerProcess pid=73270) INFO 05-10 20:06:53 [cuda.py:240] Cannot use FlashAttention-2 backend for Volta and Turing GPUs. (VllmWorkerProcess pid=73270) INFO 05-10 20:06:53 [cuda.py:289] Using XFormers backend. ERROR 05-10 20:06:53 [engine.py:448] Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your Quadro RTX 6000 GPU has compute capability 7.5. You can use float16 instead by explicitly setting the `dtype` flag in CLI, for example: --dtype=half. ERROR 05-10 20:06:53 [engine.py:448] Traceback (most recent call last): ERROR 05-10 20:06:53 [engine.py:448] File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 436, in run_mp_engine ERROR 05-10 20:06:53 [engine.py:448] engine = MQLLMEngine.from_vllm_config( ERROR 05-10 20:06:53 [engine.py:448] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 05-10 20:06:53 [engine.py:448] File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 128, in from_vllm_config ERROR 05-10 20:06:53 [engine.py:448] return cls( ERROR 05-10 20:06:53 [engine.py:448] ^^^^ ERROR 05-10 20:06:53 [engine.py:448] File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 82, in __init__ ERROR 05-10 20:06:53 [engine.py:448] self.engine = LLMEngine(*args, **kwargs) ERROR 05-10 20:06:53 [engine.py:448] ^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 05-10 20:06:53 [engine.py:448] File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/engine/llm_engine.py", line 275, in __init__ ERROR 05-10 20:06:53 [engine.py:448] self.model_executor = executor_class(vllm_config=vllm_config) ERROR 05-10 20:06:53 [engine.py:448] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 05-10 20:06:53 [engine.py:448] File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 286, in __init__ ERROR 05-10 20:06:53 [engine.py:448] super().__init__(*args, **kwargs) ERROR 05-10 20:06:53 [engine.py:448] File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 52, in __init__ ERROR 05-10 20:06:53 [engine.py:448] self._init_executor() ERROR 05-10 20:06:53 [engine.py:448] File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/executor/mp_distributed_executor.py", line 124, in _init_executor ERROR 05-10 20:06:53 [engine.py:448] self._run_workers("init_device") ERROR 05-10 20:06:53 [engine.py:448] File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/executor/mp_distributed_executor.py", line 185, in _run_workers ERROR 05-10 20:06:53 [engine.py:448] driver_worker_output = run_method(self.driver_worker, sent_method, ERROR 05-10 20:06:53 [engine.py:448] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 05-10 20:06:53 [engine.py:448] File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/utils.py", line 2456, in run_method ERROR 05-10 20:06:53 [engine.py:448] return func(*args, **kwargs) ERROR 05-10 20:06:53 [engine.py:448] ^^^^^^^^^^^^^^^^^^^^^ ERROR 05-10 20:06:53 [engine.py:448] File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/worker/worker_base.py", line 604, in init_device ERROR 05-10 20:06:53 [engine.py:448] self.worker.init_device() # type: ignore ERROR 05-10 20:06:53 [engine.py:448] ^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 05-10 20:06:53 [engine.py:448] File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/worker/worker.py", line 177, in init_device ERROR 05-10 20:06:53 [engine.py:448] _check_if_gpu_supports_dtype(self.model_config.dtype) ERROR 05-10 20:06:53 [engine.py:448] File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/worker/worker.py", line 546, in _check_if_gpu_supports_dtype ERROR 05-10 20:06:53 [engine.py:448] raise ValueError( ERROR 05-10 20:06:53 [engine.py:448] ValueError: Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your Quadro RTX 6000 GPU has compute capability 7.5. You can use float16 instead by explicitly setting the `dtype` flag in CLI, for example: --dtype=half. Process SpawnProcess-1: ERROR 05-10 20:06:53 [multiproc_worker_utils.py:120] Worker VllmWorkerProcess pid 73270 died, exit code: -15 INFO 05-10 20:06:53 [multiproc_worker_utils.py:124] Killing local vLLM worker processes Traceback (most recent call last): File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 450, in run_mp_engine raise e File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 436, in run_mp_engine engine = MQLLMEngine.from_vllm_config( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 128, in from_vllm_config return cls( ^^^^ File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 82, in __init__ self.engine = LLMEngine(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/engine/llm_engine.py", line 275, in __init__ self.model_executor = executor_class(vllm_config=vllm_config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 286, in __init__ super().__init__(*args, **kwargs) File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 52, in __init__ self._init_executor() File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/executor/mp_distributed_executor.py", line 124, in _init_executor self._run_workers("init_device") File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/executor/mp_distributed_executor.py", line 185, in _run_workers driver_worker_output = run_method(self.driver_worker, sent_method, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/utils.py", line 2456, in run_method return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/worker/worker_base.py", line 604, in init_device self.worker.init_device() # type: ignore ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/worker/worker.py", line 177, in init_device _check_if_gpu_supports_dtype(self.model_config.dtype) File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/worker/worker.py", line 546, in _check_if_gpu_supports_dtype raise ValueError( ValueError: Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your Quadro RTX 6000 GPU has compute capability 7.5. You can use float16 instead by explicitly setting the `dtype` flag in CLI, for example: --dtype=half. Traceback (most recent call last): File "/home/zhzx/miniconda3/envs/vllm/bin/vllm", line 8, in <module> sys.exit(main()) ^^^^^^ File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/entrypoints/cli/main.py", line 53, in main args.dispatch_function(args) File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/entrypoints/cli/serve.py", line 27, in cmd uvloop.run(run_server(args)) File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/uvloop/__init__.py", line 109, in run return __asyncio.run( ^^^^^^^^^^^^^^ File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/asyncio/runners.py", line 195, in run return runner.run(main) ^^^^^^^^^^^^^^^^ File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/asyncio/runners.py", line 118, in run return self._loop.run_until_complete(task) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/uvloop/__init__.py", line 61, in wrapper return await main ^^^^^^^^^^ File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 1078, in run_server async with build_async_engine_client(args) as engine_client: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/contextlib.py", line 210, in __aenter__ return await anext(self.gen) ^^^^^^^^^^^^^^^^^^^^^ File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 146, in build_async_engine_client async with build_async_engine_client_from_engine_args( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/contextlib.py", line 210, in __aenter__ return await anext(self.gen) ^^^^^^^^^^^^^^^^^^^^^ File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 269, in build_async_engine_client_from_engine_args raise RuntimeError( RuntimeError: Engine process failed to start. See stack trace for the root cause.
最新发布
05-11
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值