xinference.api.restful_api KeyError: ‘model.embed_tokens.weight‘

使用xinference运行qwen2选择8B量化运行时报错:KeyError: [address=127.0.0.1:59995, pid=14340] 'model.embed_tokens.weight'

具体日志报错如下:

Traceback (most recent call last):
  File "C:\Users\dell\anaconda3\lib\site-packages\xinference\api\restful_api.py", line 878, in launch_model
    model_uid = await (await self._get_supervisor_ref()).launch_builtin_model(
  File "C:\Users\dell\anaconda3\lib\site-packages\xoscar\backends\context.py", line 231, in send
    return self._process_result_message(result)
  File "C:\Users\dell\anaconda3\lib\site-packages\xoscar\backends\context.py", line 102, in _process_result_message
    raise message.as_instanceof_cause()
  File "C:\Users\dell\anaconda3\lib\site-packages\xoscar\backends\pool.py", line 656, in send
    result = await self._run_coro(message.message_id, coro)
  File "C:\Users\dell\anaconda3\lib\site-packages\xoscar\backends\pool.py", line 367, in _run_coro
    return await coro
  File "C:\Users\dell\anaconda3\lib\site-packages\xoscar\api.py", line 384, in __on_receive__
    return await super().__on_receive__(message)  # type: ignore
  File "xoscar\\core.pyx", line 558, in __on_receive__
    raise ex
  File "xoscar\\core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
    async with self._lock:
  File "xoscar\\core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
    with debug_async_timeout('actor_lock_timeout',
  File "xoscar\\core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
    result = await result
  File "C:\Users\dell\anaconda3\lib\site-packages\xinference\core\supervisor.py", line 1027, in launch_builtin_model
    await _launch_model()
  File "C:\Users\dell\anaconda3\lib\site-packages\xinference\core\supervisor.py", line 991, in _launch_model
    await _launch_one_model(rep_model_uid)
  File "C:\Users\dell\anaconda3\lib\site-packages\xinference\core\supervisor.py", line 970, in _launch_one_model
    await worker_ref.launch_builtin_model(
  File "xoscar\\core.pyx", line 284, in __pyx_actor_method_wrapper
    async with lock:
  File "xoscar\\core.pyx", line 287, in xoscar.core.__pyx_actor_method_wrapper
    result = await result
  File "C:\Users\dell\anaconda3\lib\site-packages\xinference\core\utils.py", line 45, in wrapped
    ret = await func(*args, **kwargs)
  File "C:\Users\dell\anaconda3\lib\site-packages\xinference\core\worker.py", line 882, in launch_builtin_model
    await model_ref.load()
  File "C:\Users\dell\anaconda3\lib\site-packages\xoscar\backends\context.py", line 231, in send
    return self._process_result_message(result)
  File "C:\Users\dell\anaconda3\lib\site-packages\xoscar\backends\context.py", line 102, in _process_result_message
    raise message.as_instanceof_cause()
  File "C:\Users\dell\anaconda3\lib\site-packages\xoscar\backends\pool.py", line 656, in send
    result = await self._run_coro(message.message_id, coro)
  File "C:\Users\dell\anaconda3\lib\site-packages\xoscar\backends\pool.py", line 367, in _run_coro
    return await coro
  File "C:\Users\dell\anaconda3\lib\site-packages\xoscar\api.py", line 384, in __on_receive__
    return await super().__on_receive__(message)  # type: ignore
  File "xoscar\\core.pyx", line 558, in __on_receive__
    raise ex
  File "xoscar\\core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
    async with self._lock:
  File "xoscar\\core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
    with debug_async_timeout('actor_lock_timeout',
  File "xoscar\\core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
    result = await result
  File "C:\Users\dell\anaconda3\lib\site-packages\xinference\core\model.py", line 300, in load
    self._model.load()
  File "C:\Users\dell\anaconda3\lib\site-packages\xinference\model\llm\pytorch\core.py", line 769, in load
    super().load()
  File "C:\Users\dell\anaconda3\lib\site-packages\xinference\model\llm\pytorch\core.py", line 310, in load
    ) = load_compress_model(
  File "C:\Users\dell\anaconda3\lib\site-packages\xinference\model\llm\pytorch\compression.py", line 163, in load_compress_model
    model, name, device, value=compressed_state_dict[name]
KeyError: [address=127.0.0.1:59995, pid=14340] 'model.embed_tokens.weight'

原因是:xinference在使用指定量化时,只能运行bin文件。而qwen2运行时生成的是safetensors文件

解决方法:使用xinference运行qwen2在指定量化规模时,选择none运行即可。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值