特别推荐！在modelscope上可以使用免费的CPU和限时的GPU啦，成功安装xinference框架，并部署qwen-1.5大模型，速度7 tokens/s

最新推荐文章于 2025-02-27 11:15:51 发布

fly-iot

最新推荐文章于 2025-02-27 11:15:51 发布

阅读量1.1k

点赞数 9

分类专栏：大模型 xinference modescope 文章标签： qwen大模型 modelscope

本文链接：https://blog.csdn.net/freewebsys/article/details/139283183

版权

大模型同时被 3 个专栏收录

74 篇文章

订阅专栏

xinference

9 篇文章

订阅专栏

modescope

1 篇文章

订阅专栏

使用pip 安装xinference

在这里插入图片描述

特别推荐！在modelscope上可以使用免费的CPU和限时的GPU啦，成功安装xinference框架，并部署qwen-1.5大模型，速度7 tokens/s

# 安装xinf 
pip3 install xinference  

# 设置环境变量 
export HF_ENDPOINT=https://hf-mirror.com
export XINFERENCE_MODEL_SRC=modelscope
export XINFERENCE_HOME=/mnt/workspace/xinf-data

xinference-local --host 0.0.0.0 --port 9997

可以运行大模型：

xinference launch --model-engine transformers --model-name qwen1.5-chat --size-in-billions 0_5 --model-format pytorch --quantization none

# 成功
Launch model name: qwen1.5-chat with kwargs: {}
Model uid: qwen1.5-chat

运行大模型报错，不能使用 llama.cpp库

xinference launch --model-engine llama.cpp --model-name qwen1.5-chat --size-in-billions 0_5 --model-format ggufv2 --quantization none


Traceback (most recent call last):
  File "/opt/conda/bin/xinference", line 8, in <module>
    sys.exit(cli())
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/xinference/deploy/cmdline.py", line 814, in model_launch
    model_uid = client.launch_model(
  File "/opt/conda/lib/python3.10/site-packages/xinference/client/restful/restful_client.py", line 864, in launch_model
    raise RuntimeError(

测试时http接口

curl -X 'POST' 'http://0.0.0.0:9997/v1/chat/completions' \
-H 'Content-Type: application/json' -d '{
    "model": "qwen1.5-chat",
    "messages": [
        {
            "role": "user",
            "content": "北京景点?"
        }
    ],
    "temperature": 1
}'

curl -X 'POST' 'http://0.0.0.0:9997/v1/chat/completions' \
-H 'Content-Type: application/json' -d '{
    "model": "qwen1.5-chat","stream":true,
    "messages": [
        {
            "role": "user",
            "content": "北京景点?"
        }
    ],
    "temperature": 1
}'