特别推荐!在modelscope上可以使用免费的CPU和限时的GPU啦,成功安装xinference框架,并部署qwen-1.5大模型,速度7 tokens/s

9 篇文章 0 订阅
1 篇文章 0 订阅

使用pip 安装xinference

在这里插入图片描述

特别推荐!在modelscope上可以使用免费的CPU和限时的GPU啦,成功安装xinference框架,并部署qwen-1.5大模型,速度7 tokens/s

# 安装xinf 
pip3 install xinference  

# 设置环境变量 
export HF_ENDPOINT=https://hf-mirror.com
export XINFERENCE_MODEL_SRC=modelscope
export XINFERENCE_HOME=/mnt/workspace/xinf-data

xinference-local --host 0.0.0.0 --port 9997

可以运行大模型:

xinference launch --model-engine transformers --model-name qwen1.5-chat --size-in-billions 0_5 --model-format pytorch --quantization none

# 成功
Launch model name: qwen1.5-chat with kwargs: {}
Model uid: qwen1.5-chat

运行大模型报错,不能使用 llama.cpp库

xinference launch --model-engine llama.cpp --model-name qwen1.5-chat --size-in-billions 0_5 --model-format ggufv2 --quantization none


Traceback (most recent call last):
  File "/opt/conda/bin/xinference", line 8, in <module>
    sys.exit(cli())
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/xinference/deploy/cmdline.py", line 814, in model_launch
    model_uid = client.launch_model(
  File "/opt/conda/lib/python3.10/site-packages/xinference/client/restful/restful_client.py", line 864, in launch_model
    raise RuntimeError(

测试时http接口

curl -X 'POST' 'http://0.0.0.0:9997/v1/chat/completions' \
-H 'Content-Type: application/json' -d '{
    "model": "qwen1.5-chat",
    "messages": [
        {
            "role": "user",
            "content": "北京景点?"
        }
    ],
    "temperature": 1
}'

curl -X 'POST' 'http://0.0.0.0:9997/v1/chat/completions' \
-H 'Content-Type: application/json' -d '{
    "model": "qwen1.5-chat","stream":true,
    "messages": [
        {
            "role": "user",
            "content": "北京景点?"
        }
    ],
    "temperature": 1
}'

测试速度&总结

2024-05-29 07:13:45,715 xinference.model.llm.pytorch.utils 3916 INFO Average generation speed: 7.19 tokens/s.

空闲的时候,会被删除掉。数据不会被保存!单次最长10个小时使用!!

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值