Centos上用Xinference部署大模型！

Q794469

已于 2024-05-24 19:13:57 修改

阅读量1k

点赞数 2

分类专栏： Cpu/Gpu部署/分布式部署大模型文章标签： python 服务器 AIGC

于 2024-05-23 10:37:13 首次发布

本文链接：https://blog.csdn.net/Q794469/article/details/139125910

版权

Cpu/Gpu部署/分布式部署大模型专栏收录该内容

1 篇文章 0 订阅

订阅专栏

创建单独环境：

conda create -n xinference python=3.10

安装前需要配置gcc：

centos:

yum install centos-release-scl && yum install devtoolset-11-gcc* && scl enable devtoolset-11 bash && gcc -v

ubuntu:

sudo apt update
sudo apt install build-essential

可能会报错：

Reading package lists... Done Building dependency tree Reading state information... Done build-essential is already the newest version (12.8ubuntu1.1). 0 upgraded, 0 newly installed, 0 to remove and 248 not upgraded.

就是没有更新APT库
更新一下就好了

sudo apt-get update
sudo apt-get upgrade

或将python版本降低至3.8.

Xinference 安装(下载时间很长耐心等待)：

pip install "xinference[all]"

安装成功！

启动服务：

XINFERENCE_HOME=/home/xuhui/tmp/xinference：

通过配置环境变量来更改模型下载目录

XINFERENCE_MODEL_SRC=modelscope：

设置从从 modelscope 下载模型因为默认从 Hugging Face 下载模型服务器没有设置代理的话会失败！

xinference-local：

启动命令

--host 0.0.0.0 --port 9997 ：

设置本地ip及端口号

XINFERENCE_HOME=/home/xuhui/tmp/xinference XINFERENCE_MODEL_SRC=modelscope xinference-local --host 0.0.0.0 --port 9997

服务启动成功！

另开端口执行模型下载启动命令：

xinference launch --model-name chatglm3 --size-in-billions 6 --model-format ggmlv3 --quantization q4_0 --model-engine llama.cpp

开始下载模型！

下载完成！

调用测试成功！

from xinference.client import RESTfulClient

client = RESTfulClient("http://127.0.0.1:9997")
model = client.get_model("chatglm3")
print(model.generate(
    prompt="What is the largest animal?",
    generate_config={
      "max_tokens": 512,
      "temperature": 0.7
    }
))