Xinference介绍
https://github.com/xorbitsai/inference
Xorbits Inference(Xinference)是一个性能强大且功能全面的分布式推理框架。可用于大语言模型(LLM),语音识别模型,多模态模型等各种模型的推理。通过 Xorbits Inference,你可以轻松地一键部署你自己的模型或内置的前沿开源模型。无论你是研究者,开发者,或是数据科学家,都可以通过 Xorbits Inference 与最前沿的 AI 模型,发掘更多可能。
我的docker-compose.yml 文件, 执行docker compose up -d 一键部署
services:
xinference:
image: xprobe/xinference:latest
ports:
- "9997:9997"
volumes:
# # Replace <xinference_home> with your xinference home path on the host machine
- "${xinference_home}:/root/.xinference"
# # Replace <huggingface_cache_dir> with your huggingface cache path, default is
# # <home_path>/.cache/huggingface
- "${huggingface_cache_dir}:/root/.cache/huggingface"
# # If models are downloaded from modelscope, replace <huggingface_cache_dir> with
# # your modelscope cache path, default is <home_path>/.cache/modelscope
- "${modelscope_cache_dir}:/root/.cache/modelscope"
environment:
# # add envs here. Here's an example, if you want to download model from modelscope
- XINFERENCE_MODEL_SRC=modelscope
command: xinference-local --host 0.0.0.0 --port 9997
deploy:
resources:
reservations:
devices:
- capabilities: [gpu]
driver: nvidia
count: all
提供常见模型
我使用的qwen2
在Runing models找到启动后的模型
点击Action的第一个按钮可以直接测试
from xinference.client import Client
client = Client(“http://localhost:9997”)
model = client.get_model(“MODEL_UID”)
测试
model.chat(
messages=[{"role": "system", "content": "You are a helpful assistant"}, {"role": "user", "content": "What is the largest animal?"}],
generate_config={"max_tokens": 1024}
)