Agent探索之本地大模型推理（Xinference）

最新推荐文章于 2024-10-15 01:20:33 发布

半度、

最新推荐文章于 2024-10-15 01:20:33 发布

阅读量358

点赞数 6

分类专栏：大模型文章标签：人工智能 semantic_kernel

本文链接：https://blog.csdn.net/weixin_43457608/article/details/142099056

版权

大模型专栏收录该内容

13 篇文章 0 订阅

订阅专栏

Xinference介绍

https://github.com/xorbitsai/inference

Xorbits Inference（Xinference）是一个性能强大且功能全面的分布式推理框架。可用于大语言模型（LLM），语音识别模型，多模态模型等各种模型的推理。通过 Xorbits Inference，你可以轻松地一键部署你自己的模型或内置的前沿开源模型。无论你是研究者，开发者，或是数据科学家，都可以通过 Xorbits Inference 与最前沿的 AI 模型，发掘更多可能。

我的docker-compose.yml 文件，执行docker compose up -d 一键部署

services:
  xinference:
    image: xprobe/xinference:latest
    ports:
      - "9997:9997"
    volumes:
#      # Replace <xinference_home> with your xinference home path on the host machine
      - "${xinference_home}:/root/.xinference"
#      # Replace <huggingface_cache_dir> with your huggingface cache path, default is
#      # <home_path>/.cache/huggingface
      - "${huggingface_cache_dir}:/root/.cache/huggingface"
#      # If models are downloaded from modelscope, replace <huggingface_cache_dir> with
#      # your modelscope cache path, default is <home_path>/.cache/modelscope
      - "${modelscope_cache_dir}:/root/.cache/modelscope"
    environment:
#      # add envs here. Here's an example, if you want to download model from modelscope
      - XINFERENCE_MODEL_SRC=modelscope
    command: xinference-local --host 0.0.0.0 --port 9997
    deploy:
      resources:
        reservations:
          devices:
            - capabilities: [gpu]
              driver: nvidia
              count: all

提供常见模型

在这里插入图片描述

我使用的qwen2

在这里插入图片描述

在Runing models找到启动后的模型

在这里插入图片描述

点击Action的第一个按钮可以直接测试

在这里插入图片描述
from xinference.client import Client

client = Client(“http://localhost:9997”)
model = client.get_model(“MODEL_UID”)

测试

model.chat(
   messages=[{"role": "system", "content": "You are a helpful assistant"}, {"role": "user", "content": "What is the largest animal?"}],
   generate_config={"max_tokens": 1024}
)