使用Xorbits Inference进行本地大模型部署及查询

最新推荐文章于 2024-07-22 14:55:58 发布

ppoojjj

最新推荐文章于 2024-07-22 14:55:58 发布

阅读量369

点赞数 4

文章标签： chrome 网络前端 python

本文链接：https://blog.csdn.net/ppoojjj/article/details/140249056

版权

介绍

在本文中，我们将展示如何使用Xorbits Inference (简称Xinference) 在本地部署大型语言模型（LLM）。我们将使用Llama 2聊天模型作为示例，但代码适用于所有支持的LLM聊天模型。

步骤概述

安装Xinference
启动本地模型
索引数据并进行查询

安装Xinference

首先，我们需要安装Xinference。请在终端窗口中运行以下命令：

pip install "xinference[all]"

安装完成后，重启您的Jupyter Notebook。然后在新的终端窗口中运行以下命令：

xinference

您应该会看到如下输出：

INFO:xinference:Xinference successfully started. Endpoint: http://127.0.0.1:9997
INFO:xinference.core.service:Worker 127.0.0.1:21561 has been added successfully
INFO:xinference.deploy.worker:Xinference worker successfully started.

在端点描述中查找端口号，在上面的例子中是9997。

接下来设置端口号：

%pip install llama-index-llms-xinference

port = 9997  # 替换为您的端口号

启动本地模型

在这一步中，我们将导入相关库并启动模型。

!pip install llama-index  # 如果缺少LlamaIndex，需安装

from llama_index.core import (
    SummaryIndex,
    TreeIndex,
    VectorStoreIndex,
    KeywordTableIndex,
    KnowledgeGraphIndex,
    SimpleDirectoryReader
)
from llama_index.llms.xinference import Xinference
from xinference.client import RESTfulClient
from IPython.display import Markdown, display

定义客户端并启动模型：

# 定义客户端以发送命令到Xinference
client = RESTfulClient(f"http://localhost:{port}")

# 下载并启动模型，初次可能需要等待
model_uid = client.launch_model(
    model_name="llama-2-chat",
    model_size_in_billions=7,
    model_format="ggmlv3",
    quantization="q2_K",
)

# 初始化Xinference对象以使用LLM
llm = Xinference(
    endpoint=f"http://localhost:{port}",
    model_uid=model_uid,
    temperature=0.0,
    max_tokens=512,
)

索引数据并进行查询

在这一步中，我们将模型与数据结合以创建查询引擎。

# 从数据创建索引
documents = SimpleDirectoryReader("../data/paul_graham").load_data()

# 使用VectorStoreIndex创建索引
index = VectorStoreIndex.from_documents(documents=documents)

# 创建查询引擎
query_engine = index.as_query_engine(llm=llm)

我们可以在询问问题前设置温度和最大回答长度。

# 可选，更新温度和最大回答长度
llm.__dict__.update({"temperature": 0.0})
llm.__dict__.update({"max_tokens": 2048})

# 问一个问题并显示答案
question = "What did the author do after his time at Y Combinator?"

response = query_engine.query(question)
display(Markdown(f"<b>{response}</b>"))

可能遇到的错误

无法安装Xinference: 确保您的网络连接正常，或者尝试使用国内镜像源安装包。
模型启动失败: 检查是否有足够的系统资源，尤其是内存和存储空间。
查询引擎创建失败: 确认数据路径正确且数据格式符合要求。

参考资料:

如果你觉得这篇文章对你有帮助,请点赞,关注我的博客,谢谢!

ppoojjj

关注

4
点赞
踩
10

收藏

觉得还不错? 一键收藏
0
评论
使用Xorbits Inference进行本地大模型部署及查询

在本文中，我们将展示如何使用Xorbits Inference (简称Xinference) 在本地部署大型语言模型（LLM）。我们将使用Llama 2聊天模型作为示例，但代码适用于所有支持的LLM聊天模型。安装Xinference启动本地模型索引数据并进行查询。
复制链接

扫一扫