使用Xinference部署本地LLM并结合LlamaIndex进行数据查询

最新推荐文章于 2024-08-16 07:30:00 发布

qq_37836323

最新推荐文章于 2024-08-16 07:30:00 发布

阅读量353

点赞数 10

文章标签： python 前端网络

本文链接：https://blog.csdn.net/qq_29929123/article/details/140678108

版权

在本文中，我们将演示如何使用Xinference部署本地LLM，并结合LlamaIndex对数据进行查询。我们将使用Llama 2 chat模型作为示例，但代码可以轻松转移到所有Xinference支持的LLM聊天模型上。

安装Xinference

首先，我们需要安装Xinference。在终端窗口中运行以下命令：

pip install "xinference[all]"

安装完成后，重启Jupyter Notebook。然后，在新的终端窗口中运行以下命令启动Xinference：

xinference

你应该会看到类似以下的输出：

INFO:xinference:Xinference successfully started. Endpoint: http://127.0.0.1:9997
INFO:xinference.core.service:Worker 127.0.0.1:21561 has been added successfully
INFO:xinference.deploy.worker:Xinference worker successfully started.

记录端点端口号，例如上面的例子中端口号是9997。然后使用以下代码设置端口号：

%pip install llama-index-llms-xinference

port = 9997  # 将此处替换为你的端点端口号

启动本地模型

接下来，我们将导入相关库并启动模型：

!pip install llama-index

from llama_index.core import SummaryIndex, TreeIndex, VectorStoreIndex, KeywordTableIndex, KnowledgeGraphIndex, SimpleDirectoryReader
from llama_index.llms.xinference import Xinference
from xinference.client import RESTfulClient
from IPython.display import Markdown, display

# 定义客户端以发送命令到xinference
client = RESTfulClient(f"http://localhost:{port}")

# 下载并启动模型，这可能需要一些时间
model_uid = client.launch_model(
    model_name="llama-2-chat",
    model_size_in_billions=7,
    model_format="ggmlv3",
    quantization="q2_K",
)  # 中转API

创建索引并进行查询

我们将结合模型和数据创建查询引擎。可以使用不同的索引类型来体验不同的效果，这里我们使用VectorStoreIndex：

# 从数据创建索引
documents = SimpleDirectoryReader("../data/paul_graham").load_data()

# 使用VectorStoreIndex创建索引
index = VectorStoreIndex.from_documents(documents=documents)

# 创建查询引擎
query_engine = index.as_query_engine(llm=llm)

# 设置温度和最大回答长度（以token为单位）
llm.__dict__.update({"temperature": 0.0})
llm.__dict__.update({"max_tokens": 2048})

# 提问并显示答案
question = "What did the author do after his time at Y Combinator?"
response = query_engine.query(question)
display(Markdown(f"<b>{response}</b>"))