RAG 04：重排序（Re-ranking）技术探讨

最新推荐文章于 2025-03-14 06:30:00 发布

LLM大模型

最新推荐文章于 2025-03-14 06:30:00 发布

阅读量1w

点赞数 28

文章标签：人工智能算法机器学习 python LLM 微调大模型

本文链接：https://blog.csdn.net/DEVELOPERAA/article/details/138791251

版权

重排序（Re-ranking）技术在检索增强生成（Retrieval Augmented Generation，RAG）全流程中起着至关重要的作用。在最原始的 RAG 方法中，可能会检索到大量的上下文，但并非所有上下文都与问题相关。重排序（Re-ranking）技术会重新排列文档的顺序，并对其进行筛选，排除掉不相关或不重要的文档，将相关文档放在最前面，从而提高 RAG 系统的准确性。

本文介绍了 RAG 系统的重排序（Re-ranking）技术，并演示了两种将重排序（Re-ranking）技术融入到 RAG 系统中的主流方法。

01 Re-ranking 技术简介

图 1：RAG 中的重排序技术，其任务是评估这些上下文的相关性，并优先选择最有可能帮助模型响应更准确并相关的上下文（红框标注部分）。图片由原文作者提供。

如图 1 所示，重排序（Re-ranking）的作用类似于一个智能过滤器（intelligent filter）。当检索器（retriever）从建立了索引的文档或数据集合中检索到多个上下文时，这些上下文可能与用户发送的 query 非常相关（如图 1 中的红色矩形框），而其他可能只是相关性较低，甚至完全不相关（如图1中的绿色矩形框和蓝色矩形框）。

重排序（Re-ranking）的任务是评估这些上下文的相关性，并优先选择最有可能帮助模型响应更准确并相关的上下文。这使得语言模型能够在生成回答时优先考虑这些排名靠前的上下文，从而提高最终响应的准确性和质量。

简单来说，重排序（Re-ranking）就像在开卷考试中帮助你从一堆学习材料中选择最相关的参考资料，这样你就能更高效、更准确地回答问题。

本文要介绍的重排序方法主要分为以下两种类型：

重排序模型（Re-ranking models） ：这些重排序模型会分析用户提出的 query 与文档之间的交互特征，以便更准确地评估它们之间的相关性。
LLM：通过使用 LLM 深入理解整个文档和用户提出的 query ，可以更全面地捕捉语义信息。

02 将 Re-ranking models 作为 reranker 使用

与嵌入模型（embedding model）不同，重排序模型（re-ranking model）将用户提出的 query 和上下文作为输入，直接输出 similarity scores （译者注：指的是重排序模型输出的文档与 query 之间的相似程度评分），而不是嵌入（embeddings）。值得注意的是，重排序模型是利用交叉熵损失（cross-entropy loss）进行优化的[1]，因此 similarity scores 不局限于特定范围，甚至可以是负数。

目前，市面上可用的重排序模型并不多。其中一个选择是 Cohere[2] 的在线模型，可以通过调用 API 访问。此外，还有一些开源模型，如 bge-reranker-base 和 bge-reranker-large 等[3]。

图 2 显示了使用 Hit Rate（译者注：表示检索结果中与 query 相关的文档所占的比例）和Mean Reciprocal Rank (MRR) （译者注：对每个 query 计算 Reciprocal Rank (RR) 然后取平均值。Reciprocal Rank是指模型返回的第一个相关结果的位置的倒数。）指标得出的评估结果：

图 2：使用 Hit Rate 和Mean Reciprocal Rank (MRR) 指标得出的评估结果。

Source：Boosting RAG: Picking Best Embedding & Reranker models（blog.llamaindex.ai/boosting-ra…

从这个评估结果可以看出：

无论使用哪种嵌入模型（embedding model），重排序技术都能够达到更高的 hit rate 和 MRR，表明重排序技术的影响显著。
目前，最佳的重排序模型是 Cohere[2] ，但它是一项付费服务。开源的 bge-reranker-large 模型[3]具有与 Cohere 类似的能力。
嵌入模型（embedding models）和重排序模型（re-ranking models）如何组合也可能对 RAG System 的性能产生一定的影响，因此开发者可能需要在实际开发过程中尝试不同的组合。

在本文中，将使用 bge-reranker-base 模型。

2.1 开发环境配置

导入相关库，设置环境变量和全局变量。

javascript
复制代码
import os
os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_KEY"

from llama_index import VectorStoreIndex, SimpleDirectoryReader
from llama_index.postprocessor.flag_embedding_reranker import FlagEmbeddingReranker
from llama_index.schema import QueryBundle

dir_path = "YOUR_DIR_PATH"

目录中只有一个 PDF 文件，即 “TinyLlama: An Open Source Small Language Model[4]”。

bash
复制代码
(py) Florian:~ Florian$ ls /Users/Florian/Downloads/pdf_test/
tinyllama.pdf

2.2 使用 LlamaIndex 构建一个简单的检索器

ini
复制代码
documents = SimpleDirectoryReader(dir_path).load_data()
index = VectorStoreIndex.from_documents(documents)
retriever = index.as_retriever(similarity_top_k = 3)

2.3 基础检索功能的实现

ini
复制代码
query = "Can you provide a concise description of the TinyLlama model?"
nodes = retriever.retrieve(query)
for node in nodes:
 print('----------------------------------------------------')
    display_source_node(node, source_length = 500)

display_source_node 函数改编自 llama_index 源代码[5]。原函数是为 Jupyter notebook 设计的，因此被修改如下：

python
复制代码
from llama_index.schema import ImageNode, MetadataMode, NodeWithScore
from llama_index.utils import truncate_text

def display_source_node(
    source_node: NodeWithScore,
    source_length: int = 100,
    show_source_metadata: bool = False,
    metadata_mode: MetadataMode = MetadataMode.NONE,
) -> None:
 """Display source node"""
    source_text_fmt = truncate_text(
        source_node.node.get_content(metadata_mode=metadata_mode).strip(), source_length
 )
    text_md = (
 f"Node ID: {source_node.node.node_id} \n"
 f"Score: {source_node.score} \n"
 f"Text: {source_text_fmt} \n"
 )
 if show_source_metadata:
        text_md += f"Metadata: {source_node.node.metadata} \n"
 if isinstance(source_node.node, ImageNode):
        text_md += "Image:"

 print(text_md)
 # display(Markdown(text_md))
 # if isinstance(source_node.node, ImageNode) and source_node.node.image is not None:
 #     display_image(source_node.node.image)

基本检索（basic retrieving）的结果如下，输出内容表示重排序之前的前 3 个节点（Node）：

sql
复制代码
----------------------------------------------------
Node ID: 438b9d91-cd5a-44a8-939e-3ecd77648662 
Score: 0.8706055408845863 
Text: 4 Conclusion
In this paper, we introduce TinyLlama, an open-source, small-scale language model. To promote
transparency in the open-source LLM pre-training community, we have released all relevant infor-
mation, including our pre-training code, all intermediate model checkpoints, and the details of our
data processing steps. With its compact architecture and promising performance, TinyLlama can
enable end-user applications on mobile devices, and serve as a lightweight platform for testing a
w... 

----------------------------------------------------
Node ID: ca4db90f-5c6e-47d5-a544-05a9a1d09bc6 
Score: 0.8624531691777889 
Text: TinyLlama: An Open-Source Small Language Model
Peiyuan Zhang∗Guangtao Zeng∗Tianduo Wang Wei Lu
StatNLP Research Group
Singapore University of Technology and Design
{peiyuan_zhang, tianduo_wang, @sutd.edu.sg">luwei}@sutd.edu.sg
guangtao_zeng@mymail.sutd.edu.sg
Abstract
We present TinyLlama, a compact 1.1B language model pretrained on around 1
trillion tokens for approximately 3 epochs. Building on the architecture and tok-
enizer of Llama 2 (Touvron et al., 2023b), TinyLlama leverages various advances
contr... 

----------------------------------------------------
Node ID: e2d97411-8dc0-40a3-9539-a860d1741d4f 
Score: 0.8346160605298356 
Text: Although these works show a clear preference on large models, the potential of training smaller
models with larger dataset remains under-explored. Instead of training compute-optimal language
models, Touvron et al. (2023a) highlight the importance of the inference budget, instead of focusing
solely on training compute-optimal language models. Inference-optimal language models aim for
optimal performance within specific inference constraints This is achieved by training models with
more tokens...

2.4 Re-ranking

使用 bge-reranker-base 模型对上述节点（Node）进行重排序。

ini
复制代码
print('------------------------------------------------------------------------------------------------')
print('Start reranking...')

reranker = FlagEmbeddingReranker(
    top_n = 3,
    model = "BAAI/bge-reranker-base",
)

query_bundle = QueryBundle(query_str=query)
ranked_nodes = reranker._postprocess_nodes(nodes, query_bundle = query_bundle)
for ranked_node in ranked_nodes:
 print('----------------------------------------------------')
    display_source_node(ranked_node, source_length = 500)

重排序后的结果如下：

css
复制代码
------------------------------------------------------------------------------------------------
Start reranking...
----------------------------------------------------
Node ID: ca4db90f-5c6e-47d5-a544-05a9a1d09bc6 
Score: -1.584416151046753 
Text: TinyLlama: An Open-Source Small Language Model
Peiyuan Zhang∗Guangtao Zeng∗Tianduo Wang Wei Lu
StatNLP Research Group
Singapore University of Technology and Design
{peiyuan_zhang, tianduo_wang, @sutd.edu.sg">luwei}@sutd.edu.sg
guangtao_zeng@mymail.sutd.edu.sg
Abstract
We present TinyLlama, a compact 1.1B language model pretrained on around 1
trillion tokens for approximately 3 epochs. Building on the architecture and tok-
enizer of Llama 2 (Touvron et al., 2023b), TinyLlama leverages various advances
contr... 

----------------------------------------------------
Node ID: e2d97411-8dc0-40a3-9539-a860d1741d4f 
Score: -1.7028117179870605 
Text: Although these works show a clear preference on large models, the potential of training smaller
models with larger dataset remains under-explored. Instead of training compute-optimal language
models, Touvron et al. (2023a) highlight the importance of the inference budget, instead of focusing
solely on training compute-optimal language models. Inference-optimal language models aim for
optimal performance within specific inference constraints This is achieved by training models with
more tokens... 

----------------------------------------------------
Node ID: 438b9d91-cd5a-44a8-939e-3ecd77648662 
Score: -2.904750347137451 
Text: 4 Conclusion
In this paper, we introduce TinyLlama, an open-source, small-scale language model. To promote
transparency in the open-source LLM pre-training community, we have released all relevant infor-
mation, including our pre-training code, all intermediate model checkpoints, and the details of our
data processing steps. With its compact architecture and promising performance, TinyLlama can
enable end-user applications on mobile devices, and serve as a lightweight platform for testing a
w...

很明显，经过重排序，ID 为 ca4db90f-5c6e-47d5-a544-05a9a1d09bc6 的节点（Node）其排名从 2 变为 1，这说明最相关的上下文已经被排在了第一位。

03 将 LLM 作为 reranker 使用

现有的涉及 LLM 的重排序方法大致可分为三类：对 LLM 进行 fine-tuning，使其专门针对重排序任务进行训练、使用 Prompt 的方式引导 LLM 进行重排序以及利用 LLM 生成的数据来增强训练集，从而提高重排序模型的性能。

使用 Prompt 的方式引导 LLM 进行重排序的这种方法成本较低。以下内容将演示如何使用 RankGPT[6] 完成这类任务，该工具已经整合到 LlamaIndex[7] 中。

RankGPT 的理念是使用 LLM（如 ChatGPT 或 GPT-4 或其他 LLM）在没有针对特定任务进行训练的情况下，直接对一系列文档段落进行重排序。它采用内容排序方案生成方法（permutation generation approach）和滑动窗口策略（sliding window strategy）来高效地对文档段落重排序。

如图 3 所示，这篇论文[6]提出了三种可行的方法。

图 3：这些 instructions 说明了如何使用未经预训练的 LLM 执行特定的重排序任务。灰色框和黄色框表示模型的输入和输出。(a) Query generation 这种方法让 LLM 根据文档内容，使用其通过计算和学习得到的对数概率值来生成与该段落相关的 query 。 (b) Relevance generation 让 LLM 评估给定的文档段落与 query 之间的相关性，并输出相关性程度。© Permutation generation 会对一组文档段落进行重排列，并生成一个按照相关性排名的文档段落列表，以便确定哪些文档段落最适合给定的query。

Source：arxiv.org/pdf/2304.09…

前两种方法都是传统方法，根据相关性程度给每篇文档打分，然后所有的文档段落根据这个分数进行排序。

本文提出了第三种方法，permutation generation。具体来说，这种方法直接对文档或段落进行排序，而不是依赖外部得分或其他辅助信息来指导排序过程。也就是直接利用 LLM 的语义理解能力对所有候选段落进行相关性程度排名。

然而，候选文档的数量通常非常大，而 LLM 可能无法一次性处理所有的文本数据。因此，通常无法一次性输入所有文本。

图 4 ：使用滑动窗口对 8 个段落进行重排序的示意图，滑动窗口大小为 4，步长为 2。蓝色框代表前两个窗口，黄色框代表最后一个窗口。滑动窗口的应用顺序是从后向前的，这说明前一个窗口中的前两个段落将参与下一个窗口的重排序。

Source：arxiv.org/pdf/2304.09…

因此，如图 4 所示，我们引入了一种遵循冒泡排序思想的滑动窗口方法。 每次只对前 4 段文本进行排序，然后移动窗口，对后面 4 段文本进行排序。在对所有文本进行迭代之后，我们可以得到相关性程度较高的前几段文本。

请注意，要使用 RankGPT，需要安装较新版本的 LlamaIndex。我之前安装的版本（0.9.29）不包含 RankGPT 。因此，我创建了一个新的 conda 环境，其中包含 LlamaIndex 0.9.45.post1 版本。

代码很简单，基于上一节的代码，只需将 RankGPT 设置为 reranker 即可。

ini
复制代码
from llama_index.postprocessor import RankGPTRerank
from llama_index.llms import OpenAI
reranker = RankGPTRerank(
    top_n = 3,
    llm = OpenAI(model="gpt-3.5-turbo-16k"),
 # verbose=True,
)

总体结果如下:

sql
复制代码
(llamaindex_new) Florian:~ Florian$ python /Users/Florian/Documents/rerank.py 
----------------------------------------------------
Node ID: 20de8234-a668-442d-8495-d39b156b44bb 
Score: 0.8703492815379594 
Text: 4 Conclusion
In this paper, we introduce TinyLlama, an open-source, small-scale language model. To promote
transparency in the open-source LLM pre-training community, we have released all relevant infor-
mation, including our pre-training code, all intermediate model checkpoints, and the details of our
data processing steps. With its compact architecture and promising performance, TinyLlama can
enable end-user applications on mobile devices, and serve as a lightweight platform for testing a
w... 

----------------------------------------------------
Node ID: 47ba3955-c6f8-4f28-a3db-f3222b3a09cd 
Score: 0.8621633467539512 
Text: TinyLlama: An Open-Source Small Language Model
Peiyuan Zhang∗Guangtao Zeng∗Tianduo Wang Wei Lu
StatNLP Research Group
Singapore University of Technology and Design
{peiyuan_zhang, tianduo_wang, @sutd.edu.sg">luwei}@sutd.edu.sg
guangtao_zeng@mymail.sutd.edu.sg
Abstract
We present TinyLlama, a compact 1.1B language model pretrained on around 1
trillion tokens for approximately 3 epochs. Building on the architecture and tok-
enizer of Llama 2 (Touvron et al., 2023b), TinyLlama leverages various advances
contr... 

----------------------------------------------------
Node ID: 17cd9896-473c-47e0-8419-16b4ac615a59 
Score: 0.8343984516104476 
Text: Although these works show a clear preference on large models, the potential of training smaller
models with larger dataset remains under-explored. Instead of training compute-optimal language
models, Touvron et al. (2023a) highlight the importance of the inference budget, instead of focusing
solely on training compute-optimal language models. Inference-optimal language models aim for
optimal performance within specific inference constraints This is achieved by training models with
more tokens... 

------------------------------------------------------------------------------------------------
Start reranking...
----------------------------------------------------
Node ID: 47ba3955-c6f8-4f28-a3db-f3222b3a09cd 
Score: 0.8621633467539512 
Text: TinyLlama: An Open-Source Small Language Model
Peiyuan Zhang∗Guangtao Zeng∗Tianduo Wang Wei Lu
StatNLP Research Group
Singapore University of Technology and Design
{peiyuan_zhang, tianduo_wang, @sutd.edu.sg">luwei}@sutd.edu.sg
guangtao_zeng@mymail.sutd.edu.sg
Abstract
We present TinyLlama, a compact 1.1B language model pretrained on around 1
trillion tokens for approximately 3 epochs. Building on the architecture and tok-
enizer of Llama 2 (Touvron et al., 2023b), TinyLlama leverages various advances
contr... 

----------------------------------------------------
Node ID: 17cd9896-473c-47e0-8419-16b4ac615a59 
Score: 0.8343984516104476 
Text: Although these works show a clear preference on large models, the potential of training smaller
models with larger dataset remains under-explored. Instead of training compute-optimal language
models, Touvron et al. (2023a) highlight the importance of the inference budget, instead of focusing
solely on training compute-optimal language models. Inference-optimal language models aim for
optimal performance within specific inference constraints This is achieved by training models with
more tokens... 

----------------------------------------------------
Node ID: 20de8234-a668-442d-8495-d39b156b44bb 
Score: 0.8703492815379594 
Text: 4 Conclusion
In this paper, we introduce TinyLlama, an open-source, small-scale language model. To promote
transparency in the open-source LLM pre-training community, we have released all relevant infor-
mation, including our pre-training code, all intermediate model checkpoints, and the details of our
data processing steps. With its compact architecture and promising performance, TinyLlama can
enable end-user applications on mobile devices, and serve as a lightweight platform for testing a
w...

请注意，由于使用了 LLM，重排序（re-ranking）后的相关性分数并未被更新。当然，这并不重要。

从结果中我们可以看到，经过重排序后，排在第一位的结果是包含正确答案的文本段落，这与前面使用重排序模型得到的结果是一致的。

04 评估使用重排序（re-ranking）技术优化后的RAG系统

我们可以使用前一篇文章（Advanced RAG 03）中描述的方法进行评估。

具体过程已在本系列的前一篇文章中作了介绍。修改后的代码如下：

ini
复制代码
reranker = FlagEmbeddingReranker(
    top_n = 3,
    model = "BAAI/bge-reranker-base",
    use_fp16 = False
)

# or using LLM as reranker
# from llama_index.postprocessor import RankGPTRerank
# from llama_index.llms import OpenAI
# reranker = RankGPTRerank(
#     top_n = 3,
#     llm = OpenAI(model="gpt-3.5-turbo-16k"),
#     # verbose=True,
# )

query_engine = index.as_query_engine( # add reranker to query_engine
    similarity_top_k = 3, 
    node_postprocessors=[reranker]
)
# query_engine = index.as_query_engine()    # original query_engine

对此感兴趣的读者可以试一试。

05 Conclusion

本文介绍了重排序（re-ranking）的原理和两种主流方法。其中，使用重排序模型的这种方法轻量且开销较小。另一方面，使用 LLM 的这种方法在多个基准测试上都表现良好[7]，但成本较高，并且仅在使用 ChatGPT 和 GPT-4 时表现良好，而在使用其他开源模型（如 FLAN-T5 和 Vicuna-13B ）时性能不如人意。 因此，在实际应用中，需要根据具体情况进行具体分析。

如何系统的去学习大模型LLM ？

作为一名热心肠的互联网老兵，我意识到有很多经验和知识值得分享给大家，也可以通过我们的能力和经验解答大家在人工智能学习中的很多困惑，所以在工作繁忙的情况下还是坚持各种整理和分享。

但苦于知识传播途径有限，很多互联网行业朋友无法获得正确的资料得到学习提升，故此将并将重要的 AI大模型资料 包括AI大模型入门学习思维导图、精品AI大模型学习书籍手册、视频教程、实战学习等录播视频免费分享出来。

😝有需要的小伙伴，可以V扫描下方二维码免费领取🆓

在这里插入图片描述

一、全套AGI大模型学习路线

AI大模型时代的学习之旅：从基础到前沿，掌握人工智能的核心技能！

二、640套AI大模型报告合集

这套包含640份报告的合集，涵盖了AI大模型的理论研究、技术实现、行业应用等多个方面。无论您是科研人员、工程师，还是对AI大模型感兴趣的爱好者，这套报告合集都将为您提供宝贵的信息和启示。

三、AI大模型经典PDF籍

随着人工智能技术的飞速发展，AI大模型已经成为了当今科技领域的一大热点。这些大型预训练模型，如GPT-3、BERT、XLNet等，以其强大的语言理解和生成能力，正在改变我们对人工智能的认识。那以下这些PDF籍就是非常不错的学习资源。

在这里插入图片描述

四、AI大模型商业化落地方案

阶段1：AI大模型时代的基础理解

目标：了解AI大模型的基本概念、发展历程和核心原理。
内容：
- L1.1 人工智能简述与大模型起源
- L1.2 大模型与通用人工智能
- L1.3 GPT模型的发展历程
- L1.4 模型工程
  - L1.4.1 知识大模型
  - L1.4.2 生产大模型
  - L1.4.3 模型工程方法论
  - L1.4.4 模型工程实践
- L1.5 GPT应用案例

阶段2：AI大模型API应用开发工程

目标：掌握AI大模型API的使用和开发，以及相关的编程技能。
内容：
- L2.1 API接口
  - L2.1.1 OpenAI API接口
  - L2.1.2 Python接口接入
  - L2.1.3 BOT工具类框架
  - L2.1.4 代码示例
- L2.2 Prompt框架
  - L2.2.1 什么是Prompt
  - L2.2.2 Prompt框架应用现状
  - L2.2.3 基于GPTAS的Prompt框架
  - L2.2.4 Prompt框架与Thought
  - L2.2.5 Prompt框架与提示词
- L2.3 流水线工程
  - L2.3.1 流水线工程的概念
  - L2.3.2 流水线工程的优点
  - L2.3.3 流水线工程的应用
- L2.4 总结与展望

阶段3：AI大模型应用架构实践

目标：深入理解AI大模型的应用架构，并能够进行私有化部署。
内容：
- L3.1 Agent模型框架
  - L3.1.1 Agent模型框架的设计理念
  - L3.1.2 Agent模型框架的核心组件
  - L3.1.3 Agent模型框架的实现细节
- L3.2 MetaGPT
  - L3.2.1 MetaGPT的基本概念
  - L3.2.2 MetaGPT的工作原理
  - L3.2.3 MetaGPT的应用场景
- L3.3 ChatGLM
  - L3.3.1 ChatGLM的特点
  - L3.3.2 ChatGLM的开发环境
  - L3.3.3 ChatGLM的使用示例
- L3.4 LLAMA
  - L3.4.1 LLAMA的特点
  - L3.4.2 LLAMA的开发环境
  - L3.4.3 LLAMA的使用示例
- L3.5 其他大模型介绍