重新排序在检索增强生成(RAG)过程中起着至关重要的作用。在naive RAG方法中,可以检索大量上下文,但并非所有上下文都与问题相关。重新排序允许对文档进行重新排序和过滤,将相关文档放在最前面,从而提高RAG的有效性。
本 文将介绍RAG的重新排序技术,并演示了如何使用两种方法合并重新排序功能。
一、重新排名介绍
如图1所示,重新排序的任务就像一个智能过滤器。当检索器从索引集合中检索多个上下文时,这些上下文与用户的查询的相关性可能不同,一些上下文可能非常相关(在图1中用红框突出显示),而另一些上下文可能只有轻微的相关甚至不相关(在图1中用绿框和蓝框高亮显示)。
重新排序的任务是评估这些上下文的相关性,并优先考虑最有可能提供准确和相关答案的上下文,让LLM在生成答案时优先考虑这些排名靠前的上下文,从而提高响应的准确性和质量。
简单地说,重新排名就像在开卷考试中帮助你从一堆学习材料中选择最相关的参考文献,这样你就可以更高效、更准确地回答问题。
本文描述的重新排序方法主要可分为以下两种类型:
- 重新排序模型:这些模型考虑了文档和查询之间的交互特征,以更准确地评估它们的相关性。
- LLM:LLM的出现为重新排名开辟了新的可能性。通过深入了解整个文档和查询,可以更全面地获取语义信息。
二、使用重新排序模型作为重新排序
与嵌入模型不同,重新排序模型以查询和上下文为输入,直接输出相似性得分,而不是嵌入。需要注意的是,重新排序模型是使用交叉熵损失进行优化的,允许相关性得分不限于特定范围,甚至可能是负的。
目前,没有太多可用的重新排序模型。一种选择是Cohere[1]的在线模型,可以通过API访问。另外还有开源模型,如bge-reranker-base和bge-reanker-large等。
命中率和平均倒数排名(MRR)指标的评估结果,如下图2所示:
从这个评估结果可以看出:
- 无论使用何种嵌入模型,重新排序都显示出更高的命中率和MRR,这表明重新排序的显著影响;
- 目前,最好的重新排名模型是Cohere[1],但它是一种付费服务。开源bge-reranker-large模型具有与Cohere类似的功能;
- 嵌入模型和重新排序模型的组合也会产生影响,因此开发人员可能需要在实际过程中尝试不同的组合。
三、在本文中,我们将使用bge-reranker-base模型进行演示
3.1 环境配置
导入相关库,设置环境和全局变量
import os
os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_KEY"
from llama_index import VectorStoreIndex, SimpleDirectoryReader
from llama_index.postprocessor.flag_embedding_reranker import FlagEmbeddingReranker
from llama_index.schema import QueryBundle
dir_path = "YOUR_DIR_PATH"
目录中只有一个PDF文件,使用的是论文“TinyLlama: An Open-Source Small Language Model”[2]。
(py) Florian:~ Florian$ ls /Users/Florian/Downloads/pdf_test/
tinyllama.pdf
3.2 使用LlamaIndex构建一个简单的检索器
documents = SimpleDirectoryReader(dir_path).load_data()
index = VectorStoreIndex.from_documents(documents)
retriever = index.as_retriever(similarity_top_k = 3)
3.3 基本检索
query = "Can you provide a concise description of the TinyLlama model?"
nodes = retriever.retrieve(query)
for node in nodes:
print('----------------------------------------------------')
display_source_node(node, source_length = 500)
display_source_node函数改编自llama_index源代码[3],因此修改如下:
from llama_index.schema import ImageNode, MetadataMode, NodeWithScore
from llama_index.utils import truncate_text
def display_source_node(
source_node: NodeWithScore,
source_length: int = 100,
show_source_metadata: bool = False,
metadata_mode: MetadataMode = MetadataMode.NONE,
) -> None:
"""Display source node"""
source_text_fmt = truncate_text(
source_node.node.get_content(metadata_mode=metadata_mode).strip(), source_length
)
text_md = (
f"Node ID: {source_node.node.node_id} \n"
f"Score: {source_node.score} \n"
f"Text: {source_text_fmt} \n"
)
if show_source_metadata:
text_md += f"Metadata: {source_node.node.metadata} \n"
if isinstance(source_node.node, ImageNode):
text_md += "Image:"
print(text_md)
# display(Markdown(text_md))
# if isinstance(source_node.node, ImageNode) and source_node.node.image is not None:
# display_image(source_node.node.image)
下面是基本检索的结果,重新排序前的前3个节点
----------------------------------------------------
Node ID: 438b9d91-cd5a-44a8-939e-3ecd77648662
Score: 0.8706055408845863
Text: 4 Conclusion
In this paper, we introduce TinyLlama, an open-source, small-scale language model. To promote
transparency in the open-source LLM pre-training community, we have released all relevant infor-
mation, including our pre-training code, all intermediate model checkpoints, and the details of our
data processing steps. With its compact architecture and promising performance, TinyLlama can
enable end-user applications on mobile devices, and serve as a lightweight platform for testing a
w...
----------------------------------------------------
Node ID: ca4db90f-5c6e-47d5-a544-05a9a1d09bc6
Score: 0.8624531691777889
Text: TinyLlama: An Open-Source Small Language Model
Peiyuan Zhang∗Guangtao Zeng∗Tianduo Wang Wei Lu
StatNLP Research Group
Singapore University of Technology and Design
{peiyuan_zhang, tianduo_wang, @sutd.edu.sg">luwei}@sutd.edu.sg
guangtao_zeng@mymail.sutd.edu.sg
Abstract
We present TinyLlama, a compact 1.1B language model pretrained on around 1
trillion tokens for approximately 3 epochs. Building on the architecture and tok-
enizer of Llama 2 (Touvron et al., 2023b), TinyLlama leverages various advances
contr...
----------------------------------------------------
Node ID: e2d97411-8dc0-40a3-9539-a860d1741d4f
Score: 0.8346160605298356
Text: Although these works show a clear preference on large models, the potential of training smaller
models with larger dataset remains under-explored. Instead of training compute-optimal language
models, Touvron et al. (2023a) highlight the importance of the inference budget, instead of focusing
solely on training compute-optimal language models. Inference-optimal language models aim for
optimal performance within specific inference constraints This is achieved by training models with
more tokens...
3.4 重新排序
要重新排列上述节点,这里使用bge-reranker-base模型
print('------------------------------------------------------------------------------------------------')
print('Start reranking...')
reranker = FlagEmbeddingReranker(
top_n = 3,
model = "BAAI/bge-reranker-base",
)
query_bundle = QueryBundle(query_str=query)
ranked_nodes = reranker._postprocess_nodes(nodes, query_bundle = query_bundle)
for ranked_node in ranked_nodes:
print('----------------------------------------------------')
display_source_node(ranked_node, source_length = 500)
重新排序后的结果如下:
------------------------------------------------------------------------------------------------
Start reranking...
----------------------------------------------------
Node ID: ca4db90f-5c6e-47d5-a544-05a9a1d09bc6
Score: -1.584416151046753
Text: TinyLlama: An Open-Source Small Language Model
Peiyuan Zhang∗Guangtao Zeng∗Tianduo Wang Wei Lu
StatNLP Research Group
Singapore University of Technology and Design
{peiyuan_zhang, tianduo_wang, @sutd.edu.sg">luwei}@sutd.edu.sg
guangtao_zeng@mymail.sutd.edu.sg
Abstract
We present TinyLlama, a compact 1.1B language model pretrained on around 1
trillion tokens for approximately 3 epochs. Building on the architecture and tok-
enizer of Llama 2 (Touvron et al., 2023b), TinyLlama leverages various advances
contr...
----------------------------------------------------
Node ID: e2d97411-8dc0-40a3-9539-a860d1741d4f
Score: -1.7028117179870605
Text: Although these works show a clear preference on large models, the potential of training smaller
models with larger dataset remains under-explored. Instead of training compute-optimal language
models, Touvron et al. (2023a) highlight the importance of the inference budget, instead of focusing
solely on training compute-optimal language models. Inference-optimal language models aim for
optimal performance within specific inference constraints This is achieved by training models with
more tokens...
----------------------------------------------------
Node ID: 438b9d91-cd5a-44a8-939e-3ecd77648662
Score: -2.904750347137451
Text: 4 Conclusion
In this paper, we introduce TinyLlama, an open-source, small-scale language model. To promote
transparency in the open-source LLM pre-training community, we have released all relevant infor-
mation, including our pre-training code, all intermediate model checkpoints, and the details of our
data processing steps. With its compact architecture and promising performance, TinyLlama can
enable end-user applications on mobile devices, and serve as a lightweight platform for testing a
w...
很明显,在重新排序后,ID为ca4db90f-5c6e-47d5-a544–05a9a1d09bc6的节点已将其排序从2更改为1。这意味着最相关的上下文被排在第一位。
四、使用LLM作为重新排序器
现有LLM的重新排序方法大致可分为三类:1)使用重新排序任务对LLM进行微调;2)提示LLM进行重新排序;3)在训练过程中使用LLM进行数据扩充。
提示LLM重新排序的方法的成本是较低的,以下是使用RankGPT[4]的演示,该演示已集成到LlamaIndex[5]中。
RankGPT的想法是使用LLM(如ChatGPT或GPT-4或其他LLM)执行zero-shot 段落重新排序,它应用排列生成方法和滑动窗口策略来有效地对段落进行重新排序。
如图3所示,论文[6]提出了三种可行的方法。
前两种方法是传统方法,对每个文档进行评分,然后根据该评分对所有段落进行排序。
本文提出了第三种方法:排列生成。具体来说,该模型不依赖于外部评分,而是直接对段落进行端到端排序。换句话说,它直接利用LLM的语义理解能力对所有候选段落进行相关性排序。然而,通常候选文档的数量非常大,而LLM的输入是有限的。因此,通常不可能一次输入所有文本。
因此,如图4所示,引入了一种滑动窗口方法,它遵循了气泡排序的思想。每次只对前4个文本进行排序,然后移动窗口,对随后的4个文本排序。在对整个文本进行迭代后,我们可以获得性能最好的最优文本。
请注意,为了使用RankGPT,需要安装较新版本的LlamaIndex。我之前安装的版本(0.9.29)不包括RankGPT所需的代码。因此,我使用LlamaIndex 0.9.45.post1版本创建了一个新的conda环境。
代码很简单,基于上一节的代码,只需将RankGPT设置为重新排序即可。
from llama_index.postprocessor import RankGPTRerank
from llama_index.llms import OpenAI
reranker = RankGPTRerank(
top_n = 3,
llm = OpenAI(model="gpt-3.5-turbo-16k"),
# verbose=True,
)
总体结果如下:
(llamaindex_new) Florian:~ Florian$ python /Users/Florian/Documents/rerank.py
----------------------------------------------------
Node ID: 20de8234-a668-442d-8495-d39b156b44bb
Score: 0.8703492815379594
Text: 4 Conclusion
In this paper, we introduce TinyLlama, an open-source, small-scale language model. To promote
transparency in the open-source LLM pre-training community, we have released all relevant infor-
mation, including our pre-training code, all intermediate model checkpoints, and the details of our
data processing steps. With its compact architecture and promising performance, TinyLlama can
enable end-user applications on mobile devices, and serve as a lightweight platform for testing a
w...
----------------------------------------------------
Node ID: 47ba3955-c6f8-4f28-a3db-f3222b3a09cd
Score: 0.8621633467539512
Text: TinyLlama: An Open-Source Small Language Model
Peiyuan Zhang∗Guangtao Zeng∗Tianduo Wang Wei Lu
StatNLP Research Group
Singapore University of Technology and Design
{peiyuan_zhang, tianduo_wang, @sutd.edu.sg">luwei}@sutd.edu.sg
guangtao_zeng@mymail.sutd.edu.sg
Abstract
We present TinyLlama, a compact 1.1B language model pretrained on around 1
trillion tokens for approximately 3 epochs. Building on the architecture and tok-
enizer of Llama 2 (Touvron et al., 2023b), TinyLlama leverages various advances
contr...
----------------------------------------------------
Node ID: 17cd9896-473c-47e0-8419-16b4ac615a59
Score: 0.8343984516104476
Text: Although these works show a clear preference on large models, the potential of training smaller
models with larger dataset remains under-explored. Instead of training compute-optimal language
models, Touvron et al. (2023a) highlight the importance of the inference budget, instead of focusing
solely on training compute-optimal language models. Inference-optimal language models aim for
optimal performance within specific inference constraints This is achieved by training models with
more tokens...
------------------------------------------------------------------------------------------------
Start reranking...
----------------------------------------------------
Node ID: 47ba3955-c6f8-4f28-a3db-f3222b3a09cd
Score: 0.8621633467539512
Text: TinyLlama: An Open-Source Small Language Model
Peiyuan Zhang∗Guangtao Zeng∗Tianduo Wang Wei Lu
StatNLP Research Group
Singapore University of Technology and Design
{peiyuan_zhang, tianduo_wang, @sutd.edu.sg">luwei}@sutd.edu.sg
guangtao_zeng@mymail.sutd.edu.sg
Abstract
We present TinyLlama, a compact 1.1B language model pretrained on around 1
trillion tokens for approximately 3 epochs. Building on the architecture and tok-
enizer of Llama 2 (Touvron et al., 2023b), TinyLlama leverages various advances
contr...
----------------------------------------------------
Node ID: 17cd9896-473c-47e0-8419-16b4ac615a59
Score: 0.8343984516104476
Text: Although these works show a clear preference on large models, the potential of training smaller
models with larger dataset remains under-explored. Instead of training compute-optimal language
models, Touvron et al. (2023a) highlight the importance of the inference budget, instead of focusing
solely on training compute-optimal language models. Inference-optimal language models aim for
optimal performance within specific inference constraints This is achieved by training models with
more tokens...
----------------------------------------------------
Node ID: 20de8234-a668-442d-8495-d39b156b44bb
Score: 0.8703492815379594
Text: 4 Conclusion
In this paper, we introduce TinyLlama, an open-source, small-scale language model. To promote
transparency in the open-source LLM pre-training community, we have released all relevant infor-
mation, including our pre-training code, all intermediate model checkpoints, and the details of our
data processing steps. With its compact architecture and promising performance, TinyLlama can
enable end-user applications on mobile devices, and serve as a lightweight platform for testing a
w...
请注意,由于使用LLM,重新排名后的分数没有更新。当然,这并不重要。
从结果中可以看出,重新排名后,排名前1的结果是包含答案的正确文本,这与之前使用重新排名模型获得的结果一致。
五、评估
使用智源的bge-reranker-base模型进行评估,如下代码所示:
reranker = FlagEmbeddingReranker(
top_n = 3,
model = "BAAI/bge-reranker-base",
use_fp16 = False
)
# or using LLM as reranker
# from llama_index.postprocessor import RankGPTRerank
# from llama_index.llms import OpenAI
# reranker = RankGPTRerank(
# top_n = 3,
# llm = OpenAI(model="gpt-3.5-turbo-16k"),
# # verbose=True,
# )
query_engine = index.as_query_engine( # add reranker to query_engine
similarity_top_k = 3,
node_postprocessors=[reranker]
)
# query_engine = index.as_query_engine() # original query_engine
参考:https://ai.plainenglish.io/advanced-rag-03-using-ragas-llamaindex-for-rag-evaluation-84756b82dca7
六、结论
总的来说,本文介绍了重新排序的原则和两种主流方法。
其中,使用重新排序模型的方法是轻量级的,并且开销较小。
另一方面,使用LLM的方法在多个基准[7]测试上表现良好,但更昂贵,并且仅在使用ChatGPT和GPT-4时表现良好,而在使用FLAN-T5和Vicuna-13B等其他开源模型时其性能不好。
因此,在实际项目中,需要进行特定的权衡。
参考文献:
[1] https://txt.cohere.com/rerank/
[2] https://arxiv.org/pdf/2401.02385.pdf
[3] https://github.com/run-llama/llama_index/blob/v0.9.29/llama_index/response/notebook_utils.py
[4] https://arxiv.org/pdf/2304.09542.pdf
[5] https://github.com/run-llama/llama_index/blob/v0.9.45.post1/llama_index/postprocessor/rankGPT_rerank.py
[6] https://arxiv.org/pdf/2304.09542.pdf
[7] https://arxiv.org/pdf/2304.09542.pdf