LLM之RAG实战(三十一)| 探索RAG重排序

34 篇文章 7 订阅

​       重新排序在检索增强生成(RAG)过程中起着至关重要的作用。在naive RAG方法中,可以检索大量上下文,但并非所有上下文都与问题相关。重新排序允许对文档进行重新排序和过滤,将相关文档放在最前面,从而提高RAG的有效性。

      本 文将介绍RAG的重新排序技术,并演示了如何使用两种方法合并重新排序功能。

一、重新排名介绍

       如图1所示,重新排序的任务就像一个智能过滤器。当检索器从索引集合中检索多个上下文时,这些上下文与用户的查询的相关性可能不同,一些上下文可能非常相关(在图1中用红框突出显示),而另一些上下文可能只有轻微的相关甚至不相关(在图1中用绿框和蓝框高亮显示)。

       重新排序的任务是评估这些上下文的相关性,并优先考虑最有可能提供准确和相关答案的上下文,让LLM在生成答案时优先考虑这些排名靠前的上下文,从而提高响应的准确性和质量。

       简单地说,重新排名就像在开卷考试中帮助你从一堆学习材料中选择最相关的参考文献,这样你就可以更高效、更准确地回答问题。

本文描述的重新排序方法主要可分为以下两种类型:

  • 重新排序模型:这些模型考虑了文档和查询之间的交互特征,以更准确地评估它们的相关性。
  • LLM:LLM的出现为重新排名开辟了新的可能性。通过深入了解整个文档和查询,可以更全面地获取语义信息。

二、使用重新排序模型作为重新排序

       与嵌入模型不同,重新排序模型以查询和上下文为输入,直接输出相似性得分,而不是嵌入。需要注意的是,重新排序模型是使用交叉熵损失进行优化的,允许相关性得分不限于特定范围,甚至可能是负的。

      目前,没有太多可用的重新排序模型。一种选择是Cohere[1]的在线模型,可以通过API访问。另外还有开源模型,如bge-reranker-basebge-reanker-large等。

       命中率和平均倒数排名(MRR)指标的评估结果,如下图2所示:

从这个评估结果可以看出:

  • 无论使用何种嵌入模型,重新排序都显示出更高的命中率和MRR,这表明重新排序的显著影响;
  • 目前,最好的重新排名模型是Cohere[1],但它是一种付费服务。开源bge-reranker-large模型具有与Cohere类似的功能;
  • 嵌入模型和重新排序模型的组合也会产生影响,因此开发人员可能需要在实际过程中尝试不同的组合。

三、在本文中,我们将使用bge-reranker-base模型进行演示

3.1 环境配置

        导入相关库,设置环境和全局变量

import osos.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_KEY"from llama_index import VectorStoreIndex, SimpleDirectoryReaderfrom llama_index.postprocessor.flag_embedding_reranker import FlagEmbeddingRerankerfrom llama_index.schema import QueryBundledir_path = "YOUR_DIR_PATH"

       目录中只有一个PDF文件,使用的是论文“TinyLlama: An Open-Source Small Language Model”[2]。

(py) Florian:~ Florian$ ls /Users/Florian/Downloads/pdf_test/tinyllama.pdf

3.2 使用LlamaIndex构建一个简单的检索器

documents = SimpleDirectoryReader(dir_path).load_data()index = VectorStoreIndex.from_documents(documents)retriever = index.as_retriever(similarity_top_k = 3)

3.3 基本检索

query = "Can you provide a concise description of the TinyLlama model?"nodes = retriever.retrieve(query)for node in nodes:    print('----------------------------------------------------')    display_source_node(node, source_length = 500)

       display_source_node函数改编自llama_index源代码[3],因此修改如下:

from llama_index.schema import ImageNode, MetadataMode, NodeWithScorefrom llama_index.utils import truncate_textdef display_source_node(    source_node: NodeWithScore,    source_length: int = 100,    show_source_metadata: bool = False,    metadata_mode: MetadataMode = MetadataMode.NONE,) -> None:    """Display source node"""    source_text_fmt = truncate_text(        source_node.node.get_content(metadata_mode=metadata_mode).strip(), source_length    )    text_md = (        f"Node ID: {source_node.node.node_id} \n"        f"Score: {source_node.score} \n"        f"Text: {source_text_fmt} \n"    )    if show_source_metadata:        text_md += f"Metadata: {source_node.node.metadata} \n"    if isinstance(source_node.node, ImageNode):        text_md += "Image:"    print(text_md)    # display(Markdown(text_md))    # if isinstance(source_node.node, ImageNode) and source_node.node.image is not None:    #     display_image(source_node.node.image)

        下面是基本检索的结果,重新排序前的前3个节点

----------------------------------------------------Node ID: 438b9d91-cd5a-44a8-939e-3ecd77648662 Score: 0.8706055408845863 Text: 4 ConclusionIn this paper, we introduce TinyLlama, an open-source, small-scale language model. To promotetransparency in the open-source LLM pre-training community, we have released all relevant infor-mation, including our pre-training code, all intermediate model checkpoints, and the details of ourdata processing steps. With its compact architecture and promising performance, TinyLlama canenable end-user applications on mobile devices, and serve as a lightweight platform for testing aw... ----------------------------------------------------Node ID: ca4db90f-5c6e-47d5-a544-05a9a1d09bc6 Score: 0.8624531691777889 Text: TinyLlama: An Open-Source Small Language ModelPeiyuan Zhang∗Guangtao Zeng∗Tianduo Wang Wei LuStatNLP Research GroupSingapore University of Technology and Design{peiyuan_zhang, tianduo_wang, @sutd.edu.sg">luwei}@sutd.edu.sgguangtao_zeng@mymail.sutd.edu.sgAbstractWe present TinyLlama, a compact 1.1B language model pretrained on around 1trillion tokens for approximately 3 epochs. Building on the architecture and tok-enizer of Llama 2 (Touvron et al., 2023b), TinyLlama leverages various advancescontr... ----------------------------------------------------Node ID: e2d97411-8dc0-40a3-9539-a860d1741d4f Score: 0.8346160605298356 Text: Although these works show a clear preference on large models, the potential of training smallermodels with larger dataset remains under-explored. Instead of training compute-optimal languagemodels, Touvron et al. (2023a) highlight the importance of the inference budget, instead of focusingsolely on training compute-optimal language models. Inference-optimal language models aim foroptimal performance within specific inference constraints This is achieved by training models withmore tokens...

3.4 重新排序

         要重新排列上述节点,这里使用bge-reranker-base模型

print('------------------------------------------------------------------------------------------------')print('Start reranking...')reranker = FlagEmbeddingReranker(    top_n = 3,    model = "BAAI/bge-reranker-base",)query_bundle = QueryBundle(query_str=query)ranked_nodes = reranker._postprocess_nodes(nodes, query_bundle = query_bundle)for ranked_node in ranked_nodes:    print('----------------------------------------------------')    display_source_node(ranked_node, source_length = 500)

          重新排序后的结果如下:

------------------------------------------------------------------------------------------------Start reranking...----------------------------------------------------Node ID: ca4db90f-5c6e-47d5-a544-05a9a1d09bc6 Score: -1.584416151046753 Text: TinyLlama: An Open-Source Small Language ModelPeiyuan Zhang∗Guangtao Zeng∗Tianduo Wang Wei LuStatNLP Research GroupSingapore University of Technology and Design{peiyuan_zhang, tianduo_wang, @sutd.edu.sg">luwei}@sutd.edu.sgguangtao_zeng@mymail.sutd.edu.sgAbstractWe present TinyLlama, a compact 1.1B language model pretrained on around 1trillion tokens for approximately 3 epochs. Building on the architecture and tok-enizer of Llama 2 (Touvron et al., 2023b), TinyLlama leverages various advancescontr... ----------------------------------------------------Node ID: e2d97411-8dc0-40a3-9539-a860d1741d4f Score: -1.7028117179870605 Text: Although these works show a clear preference on large models, the potential of training smallermodels with larger dataset remains under-explored. Instead of training compute-optimal languagemodels, Touvron et al. (2023a) highlight the importance of the inference budget, instead of focusingsolely on training compute-optimal language models. Inference-optimal language models aim foroptimal performance within specific inference constraints This is achieved by training models withmore tokens... ----------------------------------------------------Node ID: 438b9d91-cd5a-44a8-939e-3ecd77648662 Score: -2.904750347137451 Text: 4 ConclusionIn this paper, we introduce TinyLlama, an open-source, small-scale language model. To promotetransparency in the open-source LLM pre-training community, we have released all relevant infor-mation, including our pre-training code, all intermediate model checkpoints, and the details of ourdata processing steps. With its compact architecture and promising performance, TinyLlama canenable end-user applications on mobile devices, and serve as a lightweight platform for testing aw...

   很明显,在重新排序后,ID为ca4db90f-5c6e-47d5-a544–05a9a1d09bc6的节点已将其排序从2更改为1。这意味着最相关的上下文被排在第一位。

四、使用LLM作为重新排序器

       现有LLM的重新排序方法大致可分为三类:1)使用重新排序任务对LLM进行微调;2)提示LLM进行重新排序;3)在训练过程中使用LLM进行数据扩充。

       提示LLM重新排序的方法的成本是较低的,以下是使用RankGPT[4]的演示,该演示已集成到LlamaIndex[5]中。

       RankGPT的想法是使用LLM(如ChatGPT或GPT-4或其他LLM)执行zero-shot 段落重新排序,它应用排列生成方法和滑动窗口策略来有效地对段落进行重新排序。

       如图3所示,论文[6]提出了三种可行的方法。

       前两种方法是传统方法,对每个文档进行评分,然后根据该评分对所有段落进行排序。

       本文提出了第三种方法:排列生成。具体来说,该模型不依赖于外部评分,而是直接对段落进行端到端排序。换句话说,它直接利用LLM的语义理解能力对所有候选段落进行相关性排序。然而,通常候选文档的数量非常大,而LLM的输入是有限的。因此,通常不可能一次输入所有文本。

       因此,如图4所示,引入了一种滑动窗口方法,它遵循了气泡排序的思想。每次只对前4个文本进行排序,然后移动窗口,对随后的4个文本排序。在对整个文本进行迭代后,我们可以获得性能最好的最优文本。

       请注意,为了使用RankGPT,需要安装较新版本的LlamaIndex。我之前安装的版本(0.9.29)不包括RankGPT所需的代码。因此,我使用LlamaIndex 0.9.45.post1版本创建了一个新的conda环境。

        代码很简单,基于上一节的代码,只需将RankGPT设置为重新排序即可。

from llama_index.postprocessor import RankGPTRerankfrom llama_index.llms import OpenAIreranker = RankGPTRerank(    top_n = 3,    llm = OpenAI(model="gpt-3.5-turbo-16k"),    # verbose=True,)

        总体结果如下:

(llamaindex_new) Florian:~ Florian$ python /Users/Florian/Documents/rerank.py ----------------------------------------------------Node ID: 20de8234-a668-442d-8495-d39b156b44bb Score: 0.8703492815379594 Text: 4 ConclusionIn this paper, we introduce TinyLlama, an open-source, small-scale language model. To promotetransparency in the open-source LLM pre-training community, we have released all relevant infor-mation, including our pre-training code, all intermediate model checkpoints, and the details of ourdata processing steps. With its compact architecture and promising performance, TinyLlama canenable end-user applications on mobile devices, and serve as a lightweight platform for testing aw... ----------------------------------------------------Node ID: 47ba3955-c6f8-4f28-a3db-f3222b3a09cd Score: 0.8621633467539512 Text: TinyLlama: An Open-Source Small Language ModelPeiyuan Zhang∗Guangtao Zeng∗Tianduo Wang Wei LuStatNLP Research GroupSingapore University of Technology and Design{peiyuan_zhang, tianduo_wang, @sutd.edu.sg">luwei}@sutd.edu.sgguangtao_zeng@mymail.sutd.edu.sgAbstractWe present TinyLlama, a compact 1.1B language model pretrained on around 1trillion tokens for approximately 3 epochs. Building on the architecture and tok-enizer of Llama 2 (Touvron et al., 2023b), TinyLlama leverages various advancescontr... ----------------------------------------------------Node ID: 17cd9896-473c-47e0-8419-16b4ac615a59 Score: 0.8343984516104476 Text: Although these works show a clear preference on large models, the potential of training smallermodels with larger dataset remains under-explored. Instead of training compute-optimal languagemodels, Touvron et al. (2023a) highlight the importance of the inference budget, instead of focusingsolely on training compute-optimal language models. Inference-optimal language models aim foroptimal performance within specific inference constraints This is achieved by training models withmore tokens... ------------------------------------------------------------------------------------------------Start reranking...----------------------------------------------------Node ID: 47ba3955-c6f8-4f28-a3db-f3222b3a09cd Score: 0.8621633467539512 Text: TinyLlama: An Open-Source Small Language ModelPeiyuan Zhang∗Guangtao Zeng∗Tianduo Wang Wei LuStatNLP Research GroupSingapore University of Technology and Design{peiyuan_zhang, tianduo_wang, @sutd.edu.sg">luwei}@sutd.edu.sgguangtao_zeng@mymail.sutd.edu.sgAbstractWe present TinyLlama, a compact 1.1B language model pretrained on around 1trillion tokens for approximately 3 epochs. Building on the architecture and tok-enizer of Llama 2 (Touvron et al., 2023b), TinyLlama leverages various advancescontr... ----------------------------------------------------Node ID: 17cd9896-473c-47e0-8419-16b4ac615a59 Score: 0.8343984516104476 Text: Although these works show a clear preference on large models, the potential of training smallermodels with larger dataset remains under-explored. Instead of training compute-optimal languagemodels, Touvron et al. (2023a) highlight the importance of the inference budget, instead of focusingsolely on training compute-optimal language models. Inference-optimal language models aim foroptimal performance within specific inference constraints This is achieved by training models withmore tokens... ----------------------------------------------------Node ID: 20de8234-a668-442d-8495-d39b156b44bb Score: 0.8703492815379594 Text: 4 ConclusionIn this paper, we introduce TinyLlama, an open-source, small-scale language model. To promotetransparency in the open-source LLM pre-training community, we have released all relevant infor-mation, including our pre-training code, all intermediate model checkpoints, and the details of ourdata processing steps. With its compact architecture and promising performance, TinyLlama canenable end-user applications on mobile devices, and serve as a lightweight platform for testing aw...

请注意,由于使用LLM,重新排名后的分数没有更新。当然,这并不重要。

       从结果中可以看出,重新排名后,排名前1的结果是包含答案的正确文本,这与之前使用重新排名模型获得的结果一致。

五、评估

        使用智源的bge-reranker-base模型进行评估,如下代码所示:

reranker = FlagEmbeddingReranker(    top_n = 3,    model = "BAAI/bge-reranker-base",    use_fp16 = False)# or using LLM as reranker# from llama_index.postprocessor import RankGPTRerank# from llama_index.llms import OpenAI# reranker = RankGPTRerank(#     top_n = 3,#     llm = OpenAI(model="gpt-3.5-turbo-16k"),#     # verbose=True,# )query_engine = index.as_query_engine(       # add reranker to query_engine    similarity_top_k = 3,     node_postprocessors=[reranker])# query_engine = index.as_query_engine()    # original query_engine

参考:https://ai.plainenglish.io/advanced-rag-03-using-ragas-llamaindex-for-rag-evaluation-84756b82dca7

六、结论

总的来说,本文介绍了重新排序的原则和两种主流方法。

其中,使用重新排序模型的方法是轻量级的,并且开销较小。

另一方面,使用LLM的方法在多个基准[7]测试上表现良好,但更昂贵,并且仅在使用ChatGPT和GPT-4时表现良好,而在使用FLAN-T5和Vicuna-13B等其他开源模型时其性能不好。

因此,在实际项目中,需要进行特定的权衡。

参考文献:

[1] https://txt.cohere.com/rerank/

[2] https://arxiv.org/pdf/2401.02385.pdf

[3] https://github.com/run-llama/llama_index/blob/v0.9.29/llama_index/response/notebook_utils.py

[4] https://arxiv.org/pdf/2304.09542.pdf

[5] https://github.com/run-llama/llama_index/blob/v0.9.45.post1/llama_index/postprocessor/rankGPT_rerank.py

[6] https://arxiv.org/pdf/2304.09542.pdf

[7] https://arxiv.org/pdf/2304.09542.pdf

  • 21
    点赞
  • 16
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
根据提供的引用内容,可以得知prompt+RAG的流程如下: 1. 首先,使用Retriever部分在知识库中检索出top-k个匹配的文档zi。 2. 然后,将query和k个文档拼接起来作为QA的prompt,送入seq2seq模型。 3. seq2seq模型生成回复y。 4. 如果需要进行Re-rank,可以使用LLM来rerank,给LLM写好prompt即可。 下面是一个简单的示例代码,演示如何使用prompt+RAG: ```python from transformers import RagTokenizer, RagRetriever, RagSequenceForGeneration # 初始化tokenizer、retriever和seq2seq模型 tokenizer = RagTokenizer.from_pretrained('facebook/rag-token-base') retriever = RagRetriever.from_pretrained('facebook/rag-token-base', index_name='exact', use_dummy_dataset=True) model = RagSequenceForGeneration.from_pretrained('facebook/rag-token-base') # 设置query和context query = "What is the capital of France?" context = "France is a country located in Western Europe. Paris, the capital city of France, is known for its romantic ambiance and iconic landmarks such as the Eiffel Tower." # 使用Retriever部分检索top-k个匹配的文档 retrieved_docs = retriever(query) # 将query和k个文档拼接起来作为QA的prompt input_dict = tokenizer.prepare_seq2seq_batch(query, retrieved_docs[:2], return_tensors='pt') generated = model.generate(input_ids=input_dict['input_ids'], attention_mask=input_dict['attention_mask']) # 输出生成的回复 generated_text = tokenizer.batch_decode(generated, skip_special_tokens=True)[0] print(generated_text) ```

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

wshzd

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值