FlashRAG_ A Modular Toolkit for Efficient Retrieval-Augmented Generation Research



Introduction

RAG

  • A robust solution to mitigate hallucination issues in LLMs


Motivation

  • Many works are not open-source or have fixed settings in their open-source code, making it difficult to adapt to new data or innovative components.
  • The datasets and retrieval corpus used often vary, with resources being scattered.
  • Due to the complexity of RAG systems, involving multiple steps such as indexing, retrieval, and generation, researchers often need to implement many parts of the system themselves. Although there are some existing RAG toolkits like LangChain and LlamaIndex, they are typically large and cumbersome, hindering researchers from implementing customized processes and failing to address the aforementioned issues.


Contributions

FlashRAG, an open-source library designed to enable researchers to easily reproduce existing RAG methods and develop their own RAG algorithms.


FlashRAG

Component Module

The Component Module encompasses five main components: Judger, Retriever,

Reranker, Refiner, and Generator.

  • Judger functions as a preliminary component that assesses whether a query necessitates retrieval.
  • Retriever:
    • sparse retrieval, BM25
    • dense retrieval, BERT-based embedding models such as DPR, E5 and BGE.
    • vector database, FAISS
  • Reranker aims at refining the order of results returned by the retriever to enhance retrieval accuracy.
    • Cross-Encoder models, such as the bge-reranker and jina-reranker.
  • Refiner refines the input text for generators to reduce token usage and reduce noise from retrieved documents, improving the final RAG responses.
    • The Extractive Refiner employs an embedding model to extract semantic units, like sentences or phrases, from the retrieved text that hold higher semantic similarity with the query.
    • The Abstractive Refiner utilizes a seq2seq model to directly summarize the retrieved text.
    • Perplexity-based refiners,LLMLingua Refiner and Selective-Context Refiner.
  • Generator
    • vllm & FastChat
    • Transformers
    • OpenAI
    • Encoder-Decoder models

Pipeline Module

  • Sequential Pipeline implements a linear execution path for the query, formally represented as query -> retriever -> post-retrieval (reranker, refiner) -> generator.
  • Branching Pipeline executes multiple paths in parallel for a single query (often one path per retrieved document) and merges the results from all paths to form the ultimate output.
    • REPLUG: The REPLUG pipeline processes each retrieved document in parallel and combines the generation probabilities from all documents to produce the final answer.
    • SuRe: The SuRe pipeline generates a candidate answer from each retrieved document and then ranks all candidate answers.
  • Conditional Pipeline utilizes a judger to direct the query into different execution paths based on the judgement outcome.
  • Loop Pipeline involves complex interactions between retrieval and generation processes, often encompassing multiple cycles of retrieval and generation.
    • Iterative
    • Self-Ask
    • Self-RAG
    • FlARE

Datasets and Corpus

  • Datasets: pre-processes 32 benchmark datasets
  • Corpus: Wikipedia passages、MS MARCO passages


Evaluation

  • Retrieval-aspect metrics: recall@k, precision@k, F1@k, and mean average precision (MAP)
  • Generation-aspect metrics: token-level F1 score, exact match, accuracy, BLEU, and ROUGE-L
  • accommodate custom evaluation metrics

Experimental Result and Discussion

Experimental Setup

Experiment ComponentDescription
Generator ModelLLAMA3-8B-instruct
Retriever ModelE5-base-v2
Retrieval CorpusWikipedia data from December 2018
Max Input Length for Generator Model4096
Documents Retrieved per Query5
Default PromptAnswer the question based on the given document. Only give me the answer and do not output any other words. The following are given documents:{retrieval documents}
Experimental Environment8 NVIDIA A100 GPUs
DatasetsNatural Questions (NQ), TriviaQA, HotpotQA, 2WikiMultihopQA, PopQA, WebQuestions
Evaluation MetricsExact match for NQ, TriviaQA, WebQuestions;
Token-level F1 for HotpotQA, 2WikiMultihopQA, PopQA

Methods and Main results

Methods are categorized based on the RAG component they primarily focused on optimizing.

  • retriever
  • refiner
  • generator and its related decoding methods
  • judger
  • entire RAG flow, including multiple retrievals and generation processes.


Impact of Retrieval on RAG

  • Existing research works often employs a fixed retriever and a fixed number of retrieved documents.

  • Main Findings:
    • Figure left:
      • the overall performance is optimal when the number of retrieved documents is 3 or 5.
      • Both an excessive and insufficient number of retrieved documents lead to a significant decrease in performance, with a drop of up to 40% (both dense and sparse retrieval methods).
      • Additionally, we observe that when the number of retrieved documents is large, the results of the three different quality retrievers converge. In contrast, for the top1 results, there is a substantial gap between dense methods (E5, Bge) and BM25, indicating that the fewer documents retrieved, the greater the impact of the retriever’s quality on the final result.
    • Figure right:
      • It can be seen that on most datasets, using top3 or top5 retrieved results yields the best performance, suggesting that this may represent a good balance between the quality of retrieved documents and noise.

Take away

FlashRAG is a Python toolkit for the reproduction and development of Retrieval Augmented Generation (RAG) research. Our toolkit includes 32 pre-processed benchmark RAG datasets and 14 state-of-the-art RAG algorithms.


References

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

aJupyter

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值