RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval

最新推荐文章于 2024-08-22 17:27:46 发布

路人与大师

最新推荐文章于 2024-08-22 17:27:46 发布

阅读量481

点赞数 3

文章标签：算法

本文链接：https://blog.csdn.net/weixin_41046245/article/details/139766751

版权

RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval

介绍

欢迎来到RAPTOR教程！RAPTOR是一种新颖的增强型检索语言模型，通过从文档中构建递归树结构，实现高效的上下文感知信息检索。这个教程将帮助你了解如何安装、使用和扩展RAPTOR，以便更好地利用这一强大的工具。

安装

在使用RAPTOR之前，请确保已安装Python 3.8或更高版本。接下来，克隆RAPTOR仓库并安装必要的依赖项：

git clone https://github.com/parthsarthi03/raptor.git
cd raptor
pip install -r requirements.txt

基本使用

设置RAPTOR

首先，设置你的OpenAI API密钥并初始化RAPTOR配置：

import os
os.environ["OPENAI_API_KEY"] = "your-openai-api-key"

from raptor import RetrievalAugmentation

# 使用默认配置初始化RAPTOR
RA = RetrievalAugmentation()

添加文档到树结构

将你的文本文档添加到RAPTOR进行索引：

with open('sample.txt', 'r') as file:
    text = file.read()
RA.add_documents(text)

回答问题

现在你可以使用RAPTOR根据索引的文档回答问题：

question = "How did Cinderella reach her happy ending?"
answer = RA.answer_question(question=question)
print("Answer: ", answer)

保存和加载树结构

将构建的树保存到指定路径：

SAVE_PATH = "demo/cinderella"
RA.save(SAVE_PATH)

从保存的树中加载数据：

RA = RetrievalAugmentation(tree=SAVE_PATH)
answer = RA.answer_question(question=question)

扩展RAPTOR

RAPTOR设计为灵活的，允许你集成任何模型用于摘要生成、问答（QA）和嵌入生成。以下是如何用你自己的模型扩展RAPTOR：

自定义摘要模型

要使用不同的语言模型进行摘要生成，可以通过扩展BaseSummarizationModel类来实现。实现summarize方法以集成自定义摘要逻辑：

from raptor import BaseSummarizationModel

class CustomSummarizationModel(BaseSummarizationModel):
    def __init__(self):
        # 初始化你的模型
        pass

    def summarize(self, context, max_tokens=150):
        # 实现你的摘要逻辑
        summary = "Your summary here"
        return summary

自定义QA模型

对于自定义QA模型，扩展BaseQAModel类并实现answer_question方法。该方法应返回你的模型在给定上下文和问题时找到的最佳答案：

from raptor import BaseQAModel

class CustomQAModel(BaseQAModel):
    def __init__(self):
        # 初始化你的模型
        pass

    def answer_question(self, context, question):
        # 实现你的QA逻辑
        answer = "Your answer here"
        return answer

自定义嵌入模型

要使用不同的嵌入模型，扩展BaseEmbeddingModel类。实现create_embedding方法，该方法应返回输入文本的向量表示：

from raptor import BaseEmbeddingModel

class CustomEmbeddingModel(BaseEmbeddingModel):
    def __init__(self):
        # 初始化你的模型
        pass

    def create_embedding(self, text):
        # 实现你的嵌入逻辑
        embedding = [0.0] * embedding_dim  # 替换为实际嵌入逻辑
        return embedding

集成自定义模型到RAPTOR

实现自定义模型后，将它们集成到RAPTOR：

from raptor import RetrievalAugmentation, RetrievalAugmentationConfig

# 初始化自定义模型
custom_summarizer = CustomSummarizationModel()
custom_qa = CustomQAModel()
custom_embedding = CustomEmbeddingModel()

# 创建包含自定义模型的配置
custom_config = RetrievalAugmentationConfig(
    summarization_model=custom_summarizer,
    qa_model=custom_qa,
    embedding_model=custom_embedding
)

# 使用自定义配置初始化RAPTOR
RA = RetrievalAugmentation(config=custom_config)

查看demo.ipynb，了解如何指定你自己的摘要/QA模型（如Llama/Mistral/Gemma）和嵌入模型（如SBERT），用于RAPTOR。

注意：更多示例和RAPTOR的配置方法即将发布。高级用法和附加功能将在文档和仓库更新中提供。

通过这篇教程，希望你能轻松上手RAPTOR，利用它进行高效的信息检索和处理。如果有任何疑问或建议，欢迎在RAPTOR的GitHub仓库中提出Issue。

路人与大师

关注

3
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval

MindSpore Transformers套件的目标是构建一个大模型训练、微调、评估、推理、部署的全流程开发套件，提供业内主流的Transformer类预训练模型和SOTA下游任务应用，涵盖丰富的并行特性。期望帮助用户轻松实现大模型训练和创新研发。一行代码实现从单卡到大规模集群训练的无缝切换；提供灵活易用的个性化并行配置；能够自动进行拓扑感知，高效地融合数据并行和模型并行策略；一键启动任意任务的单卡/多卡训练、微调、评估、推理流程；
复制链接

扫一扫