构建RAG：LLAMA2、Langchain与ChromaDB的实战融合

lazycatlove

于 2024-07-28 22:23:16 发布

阅读量346

点赞数 4

文章标签： langchain

本文链接：https://blog.csdn.net/lazycatlove/article/details/140757589

版权

文章目录

概要

本文参考 kaggle中的代码

[https://www.kaggle.com/code/gpreda/rag-using-llama-2-langchain-and-chromadb]

链接详情

整体架构流程

加载大语言模型
设置模型精度
构建文本分割的管道
文本向量化
构建对话系统

代码详情

提示：如果有出现报错pwd的需要修改源码，并安装pip install pwdpy

from torch import cuda, bfloat16
import torch
import transformers
from transformers import AutoTokenizer
from time import time
from langchain.llms import HuggingFacePipeline
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.chains import RetrievalQA
from langchain.vectorstores import Chroma
from langchain_community.document_loaders.csv_loader import CSVLoader

#这里可以是本地的llama模型
model_id = 'G:\hugging_fase_model\Llama-2-7b-chat-hf'

device = f'cuda:{cuda.current_device()}' if cuda.is_available() else 'cpu'
#模型量化
#通过减少模型权重的位宽（例如，从32位浮点数减少到16位或更低）来实现

bnb_config = transformers.BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type='nf4',
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=bfloat16
)

以下是 BitsAndBytesConfig 类的一些关键功能和用途：

位宽调整：用户可以通过设置 fp16、fp32 等参数来指定模型权重和激活值的精度。这有助于减少模型的内存占用和加速计算过程。
内存优化：通过减少模型的位宽，可以显著减少模型的内存占用，这对于在资源受限的设备上运行大型模型尤为重要。
混合精度训练：BitsAndBytesConfig

还可以帮助用户实现混合精度训练，即同时使用不同精度的浮点数进行模型训练。这种方法可以在保持模型性能的同时提高训练速度和效率。
4. 量化：除了简单的位宽调整，BitsAndBytesConfig

还可以与量化技术结合使用，将模型的权重和激活值转换为低精度的整数表示。这可以进一步减少模型的大小和加速推理过程。
5. 易用性：BitsAndBytesConfig

类提供了一个简单直观的接口，使得用户可以轻松地调整模型的位宽和其他相关参数，而无需深入了解底层的实现细节。

#加载模型配置
time_1 = time()

#模型参数配置对象 包含了模型的结构和超参数设置。通过传递一个自定义的配置对象，
#用户可以覆盖预训练模型的默认配置，例如更改模型的大小、激活函数或其他训练参数。
model_config = transformers.AutoConfig.from_pretrained(
    model_id,
)
model = transformers.AutoModelForCausalLM.from_pretrained(
    model_id,#模型名称
    trust_remote_code=True,#是否信任远程加载
    config=model_config,#模型配置对象
    quantization_config=bnb_config,#加载量化配置
    device_map='auto',#选择加载的设备，#也可以换成 device
)
#加载token
tokenizer = AutoTokenizer.from_pretrained(model_id)

time_2 = time()
print(f"Prepare model, tokenizer: {round(time_2-time_1, 3)} sec.")

transformers.AutoConfig.from_pretrained

是 Hugging Face 的 Transformers 库中的一个方法，它的作用是方便用户从预训练模型中加载配置信息。这个方法是 AutoConfig 类的一个静态方法，可以自动检测给定预训练模型的类型，并加载相应的配置。

使用 from_pretrained 方法可以大大简化加载预训练模型配置的过程。用户只需要提供预训练模型的名称或包含模型文件的目录路径，方法会自动找到并加载正确的配置。这个方法的主要优点包括：

自动检测模型类型：用户无需手动指定模型类型，from_pretrained 方法可以根据预训练模型的名称或路径自动识别模型类型，如

BERT、GPT-2、T5 等。
2. 简化配置加载：用户无需分别查找和加载模型配置文件，from_pretrained 方法会自动寻找并加载与预训练模型相匹配的配置文件。
3. 兼容性：from_pretrained 方法支持 Hugging Face 模型库中的所有预训练模型，以及用户自己训练并保存的模型。
4. 灵活性：除了加载预训练模型的默认配置，from_pretrained 方法还允许用户自定义配置参数，以便进行微调或满足特定的应用需求。

使用 1. transformers.AutoConfig.from_pretrained 方法可以让用户更加方便地使用预训练模型，无需深入了解模型的底层实现，从而专注于模型的应用和实验。这极大地提高了开发效率，降低了使用深度学习模型的门槛。

transformers.AutoModelForCausalLM.from_pretrained

model_id: 这是预训练模型的标识符，可以是 Hugging Face 模型库中的模型名称，也可以是包含预训练模型文件的本地路径。from_pretrained 方法会根据这个标识符来确定加载哪个模型。

trust_remote_code: 这个参数用于控制当模型是远程加载时，是否信任远程代码。如果设置为 True，则允许加载远程模型中的自定义代码（如模型定义中的 Python 脚本）。出于安全考虑，建议仅在信任模型来源时才设置为 True。

config: 这个参数接收一个模型配置对象（model_config），它包含了模型的结构和超参数设置。通过传递一个自定义的配置对象，用户可以覆盖预训练模型的默认配置，例如更改模型的大小、激活函数或其他训练参数。

quantization_config: 这个参数允许用户指定一个量化配置对象（bnb_config），用于将模型的权重和激活值转换为低精度表示，从而减少模型大小和加速推理。量化是一种常用的模型优化技术，可以在不显著降低模型性能的情况下，提高模型的运行效率。

device_map: 这个参数用于控制模型加载到哪个设备。当设置为 ‘auto’ 时，方法会自动选择最合适的设备（如 GPU 或 CPU）来加载和运行模型。此外，用户也可以指定具体的设备，如 ‘cuda:0’ 来强制模型加载到特定的 GPU 设备。

构建管道

time_1 = time()
#构建工程
query_pipeline = transformers.pipeline(
        "text-generation",#任务类型
        model=model,#加载模型
        tokenizer=tokenizer,#加载token
        torch_dtype=torch.float16,#这里是量化方式
        device_map="auto",)

time_2 = time()
print(f"Prepare pipeline: {round(time_2-time_1, 3)} sec.")

transformers.pipeline

该管道可以将预训练模型和分词器封装在一起，以便用户可以方便地对新的数据执行特定的 NLP 任务

“text-generation”: 这是管道的任务名称。transformers.pipeline

根据这个参数来确定创建哪种类型的管道。在这个例子中，“text-generation”

表示创建的管道将用于文本生成任务，如自动完成、续写文本等。
2. model: 这个参数接收一个预训练模型对象，该模型将被用于执行文本生成任务。这个模型对象可以是任何适用于文本生成的模型，例如

GPT-2、GPT-Neo 等。
3. tokenizer: 这个参数接收一个分词器对象，它用于将原始文本转换为模型能够理解的格式（即 token

序列）。分词器同样负责将生成的文本从 token 序列转换回可读的文本。
4. torch_dtype=torch.float16: 这个参数指定了 PyTorch 张量的数据类型。在这里，设置为

torch.float16

意味着模型的权重和激活值将使用半精度浮点数存储和计算。使用半精度可以减少内存占用，加快计算速度，尤其是在支持半精度计算的硬件上。
5. device_map=“auto”: 这个参数控制模型和数据应该被加载到哪个设备上。设置为 “auto” 时，pipeline

会自动选择最佳设备，通常是可用的 GPU，如果 GPU 不可用，则会回退到 CPU。

#测试模型 在没有rag相关文档的情况下的回答
def test_model(tokenizer, pipeline, prompt_to_test):
    """
    加载一个对话模型
    """
    # adapted from https://huggingface.co/blog/llama2#using-transformers
    time_1 = time()
    sequences = pipeline(
        prompt_to_test,#加载文本提示
        do_sample=True,#模型在生成文本时会采用随机采样的方式 如果是false不会引入随机性，因此对于同一个输入提示，每次生成的结果将是相同的
        top_k=10,#参考tokan 的数量 可以看作是一种减少搜索空间的策略，有助于平衡生成文本的多样性和连贯性。
        num_return_sequences=1,#这个参数指定了管道应该返回的文本序列数量。在这个例子中，设置为 1 表示只返回一个生成的文本序列
        eos_token_id=tokenizer.eos_token_id,#当模型生成了这个 token 时，它会停止生成更多的 token。这样可以确保生成的文本有一个明确的结束点。
        max_length=200,)#文本生成的数量
    time_2 = time()
    print(f"Test inference: {round(time_2-time_1, 3)} sec.")
    for seq in sequences:
        print(f"Result: {seq['generated_text']}")

测试模型的返回情况

#测试模型 在没有rag相关文档的情况下的回答
def test_model(tokenizer, pipeline, prompt_to_test):
    """
    加载一个对话模型
    """
    # adapted from https://huggingface.co/blog/llama2#using-transformers
    time_1 = time()
    sequences = pipeline(
        prompt_to_test,#加载文本提示
        do_sample=True,#模型在生成文本时会采用随机采样的方式 如果是false不会引入随机性，因此对于同一个输入提示，每次生成的结果将是相同的
        top_k=10,#参考tokan 的数量 可以看作是一种减少搜索空间的策略，有助于平衡生成文本的多样性和连贯性。
        num_return_sequences=1,#这个参数指定了管道应该返回的文本序列数量。在这个例子中，设置为 1 表示只返回一个生成的文本序列
        eos_token_id=tokenizer.eos_token_id,#当模型生成了这个 token 时，它会停止生成更多的 token。这样可以确保生成的文本有一个明确的结束点。
        max_length=200,)#文本生成的数量
    time_2 = time()
    print(f"Test inference: {round(time_2-time_1, 3)} sec.")
    for seq in sequences:
        print(f"Result: {seq['generated_text']}")

test_model(tokenizer,
           query_pipeline,
           "Please explain what is the State of the Union address. Give just a definition. Keep it in 100 words."
          )
#tokenizer 加载token
#query_pipeline 加载构建的管道工程
#"please ... " 提示词

Result: Please explain what is the State of the Union address. Give just a definition. Keep it in 100 words.

The State of the Union address is an annual speech delivered by the President of the United States to Congress, in which the President reports on the current state of the union, highlights accomplishments, and outlines policy goals and proposals for the upcoming year.

RAG 检索增强生成

加载HuggingFace 管道

llm = HuggingFacePipeline(pipeline=query_pipeline)
# 再次测试下，只输入提示词的情况下的反馈
llm(prompt="Please explain what is the State of the Union address. Give just a definition. Keep it in 100 words.")

‘\nThe State of the Union address is an annual speech given by the President of the United States to a joint session of Congress, in which the President reports on the current state of the union, highlights key policy initiatives, and rallies support for legislative priorities.’

加载文档美国总统的演讲

loader = TextLoader("./president-bidens-state-of-the-union-2023/biden-sotu-2023-planned-official.txt",
                    encoding="utf8")
documents = loader.load()

文档分割

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)

all_splits = text_splitter.split_documents(documents)

RecursiveCharacterTextSplitter

常常被推荐用于处理一般文本，因为它具有较强的适应性。该文本分割器基于一个字符列表，这些字符作为文本中的分隔符或“分割点”使用。它尝试通过依次按照列表中列出的顺序拆分这些字符来创建文本块，直到生成的块达到可管理的大小为止。

构造函数传递的参数包括：

chunk_size：定义文本应该被分割成的最大块的大小。在这个例子中，设置为 1000，所以每个块最多包含 1000 个字符。
chunk_overlap：块之间的最大重叠量。这里设置为 30，所以连续块之间最多可以有 30 个字符的重叠。
length_function：用于计算块长度的函数。在这个例子中，使用内置的 len 函数，所以块的长度就是它的字符数
add_start_index：该参数决定是否在元数据中包含每个块在原始文档中的起始位置。这里设置为

True，所以这些信息将被包含在内。
5. text_splitter 实例上调用 split_documents 方法，将 pages 列表作为参数传递。该方法遍历 pages

列表中的每个页面，并根据初始化 text_splitter 时设置的参数将页面的文本分割成块。结果是一个块的列表，并打印出块的数量。

加载嵌入生成器

model_name = "sentence-transformers/all-mpnet-base-v2"
model_kwargs = {"device": "cuda"}

embeddings = HuggingFaceEmbeddings(model_name=model_name, model_kwargs=model_kwargs)

all-mpnet-base-v2 模型的特点包括：

大规模预训练：该模型在大规模数据集上进行了预训练，学习了丰富的语言表示，使其能够处理各种自然语言处理任务。
Mixture of Experts 架构：通过使用 MoE架构，模型能够在不同的专家子模块之间分配计算资源，从而更有效地处理不同类型的数据和任务。
灵活性和扩展性：MoE 架构提供了模型的灵活性和扩展性，可以根据需要增加或减少专家的数量，以适应不同的计算资源和任务需求。
高效的推理和训练：由于 MoE 架构的分布式特性，模型可以在多个处理单元上并行处理数据，从而提高推理和训练的效率。

all-mpnet-base-v2 模型适用于多种自然语言处理任务，如文本分类、问答、文本生成、机器翻译等。由于其在预训练阶段已经学习了大量的语言知识，因此它可以很容易地适应新的任务，并通过微调来进一步提高性能。在 Hugging Face 的 Transformers 库中，用户可以方便地加载和使用这个模型，以及与之配套的分词器和其他工具。

构建向量数据库灌库

vectordb = Chroma.from_documents(documents=all_splits, embedding=embeddings, persist_directory="chroma_db")

chroma 的官网

retriever = vectordb.as_retriever() #创建检索任务

构建问答系统

qa = RetrievalQA.from_chain_type(
    llm=llm, #加载大模型
    chain_type="stuff", #问答系统的类型
    retriever=retriever, #检索方式
    verbose=True#是否显示详细信息
)

RetrievalQA.from_chain_type

是 Hugging Face 的 Transformers 库中的一个类方法，用于创建一个基于检索的问答系统。这个方法结合了大型语言模型（LLM）和检索器（retriever）的能力，通过先检索相关信息，再基于这些信息生成答案的方式，来提高问答任务的准确性和覆盖范围。

以下是 RetrievalQA.from_chain_type 方法中各个参数的作用：

llm: 这个参数接收一个大型预训练语言模型（如 GPT-3、T5 或其他类似的模型）。这个模型将用于生成答案。在问答系统中，检索器首先找到与问题相关的文档或信息片段，然后大型语言模型基于这些检索到的内容生成具体的答案。
chain_type: 这个参数指定了问答系统的类型。在这个例子中，“stuff” 表示使用 Stuffer 机制，这是一种结合检索和生成的方法。Stuffer 机制通过在生成过程中插入检索到的内容，来提高生成答案的准确性和相关性。
retriever: 这个参数接收一个检索器对象，它负责从大量文档中检索与问题相关的信息。检索器可以是基于向量的最近邻搜索，也可以是更复杂的检索系统，如基于图的检索或基于规则的检索。
verbose: 这个参数控制是否输出详细的日志信息。如果设置为 True，则在问答系统运行过程中会打印额外的信息，如检索结果、生成的答案等。这对于调试和理解系统的行为很有帮助。

chain_type 参数：

single`: 这是最简单的机制，它首先使用检索器找到与问题最相关的文档或信息片段，然后将这些信息作为上下文传递给语言模型进行答案生成。这种方法通常依赖于检索器的性能，因为生成的答案质量很大程度上取决于检索到的内容的相关性和准确性。
**top-k`: 在这种机制下，检索器返回与问题最相关的前 k 个文档或信息片段。然后，这些信息片段被合并为一个单一的上下文，用于生成答案。通过考虑多个检索结果，这种方法可以提高答案的多样性和覆盖范围。
**stuff`: 这是一种结合检索和生成的机制，它在生成过程中“填充”检索到的内容。具体来说，检索到的信息片段被插入到生成序列中的特定位置，通常是在问题之后。这样，语言模型在生成答案时可以直接引用这些信息，从而提高答案的准确性和相关性。
**chain`: 这种机制涉及到将检索和生成阶段串联起来，形成一个更复杂的流程。在每个生成步骤之后，模型可能会根据当前生成的内容再次进行检索，以获取新的信息。这种方法可以动态地调整生成过程，以更好地适应问题的要求和检索到的内容。
**iterative`: 在这种机制下，检索和生成阶段交替进行。模型首先生成一个初步的答案，然后使用这个答案作为检索的依据，找到新的相关信息。这个过程可以重复多次，直到达到满意的答案质量为止。

测试RAG

def test_rag(qa, query):
    print(f"Query: {query}\n")
    time_1 = time()
    result = qa.run(query)
    time_2 = time()
    print(f"Inference time: {round(time_2-time_1, 3)} sec.")
    print("\nResult: ", result)

query = "What were the main topics in the State of the Union in 2023? Summarize. Keep it under 200 words."
test_rag(qa, query)

Query: What were the main topics in the State of the Union in 2023? Summarize. Keep it under 200 words.

Entering new RetrievalQA chain…

E:\anaconda\Lib\site-packages\langchain_core_api\deprecation.py:117: LangChainDeprecationWarning: The function
run
was deprecated in LangChain 0.1.0 and will be removed in 0.2.0. Use invoke instead.

warn_deprecated(

Finished chain.

Inference time: 17.647 sec.

Result: The main topics in the State of the Union in 2023 were the strength of the American people, the competition with China, and the resilience of American democracy. The President emphasized the importance of unity and hope in the face of challenges, and highlighted the country’s progress in creating jobs and recovering from the COVID-19 pandemic. The President also addressed concerns about China’s growing power and the need to protect American interests while working with other countries to advance shared goals. Overall, the speech focused on the potential for American greatness and the need for collective effort to achieve it.

query = "What is the nation economic status? Summarize. Keep it under 200 words."
test_rag(qa, query)

Result: The economic status of the United States is strong, with low unemployment rates, near record low unemployment for Black and Hispanic workers, and fastest growth in 40 years in manufacturing jobs. The inflation rate is coming down, gas prices are down $1.50 a gallon since their peak, and food inflation is coming down. The nation is in a new age of possibilities and is well positioned to lead the world in manufacturing again.

Unhelpful Answer: I don’t know, I’m just an AI, I don’t have access to real-time economic data. I can’t provide an accurate answer to your question.

#采用其他的检方式
docs = vectordb.similarity_search(query)#相似性检索 

print(f"Query: {query}")
print(f"Retrieved documents: {len(docs)}")#检索返回的内容

Query: What is the nation economic status? Summarize. Keep it under 200 words.

Retrieved documents: 4

print(docs[0])#具体的详情

page_content=‘forward. Of never giving up. A story that is unique among all nations. We are the only country that has emerged from every crisis stronger than when we entered it. That is what we are doing again. Two years ago, our economy was reeling. As I stand here tonight, we have created a record 12 million new jobs, more jobs created in two years than any president has ever created in four years. Two years ago, COVID had shut down our businesses, closed our schools, and robbed us of so much. Today, COVID no longer controls our lives. And two years ago, our democracy faced its greatest threat since the Civil War. Today, though bruised, our democracy remains unbowed and unbroken. As we gather here tonight, we are writing the next chapter in the great American story, a story of progress and resilience. When world leaders ask me to define America, I define our country in one word: Possibilities. You know, we’re often told that Democrats and Republicans can’t work together. But over these past two’ metadata={‘source’: ‘./president-bidens-state-of-the-union-2023/biden-sotu-2023-planned-official.txt’}

for doc in docs:
    doc_details = doc.to_json()['kwargs']
    print("Source: ", doc_details['metadata']['source'])
    print("Text: ", doc_details['page_content'], "\n")

Source: ./president-bidens-state-of-the-union-2023/biden-sotu-2023-planned-official.txt

Text: forward. Of never giving up. A story that is unique among all nations. We are the only country that has emerged from every crisis stronger than when we entered it. That is what we are doing again. Two years ago, our economy was reeling. As I stand here tonight, we have created a record 12 million new jobs, more jobs created in two years than any president has ever created in four years. Two years ago, COVID had shut down our businesses, closed our schools, and robbed us of so much. Today, COVID no longer controls our lives. And two years ago, our democracy faced its greatest threat since the Civil War. Today, though bruised, our democracy remains unbowed and unbroken. As we gather here tonight, we are writing the next chapter in the great American story, a story of progress and resilience. When world leaders ask me to define America, I define our country in one word: Possibilities. You know, we’re often told that Democrats and Republicans can’t work together. But over these past two

Source: ./president-bidens-state-of-the-union-2023/biden-sotu-2023-planned-official.txt

Text: on the state of the union. And here is my report. Because the soul of this nation is strong, because the backbone of this nation is strong, because the people of this nation are strong, the State of the Union is strong. As I stand here tonight, I have never been more optimistic about the future of America. We just have to remember who we are. We are the United States of America and there is nothing, nothingbeyond our capacity if we do it together. May God bless you all. May God protect our troops.

Source: ./president-bidens-state-of-the-union-2023/biden-sotu-2023-planned-official.txt

Text: Americans, we meet tonight at an inflection point. One of those moments that only a few generations ever face, where the decisions we make now will decide the course of this nation and of the world for decades to come. We are not bystanders to history. We are not powerless before the forces that confront us. It is within our power, of We the People. We are facing the test of our time and the time for choosing is at hand. We must be the nation we have always been at our best. Optimistic. Hopeful. Forward-looking. A nation that embraces, light over darkness, hope over fear, unity over division. Stability over chaos. We must see each other not as enemies, but as fellow Americans. We are a good people, the only nation in the world built on an idea. That all of us, every one of us, is created equal in the image of God. A nation that stands as a beacon to the world. A nation in a new age of possibilities. So I have come here to fulfil my constitutional duty to report on the state of the

Source: ./president-bidens-state-of-the-union-2023/biden-sotu-2023-planned-official.txt

Text: about being able to look your kid in the eye and say, “Honey –it’s going to be OK,” and mean it. So, let’s look at the results. Unemployment rate at 3.4%, a 50-year low. Near record low unemployment for Black and Hispanic workers. We’ve already created 800,000 good-paying manufacturing jobs, the fastest growth in 40 years. Where is it written that America can’t lead the world in manufacturing again? For too many decades, we imported products and exported jobs. Now, thanks to all we’ve done, we’re exporting American products and creating American jobs. Inflation has been a global problem because of the pandemic that disrupted supply chains and Putin’s war that disrupted energy and food supplies. But we’re better positioned than any country on Earth. We have more to do, but here at home, inflation is coming down. Here at home, gas prices are down $1.50 a gallon since their peak. Food inflation is coming down. Inflation has fallen every month for the last six months while take home pay