重生之号练废了，我来教你手搓丐版RAG！

2301_80318532

已于 2024-08-19 00:03:55 修改

阅读量838

点赞数 9

文章标签：人工智能

于 2024-08-18 23:09:01 首次发布

本文链接：https://blog.csdn.net/2301_80318532/article/details/141307012

版权

项目名称：丐版RAG智能对话及图片分析机器人

报告日期：2024年8月18日

项目负责人：不愿透露姓名

项目概述：

随着人工智能技术的迅猛发展，智能对话和图片分析系统已成为各种应用场景中的核心组件。从客户服务到社交媒体内容分析，这些技术的需求不断增长。本项目旨在开发一个集成智能对话和图片分析的机器人，提升用户交互体验和信息处理能力。

技术方案与实施步骤

1.模型选择：

本项目使用的向量化模型为NV-Embed-QA，因LLaMA-3.1-405b-Instruct 有大规模参数，能带来的强大语言理解和生成能力，加上指令调优后的高效任务执行能力，使得它在处理复杂的语言任务时表现出色，所以选择它来实现RAG聊天助手。

Phi-3-Vision-128k-Instruct 是一个专注于视觉任务的多模态模型。这个模型结合了自然语言处理和计算机视觉的技术，主要用于处理涉及图像和文本的任务。其强大的视觉和语言处理能力，通过指令调优提高了对复杂任务的执行能力，适用于需要综合视觉和文本信息的多种应用场景。因此选择此模型作为实现多模态的基础。

2.数据的构建：向量化处理采用FAISS进行向量存储。

FAISS（Facebook AI Similarity Search）是一个用于高效相似性搜索和聚类的开源库，特别适合处理大规模向量数据。采用 FAISS 进行向量存储的优势主要包括以下几点：
1.高效的相似性搜索：FAISS专为快速相似性搜索设计，能够处理海量数据集，支持高维向量的快速查找。它通过多种索引结构（如IVF、PQ、HNSW等）来实现高效查询。
2.多种索引类型：FAISS提供多种索引方法，可以根据具体需求选择最合适的索引类型，平衡搜索速度和内存使用。例如，使用量化方法可以显著减少存储空间。
3.支持GPU加速：FAISS支持在GPU上运行，能够利用并行计算加速大规模向量的处理和查询。这大大提升了计算效率，特别是对于大型数据集。
4.灵活的API和易用性：FAISS提供友好的API，易于与其他Python工具和框架集成，便于开发者在不同项目中使用。
5.内存使用优化：FAISS通过各种技术（如向量压缩和索引压缩）降低内存占用，使得可以在资源受限的环境中处理较大的数据集。
6.开源和社区支持：FAISS是一个开源项目，拥有较为活跃的社区支持，可以获得持续的更新和功能扩展。
7.适应性强：FAISS可以用于多种领域的应用，包括图像检索、推荐系统、自然语言处理等，适用场景广泛。

功能整合：

本项目将智能对话和图片分析通过gradio整合在一起。

实施步骤：

环境搭建：

1．首先需要安装Miniconda

2．安装完之后，打开Anaconda Powershell:

3．在打开的终端中按照下面的步骤执行,配置环境:

创建python 3.8虚拟环境：conda create --name ai_endpoint python=3.8
进入虚拟环境：conda activate ai_endpoint
安装nvidia_ai_endpoint工具：pip install langchain-nvidia-ai-endpoints
安装Jupyter Lab：pip install jupyterlab
安装langchain_core：pip install langchain_core
安装langchain：pip install langchain
安装matplotlib：pip install matplotlib
安装Numpy：pip install numpy
安装faiss, 这里如果没有GPU可以安装CPU版本：pip install faiss-cpu==1.7.2
安装OPENAI库：pip install openai

代码实现：

先导入工具包：

from langchain_nvidia_ai_endpoints import ChatNVIDIA
from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings
from langchain.chains import ConversationalRetrievalChain, LLMChain, ConversationChain
from langchain.chains.conversational_retrieval.prompts import CONDENSE_QUESTION_PROMPT, QA_PROMPT
from langchain.chains.question_answering import load_qa_chain
from langchain.memory import ConversationBufferMemory
from langchain.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter

from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain.schema.runnable import RunnableLambda
from langchain.schema.runnable.passthrough import RunnableAssign
from langchain_core.runnables import RunnableBranch
from langchain_core.runnables import RunnablePassthrough

import base64
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image

下面的代码主要确保你的环境中设置了一个有效的 NVIDIA API 密钥：

import getpass
import os

if not os.environ.get("NVIDIA_API_KEY", "").startswith("nvapi-"):
    nvapi_key = getpass.getpass("Enter your NVIDIA API key: ")
    assert nvapi_key.startswith("nvapi-"), f"{nvapi_key[:5]}... is not a valid key"
    os.environ["NVIDIA_API_KEY"] = nvapi_key

import re
from typing import List, Union

import requests
from bs4 import BeautifulSoup

def html_document_loader(url: Union[str, bytes]) -> str:
    #函数 html_document_loader 接受一个参数 url，其类型可以是 str（字符串）或 bytes（字节序列）。
    #这意味着调用该函数时，传入的参数可以是 URL 的字符串形式或者以字节形式表示的 URL。
    
    """
    Loads the HTML content of a document from a given URL and return it's content.

    Args:
        url: The URL of the document.

    Returns:
        The content of the document.

    Raises:
        Exception: If there is an error while making the HTTP request.

    """
    try:
        response = requests.get(url)
        html_content = response.text
    except Exception as e:
        print(f"Failed to load {url} due to exception {e}")
        return ""

    try:
        # 创建Beautiful Soup对象用来解析html
        soup = BeautifulSoup(html_content, "html.parser")

        # 删除脚本和样式标签
        for script in soup(["script", "style"]):
            script.extract()

        # 从 HTML 文档中获取纯文本
        text = soup.get_text()

        # 去除空格换行符
        text = re.sub("\s+", " ", text).strip()

        return text
    except Exception as e:
        print(f"Exception {e} while loading document")
        return ""

下面这段代码的主要功能是从指定的网页加载内容，将内容分割成适合的块，然后生成并存储嵌入。生成的嵌入可以用于各种自然语言处理任务：

def create_embeddings(embedding_path: str = "./embed"):
    #函数 create_embeddings 的参数 embedding_path 类型为 str（字符串）。
    #默认值为 "./embed"，这意味着如果调用该函数时未提供该参数，则使用该默认路径。

    embedding_path = "./embed"
    print(f"Storing embeddings to {embedding_path}")

    # 包含网页列表
    urls = [
         "导入你所要进行向量存储的数据或文档的URLS",
    ]
           
    # 使用html_document_loader对文档数据进行加载
    documents = []
    for url in urls:
        document = html_document_loader(url)
        documents.append(document)

    #进行chunk分词分块处理
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,
        chunk_overlap=0,
        length_function=len,
    )
    texts = text_splitter.create_documents(documents)
    index_docs(url, text_splitter, texts, embedding_path)
    print("Generated embedding successfully")

def index_docs(url: Union[str, bytes], splitter, documents: List[str], dest_embed_dir) -> None:
    """
    Split the document into chunks and create embeddings for the document

    Args:
        url: Source url for the document.
        splitter: Splitter used to split the document
        documents: list of documents whose embeddings needs to be created
        dest_embed_dir: destination directory for embeddings

    Returns:
        None
    """
    # 通过NVIDIAEmbeddings工具类调用NIM中的"NV-Embed-QA"向量化模型
    embeddings = NVIDIAEmbeddings(model="NV-Embed-QA")
    
    for document in documents:
        texts = splitter.split_text(document.page_content)

        # 根据url清洗好的文档内容构建元数据
        metadatas = [document.metadata]

        # 创建embeddings嵌入并通过FAISS进行向量存储
        if os.path.exists(dest_embed_dir):
            update = FAISS.load_local(folder_path=dest_embed_dir, embeddings=embeddings, allow_dangerous_deserialization=True)
            update.add_texts(texts, metadatas=metadatas)
            update.save_local(folder_path=dest_embed_dir)
        else:
            docsearch = FAISS.from_texts(texts, embedding=embeddings, metadatas=metadatas)
            docsearch.save_local(folder_path=dest_embed_dir)

接下来实现多模态功能：

同上面一样，在执行下面代码后，输入所选模型的API密钥：

import getpass
import os

if not os.environ.get("phi_API_KEY", "").startswith("nvapi-"):
    phi_key = getpass.getpass("Enter your API key: ")
    assert phi_key.startswith("nvapi-"), f"{phi_key[:5]}... is not a valid key"
    os.environ["phi_API_KEY"] = nvapi_key

定义一些实现图片分析处理的工具：

import re

# 将 langchain 运行状态下的表保存到全局变量中
def save_table_to_global(x):
    global table
    if 'TABLE' in x.content:
        table = x.content.split('TABLE', 1)[1].split('END_TABLE')[0]
    return x

# helper function 用于Debug
def print_and_return(x):
    print(x)
    return x

# 对打模型生成的代码进行处理, 将注释或解释性文字去除掉, 留下pyhon代码
def extract_python_code(text):
    pattern = r'```python\s*(.*?)\s*```'
    matches = re.findall(pattern, text, re.DOTALL)
    return [match.strip() for match in matches]

# 执行由大模型生成的代码
def execute_and_return(x):
    code = extract_python_code(x.content)[0]
    try:
        result = exec(str(code))
        #print("exec result: "+result)
    except ExceptionType:
        print("The code is not executable, don't give up, try again!")
    return x

# 将图片编码成base64格式, 以方便输入给大模型
def image2b64(image_file):
    with open(image_file, "rb") as f:
        image_b64 = base64.b64encode(f.read()).decode()
        return image_b64

接下来的函数是智能图片分析的实现：

def chart_agent_gr(image_b64, user_input, table):

    image_b64 = image2b64(image_b64)
    # Chart reading Runnable
    chart_reading = ChatNVIDIA(model="microsoft/phi-3-vision-128k-instruct")
    chart_reading_prompt = ChatPromptTemplate.from_template(
        'Generate underlying data table of the figure below, : <img src="data:image/png;base64,{image_b64}" />'
    )
    chart_chain = chart_reading_prompt | chart_reading

    instruct_chat = ChatNVIDIA(model="meta/llama3-70b-instruct") #可用中文指令

    instruct_prompt = ChatPromptTemplate.from_template(
        "Do NOT repeat my requirements already stated. Based on this table {table}, {input}" \
        "If has table string, start with 'TABLE', end with 'END_TABLE'." \
        "If has code, start with '```python' and end with '```'." \
        "Do NOT include table inside code, and vice versa."
    )
    instruct_chain = instruct_prompt | instruct_chat

    # 根据“表格”决定是否读取图表
    chart_reading_branch = RunnableBranch(
        (lambda x: x.get('table') is None, RunnableAssign({'table': chart_chain })),
        (lambda x: x.get('table') is not None, lambda x: x),
        lambda x: x
    )
    
    # 根据需求更新table
    update_table = RunnableBranch(
        (lambda x: 'TABLE' in x.content, save_table_to_global),
        lambda x: x
    )

    execute_code = RunnableBranch(
        (lambda x: '```python' in x.content, execute_and_return_gr),
        lambda x: x
    )
    
    # 执行绘制图表的代码
    chain = (
        chart_reading_branch
        | RunnableLambda(print_and_return)
        | instruct_chain
        | RunnableLambda(print_and_return)
        | update_table
        | execute_code
    )

    return chain.invoke({"image_b64": image_b64, "input": user_input, "table": table})

最后我们将智能对话实现封装，以及将智能对话和图片分析的功能用gradio集合起来

import gradio as gr
from typing import Union

def qa_chat_with_memory(query: str) -> str:
    # 初始化或加载相关模型和工具
    embedding_model = NVIDIAEmbeddings(model="NV-Embed-QA")
    embedding_path = "embed/"
    docsearch = FAISS.load_local(folder_path=embedding_path, embeddings=embedding_model, allow_dangerous_deserialization=True)
    
    llm = ChatNVIDIA(model="mistralai/mixtral-8x7b-instruct-v0.1")
    memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
    question_generator = LLMChain(llm=llm, prompt=CONDENSE_QUESTION_PROMPT)
    chat = ChatNVIDIA(model="mistralai/mixtral-8x7b-instruct-v0.1", temperature=0.1, max_tokens=1000, top_p=1.0)
    doc_chain = load_qa_chain(chat, chain_type="stuff", prompt=QA_PROMPT)

    qa = ConversationalRetrievalChain(
        retriever=docsearch.as_retriever(),
        combine_docs_chain=doc_chain,
        memory=memory,
        question_generator=question_generator,
    )

    result = qa({"question": query})
    return result.get("answer")

def clear_textboxes():
    return "", ""  # Return empty strings to clear both text input fields


# 使用 Gradio Blocks 进行组合
with gr.Blocks() as demo:
    with gr.Tab("Q&A Chat"):
        gr.Markdown("### Q&A Chat")
        with gr.Row():
            with gr.Column():
                text_input = gr.Textbox(label="Question", placeholder="Enter your query here...")
                with gr.Row():
                    chat_button = gr.Button("Submit")
                    clear_button = gr.Button("Clear")
            with gr.Column():
                text_output = gr.Textbox(label="Answer", interactive=False)
        
        chat_button.click(fn=qa_chat_with_memory, inputs=[text_input], outputs=[text_output])
        clear_button.click(fn=clear_textboxes, outputs=[text_input, text_output])

    with gr.Tab("Multi Modal Chat Agent"):
        gr.Markdown("### Multi Modal Chat Agent")
        with gr.Row():
            with gr.Column():
                image_input = gr.Image(label="Upload Image", type="filepath")
                text_input = gr.Textbox(label="Question")
                with gr.Row():
                    process_button = gr.Button("Process")
                    clear_button = gr.Button("Clear")
            with gr.Column():
                image_output = gr.Image(label="Output Image")
        
        process_button.click(fn=chart_agent_gr, inputs=[image_input, text_input], outputs=[image_output])
        clear_button.click(fn=lambda: ("", "", ""), outputs=[image_input, text_input, image_output])

# 启动服务器
demo.launch(debug=True, share=False, show_api=False, server_port=5000, server_name="127.0.0.1")

最后，我们就实现了一个集智能对话与图片分析于一体的机器人啦。

效果图：