菜鸟三天手搓一个RAG机器人的感受--参加英伟达夏令营有感

Captain LUNA

已于 2024-08-18 23:02:11 修改

阅读量794

点赞数 14

分类专栏：智能体文章标签：机器人 python

于 2024-08-18 22:14:00 首次发布

本文链接：https://blog.csdn.net/2401_86663727/article/details/141305972

版权

智能体专栏收录该内容

1 篇文章 0 订阅

订阅专栏

智能体是今年的AI十大趋势之一。以后智能体比人多。

作为一个将来想投身AI心理情感陪伴赛道的人，只会用智能体是不够的，想给用户更深度的体验，得了解AI归因可解释性，因此展开学习，这是动力，也是来到英伟达训练营的原因。

虽然老黄说将来不需要程序员了，但架构能力、代码能力，有比无好，因为方便应用层的理解。

三天训练营时间比较紧，加上基础薄弱，手搓起来比较困难，这就更要感谢英伟达各位老师和训练营的同学，不辞劳苦、耐心细致手把手讲解。也顺便看了一些LLM论文，读了一些简洁流畅的优秀代码。也向各位AI前路同学致敬，这工作确实烧脑，但很有价值感。

下面是项目报告，很粗糙，供参考，也是走过的路痕迹，将来AI心理情感陪伴会更好，我们的情绪不再难以安放，我们很快摆脱负面情绪，我们看到自己，心理健康和身体健康同步提升。

NVIDIA AI-AGENT夏季训练营

项目名称：AI-AGENT夏季训练营 — RAG智能对话机器人

报告日期：2024年8月18日

项目负责人：LUNA船长

项目概述：

多模态Agent
from langchain_nvidia_ai_endpoints import ChatNVIDIA
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain.schema.runnable import RunnableLambda
from langchain.schema.runnable.passthrough import RunnableAssign
from langchain_core.runnables import RunnableBranch
from langchain_core.runnables import RunnablePassthrough
 
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory

import os
import base64
import matplotlib.pyplot as plt
import numpy as np

os.environ["NVIDIA_API_KEY"] = "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"

ChatNVIDIA.get_available_models()
def image2b64(image_file):
    with open(image_file, "rb") as f:
        image_b64 = base64.b64encode(f.read()).decode()
        return image_b64

image_b64 = image2b64("economic-assistance-chart.png")
# image_b64 = image2b64("eco-good-bad-chart.png")
from PIL import Image
display(Image.open("economic-assistance-chart.png"))
chart_reading = ChatNVIDIA(model="microsoft/phi-3-vision-128k-instruct")
result = chart_reading.invoke(f'Generate underlying data table of the figure below, : <img src="data:image/png;base64,{image_b64}" />')
print(result.content)
instruct_chat = ChatNVIDIA(model="ai-llama3-70b")
# result = instruct_chat.invoke('How to implement Fibonacci in python using dynamic programming')
result = instruct_chat.invoke('怎么用 Python 实现快速排序')
print(result.content)
import re

# 将 langchain 运行状态下的表保存到全局变量中
def save_table_to_global(x):
    global table
    if 'TABLE' in x.content:
        table = x.content.split('TABLE', 1)[1].split('END_TABLE')[0]
    return x

# helper function 用于Debug
def print_and_return(x):
    print(x)
    return x

# 对打模型生成的代码进行处理, 将注释或解释性文字去除掉, 留下pyhon代码
def extract_python_code(text):
    pattern = r'```python\s*(.*?)\s*```'
    matches = re.findall(pattern, text, re.DOTALL)
    return [match.strip() for match in matches]

# 执行由大模型生成的代码
def execute_and_return(x):
    code = extract_python_code(x.content)[0]
    try:
        result = exec(str(code))
        #print("exec result: "+result)
    except ExceptionType:
        print("The code is not executable, don't give up, try again!")
    return x

# 将图片编码成base64格式, 以方便输入给大模型
def image2b64(image_file):
    with open(image_file, "rb") as f:
        image_b64 = base64.b64encode(f.read()).decode()
        return image_b64

定义多模态
def chart_agent(image_b64, user_input, table):
    # Chart reading Runnable
    chart_reading = ChatNVIDIA(model="microsoft/phi-3-vision-128k-instruct")
    chart_reading_prompt = ChatPromptTemplate.from_template(
        'Generate underlying data table of the figure below, : <img src="data:image/png;base64,{image_b64}" />'
    )
    chart_chain = chart_reading_prompt | chart_reading

    # Instruct LLM Runnable
    # instruct_chat = ChatNVIDIA(model="nv-mistralai/mistral-nemo-12b-instruct")
    # instruct_chat = ChatNVIDIA(model="meta/llama-3.1-8b-instruct")
    #instruct_chat = ChatNVIDIA(model="ai-llama3-70b")
    instruct_chat = ChatNVIDIA(model="meta/llama-3.1-405b-instruct")

    instruct_prompt = ChatPromptTemplate.from_template(
        "Do NOT repeat my requirements already stated. Based on this table {table}, {input}" \
        "If has table string, start with 'TABLE', end with 'END_TABLE'." \
        "If has code, start with '```python' and end with '```'." \
        "Do NOT include table inside code, and vice versa."
    )
    instruct_chain = instruct_prompt | instruct_chat

    # 根据“表格”决定是否读取图表
    chart_reading_branch = RunnableBranch(
        (lambda x: x.get('table') is None, RunnableAssign({'table': chart_chain })),
        (lambda x: x.get('table') is not None, lambda x: x),
        lambda x: x
    )
    # 根据需求更新table
    update_table = RunnableBranch(
        (lambda x: 'TABLE' in x.content, save_table_to_global),
        lambda x: x
    )
    # 执行绘制图表的代码
    execute_code = RunnableBranch(
        (lambda x: '```python' in x.content, execute_and_return),
        lambda x: x
    )

    chain = (
        chart_reading_branch
        #| RunnableLambda(print_and_return)
        | instruct_chain
        #| RunnableLambda(print_and_return)
        | update_table
        | execute_code
    )

return chain.invoke({"image_b64": image_b64, "input": user_input, "table": table}).content


第四步封装进Gradio
cur_dir = os.getcwd()
global img_path
# img_path = '/home/nvidia/2024_summer_bootcamp/day3/'+'image.png'
img_path = os.path.join(cur_dir, 'image.png')
print(img_path)

def execute_and_return_gr(x):
    code = extract_python_code(x.content)[0]
    try:
        result = exec(str(code))
        #print("exec result: "+result)
    except ExceptionType:
        print("The code is not executable, don't give up, try again!")
    return img_path

def chart_agent_gr(image_b64, user_input, table):

    image_b64 = image2b64(image_b64)
    # Chart reading Runnable
    chart_reading = ChatNVIDIA(model="microsoft/phi-3-vision-128k-instruct")
    chart_reading_prompt = ChatPromptTemplate.from_template(
        'Generate underlying data table of the figure below, : <img src="data:image/png;base64,{image_b64}" />'
    )
    chart_chain = chart_reading_prompt | chart_reading

    # Instruct LLM Runnable
    # instruct_chat = ChatNVIDIA(model="nv-mistralai/mistral-nemo-12b-instruct")
    # instruct_chat = ChatNVIDIA(model="meta/llama-3.1-8b-instruct")
    #instruct_chat = ChatNVIDIA(model="ai-llama3-70b")
    instruct_chat = ChatNVIDIA(model="meta/llama-3.1-405b-instruct")

    instruct_prompt = ChatPromptTemplate.from_template(
        "Do NOT include table inside code, and vice versa."
        "You are a pshchological coach. You are positive, shining.”
        “You conclude user’s emotions according their photos by their face and gesture.”
        “You transfer user’s negative words to positive words.”
        “You support user not to stay in the negative emotions and help them to train mental strength as health exercise.”
        “Do NOT judge user."


    )
    instruct_chain = instruct_prompt | instruct_chat

    # 根据“表格”决定是否读取图表
    chart_reading_branch = RunnableBranch(
        (lambda x: x.get('table') is None, RunnableAssign({'table': chart_chain })),
        (lambda x: x.get('table') is not None, lambda x: x),
        lambda x: x
    )
    
    # 根据需求更新table
    update_table = RunnableBranch(
        (lambda x: 'TABLE' in x.content, save_table_to_global),
        lambda x: x
    )

    execute_code = RunnableBranch(
        (lambda x: '```python' in x.content, execute_and_return_gr),
        lambda x: x
    )
    
    # 执行绘制图表的代码
    chain = (
        chart_reading_branch
        | RunnableLambda(print_and_return)
        | instruct_chain
        | RunnableLambda(print_and_return)
        | update_table
        | execute_code
    )

    return chain.invoke({"image_b64": image_b64, "input": user_input, "table": table})
执行Gradio对话
user_input = "replace table string's 'UK' with 'United Kingdom', draw this table as stacked bar chart in python, and save the image in path: "+img_path
print(user_input)

import gradio as gr
multi_modal_chart_agent = gr.Interface(fn=chart_agent_gr,
                    inputs=[gr.Image(label="Upload image", type="filepath"), 'text'],
                    outputs=['image'],
                    title="Multi Modal chat agent",
                    description="Multi Modal chat agent",
                    allow_flagging="never")

multi_modal_chart_agent.launch(debug=True, share=False, show_api=False, server_port=5001, server_name="0.0.0.0")

当今社会70%的人口存在焦虑，特别是青少年群体抑郁筛出率24.6%，而整个社会缺少一套支持心理健康的整体解决方案。心理咨询师行业存在巨大缺口，与学校、职场等卷王场景缺少对接流畅机制，人人需要心理健康服务，这和身体健康服务一样重要甚至更重要，而中国心理咨询师仅有不到130万人，且服务无法标准化，过于依赖个体咨询师主观知识、两次咨询之间无法联系、不能近距离陪伴、时间资源限制1v1咨询方式等行业四大壁垒造成无法发展壮大，心理健康行业需要更好的发展模式，而人工智能提供了这样的机会跨越行业壁垒。

我们将训练一个不但能懂用户心理，能通过文字对话交互去理解支持用户快速摆脱负面情绪，而且能视觉识别用户微表情、判断用户心理的、全面立体更懂用户的RAG对话多模态机器人，解决心理健康行业传统四大壁垒，让世界更美好。

技术方案与实施步骤

模型选择选择大模型包括GPT4o、GLM4、BGE、Phi-3、Nvidia的Nv-embed-v1，原因是4o的情感表达力好、GLM4组织、BGE中文语料最全、Phi-3有识图、Nvidia表现佳，使用FAISS支持快速检索，使用检索模型BM25。
1.数据收集：a.收集心理学相关练习，使用了自有知识库，通过实战练习验证过滤了各类心理学流派中有效的部分；2.数据预处理：a. 对自有积累数据库进行清洗，删除冗余信息、过滤噪声、处理特殊字符等组织成PDF格式;b.选择了一些积极心理学的网站b，使用爬虫收集，根据预定义列表，停用词列表去除对语义无关紧要的词向量化处理方法使用Numpy；c.进行标准，分层分类，按照领域、主题、重要性分类，用于后续检索和生成阶段的处理。3.索引构建：使用词嵌入或句子嵌入BERT将文本转化为向量表示。使用FAISS构建高效的向量索引，支持快速检索。4.生成模型集成：使用4o生成，使用nv-embed-v1检索。
功能整合：语音功能整合策略和方法：解码使用Whisper，对输入的人类语音识别转化为文本，送进Pipeline，让大模型处理，编码使用TTS，将文本转化为人类语音输出。
Agent功能使用LangChain规划流程，用gradio展现。
多模型的视觉识别的构建基于Nim构建，使用Phi-3-Vision解析图片数据。

实施步骤：

环境搭建：

搭建环境包括安装FAISS、Langchain、base64、langchain_nvidia_ai_endpoint、numpy。他们的功能是：
langchain_nvidia_ai_endpoint: 用来调用nvidia nim的计算资源
langchain: 用来构建对话链, 将智能体的各个组件串联起来
base64: 因为本实验是构建多模态的智能体, 需要base64来对图像进行编解码
进入NVIDIA NIM | phi-3-vision-128k-instruct, 点击Get API Key按钮，生成一个秘钥
2.代码实现：
1. 使用Phi-3-vision来编解码
2. 调用Phi-3-vision或Llmam3-70b
3. 将统计图标转化为Python可分析的数据
4. 到达Agent工作流