第二期书生浦语大模型实战营第六次课程笔记----Lagent & AgentLego 智能体应用搭建

haidizym

已于 2024-05-08 14:48:40 修改

阅读量294

点赞数 2

文章标签：笔记人工智能自然语言处理

于 2024-04-21 20:43:52 首次发布

本文链接：https://blog.csdn.net/haidizym/article/details/138044672

版权

【第六节】: 【Lagent & AgentLego 智能体应用搭建】
1、Agent 理论及 Lagent&AgentLego 开源产品介绍
2、Lagent 调用已有 Arxiv 论文搜索工具实战
3、Lagent 新增自定义工具实战（以查询天气的工具为例）
4、AgentLego 新增 MagicMaker 文生图工具实战

【视频地址】：https://www.bilibili.com/video/BV1Xt4217728/
【课程文档】：https://github.com/InternLM/Tutorial/tree/camp2/agent
【课程作业】：https://github.com/InternLM/Tutorial/blob/camp2/agent/homework.md
【操作平台】：https://studio.intern-ai.org.cn/console/instance/
【lagent文档】： https://github.com/InternLM/Tutorial/blob/camp2/agent/lagent.md
【agentlego文档】：https://github.com/InternLM/Tutorial/blob/camp2/agent/agentlego.md
【lagent自定义工具】： https://lagent.readthedocs.io/zh-cn/latest/tutorials/action.html
【agentlego自定义工具】： https://agentlego.readthedocs.io/zh-cn/latest/modules/tool.html

agent理论

大语言模型的局限性：幻觉、时效性，可靠性
智能体定义：感知、决策、行动
智能体组成：感知、大脑、动作
智能体类型：ReAct（选择工具），ReWoo（计划拆分），AutoGPT（人工干预）
在这里插入图片描述

Lagent&AgentLego开源框架

Lagent一个轻量级开源智能体框架，旨在让用户可以高效地构建基于大语言模型的智能体。支持多种智能体范式（ReAct，ReWoo，AutoGPT），支持多种工具（如谷歌搜索、python解释器等）
AgentLego一个多模态工具包，旨在像乐高积木，可以快速简便地拓展自定义工具，从而组装出自己的智能体。支持多个智能体框架（如lagent，LangChain，Transformers Agents）,提供大量视觉、多模态领域前沿算法
在这里插入图片描述

Lagent & AgentLego 智能体应用案例汇总表

在这里插入图片描述

总环境配置：

mkdir -p /root/agent
studio-conda -t agent -o pytorch-2.1.2
cd /root/agent
conda activate agent
git clone https://gitee.com/internlm/lagent.git
cd lagent && git checkout 581d9fb && pip install -e . && cd ..
git clone https://gitee.com/internlm/agentlego.git
cd agentlego && git checkout 7769e0d && pip install -e . && cd ..
conda activate agent
pip install lmdeploy==0.3.0
cd /root/agent
git clone -b camp2 https://gitee.com/internlm/Tutorial.git

案例1：Lagent+ArxivSearch

问题：
请帮我搜索 InternLM2 Technical Report
资源：

部署：

#vscode terminal：Imdeploy的api server
conda activate agent
lmdeploy serve api_server /root/share/new_models/Shanghai_AI_Laboratory/internlm2-chat-7b \
                            --server-name 127.0.0.1 \
                            --model-name internlm2-chat-7b \
                            --cache-max-entry-count 0.1

#new vscode terminal：streamlit web demo
conda activate agent
cd /root/agent/lagent/examples
streamlit run internlm2_agent_web_demo.py --server.address 127.0.0.1 --server.port 7860

#终端映射powershell
ssh -CNg -L 7860:127.0.0.1:7860 -L 23333:127.0.0.1:23333 root@ssh.intern-ai.org.cn -p 你的 ssh 端口号

使用：
浏览器： http://localhost:7860
Web页面设置：
模型IP：127.0.0.1:23333
插件选择:ArxivSearch
效果：
在这里插入图片描述

案例2：Lagent+WeatherQuery

问题：
请帮我查询上海的天气
资源：

#自定义天气查询工具touch /root/agent/lagent/lagent/actions/weather.py
import json
import os
import requests
from typing import Optional, Type

from lagent.actions.base_action import BaseAction, tool_api
from lagent.actions.parser import BaseParser, JsonParser
from lagent.schema import ActionReturn, ActionStatusCode

class WeatherQuery(BaseAction):
    """Weather plugin for querying weather information."""
    
    def __init__(self,
                 key: Optional[str] = None,
                 description: Optional[dict] = None,
                 parser: Type[BaseParser] = JsonParser,
                 enable: bool = True) -> None:
        super().__init__(description, parser, enable)
        key = os.environ.get('WEATHER_API_KEY', key)
        if key is None:
            raise ValueError(
                'Please set Weather API key either in the environment '
                'as WEATHER_API_KEY or pass it as `key`')
        self.key = key
        self.location_query_url = 'https://geoapi.qweather.com/v2/city/lookup'
        self.weather_query_url = 'https://devapi.qweather.com/v7/weather/now'

    @tool_api
    def run(self, query: str) -> ActionReturn:
        """一个天气查询API。可以根据城市名查询天气信息。
        
        Args:
            query (:class:`str`): The city name to query.
        """
        tool_return = ActionReturn(type=self.name)
        status_code, response = self._search(query)
        if status_code == -1:
            tool_return.errmsg = response
            tool_return.state = ActionStatusCode.HTTP_ERROR
        elif status_code == 200:
            parsed_res = self._parse_results(response)
            tool_return.result = [dict(type='text', content=str(parsed_res))]
            tool_return.state = ActionStatusCode.SUCCESS
        else:
            tool_return.errmsg = str(status_code)
            tool_return.state = ActionStatusCode.API_ERROR
        return tool_return
    
    def _parse_results(self, results: dict) -> str:
        """Parse the weather results from QWeather API.
        
        Args:
            results (dict): The weather content from QWeather API
                in json format.
        
        Returns:
            str: The parsed weather results.
        """
        now = results['now']
        data = [
            f'数据观测时间: {now["obsTime"]}',
            f'温度: {now["temp"]}°C',
            f'体感温度: {now["feelsLike"]}°C',
            f'天气: {now["text"]}',
            f'风向: {now["windDir"]}，角度为 {now["wind360"]}°',
            f'风力等级: {now["windScale"]}，风速为 {now["windSpeed"]} km/h',
            f'相对湿度: {now["humidity"]}',
            f'当前小时累计降水量: {now["precip"]} mm',
            f'大气压强: {now["pressure"]} 百帕',
            f'能见度: {now["vis"]} km',
        ]
        return '\n'.join(data)

    def _search(self, query: str):
        # get city_code
        try:
            city_code_response = requests.get(
                self.location_query_url,
                params={'key': self.key, 'location': query}
            )
        except Exception as e:
            return -1, str(e)
        if city_code_response.status_code != 200:
            return city_code_response.status_code, city_code_response.json()
        city_code_response = city_code_response.json()
        if len(city_code_response['location']) == 0:
            return -1, '未查询到城市'
        city_code = city_code_response['location'][0]['id']
        # get weather
        try:
            weather_response = requests.get(
                self.weather_query_url,
                params={'key': self.key, 'location': city_code}
            )
        except Exception as e:
            return -1, str(e)
        return weather_response.status_code, weather_response.json()

#获取天气查询API key： https://dev.qweather.com/docs/api/

部署：

#vscode terminal：Imdeploy的api serve
conda activate agent
lmdeploy serve api_server /root/share/new_models/Shanghai_AI_Laboratory/internlm2-chat-7b \
                            --server-name 127.0.0.1 \
                            --model-name internlm2-chat-7b \
                            --cache-max-entry-count 0.1

#new vscode terminal：启动并部署web demo
export WEATHER_API_KEY=在2.2节获取的API KEY
# 比如 export WEATHER_API_KEY=1234567890abcdef
conda activate agent
cd /root/agent/Tutorial/agent
streamlit run internlm2_weather_web_demo.py --server.address 127.0.0.1 --server.port 7860

#终端映射powershell
ssh -CNg -L 7860:127.0.0.1:7860 -L 23333:127.0.0.1:23333 root@ssh.intern-ai.org.cn -p 你的 ssh 端口号

使用：
浏览器： http://localhost:7860
Web页面设置：
模型IP：127.0.0.1:23333
插件选择:WeatherQuery
效果：

在这里插入图片描述

案例3：Lagent+agentlego+ObjectDetection

问题：
请检测图中的物体
资源：

cd /root/agent
wget http://download.openmmlab.com/agentlego/road.jpg
conda activate agent
pip install openmim==0.3.9
mim install mmdet==3.3.0

#新建目标检测工具文件：touch /root/agent/direct_use.py
import re

import cv2
from agentlego.apis import load_tool

# load tool
tool = load_tool('ObjectDetection', device='cuda')

# apply tool
visualization = tool('/root/agent/road.jpg')
print(visualization)

# visualize
image = cv2.imread('/root/agent/road.jpg')

preds = visualization.split('\n')
pattern = r'(\w+) \((\d+), (\d+), (\d+), (\d+)\), score (\d+)'

for pred in preds:
    name, x1, y1, x2, y2, score = re.match(pattern, pred).groups()
    x1, y1, x2, y2, score = int(x1), int(y1), int(x2), int(y2), int(score)
    cv2.rectangle(image, (x1, y1), (x2, y2), (0, 255, 0), 1)
    cv2.putText(image, f'{name} {score}', (x1, y1), cv2.FONT_HERSHEY_SIMPLEX, 0.8, (0, 255, 0), 1)

cv2.imwrite('/root/agent/road_detection_direct.jpg', image)

#修改相关文件：/root/agent/agentlego/webui/modules/agents/lagent_agent.py 文件的第 105行位置，将 internlm2-chat-20b 修改为 internlm2-chat-7b，
def llm_internlm2_lmdeploy(cfg):
    url = cfg['url'].strip()
    llm = LMDeployClient(
-         model_name='internlm2-chat-20b',
+         model_name='internlm2-chat-7b',
        url=url,
        meta_template=INTERNLM2_META,
        top_p=0.8,
        top_k=100,
        temperature=cfg.get('temperature', 0.7),
        repetition_penalty=1.0,
        stop_words=['<|im_end|>'])
    return llm

部署：

#vscode terminal：Imdeploy的api serve
conda activate agent
lmdeploy serve api_server /root/share/new_models/Shanghai_AI_Laboratory/internlm2-chat-7b \
                            --server-name 127.0.0.1 \
                            --model-name internlm2-chat-7b \
                            --cache-max-entry-count 0.1

#new vscode terminal：启动 AgentLego WebUI
conda activate agent
cd /root/agent/agentlego/webui
python one_click.py

#终端映射powershell
ssh -CNg -L 7860:127.0.0.1:7860 -L 23333:127.0.0.1:23333 root@ssh.intern-ai.org.cn -p 你的 ssh 端口号

使用：
浏览器： http://localhost:7860
agent配置、加载，tools配置，chat选择

点击上方 Agent 进入 Agent 配置页面。（如①所示）
点击 Agent 下方框，选择 New Agent。（如②所示）
选择 Agent Class 为 lagent.InternLM2Agent。（如③所示）
输入模型 URL 为 http://127.0.0.1:23333 。（如④所示）
输入 Agent name，自定义即可，图中输入了 internlm2。（如⑤所示）
点击 save to 以保存配置，这样在下次使用时只需在第2步时选择 Agent 为 internlm2 后点击 load 以加载就可以了。（如⑥所示）
点击 load 以加载配置。（如⑦所示）

在这里插入图片描述

点击上方 Tools 页面进入工具配置页面。（如①所示）
点击 Tools 下方框，选择 New Tool 以加载新工具。（如②所示）
选择 Tool Class 为 ObjectDetection。（如③所示）
点击 save 以保存配置。（如④所示）

在这里插入图片描述

等待工具加载完成后，点击上方 Chat 以进入对话页面。在页面下方选择工具部分只选择 ObjectDetection 工具，如下图所示。为了确保调用工具的成功率，请在使用时确保仅有这一个工具启用。

在这里插入图片描述

效果：
在这里插入图片描述

案例4：Lagent+agentlego+ MagicMakerImageGeneration

问题：
请帮我生成一幅山水画
资源：

#新建目标检测工具文件：
#touch /root/agent/agentlego/agentlego/tools/magicmaker_image_generation.py
import json
import requests

import numpy as np

from agentlego.types import Annotated, ImageIO, Info
from agentlego.utils import require
from .base import BaseTool


class MagicMakerImageGeneration(BaseTool):

    default_desc = ('This tool can call the api of magicmaker to '
                    'generate an image according to the given keywords.')

    styles_option = [
        'dongman',  # 动漫
        'guofeng',  # 国风
        'xieshi',   # 写实
        'youhua',   # 油画
        'manghe',   # 盲盒
    ]
    aspect_ratio_options = [
        '16:9', '4:3', '3:2', '1:1',
        '2:3', '3:4', '9:16'
    ]

    @require('opencv-python')
    def __init__(self,
                 style='guofeng',
                 aspect_ratio='4:3'):
        super().__init__()
        if style in self.styles_option:
            self.style = style
        else:
            raise ValueError(f'The style must be one of {self.styles_option}')
        
        if aspect_ratio in self.aspect_ratio_options:
            self.aspect_ratio = aspect_ratio
        else:
            raise ValueError(f'The aspect ratio must be one of {aspect_ratio}')

    def apply(self,
              keywords: Annotated[str,
                                  Info('A series of Chinese keywords separated by comma.')]
        ) -> ImageIO:
        import cv2
        response = requests.post(
            url='https://magicmaker.openxlab.org.cn/gw/edit-anything/api/v1/bff/sd/generate',
            data=json.dumps({
                "official": True,
                "prompt": keywords,
                "style": self.style,
                "poseT": False,
                "aspectRatio": self.aspect_ratio
            }),
            headers={'content-type': 'application/json'}
        )
        image_url = response.json()['data']['imgUrl']
        image_response = requests.get(image_url)
        image = cv2.imdecode(np.frombuffer(image_response.content, np.uint8), cv2.IMREAD_COLOR)
        return ImageIO(image)

#注册新工具，修改 /root/agent/agentlego/agentlego/tools/__init__.py 文件，将我们的工具注册在工具列表中。
from .base import BaseTool
from .calculator import Calculator
from .func import make_tool
from .image_canny import CannyTextToImage, ImageToCanny
from .image_depth import DepthTextToImage, ImageToDepth
from .image_editing import ImageExpansion, ImageStylization, ObjectRemove, ObjectReplace
from .image_pose import HumanBodyPose, HumanFaceLandmark, PoseToImage
from .image_scribble import ImageToScribble, ScribbleTextToImage
from .image_text import ImageDescription, TextToImage
from .imagebind import AudioImageToImage, AudioTextToImage, AudioToImage, ThermalToImage
from .object_detection import ObjectDetection, TextToBbox
from .ocr import OCR
from .scholar import *  # noqa: F401, F403
from .search import BingSearch, GoogleSearch
from .segmentation import SegmentAnything, SegmentObject, SemanticSegmentation
from .speech_text import SpeechToText, TextToSpeech
from .translation import Translation
from .vqa import VQA
+ from .magicmaker_image_generation import MagicMakerImageGeneration

__all__ = [
    'CannyTextToImage', 'ImageToCanny', 'DepthTextToImage', 'ImageToDepth',
    'ImageExpansion', 'ObjectRemove', 'ObjectReplace', 'HumanFaceLandmark',
    'HumanBodyPose', 'PoseToImage', 'ImageToScribble', 'ScribbleTextToImage',
    'ImageDescription', 'TextToImage', 'VQA', 'ObjectDetection', 'TextToBbox', 'OCR',
    'SegmentObject', 'SegmentAnything', 'SemanticSegmentation', 'ImageStylization',
    'AudioToImage', 'ThermalToImage', 'AudioImageToImage', 'AudioTextToImage',
    'SpeechToText', 'TextToSpeech', 'Translation', 'GoogleSearch', 'Calculator',
-     'BaseTool', 'make_tool', 'BingSearch'
+     'BaseTool', 'make_tool', 'BingSearch', 'MagicMakerImageGeneration'
]

部署：

#vscode terminal：Imdeploy的api serve
conda activate agent
lmdeploy serve api_server /root/share/new_models/Shanghai_AI_Laboratory/internlm2-chat-7b \
                            --server-name 127.0.0.1 \
                            --model-name internlm2-chat-7b \
                            --cache-max-entry-count 0.1

#new vscode terminal：启动 AgentLego WebUI
conda activate agent
cd /root/agent/agentlego/webui
python one_click.py

#终端映射powershell
ssh -CNg -L 7860:127.0.0.1:7860 -L 23333:127.0.0.1:23333 root@ssh.intern-ai.org.cn -p 你的 ssh 端口号

使用：
浏览器： http://localhost:7860
agent配置、加载，tools配置，chat选择
效果：
在这里插入图片描述

附加：使用iPythoninterpreter

https://github.com/InternLM/lagent/blob/main/lagent/actions/ipython_interpreter.py
在这里插入图片描述
有人研究过这4个工具定义文件功能有什么不同吗？
ipythoninteraactive
ipythoninterpreter
ipythonmanager（暂时不知）
pythoninterpreter

在这里插入图片描述
这里加插件，然后model部分改成你要的

备注：来自网址https://boiled-ginger-cbc.notion.site/Lagent-80ac842782f54171a6443db3e385e846
看看本次实验使用了哪些包

import copy
import os

import streamlit as st 
from streamlit.logger import get_logger

# 这里应该可以可以总共支持多少的 actions
from lagent.actions import ActionExecutor, GoogleSearch, PythonInterpreter


from lagent.agents.react import ReAct
'''
ReAct 使用到的地方 -- 看上去就是初始化要用的 model 和 action
def initialize_chatbot(self, model, plugin_action):
        """Initialize the chatbot with the given model and plugin actions."""
        return ReAct(
            llm=model, action_executor=ActionExecutor(actions=plugin_action))
'''

# 这就是字面意思 -- GPT的API
from lagent.llms import GPTAPI
'''
看下使用到的地方 -- 这里就是初始化 model 有多少个选择的地方 -- 没有看到输入 api key 的地方
想要增加 model 看这里
def init_model(self, option):
        """Initialize the model based on the selected option."""
        if option not in st.session_state['model_map']:
            if option.startswith('gpt'):
                st.session_state['model_map'][option] = GPTAPI(
                    model_type=option)
            else:
                st.session_state['model_map'][option] = HFTransformerCasualLM(
                    '/root/model/Shanghai_AI_Laboratory/internlm-chat-7b')
        return st.session_state['model_map'][option]
'''
# 加载 hf 上的 model 的方式
from lagent.llms.huggingface import HFTransformerCasualLM

在当前代码的父目录下初始化一个 tmp_dir 的文件夹，然后调用 main 函数

if __name__ == '__main__':
    root_dir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
    root_dir = os.path.join(root_dir, 'tmp_dir')
    os.makedirs(root_dir, exist_ok=True)
    main()

我们也能看到它哈，但是是空的，估计有一些要保存什么东西的指令才会保存吧
在这里插入图片描述
ok 进入 main 函数

def main():
		# 初始化日志
    logger = get_logger(__name__)
    # Initialize Streamlit UI and setup sidebar
    # 初始化Streamlit UI
    if 'ui' not in st.session_state:
        session_state = SessionState() # 初始化一个会话状态
        session_state.init_state()
        st.session_state['ui'] = StreamlitUI(session_state) # 初始UI

    else:
		    # 设置 UI 参数：这里可以修改 显示title 和图片 
        st.set_page_config(
            layout='wide',
            page_title='lagent-web',
            page_icon='./docs/imgs/lagent_icon.png')
        # st.header(':robot_face: :blue[Lagent] Web Demo ', divider='rainbow')
    # 获取 UI 中的设置：选择了什么模型、插件、上传文件 
    # -- 这里没有循环调用，重新选择后可能需要刷新一下网页
    model_name, model, plugin_action, uploaded_file = st.session_state[
        'ui'].setup_sidebar()

    # Initialize chatbot if it is not already initialized
    # or if the model has changed
    # 修改模型这里要重新加载
    if 'chatbot' not in st.session_state or model != st.session_state[
            'chatbot']._llm:
        st.session_state['chatbot'] = st.session_state[
            'ui'].initialize_chatbot(model, plugin_action)
		# 获取 user 和 assistant 的对话，并且显示到界面
    for prompt, agent_return in zip(st.session_state['user'],
                                    st.session_state['assistant']):
        st.session_state['ui'].render_user(prompt)
        st.session_state['ui'].render_assistant(agent_return)
        
    # User input form at the bottom (this part will be at the bottom)
    # with st.form(key='my_form', clear_on_submit=True):
		# ":=" 是一个赋值运算符，
		# `st.chat_input('')` 是一个函数调用，它会创建一个文本框，并等待用户在文本框中输入值。
		# 所以这句话是 检测是否用户有输入
    if user_input := st.chat_input(''):
        st.session_state['ui'].render_user(user_input)
        st.session_state['user'].append(user_input)
        # Add file uploader to sidebar
        # 如果有上传的文件：则读取 -- 支持图像、视频、音频
        if uploaded_file:
            file_bytes = uploaded_file.read()
            file_type = uploaded_file.type
            if 'image' in file_type:
                st.image(file_bytes, caption='Uploaded Image')
            elif 'video' in file_type:
                st.video(file_bytes, caption='Uploaded Video')
            elif 'audio' in file_type:
                st.audio(file_bytes, caption='Uploaded Audio')
            # Save the file to a temporary location and get the path
            # 上传文件才保存 -- 所以咱们的 tmpdir 是空
            file_path = os.path.join(root_dir, uploaded_file.name)
            with open(file_path, 'wb') as tmpfile:
                tmpfile.write(file_bytes)
            st.write(f'File saved at: {file_path}')
            # 修改 prompt
            user_input = '我上传了一个图像，路径为: {file_path}. {user_input}'.format(
                file_path=file_path, user_input=user_input)
        
        # 这里就是和 Model 交互了
        agent_return = st.session_state['chatbot'].chat(user_input)
        st.session_state['assistant'].append(copy.deepcopy(agent_return))
        logger.info(agent_return.inner_steps)
        st.session_state['ui'].render_assistant(agent_return)

修改图片的话，在这个路径下放自己的图就可以。
在这里插入图片描述
OK 这基本就是主要的脉络了里面两个重点类的实现 SessionState, StreamlitUI

 if 'ui' not in st.session_state:
      session_state = SessionState() # 初始化一个会话状态
      session_state.init_state()
      st.session_state['ui'] = StreamlitUI(session_state) # 初始UI

可以自己打开看看

我关心的部分都在 import 环节看到了这里暂时不做展开，有空再搞

把简单的先搞了：唔感觉没啥说的 — 就是建立和清除状态

assistant
user
action_list：这里可以增加动作
model_map：这里可以增加模型
plugin_actions：这是干啥的

class SessionState:

    def init_state(self):
        """Initialize session state variables."""
        st.session_state['assistant'] = []
        st.session_state['user'] = []

        #action_list = [PythonInterpreter(), GoogleSearch()]
        action_list = [PythonInterpreter()]
        st.session_state['plugin_map'] = {
            action.name: action
            for action in action_list
        }
        st.session_state['model_map'] = {}
        st.session_state['model_selected'] = None
        st.session_state['plugin_actions'] = set()

    def clear_state(self):
        """Clear the existing session state."""
        st.session_state['assistant'] = []
        st.session_state['user'] = []
        st.session_state['model_selected'] = None
        if 'chatbot' in st.session_state:
            st.session_state['chatbot']._session_history = []

这里做个尝试：

让 gpt api 可以跑起来：

我觉得在环境变量里添加 api key 应该就可以，我手里恰好有个 Key
在这里插入图片描述

嘶看这个意思是从环境变量获得 API key 呀

pytho 代码里看下

>>> import os
>>> os.getenv('OPENAI_API_KEY')
是正确的

那就是 api key 用不了 — 在我自己的平台测试 api key 可以正常使用。

在公司提供的平台不能访问，应该是没有代理的原因吧 — 是的

haidizym

关注

2
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
第二期书生浦语大模型实战营第六次课程笔记----Lagent & AgentLego 智能体应用搭建

第二期书生浦语大模型实战营第六次课程笔记----Lagent & AgentLego 智能体应用搭建
复制链接

扫一扫