万字长文深度解析规划框架:HuggingGPT

HuggingGPT是一个结合了ChatGPT和Hugging Face平台上的各种专家模型,以解决复杂的AI任务,可以认为他是一种结合任务规划和工具调用两种Agent工作流的框架。它的工作流程主要分为以下几个步骤:

  • 任务规划:使用ChatGPT分析用户的请求,理解他们的意图,并将其分解为可能可解决的任务。
  • 模型选择:为了完成规划的任务,ChatGPT根据模型的描述选择托管在Hugging Face上的专家模型。
  • 任务执行:调用并执行每个选定的模型,然后将结果返回给ChatGPT。
  • 响应生成:最后,使用ChatGPT整合所有模型的预测结果,并生成响应。

在这里插入图片描述

光说不练假把式,我们先尝试运行,然后逐步分析各个阶段的Prompt设计和代码设计。

1. 运行

下载Repo git clone https://github.com/microsoft/JARVIS.git

1.1 安装依赖

安装server依赖

bash
 代码解读

cd JARVIS/hugginggpt/server
conda create -n jarvis python=3.8
conda activate jarvis
pip install -r requirements.txt

安装前端页面

bash
 代码解读

cd ../web
npm install

注意,requirment.txt中的的werkzeug要更新为Werkzeug==2.2.2,否则Flask会报不兼容问题。这里没有安装pytorch之类的,因为我们不打算在本地下载模型,所需空间过于巨大,直接访问线上的模型。

1.2 修改配置

既然需要在线使用HuggingGPT的模型,那么我们需要到HuggingGPT上申请Token。修改server/configs/config.lite.yaml,更新huggingface token。另外我们要使用本地的LLM模型,需要修改openai->api_key,必须添加sk开头的字符串,不然报错。必须添加local->endpoint, 就是你本地openai的地址。此外,你可能还要修改是否采用续写use_completion和模型。如果无法访问HuggingGPT你还需要添加proxy。

yaml
 代码解读

openai: 
  api_key: sk-xxxx # added
huggingface:
  token: hf_xxx # updated
dev: true
debug: false
log_file: logs/debug.log
model: gpt-3.5-turbo # updated
use_completion: false # updated
inference_mode: huggingface # local, huggingface or hybrid, prefer hybrid
local_deployment: minimal # minimal, standard or full, prefer full
num_candidate_models: 5
max_description_length: 100
proxy: http://127.0.0.1:7890 # optional: your proxy server "http://ip:port"
local:
  endpoint: http://localhost:11434 # updated
...

此外,还需要修改server/awsome_chat.py,添加API_KEY否则也无法运行本地LLM。

python
 代码解读

if API_TYPE == "local":
     API_ENDPOINT = f"{config['local']['endpoint']}/v1/{api_name}"
+    API_KEY = config['openai']['api_key']

1.2 运行

开始运行

bash
 代码解读

python  --config configs/config.lite.yaml --mode server
npm run dev

然后我们打开浏览器http://localhost:9999/#/, 出现类似下图的窗口。

在这里插入图片描述

输入类似

describe the image /examples/c.jpg.

其中examples位于是hugginggpt/server/public/examples/,所以如果你要测试自己的图片,可以考虑将图片放在这儿。会输出类似如下图的结果。

在这里插入图片描述

2. 分析

在文章开头我们有说过,任务是分为规划任务、选择模型、执行任务和生成响应。那么我们先从任务规划看起。

2.1 任务规划

任务规划需要LLM进行推理分解任务,对于这样一个将HuggingFace当做调用工具的框架,我们要如何设计Prompt?几个原则

  1. 说明任务
  2. 说明任务的输入输出
  3. Few-Shot示例
  4. 上下文
  5. 用户输入

此外,我们还需要将HuggingFace所包含的API给到LLM,进行推理用户问题所需任务步骤。我们当然不可能将所有HuggingFace上的Model都提供给LLM,所以我们提供任务类型,huggingface上大约有19个任务类型, 其中15个NLP任务类型,2个Audio任务类型,3个CV的任务类型。

我觉得这里通过任务类型缩小LLM选择工具选择范围,之后再通过任务类型,然后再让LLM选择具体的模型,相当于一种摘要技术,从大类选择,在缩小到具体选择。你要知道hugginggpt在p0_models.jsonl中缓存了大约673个任务,你不可能将他们所有的描述都发送给LLM,它包含2765000个字符。

说明任务,在输出上对LLM有强烈的要求,除了要求是JSON,而且要求推理各个任务的依赖关系,并且填充类似的JSON输出。然而我要说的这种复杂的输出要求,当前只能用于Demo,否则你会遇到非常多的解析,无法获得想要的JSON格式,或者丢失特定的字段

json
 代码解读

The AI assistant can parse user input to several tasks: [{"task": task, "id": task_id, "dep": dependency_task_id, "args": {"text": text or <GENERATED>-dep_id, "image": image_url or <GENERATED>-dep_id, "audio": audio_url or <GENERATED>-dep_id}}]. 

说明任务的输入和输出,除了上文说的解释任务依赖关系和如何生成tasks之外,这里设定了task必须是HuggingFace支持的这些类别,args必须是text、imag和audio。还是老话,要求越多失败越多,我就遇到args偶尔缺失,task偶尔不对的问题。

json
 代码解读

The special tag "<GENERATED>-dep_id" refer to the one generated text/image/audio in the dependency task (Please consider whether the dependency task generates resources of this type.) and "dep_id" must be in "dep" list. The "dep" field denotes the ids of the previous prerequisite tasks which generate a new resource that the current task relies on. The "args" field must in ["text", "image", "audio"], nothing else. The task MUST be selected from the following options: "token-classification", "text2text-generation", "summarization", "translation", "question-answering", "conversational", "text-generation", "sentence-similarity", "tabular-classification", "object-detection", "image-classification", "image-to-image", "image-to-text", "text-to-image", "text-to-video", "visual-question-answering", "document-question-answering", "image-segmentation", "depth-estimation", "text-to-speech", "automatic-speech-recognition", "audio-to-audio", "audio-classification", "canny-control", "hed-control", "mlsd-control", "normal-control", "openpose-control", "canny-text-to-image", "depth-text-to-image", "hed-text-to-image", "mlsd-text-to-image", "normal-text-to-image", "openpose-text-to-image", "seg-text-to-image". 

让LLM推理规划,这里有个魔法Think step by step

bash
 代码解读

There may be multiple tasks of the same type. Think step by step about all the tasks needed to resolve the user's request. Parse out as few tasks as possible while ensuring that the user request can be resolved. Pay attention to the dependencies and order among tasks. If the user input can't be parsed, you need to reply empty JSON [], otherwise you must return JSON directly.

设定Few shot examples,大约有6个,考虑阅读体验,这里只放一个,更多的Few shot examples位于hugginggpt/server/demos/demo_parse_task.json

json
 代码解读

[
    {
        "role": "user",
        "content": "Give you some pictures e1.jpg, e2.png, e3.jpg, help me count the number of sheep?"
    },
    {
        "role": "assistant",
        "content": "[{"task": "image-to-text", "id": 0, "dep": [-1], "args": {"image": "e1.jpg" }}, {"task": "object-detection", "id": 1, "dep": [-1], "args": {"image": "e1.jpg" }}, {"task": "visual-question-answering", "id": 2, "dep": [1], "args": {"image": "<GENERATED>-1", "text": "How many sheep in the picture"}} }}, {"task": "image-to-text", "id": 3, "dep": [-1], "args": {"image": "e2.png" }}, {"task": "object-detection", "id": 4, "dep": [-1], "args": {"image": "e2.png" }}, {"task": "visual-question-answering", "id": 5, "dep": [4], "args": {"image": "<GENERATED>-4", "text": "How many sheep in the picture"}} }}, {"task": "image-to-text", "id": 6, "dep": [-1], "args": {"image": "e3.jpg" }},  {"task": "object-detection", "id": 7, "dep": [-1], "args": {"image": "e3.jpg" }}, {"task": "visual-question-answering", "id": 8, "dep": [7], "args": {"image": "<GENERATED>-7", "text": "How many sheep in the picture"}}]"
    },
    ...
]

最后上下文和用户输入,没什么好说的

json
 代码解读

The chat log [ {{context}} ] may contain the resources I mentioned. Now I input { {{input}} }. Pay attention to the input and output types of tasks and the dependencies between tasks.

任务规划阶段的Promt就已经结束了,代码实现上这一段较为简单,核心流程是chat->chat_huggingface->parse_task->send_request。其中parse_task负责组装prompt和构造open ai API所需的data参数。这里有一个简易的对话上下文移动窗口,就是计算历史对话文本的tokens,如果超过最大token就尝试pop掉最后一个,这是移除最近的对话记录保留开始的策略,当然你也可以尝试其他策略。

python
 代码解读

    # cut chat logs
    start = 0
    while start <= len(context):
        history = context[start:]
        prompt = replace_slot(parse_task_prompt, {
            "input": input,
            "context": history 
        })
        messages.append({"role": "user", "content": prompt})
        history_text = "<im_end>\nuser<im_start>".join([m["content"] for m in messages])
        num = count_tokens(LLM_encoding, history_text)
        if get_max_context_length(LLM) - num > 800:
            break
        messages.pop()
        start += 2

2.2 模型选择

在上文parse_task完成后,在chat_huggingface中就会对返回的结果进行任务解析,此时由于LLM回复的不确定性,有时候你会遇到失败无法解析或者丢失字段,一个成功的任务会返回类似如下的JSON。

json
 代码解读
[
  {'task': 'object-detection', 'id': 0, 'dep': [-1], 'args': {'image': '/examples/a.jpg'}}, 
  {'task': 'image-to-image', 'id': 1, 'dep': [-1], 'args': {'image': '/examples/a.jpg'}}
]

根据task信息,在run_task中根据task值去hugginggpt/server/data/p0_models.jsonl这个json中去搜索前10的具体模型,最后根据类型查找到可用模型如下。

python
 代码解读

{'local': [], 'huggingface': ['hustvl/yolos-tiny', 'microsoft/table-transformer-structure-recognition', 'facebook/detr-resnet-50', 'TahaDouaji/detr-doc-table-detection', 'hustvl/yolos-small']}

这里在选择模型时候,需要修改一下代码否则你几乎获取不到任何一个可用模型。

进入函数choose_model再次构造Prompt,让LLM帮助决策哪个模型更好, 以下prompt只是示意,因为源代码将其转换为了role content组成的arraylist。

vbnet
 代码解读

System: Given the user request and the parsed tasks, the AI assistant helps the user to select a suitable model from a list of models to process the user request. The assistant should focus more on the description of the model and find the model that has the most potential to solve requests and tasks. Also, prefer models with local inference endpoints for speed and stability. 
[
User: {{input}},
Assistant: {{task}}..
]
User: Please choose the most suitable model from {{metas}} for the task {{task}}. The output must be in a strict JSON format: {"id": "id", "reason": "your detail reasons for the choice"}, the id in JSON must be the one provided in the model description.

输出如下,已经给出了最佳的模型和原因,接下来进入模型执行。

json
 代码解读

{"id": "facebook/detr-resnet-50", "reason": "The model has the highest number of likes, and the description sounds promising with end-to-end detection using transformers. Also, it has a local inference endpoint for faster access."}

2.3 模型执行

上面模型已经选择,输入也有,接下来就是执行模型,这一步比较简单,就是构造huggingface api的请求。

python
 代码解读

inference_result = model_inference(best_model_id, args, hosted_on, command['task'])

HuggingFace API的请求较为简单,只要你拥有Token,你甚至可以通过curl直接运行。

kotlin
 代码解读

curl --location 'https://api-inference.huggingface.co/models/facebook/detr-resnet-50-panoptic' \
--header 'Authorization: Bearer replacewithyourowntoken' \
--header 'Content-Type: image/jpeg' \
--data '@/Users/xxxx/dc579d59_track_0.jpg'

由于可能包含多个任务,所以任务是通过thread并发执行的,最后通过queue进行收取结果如下所示。

python
 代码解读

{'generated image': '/images/a59b.jpg', 'predicted': [{'score': 0.9699670672416687, 'label': 'potted plant', 'box': {'xmin': 0, 'ymin': 240, 'xmax': 187, 'ymax': 484}}, {'score': 0.9995023012161255, 'label': 'cat', 'box': {'xmin': 165, 'ymin': 59, 'xmax': 645, 'ymax': 522}}]}
DEBUG:__main__:{1: {'task': {'task': 'image-to-image', 'id': 1, 'dep': [-1], 'args': {'image': 'public//examples/a.jpg'}}, 'inference result': {'error': 'Model lambdalabs/sd-image-variations-diffusers is currently loading', 'estimated_time': 248.20472717285156}, 'choose model result': {'id': 'lambdalabs/sd-image-variations-diffusers', 'reason': "The model has the most likes and it's also the only model with the tag 'stable-diffusion', which indicates it's a robust and popular choice for various image tasks."}}, 0: {'task': {'task': 'object-detection', 'id': 0, 'dep': [-1], 'args': {'image': 'public//examples/a.jpg'}}, 'inference result': {'generated image': '/images/a59b.jpg', 'predicted': [{'score': 0.9699670672416687, 'label': 'potted plant', 'box': {'xmin': 0, 'ymin': 240, 'xmax': 187, 'ymax': 484}}, {'score': 0.9995023012161255, 'label': 'cat', 'box': {'xmin': 165, 'ymin': 59, 'xmax': 645, 'ymax': 522}}]}, 'choose model result': {'id': 'facebook/detr-resnet-50', 'reason': 'The model has the highest number of likes, and the description sounds promising with end-to-end detection using transformers. Also, it has a local inference endpoint for faster access.'}}}

2.4 响应生成

在chat_huggingface中调用response_results,给定input和收集的results来构造类似如下所示的Prompt,其中processes就是上文模型执行的结果。该prompt先给定输入的问题和模型执行的结果,然后要求LLM仔细甄别输出的推测结果和问题是否匹配,然后进行作答。最后要求输出的它的工作流,包括使用的模型和推测结果和生成的文件位置等信息。

vbscript
 代码解读
System: #4 Response Generation Stage: With the task execution logs, the AI assistant needs to describe the process and inference results.
User: Look at /examples/a.jpg picture, can you tell me how many objects in the picture? Give me a picture similar to this one.
Assitant: Before give you a response, I want to introduce my workflow for your request, which is shown in the following JSON data: {{processes}}. Do you have any demands regarding my response?
User: Yes. Please first think carefully and directly answer my request based on the inference results. Some of the inferences may not always turn out to be correct and require you to make careful consideration in making decisions. Then please detail your workflow including the used models and inference results for my request in your friendly tone. Please filter out information that is not relevant to my request. Tell me the complete path or urls of files in inference results. If there is nothing in the results, please tell me you can't make it.

最终LLM输入如下响应

json
 代码解读

Sure! Based on the inference results, there are two objects in the picture "/examples/a.jpg". They are a potted plant and a cat.
For your second request, after reviewing the available options, I've selected a similar image using a model with the ID "facebook/detr-resnet-50". This model identified a similar image with the path: "/images/a59b.jpg".
The workflow goes like this:
1. Object Detection: I used a model called "facebook/detr-resnet-50" to detect objects in your image. It's an end-to-end model that uses transformers and has shown great performance. It successfully identified two objects for us.
2. Image Similarity: To find a similar image, I picked another pre-trained model called "lambdalabs/sd-image-variations-diffusers". This model is known to create variations of input images while keeping the main content stable. It's a diffusion-based model, and these types of models are good at generating new content with preserving the essence of the original image. 
However, I need to let you know that getting the similar image didn't work this time. The inference took longer than expected, which is not uncommon for some complex models, and eventually timed out. So, unfortunately, I couldn't generate a similar image within the given constraints.
I hope this helps! Let me know if you'd like more details on any specific steps or have additional questions.

3. 总结

纵观整个流程其实主要还是Prompt的设计,如何设计更好的Prompt生成任务规划、模型选择和响应生成。在这种中间就是各种结果的解析工作,以及调用各种工具完成任务。值得一说的是,我在实验过程中,遇到各种各样的错误,_**尤其是第一阶段的任务规划中的任务输出,它对LLM的要求非常高,如果你只是本地的小模型或许是难以胜任的。第二,当你有很多工具选择,不妨将他们先进行分类,让LLM先从分类中选择,然后筛选出缩小后具体的工具列表,再次给到LLM选择最优最匹配的工具

如何学习AI大模型 ?

“最先掌握AI的人,将会比较晚掌握AI的人有竞争优势”。

这句话,放在计算机、互联网、移动互联网的开局时期,都是一样的道理。

我在一线互联网企业工作十余年里,指导过不少同行后辈。帮助很多人得到了学习和成长。

我意识到有很多经验和知识值得分享给大家,故此将并将重要的AI大模型资料包括AI大模型入门学习思维导图、精品AI大模型学习书籍手册、视频教程、实战学习等录播视频免费分享出来。【保证100%免费】🆓

对于0基础小白入门:

如果你是零基础小白,想快速入门大模型是可以考虑的。

一方面是学习时间相对较短,学习内容更全面更集中。
二方面是可以根据这些资料规划好学习计划和方向。

😝有需要的小伙伴,可以VX扫描下方二维码免费领取🆓

👉1.大模型入门学习思维导图👈

要学习一门新的技术,作为新手一定要先学习成长路线图,方向不对,努力白费。

对于从来没有接触过AI大模型的同学,我们帮你准备了详细的学习成长路线图&学习规划。可以说是最科学最系统的学习路线,大家跟着这个大的方向学习准没问题。(全套教程文末领取哈)
在这里插入图片描述

👉2.AGI大模型配套视频👈

很多朋友都不喜欢晦涩的文字,我也为大家准备了视频教程,每个章节都是当前板块的精华浓缩。

在这里插入图片描述
在这里插入图片描述

👉3.大模型实际应用报告合集👈

这套包含640份报告的合集,涵盖了AI大模型的理论研究、技术实现、行业应用等多个方面。无论您是科研人员、工程师,还是对AI大模型感兴趣的爱好者,这套报告合集都将为您提供宝贵的信息和启示。(全套教程文末领取哈)

在这里插入图片描述

👉4.大模型落地应用案例PPT👈

光学理论是没用的,要学会跟着一起做,要动手实操,才能将自己的所学运用到实际当中去,这时候可以搞点实战案例来学习。(全套教程文末领取哈)

在这里插入图片描述

👉5.大模型经典学习电子书👈

随着人工智能技术的飞速发展,AI大模型已经成为了当今科技领域的一大热点。这些大型预训练模型,如GPT-3、BERT、XLNet等,以其强大的语言理解和生成能力,正在改变我们对人工智能的认识。 那以下这些PDF籍就是非常不错的学习资源。(全套教程文末领取哈)
img

在这里插入图片描述

👉6.大模型面试题&答案👈

截至目前大模型已经超过200个,在大模型纵横的时代,不仅大模型技术越来越卷,就连大模型相关的岗位和面试也开始越来越卷了。为了让大家更容易上车大模型算法赛道,我总结了大模型常考的面试题。(全套教程文末领取哈)

在这里插入图片描述
👉学会后的收获:👈
基于大模型全栈工程实现(前端、后端、产品经理、设计、数据分析等),通过这门课可获得不同能力;

能够利用大模型解决相关实际项目需求: 大数据时代,越来越多的企业和机构需要处理海量数据,利用大模型技术可以更好地处理这些数据,提高数据分析和决策的准确性。因此,掌握大模型应用开发技能,可以让程序员更好地应对实际项目需求;

基于大模型和企业数据AI应用开发,实现大模型理论、掌握GPU算力、硬件、LangChain开发框架和项目实战技能, 学会Fine-tuning垂直训练大模型(数据准备、数据蒸馏、大模型部署)一站式掌握;

能够完成时下热门大模型垂直领域模型训练能力,提高程序员的编码能力: 大模型应用开发需要掌握机器学习算法、深度学习

这份完整版的 AI 大模型学习资料已经上传CSDN,朋友们如果需要可以微信扫描下方CSDN官方认证二维码免费领取【保证100%免费

😝有需要的小伙伴,可以Vx扫描下方二维码免费领取🆓

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值