LLMs大模型plugin开发实战

最新推荐文章于 2025-03-21 10:24:39 发布

m0_49380401

最新推荐文章于 2025-03-21 10:24:39 发布

阅读量1.8k

点赞数

文章标签： transformer 深度学习自然语言处理

本文链接：https://blog.csdn.net/m0_49380401/article/details/132306957

版权

一、概述

ChatGPT是通用语言大模型，如果用户想要在与大模型进行交互时能够使用到企业私有的数据，那么可以通过开发plugin（插件）的方式来实现，另外GPT3.5模型的训练数据是截止到2021年9月，如果想让模型能够访问之后的数据，也可以借助plugin。

下面是关于plugin与大模型交互的架构图：

模型通过读取plugin manifest文件信息来决定如何使用plugin，在这个文件中，字段“description_for_model”的内容如下，表明当用户请求他们的个人信息时，模型应该调用plugin来从用户文档中搜索诸如用户邮件之类的信息，而用户文档就是提供给模型使用的私有数据，这段内容可以看做是给GPT模型的prompt（上下文信息）：

Plugin for searching through the user's documents (such as files, emails, and more) to find answers to questions and retrieve relevant information. Use it whenever a user asks something that might be found in their personal information.

二、用户与模型交互调用plugin的流程解析

在下面的流程图中，黄色背景框表示定制开发的plugin运行在FastAPI server上，Datastore用于存储用户文档信息，文档通过/upsert操作存储到datastore，通常表示为vector space。主要流程步骤如下：

-用户通过API发送查询请求(/query)给模型

-模型通过“description_for_model”指定的内容来决定是否需要调用plugin

-当用户提供的prompt表明需要调用plugin时，模型通过API访问datastore里的数据

-通过API把搜索到的与用户prompt匹配的数据返回给模型

-返回给模型的数据对于模型来说就是新的上下文信息（new context），然后模型根据原先用户输入的prompt+new context进行处理，然后把结果返回给用户

三、构建datastore解析

有很多方式来构建存储用户文档数据的datastore，包括本地和远程的方式。这里使用llama来构建一个datastore，首先需要导入以下packages：

from models.models import DocumentChunk, DocumentChunkMetadata, DocumentChunkWithScore, DocumentMetadataFilter

from llama_index.indices.base import BaseGPTIndex

from llama_index.indices.vector_store.base import GPTVectorStoreIndex

from llama_index.indices.query.schema import QueryBundle

from llama_index.response.schema import Response

from llama_index.data_structs.node_v2 import Node, DocumentRelationship, NodeWithScore

from llama_index.indices.registry import INDEX_STRUCT_TYPE_TO_INDEX_CLASS

from llama_index.data_structs.struct_type import IndexStructType

from llama_index.indices.response.builder import ResponseMode

存储数据的原理是先把本地文档数据创建为document chunk，然后把它转换为node，插入到datastore中并建立index以方便查询。

-执行upsert操作把本地文档数据插入到llama datastore中：

-执行query操作从datastore中搜索数据：

-也可以根据id等从datastore中删除不需要的数据

下面是存储到datastore中的文档数据样例：

调用upsert方法把上述文档信息插入到datastore中，如果没有给出id，默认使用uuid来生成随机id：

四、使用datastore查询数据

上面通过upsert操作把本地JSON文件数据写入到datastore之后，可以使用以下代码来模拟用户向大模型发出查询请求：

What's Bob's phone number?

查询返回结果如下，可以看到与用户prompt匹配的前两条记录的score最高，分别为0.943和0.928：

result is :[DocumentChunkWithScore(id='789_0', text="This is Bob's phone number: 123-456-7891", metadata=DocumentChunkMetadata(source=<Source.email: 'email'>, source_id='890', url=Non

e, created_at='2022-01-02T13:00:00Z', author='Alice', document_id='789'), embedding=None, score=0.9439842903203779), DocumentChunkWithScore(id='222_0', text="This is another Bob's phon

e number: 123-456-7893", metadata=DocumentChunkMetadata(source=<Source.email: 'email'>, source_id='8902', url=None, created_at='2022-01-05T13:00:00Z', author='Alice', document_id='222'

), embedding=None, score=0.9281880484234409), DocumentChunkWithScore(id='111_0', text="This is Mike's phone number: 123-456-7892", metadata=DocumentChunkMetadata(source=<Source.email:

'email'>, source_id='8901', url=None, created_at='2022-01-03T13:00:00Z', author='Alice', document_id='111'), embedding=None, score=0.8605654579483485)]