如何从模型中返回结构化数据：实现精准的数据提取

最新推荐文章于 2025-05-30 22:44:54 发布

aehrutktrjk

最新推荐文章于 2025-05-30 22:44:54 发布

阅读量2k

点赞数 22

文章标签： python flask 开发语言

本文链接：https://blog.csdn.net/aehrutktrjk/article/details/144406362

版权

引言

在现代应用中，从文本中提取结构化数据以便插入到数据库或用于其他下游系统变得越来越重要。无论是为了提高数据可用性还是改善数据分析的准确性，结构化数据的获取都至关重要。本文将介绍如何使用AI模型返回符合特定模式的输出，并提供实际的代码示例。

主要内容

1. 使用 `with_structured_output()` 方法

支持的模型

有很多支持这个方法的模型，包括OpenAI、Anthropic、Azure、Google等。这个方法利用模型的原生API功能来实现输出的结构化。

定义输出模式

你可以使用 TypedDict 类、JSON Schema 或 Pydantic 类来定义模式。本文将重点介绍使用 Pydantic 类的优势：确保模型生成的输出被验证，如缺少字段或字段类型错误将会报错。

Pydantic 示例

from langchain_core.pydantic_v1 import BaseModel, Field
from typing import Optional

class Joke(BaseModel):
    """Joke to tell user."""
    setup: str = Field(description="The setup of the joke")
    punchline: str = Field(description="The punchline to the joke")
    rating: Optional[int] = Field(default=None, description="How funny the joke is, from 1 to 10")

# 使用API代理服务提高访问稳定性
llm = ChatOpenAI(base_url="{AI_URL}")

structured_llm = llm.with_structured_output(Joke)
response = structured_llm.invoke("Tell me a joke about cats")
print(response)

2. 使用 `TypedDict` 或 `JSON Schema`

如果不需要Pydantic的验证功能，可以使用 TypedDict 或者 JSON Schema 来定义输出模式。

from typing_extensions import Annotated, TypedDict

class Joke(TypedDict):
    setup: Annotated[str, ..., "The setup of the joke"]
    punchline: Annotated[str, ..., "The punchline of the joke"]
    rating: Annotated[Optional[int], None, "How funny the joke is, from 1 to 10"]

structured_llm = llm.with_structured_output(Joke)
response = structured_llm.invoke("Tell me a joke about cats")
print(response)