如何从大语言模型获取JSON输出：实用指南

数智笔记

于 2024-08-19 18:22:37 发布

阅读量1k

点赞数 28

分类专栏：大语言模型文章标签：语言模型 json windows

本文链接：https://blog.csdn.net/wjjc1017/article/details/141332000

版权

大语言模型专栏收录该内容

190 篇文章 24 订阅

订阅专栏

如何从大语言模型获取JSON输出：实用指南

通过Llama.cpp或Gemini API强制输出JSON的教程

Photo by Etienne Girardet on Unsplash

大型语言模型（LLMs）在生成文本方面表现出色，但获得像 JSON 这样的结构化输出通常需要巧妙的提示，并希望 LLM 能够理解。值得庆幸的是，JSON 模式在 LLM 框架和服务中变得越来越普遍。这让你可以定义你想要的确切输出格式。

这篇文章将深入探讨使用 JSON 模式的受限生成。我们将使用一个复杂、嵌套且现实的 JSON 模式示例，引导 LLM 框架/API，如 Llama.cpp 或 Gemini API，生成结构化数据，特别是旅游地点信息。这建立在之前一篇关于使用 Guidance 进行受限生成的文章基础上，但更注重于更广泛采用的 JSON 模式。

虽然 JSON 模式的功能比 Guidance 限制更多，但其更广泛的支持使其更易于访问，尤其是在云端 LLM 提供商中。

在一个个人项目中，我发现虽然在 Llama.cpp 上使用 JSON 模式相对简单，但在 Gemini API 上实现却需要一些额外的步骤。本文将分享这些解决方案，帮助你有效利用 JSON 模式。

我们的 JSON 模式：旅游地点文档

我们的示例模式表示一个 TouristLocation。它是一个非平凡的结构，包含嵌套对象、列表、枚举和各种数据类型，如字符串和数字。

以下是一个简化版本：

{  
"name": "string",  
"location_long_lat": ["number", "number"],  
"climate_type": {"type": "string", "enum": ["tropical", "desert", "temperate", "continental", "polar"]},  
"activity_types": ["string"],  
"attraction_list": [  
{  
"name": "string",  
"description": "string"  
}  
],  
"tags": ["string"],  
"description": "string",  
"most_notably_known_for": "string",  
"location_type": {"type": "string", "enum": ["city", "country", "establishment", "landmark", "national park", "island", "region", "continent"]},  
"parents": ["string"]  
}

你可以手动编写这种类型的模式，也可以使用 Pydantic 库生成。以下是如何在一个简化的示例中做到这一点：

from typing import List  
from pydantic import BaseModel, Field  
  
class TouristLocation(BaseModel):  
    """旅游地点的模型"""  
  
    high_season_months: List[int] = Field(  
        [], description="最受欢迎的游客访问月份列表（1-12）"  
    )  
  
    tags: List[str] = Field(  
        ...,  
        description="描述地点的标签列表（例如：无障碍、可持续、阳光明媚、便宜、昂贵）",  
        min_length=1,  
    )  
    description: str = Field(..., description="该地点的文本描述")  
  
# 示例用法和模式输出  
location = TouristLocation(  
    high_season_months=[6, 7, 8],  
    tags=["beach", "sunny", "family-friendly"],  
    description="一个美丽的海滩，白色的沙子和清澈的蓝色海水。",  
)  
  
schema = location.model_json_schema()  
print(schema)

这段代码定义了一个使用 Pydantic 的 TouristLocation 数据类的简化版本。它有三个字段：

然后，代码创建 TouristLocation 类的一个实例，并使用 model_json_schema() 获取该模型的 JSON 模式表示。该模式定义了该类所期望的数据的结构和类型。

model_json_schema() 返回：

{'description': '旅游地点模型',  
 'properties': {'description': {'description': '该地点的文字描述',  
                                'title': '描述',  
                                'type': 'string'},  
                'high_season_months': {'default': [],  
                                       'description': '访问该地点最多的月份列表（1-12）',  
                                       'items': {'type': 'integer'},  
                                       'title': '旅游旺季月份',  
                                       'type': 'array'},  
                'tags': {'description': '描述该地点的标签列表（如：无障碍、可持续、阳光明媚、便宜、昂贵）',  
                         'items': {'type': 'string'},  
                         'minItems': 1,  
                         'title': '标签',  
                         'type': 'array'}},  
 'required': ['tags', 'description'],  
 'title': '旅游地点',  
 'type': 'object'}

现在我们有了我们的模式，接下来看看我们如何执行它。首先是在 Llama.cpp 中及其 Python 包装器，其次是使用 Gemini 的 API。

方法 1：使用 Llama.cpp 的直接方法

Llama.cpp 是一个用于在本地运行 Llama 模型的 C++ 库。它对初学者友好，并且拥有活跃的社区。我们将通过它的 Python 包装器使用它。

以下是如何生成 TouristLocation 数据的示例代码：

# 导入及其他  

# 模型初始化：  
checkpoint = "lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF"  

model = Llama.from_pretrained(  
    repo_id=checkpoint,  
    n_gpu_layers=-1,  
    filename="*Q4_K_M.gguf",  
    verbose=False,  
    n_ctx=12_000,  
)  

messages = [  
    {  
        "role": "system",  
        "content": "You are a helpful assistant that outputs in JSON."  
        f"Follow this schema {TouristLocation.model_json_schema()}",  
    },  
    {"role": "user", "content": "Generate information about Hawaii, US."},  
    {"role": "assistant", "content": f"{location.model_dump_json()}"},  
    {"role": "user", "content": "Generate information about Casablanca"},  
]  
response_format = {  
    "type": "json_object",  
    "schema": TouristLocation.model_json_schema(),  
}  

start = time.time()  

outputs = model.create_chat_completion(  
    messages=messages, max_tokens=1200, response_format=response_format  
)  

print(outputs["choices"][0]["message"]["content"])  

print(f"Time: {time.time() - start}")

这段代码首先导入必要的库并初始化 LLM 模型。然后，它定义了一系列与模型进行对话的消息，包括一条系统消息，指示模型以 JSON 格式输出，遵循特定模式，用户请求关于夏威夷和卡萨布兰卡的信息，以及使用指定模式的助手响应。

Llama.cpp 在后台使用无上下文语法来限制结构并生成有效的 JSON 输出，以描述一个新的城市。

在输出中，我们得到了以下生成的字符串：

{'activity_types': ['shopping', 'food and wine', 'cultural'],  
 'attraction_list': [{'description': 'One of the largest mosques in the world '  
                                     'and a symbol of Moroccan architecture',  
                      'name': 'Hassan II Mosque'},  
                     {'description': 'A historic walled city with narrow '  
                                     'streets and traditional shops',  
                      'name': 'Old Medina'},  
                     {'description': 'A historic square with a beautiful '  
                                     'fountain and surrounding buildings',  
                      'name': 'Mohammed V Square'},  
                     {'description': 'A beautiful Catholic cathedral built in '  
                                     'the early 20th century',  
                      'name': 'Casablanca Cathedral'},  
                     {'description': 'A scenic waterfront promenade with '  
                                     'beautiful views of the city and the sea',  
                      'name': 'Corniche'}],  
 'climate_type': 'temperate',  
 'description': 'A large and bustling city with a rich history and culture',  
 'location_type': 'city',  
 'most_notably_known_for': 'Its historic architecture and cultural '  
                           'significance',  
 'name': 'Casablanca',  
 'parents': ['Morocco', 'Africa'],  
 'tags': ['city', 'cultural', 'historical', 'expensive']}

然后，可以将其解析为我们的 Pydantic 类的一个实例。

方法 2：克服 Gemini API 的奇特之处

Gemini API 是 Google 管理的 LLM 服务，其文档声称对 Gemini Flash 1.5 的 JSON 模式支持有限。然而，通过一些调整，可以使其正常工作。

以下是使其正常工作的常规说明：

```python
schema = TouristLocation.model_json_schema()  
schema = replace_value_in_dict(schema.copy(), schema.copy())  
del schema["$defs"]  
delete_keys_recursive(schema, key_to_delete="title")  
delete_keys_recursive(schema, key_to_delete="location_long_lat")  
delete_keys_recursive(schema, key_to_delete="default")  
delete_keys_recursive(schema, key_to_delete="default")  
delete_keys_recursive(schema, key_to_delete="minItems")  
  
print(schema)  
  
messages = [  
    ContentDict(  
        role="user",  
        parts=[  
            "你是一个有帮助的助手，输出格式为 JSON。"  
            f"遵循这个架构 {TouristLocation.model_json_schema()}"  
        ],  
    ),  
    ContentDict(role="user", parts=["生成关于美国夏威夷的信息。"]),  
    ContentDict(role="model", parts=[f"{location.model_dump_json()}"]),  
    ContentDict(role="user", parts=["生成关于卡萨布兰卡的信息。"]),  
]  
  
genai.configure(api_key=os.environ["GOOGLE_API_KEY"])  
  
# 使用 `response_mime_type` 配合 `response_schema` 需要Gemini 1.5 Pro模型  
model = genai.GenerativeModel(  
    "gemini-1.5-flash",  
    # 设置 `response_mime_type` 输出为 JSON  
    # 将架构对象传递到 `response_schema` 字段  
    generation_config={  
        "response_mime_type": "application/json",  
        "response_schema": schema,  
    },  
)  
  
response = model.generate_content(messages)  
print(response.text)

以下是克服 Gemini 限制的方法：

用完整定义替换 $ref ： Gemini 在处理架构引用（$ref）时表现不佳。这些引用用于定义嵌套对象时。将它们替换为您架构中的完整定义。

def replace_value_in_dict(item, original_schema):  
    # 来源: https://github.com/pydantic/pydantic/issues/889  
    if isinstance(item, list):  
        return [replace_value_in_dict(i, original_schema) for i in item]  
    elif isinstance(item, dict):  
        if list(item.keys()) == ["$ref"]:  
            definitions = item["$ref"][2:].split("/")  
            res = original_schema.copy()  
            for definition in definitions:  
                res = res[definition]  
            return res  
        else:  
            return {  
                key: replace_value_in_dict(i, original_schema)  
                for key, i in item.items()  
            }  
    else:  
        return item

去除不支持的键： Gemini 目前还不能处理像“title”、“AnyOf”或“minItems”等键。将这些从您的架构中移除。这将导致架构的可读性更差和约束性降低，但如果坚持使用 Gemini，则别无选择。

def delete_keys_recursive(d, key_to_delete):  
    if isinstance(d, dict):  
        # 如果存在，则删除该键  
        if key_to_delete in d:  
            del d[key_to_delete]  
        # 递归处理字典中的所有项目  
        for k, v in d.items():  
            delete_keys_recursive(v, key_to_delete)  
    elif isinstance(d, list):  
        # 递归处理列表中的所有项目  
        for item in d:  
            delete_keys_recursive(item, key_to_delete)