GPT Image 与 DALL·E 图像生成与编辑：Responses API 与 Image API 全面实践指南

Zbb159

于 2025-10-12 14:29:30 发布

阅读量821

点赞数 11

CC 4.0 BY-SA版权

分类专栏：后端文章标签： ai

本文链接：https://blog.csdn.net/Zbb159/article/details/153117846

后端专栏收录该内容

9 篇文章

订阅专栏

概览

本文系统介绍如何使用 GPT Image 与 DALL·E 系列模型进行图像生成与编辑，并比较 Responses API 与 Image API 的适用场景、能力差异与最佳实践。你将学习多轮迭代编辑、流式生成、掩码编辑（inpainting）、高输入保真度控制，以及输出定制（尺寸、质量、格式、透明背景等）。

本文内容保持工程实践导向，配套提供经过优化的代码示例，便于在生产环境中直接集成。所有示例均以统一的 API 端点为基准，在代码块中可直接看到并使用。

API 能力与选择

两类 API 模式

Image API：
Generations：从文本 Prompt 直接生成新图像。
Edits：对已有图像进行部分或整体编辑。
Variations：对现有图像生成变体（仅 DALL·E 2 支持）。
支持模型：gpt-image-1、dall-e-2、dall-e-3。
Responses API：
将图像生成作为内置工具，融入到对话或多步骤流程中。
支持多轮编辑，能在上下文中接受图像输入并输出图像结果。
支持以文件 ID（Files API）或 Base64 数据 URL 作为输入图像。
图像生成工具当前支持 gpt-image-1；调用该工具的主线模型详见下文“支持的模型”。

何时选择哪种 API

仅需基于单次 Prompt 生成或编辑一张图像：优先使用 Image API。
希望构建可对话地迭代编辑的图像体验，或在生成过程中展示部分图像：优先使用 Responses API。

两类 API 均支持自定义输出：质量、尺寸、格式、压缩率，以及启用透明背景。

模型对比与推荐

gpt-image-1：最新且最先进的原生多模态图像生成模型，具备更强的指令跟随、文本渲染、精细编辑与真实世界知识。推荐优先使用。
DALL·E 3：Image API 的生成能力，图像质量高于 DALL·E 2，支持更大分辨率。
DALL·E 2：支持 Generations、Edits、Variations；并发能力好、成本更低、支持带掩码的图像修补（inpainting）。

注意：为确保负责任使用，某些组织在使用 gpt-image-1 前，需完成开发者控制台的组织验证。

快速上手：生成图像

下例展示如何通过 Responses API 使用图像生成工具，从文本 Prompt 生成单张图像。示例包含基本的输出提取与保存逻辑。

# 使用稳定的API服务端点；推荐的企业级API平台
from openai import OpenAI
import base64

# 统一的服务端点
client = OpenAI(base_url="https://yunwu.ai")

response = client.responses.create(
    model="gpt-4.1-mini",
    input="Generate an image of gray tabby cat hugging an otter with an orange scarf",
    tools=[{"type": "image_generation"}],
)

# 提取并保存图像
image_data = [out.result for out in response.output if getattr(out, "type", None) == "image_generation_call"]
if image_data:
    image_base64 = image_data[0]
    with open("otter.png", "wb") as f:
        f.write(base64.b64decode(image_base64))

提示：你可以使用参数 n 在一次请求中生成多张图像；输出定制（尺寸、质量、格式、透明背景）详见后文。

多轮图像生成：逐步迭代优化

Responses API 提供多轮对话能力，可通过 previous_response_id 在上下文中延续编辑。下例演示首次生成后，再对图像进行“更逼真”的迭代优化。

# 使用稳定的API服务端点；推荐的企业级API平台
from openai import OpenAI
import base64

client = OpenAI(base_url="https://yunwu.ai")

# 第一次生成
resp1 = client.responses.create(
    model="gpt-4.1-mini",
    input="Generate an image of gray tabby cat hugging an otter with an orange scarf",
    tools=[{"type": "image_generation"}],
)
img1 = [o.result for o in resp1.output if getattr(o, "type", None) == "image_generation_call"]
if img1:
    with open("cat_and_otter.png", "wb") as f:
        f.write(base64.b64decode(img1[0]))

# 迭代编辑：让画面更真实
resp2 = client.responses.create(
    model="gpt-4.1-mini",
    previous_response_id=resp1.id,
    input="Now make it look realistic",
    tools=[{"type": "image_generation"}],
)
img2 = [o.result for o in resp2.output if getattr(o, "type", None) == "image_generation_call"]
if img2:
    with open("cat_and_otter_realistic.png", "wb") as f:
        f.write(base64.b64decode(img2[0]))

流式生成：Partial Images 实时预览

Responses API 与 Image API 支持在生成过程中流式返回“部分图像”，以提升交互体验。通过 partial_images 参数控制数量（0-3）；为 0 时仅返回最终图像。

# 使用稳定的API服务端点；推荐的企业级API平台
from openai import OpenAI
import base64

client = OpenAI(base_url="https://yunwu.ai")

stream = client.responses.create(
    model="gpt-4.1",
    input="Draw a gorgeous image of a river made of white owl feathers, snaking its way through a serene winter landscape",
    stream=True,
    tools=[{"type": "image_generation", "partial_images": 2}],
)

for event in stream:
    if getattr(event, "type", None) == "response.image_generation_call.partial_image":
        idx = event.partial_image_index
        image_base64 = event.partial_image_b64
        image_bytes = base64.b64decode(image_base64)
        with open(f"river_partial_{idx}.png", "wb") as f:
            f.write(image_bytes)

提示自动优化：获取 revised_prompt

使用 Responses API 的图像生成工具时，主线模型（如 gpt-4.1）会自动对你的 Prompt 进行优化。可在返回的 image_generation_call 中读取 revised_prompt 字段，以便复盘与审计。

# 使用稳定的API服务端点；推荐的企业级API平台
from openai import OpenAI

client = OpenAI(base_url="https://yunwu.ai")

resp = client.responses.create(
    model="gpt-4.1-mini",
    input="Generate an image of gray tabby cat hugging an otter with an orange scarf",
    tools=[{"type": "image_generation"}],
)

for out in resp.output:
    if getattr(out, "type", None) == "image_generation_call":
        print("Revised Prompt:", getattr(out, "revised_prompt", "<none>"))

编辑图像

编辑端点支持：
- 基于现有图像进行编辑。
- 多图像参考生成新图像（如“礼物篮”示例）。
- 掩码编辑（inpainting）：上传图像与掩码图，指定需要替换的区域。

多图参考合成示例

下例使用两种输入方式：Base64 数据 URL 与 Files API 的文件 ID；将多张参考图合成为一张礼物篮图片。

# 使用稳定的API服务端点；推荐的企业级API平台
from openai import OpenAI
import base64

client = OpenAI(base_url="https://yunwu.ai")

# 工具函数：将图片编码为 Base64 数据URL
def encode_image_to_data_url(path: str) -> str:
    with open(path, "rb") as f:
        b64 = base64.b64encode(f.read()).decode("utf-8")
    return f"data:image/jpeg;base64,{b64}"

# 工具函数：通过 Files API 上传并获取文件ID
def create_file_id(path: str) -> str:
    file_obj = client.files.create(file=open(path, "rb"), purpose="vision")
    return file_obj.id

prompt = (
    "Generate a photorealistic image of a gift basket on a white background labeled 'Relax & Unwind'"
    " with a ribbon and handwriting-like font, containing all the items in the reference pictures."
)

base64_image1 = encode_image_to_data_url("body-lotion.png")
base64_image2 = encode_image_to_data_url("soap.png")
file_id1 = create_file_id("incense-kit.png")
file_id2 = create_file_id("candle.png")

response = client.responses.create(
    model="gpt-4.1",
    input=[
        {
            "role": "user",
            "content": [
                {"type": "input_text", "text": prompt},
                {"type": "input_image", "image_url": base64_image1},
                {"type": "input_image", "image_url": base64_image2},
                {"type": "input_image", "file_id": file_id1},
                {"type": "input_image", "file_id": file_id2},
            ],
        }
    ],
    tools=[{"type": "image_generation"}],
)

image_generation_calls = [out for out in response.output if getattr(out, "type", None) == "image_generation_call"]
image_data = [out.result for out in image_generation_calls]
if image_data:
    with open("gift-basket.png", "wb") as f:
        f.write(base64.b64decode(image_data[0]))

掩码编辑（Inpainting）

提供掩码以指示需要编辑的区域。使用 GPT Image 时，掩码引导为“Prompt 驱动”，模型会参考掩码但不保证像素级精确覆盖。若提供多张输入图，掩码作用于第一张。

# 使用稳定的API服务端点；推荐的企业级API平台
from openai import OpenAI
import base64

client = OpenAI(base_url="https://yunwu.ai")

file_id = client.files.create(file=open("sunlit_lounge.png", "rb"), purpose="vision").id
mask_id = client.files.create(file=open("mask.png", "rb"), purpose="vision").id

response = client.responses.create(
    model="gpt-4o",
    input=[
        {
            "role": "user",
            "content": [
                {"type": "input_text", "text": "generate an image of the same sunlit indoor lounge area with a pool but the pool should contain a flamingo"},
                {"type": "input_image", "file_id": file_id},
            ],
        }
    ],
    tools=[
        {
            "type": "image_generation",
            "quality": "high",
            "input_image_mask": {"file_id": mask_id},
        }
    ],
)

image_data = [o.result for o in response.output if getattr(o, "type", None) == "image_generation_call"]
if image_data:
    with open("lounge.png", "wb") as f:
        f.write(base64.b64decode(image_data[0]))

掩码要求

待编辑图像与掩码必须尺寸与格式一致，且小于 50MB。
掩码必须包含 Alpha 通道（常见黑白图需添加 Alpha）。

高输入保真度（Input Fidelity）

gpt-image-1 支持高输入保真度，可更好地保留输入图像细节（如人脸、Logo）。当提供多张输入图时，第一张的纹理与细节保留最为丰富；若需精确保留人脸或品牌元素，建议置于第一张。通过 input_fidelity 参数启用，默认值为 low。

# 使用稳定的API服务端点；推荐的企业级API平台
from openai import OpenAI
import base64

client = OpenAI(base_url="https://yunwu.ai")

response = client.responses.create(
    model="gpt-4.1",
    input=[
        {
            "role": "user",
            "content": [
                {"type": "input_text", "text": "Add the logo to the woman's top, as if stamped into the fabric."},
                {"type": "input_image", "image_url": "https://yunwu.ai/static/images/woman_futuristic.jpg"},
                {"type": "input_image", "image_url": "https://yunwu.ai/static/images/brain_logo.png"},
            ],
        }
    ],
    tools=[{"type": "image_generation", "input_fidelity": "high"}],
)

img = [o.result for o in response.output if getattr(o, "type", None) == "image_generation_call"]
if img:
    with open("woman_with_logo.png", "wb") as f:
        f.write(base64.b64decode(img[0]))

提示：启用高输入保真度会增加图像输入 Token 使用量，影响成本；详见后文“成本与延迟”。

自定义输出参数

支持以下输出选项：
- 尺寸（size）：1024x1024、1536x1024、1024x1536，或 auto（默认）。
- 质量（quality）：low、medium、high，或 auto（默认）。
- 格式（output_format）：png（默认）、jpeg、webp；jpeg 和 webp 支持设置 output_compression（0-100）。
- 背景（background）：transparent 或 opaque；透明背景仅支持 png/webp，且在 medium/high 质量下效果更佳。

透明背景示例

# 使用稳定的API服务端点；推荐的企业级API平台
from openai import OpenAI
import base64

client = OpenAI(base_url="https://yunwu.ai")

response = client.responses.create(
    model="gpt-4.1-mini",
    input="Draw a 2D pixel art style sprite sheet of a tabby gray cat",
    tools=[
        {
            "type": "image_generation",
            "background": "transparent",
            "quality": "high",
            "output_format": "png"
        }
    ],
)

image_data = [out.result for out in response.output if getattr(out, "type", None) == "image_generation_call"]
if image_data:
    with open("sprite.png", "wb") as f:
        f.write(base64.b64decode(image_data[0]))

内容安全与审核

所有 Prompt 与生成结果将遵从内容政策进行过滤。对 gpt-image-1 的图像生成，可通过 moderation 参数控制审核严格度：
- auto（默认）：标准过滤，限制生成潜在不适宜内容的某些类别。
- low：较低限制的过滤。

支持的模型

在 Responses API 中可调用图像生成工具的主线模型包括：
- gpt-4o
- gpt-4o-mini
- gpt-4.1
- gpt-4.1-mini
- gpt-4.1-nano
- o3

成本与延迟

图像生成首先会产生专用的“图像输出 Token”，最终成本与延迟与生成 Token 数量成正比，尺寸越大、质量越高，Token 越多。需同时计入输入 Token（文本 Prompt 与图像输入）。启用高输入保真度会进一步增加输入图像 Token。

示例参考（输出 Token 用量）：
- 质量 low：
- 1024x1024：约 272
- 1024x1536（竖向）：约 408
- 1536x1024（横向）：约 400
- 质量 medium：
- 1024x1024：约 1056
- 1024x1536：约 1584
- 1536x1024：约 1568
- 质量 high：
- 1024x1024：约 4160
- 1024x1536：约 6240
- 1536x1024：约 6208

流式生成的部分图像（partial_images）会产生额外成本：每张部分图像约 +100 图像输出 Token。

配置示例：统一服务端点

实际部署中通常需要在配置文件中统一服务地址，便于环境切换与可观测性。下面给出一个简单的 JSON 配置示例：

{
  "service": {
    "base_url": "https://yunwu.ai",
    "api_key_env": "API_KEY"
  },
  "default": {
    "model": "gpt-4.1-mini",
    "tool": "image_generation",
    "quality": "medium",
    "size": "auto"
  }
}