使用Google Imagen在Vertex AI上实现图像生成及编辑

最新推荐文章于 2025-02-14 11:37:57 发布

hgSdaegva

最新推荐文章于 2025-02-14 11:37:57 发布

阅读量362

点赞数 3

文章标签： Imagen 人工智能 python

本文链接：https://blog.csdn.net/hgSdaegva/article/details/145148724

版权

在当今人工智能迅猛发展的时代，借助AI生成图像已成为应用开发中的一大趋势。Google推出的Imagen技术，通过Vertex AI平台，为开发者提供了强大的图像生成和编辑能力。本文将详细介绍如何在Vertex AI上使用Imagen完成文本到图像生成、图像编辑、图像描述和视觉问答等任务，并通过完整的示例代码提供实用指导。

技术背景介绍

Google Imagen通过自然语言处理和计算机视觉技术，能够根据文本提示生成高质量的图像。结合Vertex AI平台，开发者可以快速构建下一代的AI产品，将用户的想象力转换为视觉资产。

核心原理解析

Imagen使用了先进的生成式对抗网络（GAN）和变分自编码器（VAE）等深度学习架构，通过分析文本信息生成与之相关的图像。这一过程结合了语言理解和视觉生成技术。

代码实现演示

1. 图像生成

使用文本生成新颖的图像。

from langchain_core.messages import AIMessage, HumanMessage
from langchain_google_vertexai.vision_models import VertexAIImageGeneratorChat
import base64
import io
from PIL import Image

# 创建图像生成模型对象
generator = VertexAIImageGeneratorChat()

# 提供文本输入以生成图像
messages = [HumanMessage(content=["a cat at the beach"])]
response = generator.invoke(messages)

# 获取生成的图像
generated_image = response.content[0]

# 解析响应对象以获取图像的base64字符串
img_base64 = generated_image["image_url"]["url"].split(",")[-1]

# 将base64字符串转换为图像
img = Image.open(io.BytesIO(base64.decodebytes(bytes(img_base64, "utf-8"))))

# 显示图像
img.show()

2. 图像编辑

使用文本提示编辑现有图像。

from langchain_google_vertexai.vision_models import (
    VertexAIImageEditorChat,
    VertexAIImageGeneratorChat,
)

# 创建图像生成模型对象
generator = VertexAIImageGeneratorChat()

# 提供文本输入以生成图像
messages = [HumanMessage(content=["a cat at the beach"])]
response = generator.invoke(messages)
generated_image = response.content[0]

# 创建图像编辑模型对象
editor = VertexAIImageEditorChat()

# 书写编辑提示并传递“生成的图像”
messages = [HumanMessage(content=[generated_image, "a dog at the beach "])]

# 调用模型进行图像编辑
editor_response = editor.invoke(messages)

# 解析响应对象以获取编辑后的图像base64字符串
edited_img_base64 = editor_response.content[0]["image_url"]["url"].split(",")[-1]

# 将base64字符串转换为图像
edited_img = Image.open(io.BytesIO(base64.decodebytes(bytes(edited_img_base64, "utf-8"))))

# 显示编辑后的图像
edited_img.show()

3. 图像描述

获取图像的文字描述。

from langchain_google_vertexai import VertexAIImageCaptioning

# 初始化图像描述对象
model = VertexAIImageCaptioning()

# 使用在图像生成部分生成的图像
img_base64 = generated_image["image_url"]["url"]
response = model.invoke(img_base64)

# 打印生成的描述
print(f"Generated Caption: {response}")

# 将base64字符串转换为图像
img = Image.open(io.BytesIO(base64.decodebytes(bytes(img_base64.split(",")[-1], "utf-8"))))

# 显示图像
img.show()

4. 视觉问答

基于图像进行问答。

from langchain_google_vertexai import VertexAIVisualQnAChat

# 初始化视觉问答对象
model = VertexAIVisualQnAChat()

question = "What animal is shown in the image?"
response = model.invoke(
    input=[
        HumanMessage(
            content=[
                {"type": "image_url", "image_url": {"url": img_base64}},
                question,
            ]
        )
    ]
)

# 打印问题及答案
print(f"Question: {question}\nAnswer: {response.content}")

# 将base64字符串转换为图像
img = Image.open(io.BytesIO(base64.decodebytes(bytes(img_base64.split(",")[-1], "utf-8"))))

# 显示图像
img.show()