pix2pix GAN

LIjin_1006

已于 2024-04-05 22:00:24 修改

阅读量517

点赞数 4

文章标签：人工智能深度学习

于 2024-04-05 21:59:34 首次发布

本文链接：https://blog.csdn.net/LIjin_1006/article/details/137410424

版权

本文详细介绍了如何使用TensorFlow实现Pix2Pix GAN，包括数据预处理、模型构建、训练过程，展示了如何将建筑标签图像转换为建筑立面照片。内容涵盖了图像的随机裁剪、水平翻转、归一化等数据增强技术，以及生成器和判别器的构建和训练策略。

摘要由CSDN通过智能技术生成

import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'#设置tensorflow的日志级别
from tensorflow.python.platform import build_info

import tensorflow as tf
import os
# 用于处理文件系统路径的面向对象的库。pathlib 提供了 Path 类，
#该类表示文件系统路径，并提供了很多方法来操作这些路径。
import pathlib
import time
import datetime
from matplotlib import pyplot as plt
from IPython import display

# 列出所有物理GPU设备
gpus = tf.config.list_physical_devices('GPU')
if gpus:
    # 如果有GPU，设置GPU资源使用率
    try:
        # 允许GPU内存按需增长
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
        # 设置可见的GPU设备（这里实际上不需要，因为已经通过内存增长设置了每个GPU）
        # tf.config.set_visible_devices(gpus, 'GPU')
        print("GPU可用并已设置内存增长模式。")
    except RuntimeError as e:
        # 虚拟设备未就绪时可能无法设置GPU
        print(f"设置GPU时发生错误: {e}")
else:
    # 如果没有GPU
    print("没有检测到GPU设备。")

dataset_name = "facades"
path_to_zip = pathlib.Path('./datasets')
PATH = path_to_zip/dataset_name

list(PATH.iterdir())

sample_image = tf.io.read_file(str(PATH/'train/1.jpg'))# 样本图片,还是二进制

sample_image = tf.io.decode_jpeg(sample_image)
print(sample_image.shape) #高256,宽512彩色图片,因为包含两个子图

plt.figure()
plt.imshow(sample_image)

# 您需要将真实的建筑立面图像与建筑标签图像分开，所有这些图像的大小都是 256 x 256
# 定义加载图像文件并输出两个图像张量的函数
def load(image_file):
# 读取图片文件,并且解码转换成uint8
image = tf.io.read_file(image_file)
image = tf.io.decode_jpeg(image)
w = tf.shape(image)[1]
w = w // 2
input_image = image[:, w:, :]#标签图片
real_image = image[:, :w, :]#真实图片
#把两个图片转换成 float32 tensors
input_image = tf.cast(input_image, tf.float32)
real_image = tf.cast(real_image, tf.float32)
return input_image, real_image

# 绘制输入图像（建筑标签图像）和真实（建筑立面照片）图像的样本
#调用定义的load方法加载图片并且预处理
inp, re = load(str(PATH / 'train/100.jpg'))
print(inp.shape,re.shape)
plt.figure()
plt.imshow(inp / 255.0)#归一化
plt.figure()
plt.imshow(re / 255.0)

# 定义几个具有以下功能的函数：
# 将每个 256 x 256 图像调整为更大的高度和宽度，286 x 286。
# 将其随机裁剪回 256 x 256。
# 随机水平翻转图像，即从左到右（随机镜像）。
#将图像归一化到 [-1, 1] 范围。

#缓冲池大小
BUFFER_SIZE = 400
#批次大小
BATCH_SIZE = 1
# 图片宽高
IMG_WIDTH = 256
IMG_HEIGHT = 256

# 最近邻插值是一种简单的插值方法，它选择离目标点最近的像素值作为插值结果。
# 这种方法计算速度快，但可能在图像缩放时引入锯齿状的边缘。如果你需要更平
# 滑的缩放效果，可以考虑使用其他插值方法，如双线性插值（tf.image.ResizeMethod.BILINEAR）
# 或双三次插值（tf.image.ResizeMethod.BICUBIC），改变大小肯定涉及填充放大区域

#改变图片大小
def resize(input_image, real_image, height, width):
input_image = tf.image.resize(input_image, [height, width],
method=tf.image.ResizeMethod.NEAREST_NEIGHBOR)
real_image = tf.image.resize(real_image, [height, width],
method=tf.image.ResizeMethod.NEAREST_NEIGHBOR)
return input_image, real_image

#定义随机裁剪的方法
# 如果 input_image 和 real_image 在除了批次大小维度之外的其他维度上有不同的形状，tf.stack 函数会抛出错误。
# 因此，在使用 tf.stack 之前，确保你要堆叠的张量在除了堆叠轴之外的所有维度上都有相同的形状是很重要的。
def random_crop(input_image, real_image):
#先按样本轴堆叠
stacked_image = tf.stack([input_image, real_image], axis=0)
# 随机裁剪是数据增强（data augmentation）的一种常见技术，它可以帮助模型在训练时看到输入数据的不同变体，从而提高模型的泛化能力。
# 因为裁剪是随机的，所以每次调用 tf.image.random_crop 时，都可能得到不同的裁剪结果。
cropped_image = tf.image.random_crop(
stacked_image, size=[2, IMG_HEIGHT, IMG_WIDTH, 3])
return cropped_image[0], cropped_image[1]

# 标准化图片到 [-1, 1]
def normalize(input_image, real_image):
input_image = (input_image / 127.5) - 1
real_image = (real_image / 127.5) - 1
return input_image, real_image

#转换为tensorflow计算图函数,random_jitter被装饰为TensorFlow的计算图函数，
# 但是否带梯度取决于函数内部的操作。如果函数内部只包含可微分的TensorFlow操作，
# 那么它