好玩的Deep Dream模型

最新推荐文章于 2024-04-22 14:54:33 发布

文科升

最新推荐文章于 2024-04-22 14:54:33 发布

阅读量1.3k

点赞数 2

本文链接：https://blog.csdn.net/moyu123456789/article/details/84127189

版权

Tensorflow 专栏收录该内容

9 篇文章 3 订阅

订阅专栏

1.Deep Dream是什么？

2.Tensorflow中Deep Dream模型实践

本文为笔者学习《21个项目玩转深度学习：基于TensorFlow的实践详解》这本书第四章的学习笔记。

1.Deep Dream是什么？

《21个项目玩转深度学习：基于TensorFlow的实践详解》一书对Deep Dream解释如下：Deep Dream是Google公司在2015年公布的一项有趣的技术。在训练好的卷积神经网络中，只需要设定几个参数，就可以通过这项技术生成一张图像。

在卷积神经网络中，输入的时一张图片，中间经过若干层卷积层来提取feature map，然后将feature map输入到全连接层中，通过全连接层最后进行分类。那么，卷积层中到底学到了什么？获取的feature map到底是什么样的呢？

“卷积的一个通道就可以代表一种学习到”信息”，以某一个通道的平均值作为优化目标，就可以搞清楚这个通道究竟学习到了什么，这就是Deep Dream的基本原理。

2.Tensorflow中Deep Dream模型实践

1）导入inception模型

原始的Deep Dream模型只需要优化ImageNet模型卷积层某个通道的激活值就可以了，因此需要先导入一个ImageNet图像识别模型。这里以Inception为例。TensorFlow将训练的模型导入到.pb为扩展名的文件中，在使用的时候再导出。对于inception模型，对应的pb模型文件名为tensorflow_inceptioin_graph.pb。

tensorflow_inceptioin_graph.pb也可以去百度或者Google来下载。导入inception的模型代码如下：

# coding:utf-8
# 导入要用到的基本模块。
from __future__ import print_function
import numpy as np
import tensorflow as tf

# 创建图和Session
graph = tf.Graph()
sess = tf.InteractiveSession(graph=graph)

# tensorflow_inception_graph.pb文件中，既存储了inception的网络结构也存储了对应的数据
# 使用下面的语句将之导入
model_fn = 'tensorflow_inception_graph.pb'
#tf.gfile.FastGFile(path,decodestyle)
#功能：实现对图片的读取。
#参数：(1)path：图片所在路径 (2)decodestyle:图片的解码方式。(‘r’:UTF-8编码; ‘rb’:非UTF-8编码)
with tf.gfile.FastGFile(model_fn, 'rb') as f:
    #先创建一个空的图
    graph_def = tf.GraphDef()
    graph_def.ParseFromString(f.read())
# 定义t_input为我们输入的图像
t_input = tf.placeholder(np.float32, name='input')
imagenet_mean = 117.0
# 输入图像需要经过处理才能送入网络中
# expand_dims是加一维，从[height, width, channel]变成[1, height, width, channel]
# t_input - imagenet_mean是减去一个均值
t_preprocessed = tf.expand_dims(t_input - imagenet_mean, 0)
#导入模型
tf.import_graph_def(graph_def, {'input': t_preprocessed})

# 找到所有卷积层
layers = [op.name for op in graph.get_operations() if op.type == 'Conv2D' and 'import/' in op.name]

# 输出卷积层层数
print('Number of layers', len(layers))
#输出各个层的名称
print('Layers:', layers)
# 特别地，输出mixed4d_3x3_bottleneck_pre_relu的形状
name = 'mixed4d_3x3_bottleneck_pre_relu'
print('shape of %s: %s' % (name, str(graph.get_tensor_by_name('import/' + name + ':0').get_shape())))

需要注意以下几点：

（1）在导入的时候需要给网络指定一个输入图像，为此设置了一个占位符t_input，输入图像时将图像传给t_input，需要修改图像格式为（batch,height,width,channel)，这里batch为1，因为在训练图像时都是一个batch一个batch来输入的，每个batch中有多张图像。

（2）为图像减去一个像素均值。原因是训练inception的时候做了减去均值的预处理，因此，应该使用同样的预处理方法才能保持输入的一致。这里的固定均值为117。

经过减去均值和添加batch维度这两项处理后，得到了要实际输入网络的图像t_preprocessed。

运行程序会输出共有59个卷积层，代码中的print('Layers:', layers)打印了各个层的名称，名称如下，我们可以对照着名称来了解网络模型。

Layers: ['import/conv2d0_pre_relu/conv', 'import/conv2d1_pre_relu/conv',
'import/conv2d2_pre_relu/conv','import/mixed3a_1x1_pre_relu/conv',
'import/mixed3a_3x3_bottleneck_pre_relu/conv', 'import/mixed3a_3x3_pre_relu/conv',
'import/mixed3a_5x5_bottleneck_pre_relu/conv', 'import/mixed3a_5x5_pre_relu/conv',
'import/mixed3a_pool_reduce_pre_relu/conv', 'import/mixed3b_1x1_pre_relu/conv',
'import/mixed3b_3x3_bottleneck_pre_relu/conv', 'import/mixed3b_3x3_pre_relu/conv',
'import/mixed3b_5x5_bottleneck_pre_relu/conv', 'import/mixed3b_5x5_pre_relu/conv',
'import/mixed3b_pool_reduce_pre_relu/conv', 'import/mixed4a_1x1_pre_relu/conv',
'import/mixed4a_3x3_bottleneck_pre_relu/conv', 'import/mixed4a_3x3_pre_relu/conv',
'import/mixed4a_5x5_bottleneck_pre_relu/conv', 'import/mixed4a_5x5_pre_relu/conv',
'import/mixed4a_pool_reduce_pre_relu/conv', 'import/mixed4b_1x1_pre_relu/conv',
'import/mixed4b_3x3_bottleneck_pre_relu/conv', 'import/mixed4b_3x3_pre_relu/conv',
'import/mixed4b_5x5_bottleneck_pre_relu/conv', 'import/mixed4b_5x5_pre_relu/conv',
'import/mixed4b_pool_reduce_pre_relu/conv', 'import/mixed4c_1x1_pre_relu/conv',
'import/mixed4c_3x3_bottleneck_pre_relu/conv', 'import/mixed4c_3x3_pre_relu/conv',
'import/mixed4c_5x5_bottleneck_pre_relu/conv', 'import/mixed4c_5x5_pre_relu/conv',
'import/mixed4c_pool_reduce_pre_relu/conv', 'import/mixed4d_1x1_pre_relu/conv',
'import/mixed4d_3x3_bottleneck_pre_relu/conv', 'import/mixed4d_3x3_pre_relu/conv',
'import/mixed4d_5x5_bottleneck_pre_relu/conv', 'import/mixed4d_5x5_pre_relu/conv',
'import/mixed4d_pool_reduce_pre_relu/conv', 'import/mixed4e_1x1_pre_relu/conv',
'import/mixed4e_3x3_bottleneck_pre_relu/conv', 'import/mixed4e_3x3_pre_relu/conv',
'import/mixed4e_5x5_bottleneck_pre_relu/conv', 'import/mixed4e_5x5_pre_relu/conv',
'import/mixed4e_pool_reduce_pre_relu/conv', 'import/mixed5a_1x1_pre_relu/conv',
'import/mixed5a_3x3_bottleneck_pre_relu/conv', 'import/mixed5a_3x3_pre_relu/conv',
'import/mixed5a_5x5_bottleneck_pre_relu/conv', 'import/mixed5a_5x5_pre_relu/conv',
'import/mixed5a_pool_reduce_pre_relu/conv', 'import/mixed5b_1x1_pre_relu/conv',
'import/mixed5b_3x3_bottleneck_pre_relu/conv', 'import/mixed5b_3x3_pre_relu/conv',
'import/mixed5b_5x5_bottleneck_pre_relu/conv', 'import/mixed5b_5x5_pre_relu/conv',
'import/mixed5b_pool_reduce_pre_relu/conv', 'import/head0_bottleneck_pre_relu/conv',
'import/head1_bottleneck_pre_relu/conv']

2）生成原始的DeepDream图像

这里以mixed4d_3x3_bottleneck_pre_relu层为例，最大化某一个通道的平均值来生成图像。

首先，取出对应名称为mixed4d_3x3_bottleneck_pre_relu的卷积层输出layer_output，这里选择通道channel=139来进行最大化，最后用渲染函数render_naive的时候传递layer_output[:, :, :, channel]即可。render_naive中传入的img0是随机构造的初始图像，它是一个形状为（224,224,3）的张量，表示初始的图像优化起点。

# coding: utf-8
from __future__ import print_function
import os
from io import BytesIO
import numpy as np
from functools import partial
import PIL.Image
import scipy.misc
import tensorflow as tf


graph = tf.Graph()
model_fn = 'tensorflow_inception_graph.pb'
sess = tf.InteractiveSession(graph=graph)
with tf.gfile.FastGFile(model_fn, 'rb') as f:
    graph_def = tf.GraphDef()
    graph_def.ParseFromString(f.read())
t_input = tf.placeholder(np.float32, name='input')
imagenet_mean = 117.0
t_preprocessed = tf.expand_dims(t_input - imagenet_mean, 0)
tf.import_graph_def(graph_def, {'input': t_preprocessed})


def savearray(img_array, img_name):
    scipy.misc.toimage(img_array).save(img_name)
    print('img saved: %s' % img_name)


def render_naive(t_obj, img0, iter_n=30, step=1.0):
    # t_score是优化目标。它是t_obj的平均值
    # 结合调用处看，实际上就是layer_output[:, :, :, channel]的平均值
    t_score = tf.reduce_mean(t_obj)
    # 计算t_score对t_input的梯度
    t_grad = tf.gradients(t_score, t_input)[0]

    # 创建新图
    img = img0.copy()
    for i in range(iter_n):
        # 在sess中计算梯度，以及当前的score
        g, score = sess.run([t_grad, t_score], {t_input: img})
        # 对img应用梯度。step可以看做“学习率”
        g /= g.std() + 1e-8
        img += g * step
        print('score(mean)=%f' % (score))
    # 保存图片
    savearray(img, 'naive.jpg')

# 定义卷积层、通道数，并取出对应的tensor
name = 'mixed4d_3x3_bottleneck_pre_relu'
channel = 139
layer_output = graph.get_tensor_by_name("import/%s:0" % name)

# 定义原始的图像噪声
img_noise = np.random.uniform(size=(224, 224, 3)) + 100.0
# 调用render_naive函数渲染
render_naive(layer_output[:, :, :, channel], img_noise, iter_n=20)

以上代码保存在gen_naive.py中，执行gen_naive.py结果如下：

score(mean)=-19.891029
score(mean)=-31.205770
score(mean)=21.456114
score(mean)=108.289459
score(mean)=178.601166
score(mean)=232.210205
score(mean)=290.056335
score(mean)=335.893524
score(mean)=375.240875
score(mean)=423.900024
score(mean)=465.111328
score(mean)=513.792480
score(mean)=535.929504
score(mean)=576.985229
score(mean)=602.876892
score(mean)=638.416992
score(mean)=659.976440
score(mean)=688.588501
score(mean)=711.269775
score(mean)=733.619324
img saved: naive.jpg

这说明score（也就是卷积层对应通道的平均值）确实是按照期望在逐渐增大的。经过20次迭代后，将图片保存为naive.jpg，这个图像中能模糊的看到有花的文理图，说明mixed4d_3x3_bottleneck_pre_relu层的第139个通道学习到的特征为花的纹理图特征。生成的naive.jpg图像如下：

3）生成更多尺寸的DeepDream图像

以上生成的图片大小为（224,224,3），如果要想生成更大尺寸的图像，就需要调整输入初始化图片的大小。但是，这会存在一个问题：要生成的图像越大，就会占用越大的内存（或显存），若想生成更大的图像，就会因为内存不足而渲染失败。解决该问题的思路是：每次不对整张图片做优化，而是把图片分成几部分，每次只对一部分做优化，这样每次优化时只会消耗固定大小的内存。

代码如下：

# coding:utf-8
from __future__ import print_function
import os
from io import BytesIO
import numpy as np
from functools import partial
import PIL.Image
import scipy.misc
import tensorflow as tf


graph = tf.Graph()
model_fn = 'tensorflow_inception_graph.pb'
sess = tf.InteractiveSession(graph=graph)
with tf.gfile.FastGFile(model_fn, 'rb') as f:
    graph_def = tf.GraphDef()
    graph_def.ParseFromString(f.read())
t_input = tf.placeholder(np.float32, name='input')
imagenet_mean = 117.0
t_preprocessed = tf.expand_dims(t_input - imagenet_mean, 0)
tf.import_graph_def(graph_def, {'input': t_preprocessed})


def savearray(img_array, img_name):
    scipy.misc.toimage(img_array).save(img_name)
    print('img saved: %s' % img_name)


def resize_ratio(img, ratio):
    min = img.min()
    max = img.max()
    img = (img - min) / (max - min) * 255
    img = np.float32(scipy.misc.imresize(img, ratio))
    img = img / 255 * (max - min) + min
    return img


def calc_grad_tiled(img, t_grad, tile_size=512):
    # 每次只对tile_size×tile_size大小的图像计算梯度，避免内存问题
    sz = tile_size
    h, w = img.shape[:2]
    # img_shift：先在行上做整体移动，再在列上做整体移动
    # 防止在tile的边缘产生边缘效应
    sx, sy = np.random.randint(sz, size=2)
    img_shift = np.roll(np.roll(img, sx, 1), sy, 0)
    grad = np.zeros_like(img)
    # y, x是开始位置的像素
    for y in range(0, max(h - sz // 2, sz), sz):
        for x in range(0, max(w - sz // 2, sz), sz):
            # 每次对sub计算梯度。sub的大小是tile_size×tile_size
            sub = img_shift[y:y + sz, x:x + sz]
            g = sess.run(t_grad, {t_input: sub})
            grad[y:y + sz, x:x + sz] = g
    # 使用np.roll移动回去
    return np.roll(np.roll(grad, -sx, 1), -sy, 0)


def render_multiscale(t_obj, img0, iter_n=20, step=1.0, octave_n=3, octave_scale=1.4):
    # 同样定义目标和梯度
    t_score = tf.reduce_mean(t_obj)
    t_grad = tf.gradients(t_score, t_input)[0]

    img = img0.copy()
    for octave in range(octave_n):
        if octave > 0:
            # 每次将将图片放大octave_scale倍
            # 共放大octave_n - 1 次
            img = resize_ratio(img, octave_scale)
        for i in range(iter_n):
            # 调用calc_grad_tiled计算任意大小图像的梯度
            g = calc_grad_tiled(img, t_grad)
            g /= g.std() + 1e-8
            img += g * step
            print('.', end=' ')
    savearray(img, 'multiscale.jpg')

if __name__ == '__main__':
    name = 'mixed4d_3x3_bottleneck_pre_relu'
    channel = 139
    img_noise = np.random.uniform(size=(224, 224, 3)) + 100.0
    layer_output = graph.get_tensor_by_name("import/%s:0" % name)
    render_multiscale(layer_output[:, :, :, channel], img_noise, iter_n=20)

运行后生成图像如下图所示。这张图确实大了许多，而且花朵的纹理更清晰了。

4）生成更高质量的DeepDream图像

以上的关注点是怎样将图像放大，这里关注一下图像的生成质量。在图像处理算法中，有高频成分和低频成分之分。简单来说，高频成分是指图像中灰度、颜色、透明度变化比较剧烈的地方，如边缘、细节部分。而低频成分是指，图像变化不大的地方，比如大块色块、整体风格。一般来讲，图像中的低频成分更多一些，这样图像会比较“柔和”。

此处采用放大低频的梯度，之前生成图像时使用的梯度是统一的，如果可以对梯度做分解，将之分为“高频梯度”和“低频梯度”，再人为地去放大“低频梯度”，就可以得到较为柔和的图像了。具体实践中，使用拉普拉斯金字塔对图像进行分解。

使用拉普拉斯金字塔（ Laplacian Pyramid ）对图像进行分解。这种算法可以把图片分解为多层，如图 4-5 所示底层的 level1 level2
就对应图像的高频成分，而上层的 level3 level4 对应图像的低频成分。可以对梯度也做这样的分解。分解之后，对高频的梯度和低频的梯度都做标准化，可以让梯度的低频成分和高频成分差不多，表现在图像上就会增加图像的低频成分，从而提高生成图像的质量。通常称这种方法为拉普拉斯金字塔梯度标准化（ Laplacian Pyramid Gradient Normalization ）。

代码如下，可以在代码中仔细体会该算法的思路。

# coding:utf-8
from __future__ import print_function
import os
from io import BytesIO
import numpy as np
from functools import partial
import PIL.Image
import scipy.misc
import tensorflow as tf


graph = tf.Graph()
model_fn = 'tensorflow_inception_graph.pb'
sess = tf.InteractiveSession(graph=graph)
with tf.gfile.FastGFile(model_fn, 'rb') as f:
    graph_def = tf.GraphDef()
    graph_def.ParseFromString(f.read())
t_input = tf.placeholder(np.float32, name='input')
imagenet_mean = 117.0
t_preprocessed = tf.expand_dims(t_input - imagenet_mean, 0)
tf.import_graph_def(graph_def, {'input': t_preprocessed})


def savearray(img_array, img_name):
    scipy.misc.toimage(img_array).save(img_name)
    print('img saved: %s' % img_name)



def resize_ratio(img, ratio):
    min = img.min()
    max = img.max()
    img = (img - min) / (max - min) * 255
    img = np.float32(scipy.misc.imresize(img, ratio))
    img = img / 255 * (max - min) + min
    return img


def calc_grad_tiled(img, t_grad, tile_size=512):
    sz = tile_size
    h, w = img.shape[:2]
    sx, sy = np.random.randint(sz, size=2)
    img_shift = np.roll(np.roll(img, sx, 1), sy, 0)  # 先在行上做整体移动，再在列上做整体移动
    grad = np.zeros_like(img)
    for y in range(0, max(h - sz // 2, sz), sz):
        for x in range(0, max(w - sz // 2, sz), sz):
            sub = img_shift[y:y + sz, x:x + sz]
            g = sess.run(t_grad, {t_input: sub})
            grad[y:y + sz, x:x + sz] = g
    return np.roll(np.roll(grad, -sx, 1), -sy, 0)

k = np.float32([1, 4, 6, 4, 1])
k = np.outer(k, k)
k5x5 = k[:, :, None, None] / k.sum() * np.eye(3, dtype=np.float32)

# 这个函数将图像分为低频和高频成分
def lap_split(img):
    with tf.name_scope('split'):
        # 做过一次卷积相当于一次“平滑”，因此lo为低频成分
        lo = tf.nn.conv2d(img, k5x5, [1, 2, 2, 1], 'SAME')
        # 低频成分放缩到原始图像一样大小得到lo2，再用原始图像img减去lo2，就得到高频成分hi
        lo2 = tf.nn.conv2d_transpose(lo, k5x5 * 4, tf.shape(img), [1, 2, 2, 1])
        hi = img - lo2
    return lo, hi

# 这个函数将图像img分成n层拉普拉斯金字塔
def lap_split_n(img, n):
    levels = []
    for i in range(n):
        # 调用lap_split将图像分为低频和高频部分
        # 高频部分保存到levels中
        # 低频部分再继续分解
        img, hi = lap_split(img)
        levels.append(hi)
    levels.append(img)
    return levels[::-1]

# 将拉普拉斯金字塔还原到原始图像
def lap_merge(levels):
    img = levels[0]
    for hi in levels[1:]:
        with tf.name_scope('merge'):
            img = tf.nn.conv2d_transpose(img, k5x5 * 4, tf.shape(hi), [1, 2, 2, 1]) + hi
    return img


# 对img做标准化。
def normalize_std(img, eps=1e-10):
    with tf.name_scope('normalize'):
        std = tf.sqrt(tf.reduce_mean(tf.square(img)))
        return img / tf.maximum(std, eps)

# 拉普拉斯金字塔标准化
def lap_normalize(img, scale_n=4):
    img = tf.expand_dims(img, 0)
    tlevels = lap_split_n(img, scale_n)
    # 每一层都做一次normalize_std
    tlevels = list(map(normalize_std, tlevels))
    out = lap_merge(tlevels)
    return out[0, :, :, :]


def tffunc(*argtypes):
    placeholders = list(map(tf.placeholder, argtypes))
    def wrap(f):
        out = f(*placeholders)
        def wrapper(*args, **kw):
            return out.eval(dict(zip(placeholders, args)), session=kw.get('session'))
        return wrapper
    return wrap


def render_lapnorm(t_obj, img0,
                   iter_n=10, step=1.0, octave_n=3, octave_scale=1.4, lap_n=4):
    # 同样定义目标和梯度
    t_score = tf.reduce_mean(t_obj)
    t_grad = tf.gradients(t_score, t_input)[0]
    # 将lap_normalize转换为正常函数
    lap_norm_func = tffunc(np.float32)(partial(lap_normalize, scale_n=lap_n))

    img = img0.copy()
    for octave in range(octave_n):
        if octave > 0:
            img = resize_ratio(img, octave_scale)
        for i in range(iter_n):
            g = calc_grad_tiled(img, t_grad)
            # 唯一的区别在于我们使用lap_norm_func来标准化g！
            g = lap_norm_func(g)
            img += g * step
            print('.', end=' ')
    savearray(img, 'lapnorm.jpg')

if __name__ == '__main__':
    name = 'mixed4d_3x3_bottleneck_pre_relu'
    channel = 139
    img_noise = np.random.uniform(size=(224, 224, 3)) + 100.0
    layer_output = graph.get_tensor_by_name("import/%s:0" % name)
    render_lapnorm(layer_output[:, :, :, channel]+layer_output[:, :, :, 99], img_noise, iter_n=20)

运行以上代码，生成结果如下图所示。怎么样，是不是看着柔和舒服了许多啊？这样我们可以更进一步的了解这个卷积层的139通道学习到的图像特征长什么样子。

5）构建最终的DeepDream模型

上边的图像都是使用随机生成的图像作为初始图像（也就是说背景是随机的图像），我们也可以使用一张自己的图像作为起始图像来和特征图搭配在一起生成一张图像。

使用下面这张图最为背景。

# coding:utf-8
from __future__ import print_function
import os
from io import BytesIO
import numpy as np
from functools import partial
import PIL.Image
import scipy.misc
import tensorflow as tf


graph = tf.Graph()
model_fn = 'tensorflow_inception_graph.pb'
sess = tf.InteractiveSession(graph=graph)
with tf.gfile.FastGFile(model_fn, 'rb') as f:
    graph_def = tf.GraphDef()
    graph_def.ParseFromString(f.read())
t_input = tf.placeholder(np.float32, name='input')  # define the input tensor
imagenet_mean = 117.0
t_preprocessed = tf.expand_dims(t_input - imagenet_mean, 0)
tf.import_graph_def(graph_def, {'input': t_preprocessed})


def savearray(img_array, img_name):
    scipy.misc.toimage(img_array).save(img_name)
    print('img saved: %s' % img_name)


def visstd(a, s=0.1):
    return (a - a.mean()) / max(a.std(), 1e-4) * s + 0.5


def resize_ratio(img, ratio):
    min = img.min()
    max = img.max()
    img = (img - min) / (max - min) * 255
    img = np.float32(scipy.misc.imresize(img, ratio))
    img = img / 255 * (max - min) + min
    return img


def resize(img, hw):
    min = img.min()
    max = img.max()
    img = (img - min) / (max - min) * 255
    img = np.float32(scipy.misc.imresize(img, hw))
    img = img / 255 * (max - min) + min
    return img


def calc_grad_tiled(img, t_grad, tile_size=512):
    sz = tile_size
    h, w = img.shape[:2]
    sx, sy = np.random.randint(sz, size=2)
    img_shift = np.roll(np.roll(img, sx, 1), sy, 0)  # 先在行上做整体移动，再在列上做整体移动
    grad = np.zeros_like(img)
    for y in range(0, max(h - sz // 2, sz), sz):
        for x in range(0, max(w - sz // 2, sz), sz):
            sub = img_shift[y:y + sz, x:x + sz]
            g = sess.run(t_grad, {t_input: sub})
            grad[y:y + sz, x:x + sz] = g
    return np.roll(np.roll(grad, -sx, 1), -sy, 0)


def tffunc(*argtypes):
    placeholders = list(map(tf.placeholder, argtypes))
    def wrap(f):
        out = f(*placeholders)
        def wrapper(*args, **kw):
            return out.eval(dict(zip(placeholders, args)), session=kw.get('session'))
        return wrapper
    return wrap



def render_deepdream(t_obj, img0,
                     iter_n=10, step=1.5, octave_n=4, octave_scale=1.4):
    t_score = tf.reduce_mean(t_obj)
    t_grad = tf.gradients(t_score, t_input)[0]

    img = img0
    # 同样将图像进行金字塔分解
    # 此时提取高频、低频的方法比较简单。直接缩放就可以
    octaves = []
    for i in range(octave_n - 1):
        hw = img.shape[:2]
        lo = resize(img, np.int32(np.float32(hw) / octave_scale))
        hi = img - resize(lo, hw)
        img = lo
        octaves.append(hi)

    # 先生成低频的图像，再依次放大并加上高频
    for octave in range(octave_n):
        if octave > 0:
            hi = octaves[-octave]
            img = resize(img, hi.shape[:2]) + hi
        for i in range(iter_n):
            g = calc_grad_tiled(img, t_grad)
            img += g * (step / (np.abs(g).mean() + 1e-7))
            print('.', end=' ')

    img = img.clip(0, 255)
    savearray(img, 'deepdream.jpg')


if __name__ == '__main__':
    img0 = PIL.Image.open('test.jpg')
    img0 = np.float32(img0)

    name = 'mixed4d_3x3_bottleneck_pre_relu'
    #name = 'mixed5b_5x5_pre_relu'
    #name = 'mixed4c'
    channel = 139
    layer_output = graph.get_tensor_by_name("import/%s:0" % name)
    render_deepdream(layer_output[:, :, :, channel], img0)

    #name = 'mixed4c'
    #layer_output = graph.get_tensor_by_name("import/%s:0" % name)
    #render_deepdream(tf.square(layer_output), img0)

生成图像如下，怎么样，图像中确实有卷积层提取的花的纹理图案吧。

我们将卷积层换成mixed4c层，这个层提取了许多动物图案，将动物图案和我们的test.jpg相结合生成一张图像。结果如下：

3.总结

重点了解的内容：

1）deep dream的原理，以及通过deep dream来理解卷积层中提取的特征；

2）怎样处理大图片和怎样提高图像的质量，这些原理在以后的其他地方一定会用到。

文科升

关注

2
点赞
踩
4

收藏

觉得还不错? 一键收藏
3
评论
好玩的Deep Dream模型

目录1.Deep Dream是什么？2.Tensorflow中Deep Dream模型实践 1）导入inception模型2）生成原始的DeepDream图像3）生成更多尺寸的DeepDream图像4）生成更高质量的DeepDream图像5）构建最终的DeepDream模型3.总结本文为笔者学习《21个项目玩转深度学习：基于TensorFlow的实践详解》这本书...
复制链接

扫一扫

专栏目录