Deepdream 实现

最新推荐文章于 2024-03-19 09:52:12 发布

yangdelong

最新推荐文章于 2024-03-19 09:52:12 发布

阅读量1.2k

点赞数

分类专栏：人工智能文章标签： Deep dream cnn

人工智能专栏收录该内容

44 篇文章

订阅专栏

文章来源： http://blog.csdn.net/Yan_Joy/article/details/54343806

环境准备

deepdream还是基于python和caffe深度网络的，因此大概需要以下环境：

Standard Python scientific stack: NumPy, SciPy, PIL, IPython. Those libraries can also be installed as a part of one of the scientific packages for Python, such as Anaconda or Canopy.
Caffe deep learning framework (installation instructions).
Google protobuf library that is used for Caffe model manipulation.

代码

导入库

相关的环境配置好了之后，可以先试试库能不能被导入进来：

# imports and basic notebook setup
from cStringIO import StringIO
import numpy as np
import scipy.ndimage as nd
import PIL.Image
from IPython.display import clear_output, Image, display
from google.protobuf import text_format

import caffe
caffe.set_mode_gpu();
caffe.set_device(2);# 默认GPU 为0
# 如果GPU 支持 CUDA 并且 Caffe 编译时添加对 CUDA 支持,可以使用caffe.set_mode_gpu()和caffe.set_device(0);


def showarray(a, fmt='jpeg'):
    a = np.uint8(np.clip(a, 0, 255))
    f = StringIO()
    PIL.Image.fromarray(a).save(f, fmt)
    display(Image(data=f.getvalue()))
 
 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

加载模型

网络采用了GoogLeNet模型，需要提前下好。

model_path = '../caffe/models/bvlc_googlenet/' # 替换为自己模型的目录
net_fn   = model_path + 'deploy.prototxt'
param_fn = model_path + 'bvlc_googlenet.caffemodel'

# Patching model to be able to compute gradients.
# Note that you can also manually add "force_backward: true" line to "deploy.prototxt".
# 以下部分是更改了deploy的参数，增加了"force_backward: true"，然后保存成一个临时文件用于网络。当然也可以自己手动改。
model = caffe.io.caffe_pb2.NetParameter()
text_format.Merge(open(net_fn).read(), model)
model.force_backward = True
open('tmp.prototxt', 'w').write(str(model))

net = caffe.Classifier('tmp.prototxt', param_fn,
                       mean = np.float32([104.0, 116.0, 122.0]), 
                       # ImageNet mean, training set dependent 
                       # 均值
                       channel_swap = (2,1,0))  
                       # the reference model has channels in BGR order instead of RGB 
                       # 改RGB通道

# a couple of utility functions for converting to and from Caffe's input image layout
# 为了caffe 数据处理的功能函数
def preprocess(net, img):
    return np.float32(np.rollaxis(img, 2)[::-1]) - net.transformer.mean['data']
def deprocess(net, img):
    return np.dstack((img + net.transformer.mean['data'])[::-1])
 
 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

做梦

Making the “dream” images is very simple. Essentially it is just a gradient ascent process that tries to maximize the L2 norm of activations of a particular DNN layer. Here are a few simple tricks that we found useful for getting good images:

offset image by a random jitter
normalize the magnitude of gradient ascent steps
apply ascent across multiple scales (octaves)

做梦其实很简单，本质上，它只是一个梯度上升过程，试图最大化特定DNN层激活的L2范数。这里有一些简单的技巧，我们发现有用的获得良好的图像：

由随机抖动偏移图像
规则化梯度上升步长的幅度
在多个尺度上应用上升

首先我们实现一个基本的梯度上升阶跃函数，应用前两个技巧：

# 将输入的数据(data)复制给梯度(diff)
def objective_L2(dst):
    dst.diff[:] = dst.data 

# 核心函数
def make_step(net, step_size=1.5, end='inception_4c/output', 
              jitter=32, clip=True, objective=objective_L2):
    '''Basic gradient ascent step.'''

    src = net.blobs['data'] # input image is stored in Net's 'data' blob 
                            # 输入图像
    dst = net.blobs[end]    # 目标层，默认为'inception_4c/output'

    ox, oy = np.random.randint(-jitter, jitter+1, 2) # 生产抖动
    src.data[0] = np.roll(np.roll(src.data[0], ox, -1), oy, -2) # apply jitter shift 
                                                                # 应用抖动

    net.forward(end=end) # 向前传播到指定层
    objective(dst)  # specify the optimization objective 
                    # 指定优化目标（默认为objective_L2优化）
    net.backward(start=end) # 反向传播到优化层
    g = src.diff[0] # 输入图像梯度
    # apply normalized ascent step to the input image
    # 对输入图像应用归一化上升步长
    src.data[:] += step_size/np.abs(g).mean() * g

    src.data[0] = np.roll(np.roll(src.data[0], -ox, -1), -oy, -2) # unshift image 
                                                                  # 还原抖动

    if clip:
        bias = net.transformer.mean['data']
        src.data[:] = np.clip(src.data, -bias, 255-bias)    
 
 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32

Next we implement an ascent through different scales. We call these scales “octaves”.

接下来，我们通过不同的尺度实现上升。我们称这些尺度为“octaves”。
iter_n是迭代次数，octave_n是尺度缩放次数，octave_scale是尺度缩放比例。

# 默认参数：迭代10次，缩放4次，缩放比1.4
# 即原始图像迭代10次，长宽缩小1.4倍后再次迭代，一共进行4轮（包括原始尺寸）
def deepdream(net, base_img, iter_n=10, octave_n=4, octave_scale=1.4, 
              end='inception_4c/output', clip=True, **step_params):
    # prepare base images for all octaves
    # 准备数据，生成octave_n个数据
    octaves = [preprocess(net, base_img)]
    for i in xrange(octave_n-1):
        octaves.append(nd.zoom(octaves[-1], (1, 1.0/octave_scale,1.0/octave_scale), order=1))

    src = net.blobs['data']
    #  np.zeros_like(a):　依据给定数组(a)的形状和类型返回一个新的元素全部为1的数组。
    detail = np.zeros_like(octaves[-1]) # allocate image for network-produced details 

    for octave, octave_base in enumerate(octaves[::-1]):
        h, w = octave_base.shape[-2:]
        if octave > 0:
            # upscale details from the previous octave
            h1, w1 = detail.shape[-2:]
            detail = nd.zoom(detail, (1, 1.0*h/h1,1.0*w/w1), order=1)

        src.reshape(1,3,h,w) # resize the network's input image size
        src.data[0] = octave_base+detail
        for i in xrange(iter_n):
            make_step(net, end=end, clip=clip, **step_params)

            # visualization
            vis = deprocess(net, src.data[0])
            if not clip: # adjust image contrast if clipping is disabled
                vis = vis*(255.0/np.percentile(vis, 99.98))
            showarray(vis)
            print octave, i, end, vis.shape
            clear_output(wait=True)

        # extract details produced on the current octave
        detail = src.data[0]-octave_base
    # returning the resulting image
    return deprocess(net, src.data[0])
 
 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38

开始做梦

# 打开并显示图片
img = np.float32(PIL.Image.open('sky1024px.jpg'))
showarray(img)
 
 1
2
3

sky1024px

_=deepdream(net, img)# 运行
 
 1

0_9
1_9
2_9
3_9
一共会得到4（尺度）*10（迭代）=40张图片。
更改结束层会改变结果，如：

_=deepdream(net, img, end='inception_3b/5x5_reduce')# 更改结束层
 
 1

3_9
具体的层可以参考配置文件，googlenet还是比较复杂的。而且结束的越晚，就更能从图像识别出现实物体。
如：_=deepdream(net, img, end='inception_4e/output')
3_9
左边的云已经可以看出是一只狗的脸了。

前方高能！

上面一步就得到了奇怪的结果，
如果把这样的输出再作为输入放到网络里呢？
结果比较精神污染：

5_3_9
这是经过大概5轮反复输入得到的结果。
官方迭代了100次，丧心病狂啊= =

控制做梦

如果调整了我们的优化目标，就可以控制我们想要的结果。
比如我们想把原始图像往另一张图像上靠近，可以定义一个指向性的优化目标：

def objective_guide(dst):
    x = dst.data[0].copy()
    y = guide_features
    ch = x.shape[0]
    x = x.reshape(ch,-1)
    y = y.reshape(ch,-1)
    A = x.T.dot(y) # compute the matrix of dot-products with guide features
    dst.diff[0].reshape(ch,-1)[:] = y[:,A.argmax(1)] # select ones that match best
 
 1
2
3
4
5
6
7
8

guide_features是需要预先提取的目标特征：

end = 'inception_3b/output'
h, w = guide.shape[:2]
src, dst = net.blobs['data'], net.blobs[end]
src.reshape(1,3,h,w)
src.data[0] = preprocess(net, guide)
net.forward(end=end)
guide_features = dst.data[0].copy()
 
 1
2
3
4
5
6
7

flower
好了，运行！

_=deepdream(net, img, end=end, objective=objective_guide)
 
 1

flower_3_9
有了花的感觉啊~

总结

正着传播过来是分类，反着过去是生成，用深度学习产生艺术作品似乎是一个很有意思的方向。而目前也有较为成熟甚至是商业化的项目了，比如deepart。如果你能够承受一定的精神污染，建议挑战一下Nightmare，这是YOLO大神的另一个作品。

deepdream-github