Andrew Ng Deep Learning 第四课第四周

最新推荐文章于 2022-02-27 11:45:18 发布

未知丶丶

最新推荐文章于 2022-02-27 11:45:18 发布

阅读量303

点赞数 1

分类专栏：深度学习文章标签：深度学习

本文链接：https://blog.csdn.net/qq_43310834/article/details/88429530

版权

深度学习专栏收录该内容

107 篇文章 13 订阅

订阅专栏

Andrew Ng Deep Learning 第四课第四周

前言
人脸识别
神经风格转换
课后选择题

前言

网易云课堂（双语字幕，不卡）：https://mooc.study.163.com/smartSpec/detail/1001319001.htmcourseId=1004570029、
Coursera（贵）：https://www.coursera.org/specializations/deep-learning
本人初学者，先在网易云课堂上看网课，再去Coursera上做作业，开博客以记录，文章中引用图片皆为课程中所截。
题目转载至：http://www.cnblogs.com/hezhiyao/p/7810725.html
编程作业所需库：链接：https://pan.baidu.com/s/1aS1Oia2fskemBHHEMnSepw 密码：66gd

人脸识别

Tips：人脸验证 1：1 在这里插入图片描述
Tips：人脸试别 1：k

One-Shot学习

在这里插入图片描述
Tips：简单来说就是只面对一个输入样本，输出一个结果

相似函数

在这里插入图片描述

Siamese网络

在这里插入图片描述

Tips：对每个图像使用同样的CNN网络将其转换成一个向量，该转换方式理解为输入x输出f(x)的类函数，之后就可以用得到的不同的f(x)来计算两张图片的范式差，这里||f(xⁱ)-f(x^j)||²的意思为两个向量做差运算得到新向量中每个位置的平方的总和，而这个就是我们的相似函数d
在这里插入图片描述

Triplet损失

在这里插入图片描述
Tips：我们的目标是同时看三张图片，使某图片与同一个人的图片的相似度较高（即相似函数结果小），与不是同一个人的图片相似度较低

Tips：那么就可以定义损失函数了，此处的α是超参数，防止出现所有f(x)得到结果均为0的情况，称为margin间隔
在这里插入图片描述
Tips：简单来说也就是选长得像的

二元分类

在这里插入图片描述
Tips：用Siamess网络的方法将每张图转化为一个向量，要对比两张图时将两个向量作为输入，在一个神经元上进行logstic这种逻辑回归，得出结论（1表示两图相同，0表示不同）

编程作业

import numpy as np
from keras.models import Sequential
from keras.layers import Conv2D, ZeroPadding2D, Activation, Input, concatenate
from keras.models import Model
from keras.layers.normalization import BatchNormalization
from keras.layers.pooling import MaxPooling2D, AveragePooling2D
from keras.layers.merge import Concatenate
from keras.layers.core import Lambda, Flatten, Dense
from keras.initializers import glorot_uniform
from keras.engine.topology import Layer
from keras import backend as K
#------------用于绘制模型细节，可选--------------#
from IPython.display import SVG
from keras.utils.vis_utils import model_to_dot
from keras.utils import plot_model
#------------------------------------------------#

K.set_image_data_format('channels_first')

import time
import cv2
import os
import numpy as np
from numpy import genfromtxt
import pandas as pd
import tensorflow as tf
import fr_utils
from inception_blocks_v2 import *
%matplotlib inline
%load_ext autoreload
%autoreload 2

np.set_printoptions(threshold=np.nan)

#获取模型
FRmodel = faceRecoModel(input_shape=(3,96,96))

#打印模型的总参数数量
print("参数数量：" + str(FRmodel.count_params()))

#开始时间
start_time = time.clock()

#编译模型
FRmodel.compile(optimizer = 'adam', loss = triplet_loss, metrics = ['accuracy'])

#加载权值
fr_utils.load_weights_from_FaceNet(FRmodel)

#结束时间
end_time = time.clock()

#计算时差
minium = end_time - start_time

print("执行了：" + str(int(minium / 60)) + "分" + str(int(minium%60)) + "秒")


def triplet_loss(y_true, y_pred, alpha = 0.2):
    """
    根据公式（4）实现三元组损失函数

    参数：
        y_true -- true标签，当你在Keras里定义了一个损失函数的时候需要它，但是这里不需要。
        y_pred -- 列表类型，包含了如下参数：
            anchor -- 给定的“anchor”图像的编码，维度为(None,128)
            positive -- “positive”图像的编码，维度为(None,128)
            negative -- “negative”图像的编码，维度为(None,128)
        alpha -- 超参数，阈值

    返回：
        loss -- 实数，损失的值
    """
    anchor, positive, negative = y_pred[0], y_pred[1], y_pred[2]
     
    ### START CODE HERE ### (≈ 4 lines)
    # Step 1: Compute the (encoding) distance between the anchor and the positive, you will need to sum over axis=-1
    dis1=tf.reduce_sum(tf.square(tf.subtract(anchor,positive)),axis=-1)
    # Step 2: Compute the (encoding) distance between the anchor and the negative, you will need to sum over axis=-1
    dis2=tf.reduce_sum(tf.square(tf.subtract(anchor,negative)),axis=-1)
    # Step 3: subtract the two previous distances and add alpha.
    dis=tf.add(tf.subtract(dis1,dis2),alpha)
    # Step 4: Take the maximum of basic_loss and 0.0. Sum over the training examples.
    loss=tf.reduce_sum(tf.maximum(dis,0))
    ### END CODE HERE ###
     
    return loss
database = {}
database["danielle"] = fr_utils.img_to_encoding("images/danielle.png", FRmodel)
database["younes"] = fr_utils.img_to_encoding("images/younes.jpg", FRmodel)
database["tian"] = fr_utils.img_to_encoding("images/tian.jpg", FRmodel)
database["andrew"] = fr_utils.img_to_encoding("images/andrew.jpg", FRmodel)
database["kian"] = fr_utils.img_to_encoding("images/kian.jpg", FRmodel)
database["dan"] = fr_utils.img_to_encoding("images/dan.jpg", FRmodel)
database["sebastiano"] = fr_utils.img_to_encoding("images/sebastiano.jpg", FRmodel)
database["bertrand"] = fr_utils.img_to_encoding("images/bertrand.jpg", FRmodel)
database["kevin"] = fr_utils.img_to_encoding("images/kevin.jpg", FRmodel)
database["felix"] = fr_utils.img_to_encoding("images/felix.jpg", FRmodel)
database["benoit"] = fr_utils.img_to_encoding("images/benoit.jpg", FRmodel)
database["arnaud"] = fr_utils.img_to_encoding("images/arnaud.jpg", FRmodel)

def verify(image_path, identity, database, model):
    """
    对“identity”与“image_path”的编码进行验证。

    参数：
        image_path -- 摄像头的图片。
        identity -- 字符类型，想要验证的人的名字。
        database -- 字典类型，包含了成员的名字信息与对应的编码。
        model -- 在Keras的模型的实例。

    返回：
        dist -- 摄像头的图片与数据库中的图片的编码的差距。
        is_open_door -- boolean,是否该开门。
    """
 
    ### START CODE HERE ###
     
    # Step 1: Compute the encoding for the image. Use img_to_encoding() see example above. (≈ 1 line)
    encoding=fr_utils.img_to_encoding(image_path,model)
     
    # Step 2: Compute distance with identity's image (≈ 1 line)
    dist=np.linalg.norm(encoding-database[identity])
     
    # Step 3: Open the door if dist < 0.7, else don't open (≈ 3 lines)
    if (dist<0.7):
        is_open_door=True
    else:
        is_open_door=False
    ### END CODE HERE ###
         
    return dist, is_open_door
def who_is_it(image_path, database,model):
    """
    根据指定的图片来进行人脸识别

    参数：
        images_path -- 图像地址
        database -- 包含了名字与编码的字典
        model -- 在Keras中的模型的实例。

    返回：
        min_dist -- 在数据库中与指定图像最相近的编码。
        identity -- 字符串类型，与min_dist编码相对应的名字。
    """
    ### START CODE HERE ### 
     
    ## Step 1: Compute the target "encoding" for the image. Use img_to_encoding() see example above. ## (≈ 1 line)
    encoding=fr_utils.img_to_encoding(image_path,model)
     
    ## Step 2: Find the closest encoding ##
     
    # Initialize "min_dist" to a large value, say 100 (≈1 line)
    min_dist=100
     
    # Loop over the database dictionary's names and encodings.
    for (name,enc) in database.items():
         
        # Compute L2 distance between the target "encoding" and the current "emb" from the database. (≈ 1 line)
        nowdist=np.linalg.norm(encoding-enc)
 
        # If this distance is less than the min_dist, then set min_dist to dist, and identity to name. (≈ 3 lines)
        if (nowdist<min_dist):
            min_dist=nowdist
            identity=name
 
    ### END CODE HERE ###
     
    if min_dist > 0.7:
        print("Not in the database.")
    else:
        print ("it's " + str(identity) + ", the distance is " + str(min_dist))
         
    return min_dist, identity

神经风格转换

在这里插入图片描述

深度卷积网络在做什么

在这里插入图片描述
Tips：简单来说，浅层在做的是简单的例如边角分割，颜色区分等，而深层则是对复杂物体的视觉认识

代价函数

在这里插入图片描述

内容代价函数（C，G）

在这里插入图片描述
Tips：此处的a是激活函数之后的结果，层l不是深层也不是浅层一般是中间层

风格损失函数（S，G）

Tips：首先定义图某层l的风格为该层不同信道间的激活函数的关系系数G
在这里插入图片描述
Tips：不同信道间的关系系数G可以求出↑

Tips：该层的风格代价函数为各层系数G差之和

Tips：总风格代价函数即为所有层代价函数之和

总结

在这里插入图片描述

编程作业

import time
import os
import sys
import scipy.io
import scipy.misc
import matplotlib.pyplot as plt
from matplotlib.pyplot import imshow
from PIL import Image
import nst_utils
import numpy as np
import tensorflow as tf
import imageio
%matplotlib inline

def compute_content_cost(a_C, a_G):
    """
    计算内容代价的函数

    参数：
        a_C -- tensor类型，维度为(1, n_H, n_W, n_C)，表示隐藏层中图像C的内容的激活值。
        a_G -- tensor类型，维度为(1, n_H, n_W, n_C)，表示隐藏层中图像G的内容的激活值。

    返回：
        J_content -- 实数，用上面的公式1计算的值。

    """

    ### START CODE HERE ###
    # Retrieve dimensions from a_G (≈1 line)
    m, n_H, n_W, n_C=a_G.get_shape().as_list()
     
    # Reshape a_C and a_G (≈2 lines)
    a_C_unrolled = tf.transpose(tf.reshape(a_C, [n_H * n_W, n_C])) # tanspose后a_C的维度(n_C,n_H, n_W,1)
    a_G_unrolled = tf.transpose(tf.reshape(a_G, [n_H * n_W, n_C])) # tanspose后a_G的维度(n_C,n_H, n_W,1)
     
    # compute the cost with tensorflow (≈1 line)
    J_content=tf.reduce_sum(tf.square(tf.subtract(a_C_unrolled,a_G_unrolled)))/(4*n_H*n_W*n_C)
    ### END CODE HERE ###
     
    return J_content
    
def gram_matrix(A):
    """
    计算矩阵A的风格矩阵

    参数：
        A -- 矩阵，维度为(n_C, n_H * n_W)

    返回：
        GA -- A的风格矩阵，维度为(n_C, n_C)

    """
    ### START CODE HERE ### (≈1 line)
    GA = tf.matmul(A,tf.transpose(A))
    ### END CODE HERE ###
     
    return GA
    
def compute_layer_style_cost(a_S, a_G):
    """
    计算单隐藏层的风格损失

    参数：
        a_S -- tensor类型，维度为(1, n_H, n_W, n_C)，表示隐藏层中图像S的风格的激活值。
        a_G -- tensor类型，维度为(1, n_H, n_W, n_C)，表示隐藏层中图像G的风格的激活值。

    返回：
        J_content -- 实数，用上面的公式2计算的值。

    """
    ### START CODE HERE ###
    # Retrieve dimensions from a_G (≈1 line)
    m, n_H, n_W, n_C=a_G.get_shape().as_list()
     
    # Reshape the images to have them of shape (n_C, n_H*n_W) (≈2 lines)
    a_S=tf.transpose(tf.reshape(a_S,(n_H*n_W,n_C)))
    a_G=tf.transpose(tf.reshape(a_G,(n_H*n_W,n_C)))
    # Computing gram_matrices for both images S and G (≈2 lines)
    S=gram_matrix(a_S)
    G=gram_matrix(a_G)
    # Computing the loss (≈1 line)
    J_style_layer=(tf.reduce_sum(tf.square(tf.subtract(G,S))))/(4*n_C*n_C*(n_H*n_W)*(n_H*n_W))
    
    ### END CODE HERE ###
     
    return J_style_layer
    
def compute_style_cost(model, STYLE_LAYERS):
    """
    计算几个选定层的总体风格成本

    参数：
        model -- 加载了的tensorflow模型
        STYLE_LAYERS -- 字典，包含了：
                        - 我们希望从中提取风格的层的名称
                        - 每一层的系数（coeff）
    返回：
        J_style - tensor类型，实数，由公式(2)定义的成本计算方式来计算的值。

    """
    # initialize the overall style cost
    J_style = 0
 
    for layer_name, coeff in STYLE_LAYERS:
 
        # Select the output tensor of the currently selected layer
        out = model[layer_name]
 
        # Set a_S to be the hidden layer activation from the layer we have selected, by running the session on out
        a_S = sess.run(out)
 
        # Set a_G to be the hidden layer activation from same layer. Here, a_G references model[layer_name] 
        # and isn't evaluated yet. Later in the code, we'll assign the image G as the model input, so that
        # when we run the session, this will be the activations drawn from the appropriate layer, with G as input.
        a_G = out
         
        # Compute style_cost for the current layer
        J_style_layer = compute_layer_style_cost(a_S, a_G)
 
        # Add coeff * J_style_layer of this layer to overall style cost
        J_style += coeff * J_style_layer
 
    return J_style
    
def total_cost(J_content, J_style, alpha = 10, beta = 40):
    """
    计算总成本

    参数：
        J_content -- 内容成本函数的输出
        J_style -- 风格成本函数的输出
        alpha -- 超参数，内容成本的权值
        beta -- 超参数，风格成本的权值

    """
    ### START CODE HERE ### (≈1 line)
    J =  alpha*J_content + beta*J_style
    ### END CODE HERE ###
    return J
    
STYLE_LAYERS = [
    ('conv1_1', 0.2),
    ('conv2_1', 0.2),
    ('conv3_1', 0.2),
    ('conv4_1', 0.2),
    ('conv5_1', 0.2)]

#重设图
tf.reset_default_graph()

#第1步：创建交互会话
sess = tf.InteractiveSession()

#第2步：加载内容图像(卢浮宫博物馆图片),并归一化图像
content_image = imageio.imread("images/louvre_small.jpg")
content_image = nst_utils.reshape_and_normalize_image(content_image)

#第3步：加载风格图像(印象派的风格),并归一化图像
style_image = imageio.imread("images/monet.jpg")
style_image = nst_utils.reshape_and_normalize_image(style_image)

#第4步：随机初始化生成的图像,通过在内容图像中添加随机噪声来产生噪声图像
generated_image = nst_utils.generate_noise_image(content_image)
imshow(generated_image[0])

#第5步：加载VGG16模型
model = nst_utils.load_vgg_model("pretrained-model/imagenet-vgg-verydeep-19.mat")
#第6步：构建TensorFlow图：

##将内容图像作为VGG模型的输入。
sess.run(model["input"].assign(content_image))

## 获取conv4_2层的输出
out = model["conv4_2"]

## 将a_C设置为“conv4_2”隐藏层的激活值。
a_C = sess.run(out)

## 将a_G设置为来自同一图层的隐藏层激活,这里a_G引用model["conv4_2"]，并且还没有计算，
## 在后面的代码中，我们将图像G指定为模型输入，这样当我们运行会话时，
## 这将是以图像G作为输入，从隐藏层中获取的激活值。
a_G = out

## 计算内容成本
J_content = compute_content_cost(a_C, a_G)

## 将风格图像作为VGG模型的输入
sess.run(model["input"].assign(style_image))

## 计算风格成本
J_style = compute_style_cost(model, STYLE_LAYERS)

## 计算总成本
J = total_cost(J_content, J_style, alpha = 10, beta = 40)

## 定义优化器,设置学习率为2.0
optimizer = tf.train.AdamOptimizer(2.0)

## 定义学习目标：最小化成本
train_step = optimizer.minimize(J)

def model_nn(sess, input_image, num_iterations = 200, is_print_info = True, 
             is_plot = True, is_save_process_image = True, 
             save_last_image_to = "output/generated_image.jpg"):
    # Initialize global variables (you need to run the session on the initializer)
    ### START CODE HERE ### (1 line)
    sess.run(tf.global_variables_initializer())
    ### END CODE HERE ###
     
    # Run the noisy input image (initial generated image) through the model. Use assign().
    ### START CODE HERE ### (1 line)
    sess.run(model["input"].assign(input_image))
    ### END CODE HERE ###
     
    for i in range(num_iterations):
     
        # Run the session on the train_step to minimize the total cost
        ### START CODE HERE ### (1 line)
        sess.run(train_step)
        ### END CODE HERE ###
         
        # Compute the generated image by running the session on the current model['input']
        ### START CODE HERE ### (1 line)
        generated_image=sess.run(model['input'])
        ### END CODE HERE ###
 
        # Print every 20 iteration.
        if i%20 == 0:
            Jt, Jc, Js = sess.run([J, J_content, J_style])
            print("Iteration " + str(i) + " :")
            print("total cost = " + str(Jt))
            print("content cost = " + str(Jc))
            print("style cost = " + str(Js))
             
            # save current generated image in the "/output" directory
            nst_utils.save_image("output/" + str(i) + ".png", generated_image)
     
    # save last generated image
    nst_utils.save_image('output/generated_image.jpg', generated_image)
     
    return generated_image
#开始时间
start_time = time.clock()

#非GPU版本,约25-30min
generated_image = model_nn(sess, generated_image)

#结束时间
end_time = time.clock()

#计算时差
minium = end_time - start_time

print("执行了：" + str(int(minium / 60)) + "分" + str(int(minium%60)) + "秒")

课后选择题

在这里插入图片描述

Tips：对于每个人需要多张图片

Tips：越深层识别的越复杂

Tips：32-3+1=30

未知丶丶

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
Andrew Ng Deep Learning 第四课第四周

Andrew Ng Deep Learning 第四课第四周前言人脸识别One-Shot学习相似函数Siamese网络Triplet损失二元分类神经风格转换深度卷积网络在做什么代价函数内容代价函数（C，G）风格损失函数（S，G）总结课后选择题前言网易云课堂（双语字幕，不卡）：https://mooc.study.163.com/smartSpec/detail/1001319001.htmc...
复制链接

扫一扫