2020-8-23 吴恩达-改善深层NN-w3 超参调整/批量正则化/编程框架(课后编程-TensorFlow Tutorial-手势辨认)

最新推荐文章于 2022-08-11 23:09:22 发布

没人不认识我

最新推荐文章于 2022-08-11 23:09:22 发布

阅读量359

点赞数

分类专栏：深度学习 python IT 文章标签：深度学习

本文链接：https://blog.csdn.net/weixin_42555985/article/details/108181694

版权

IT 同时被 3 个专栏收录

389 篇文章 4 订阅

订阅专栏

深度学习

274 篇文章 24 订阅

订阅专栏

python

233 篇文章 0 订阅

订阅专栏

原文链接
如果打不开，也可以复制链接到https://nbviewer.jupyter.org中打开。

欢迎来到本周的编程作业。到现在为止，你已经会使用numpy来构建NN。现在我们来引导你了解一个DL框架，它可以让你更加容易的构建NN。ML框架TensorFlow, PaddlePaddle, Torch, Caffe, Keras等，可以显著提高你ML开发速度。所有这些框架都有很多文档，你可以自由的学习。

本文将带你学习TensorFlow框架。包含以下内容

初始化变量
开始你的会话
训练算法
构建一个NN

使用框架编程不仅可以节省你的写代码时间，有时还可以优化你的代码，让它的速度更快。

1.导入TensorFlow库

你需要先导入一些库

import math
import numpy as np
import h5py
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.python.framework import ops
from tf_utils import load_dataset, random_mini_batches, convert_to_one_hot, predict

#%matplotlib inline #如果你使用的是jupyter notebook取消注释
np.random.seed(1)

导入库之后，我们将为你介绍不同的应用。
我们从计算训练样本的损失开始。
$\mathcal{L}(\hat{y}, y) = (\hat y^{(i)} - y^{(i)})^2 \tag{1}$
计算代码如下

#定义y_hat为常量36
y_hat = tf.constant(36, name='y_hat')            # Define y_hat constant. Set to 36.
#定义y为常量39
y = tf.constant(39, name='y')                    # Define y. Set to 39

#为损失函数创建一个变量
loss = tf.Variable((y - y_hat)**2, name='loss')  # Create a variable for the loss

#运行之后的初始化(ession.run(init)，损失变量将被初始化并准备计算
init = tf.global_variables_initializer()         # When init is run later (session.run(init)),
                                                 # the loss variable will be initialized and ready to be computed
#创建一个session并打印输出
with tf.Session() as session:                    # Create a session and print the output
    session.run(init)                            # Initializes the variables #初始化变量
    print(session.run(loss))                     # Prints the loss

运行结果

在TensorFlow中编写和编译程序的步骤如下：

1、创建Tensorflow张量（Tensors，包括常数、变量、张量占位符和稀疏张量），此时没有执行/评估executed/evaluated
2、编写变量之间的操作
3、初始化你的张量
4、创建Session会话
5、运行Session。前面编写操作会在这里运行。

因此，当我们为损失函数创建一个变量loss时，我们简单定义了计算公式，但没有评估它的值。为了评估它，我们需要初始化init=tf.global_variables_initializer()。初始化后，我们就能评估loss的值并打印它。

下面来看一个简单的例子

a = tf.constant(2)
b = tf.constant(10)
c = tf.multiply(a,b)
print(c)

运行结果

Tensor("Mul:0", shape=(), dtype=int32)

和预料的一样，你没有得到结果20。你得到了一个张量tensor，没有维度，类型为int32。你所做的只是把它们放入了一个计算图computation graph，但是你还没有运行计算。为了最终实现2个数相乘，你必须创建一个会话session，并运行它。
加入以下代码

sess = tf.Session()
print(sess.run(c))

运行一下，得到结果

总结一下：记得初始化你的变量，创建session并运行它。

接下来，你需要了解一下占位符placeholders。占位符是一个只能在以后指定值的对象。为了指定占位符的值，你可以使用字典feed_dict变量送入。

下面的代码我们为x创建了一个占位符。它允许我们在后面允许session时候再传入数值。



# Change the value of x in the feed_dict

x = tf.placeholder(tf.int64, name = 'x')
print(sess.run(2 * x, feed_dict = {x: 3}))
sess.close()

运行结果

上面代码中，当你定义x时候并没有赋值。我们在运行会话session时将数据分配给它。

下面是这样的：当您定义计算所需的操作时，您正在告诉TensorFlow如何构造计算图。计算图可以有一些占位符，这些占位符的值将在后面指定。最后，在运行会话session时，告诉TensorFlow执行计算图。

1.1-线性函数

让我们从一个线性函数开始编程练习。
$Y = W X + b$ , $W$ 和 $X$ 是随机矩阵，b是随机向量。

练习： $W X + b$ ， $W, X$ , 和 $b$ 是从随机正态分布中抽取的。

W的维度 (4, 3)
X 的维度 (3,1)
b 的维度 (4,1)

以下是一个样例。如何定义一个常量 X，维度 (3,1):

X = tf.constant(np.random.randn(3,1), name = "X")

以下是一些有用的函数

tf.matmul(…, …)，矩阵乘法
tf.add(…, …) ，加法
np.random.randn(…)，随机初始化

线性函数实现如下

# GRADED FUNCTION: linear_function

def linear_function():
    """
    Implements a linear function: 实现一个线性函数
            Initializes W to be a random tensor of shape (4,3)
            Initializes X to be a random tensor of shape (3,1)
            Initializes b to be a random tensor of shape (4,1)
    Returns: 
    result -- runs the session for Y = WX + b 
    """
    
    np.random.seed(1)
    
    ### START CODE HERE ### (4 lines of code) #指定随机种子
    X = np.random.randn(3, 1)
    W = np.random.randn(4, 3)
    b = np.random.randn(4, 1)
    Y = tf.add(tf.matmul(W, X), b)
    ### END CODE HERE ### 
    
    # Create the session using tf.Session() and run it with sess.run(...) on the variable you want to calculate
    #创建一个session并运行它
    ### START CODE HERE ###
    sess = tf.Session()
    result = sess.run(Y)
    ### END CODE HERE ### 
    
    # close the session 
    sess.close()

    return result

运行一下

print( "result = " + str(linear_function()))

结果

result = [[-2.15657382]
 [ 2.95891446]
 [-1.08926781]
 [-0.84538042]]

1.2-计算sigmoid

好了，你已经实现了一个线性函数。TensorFlow提供了多种常用的NN实现函数，例如：tf.sigmoid 和 tf.softmax。现在让我们来实现一个输入的sigmoid函数。

在这里你要使用一个占位符变量x。运行session时候，利用feed字典传入输入z。所以，实现本例你需要
(i) 创建一个占位符 x
(ii) 使用tf.sigmoid定义操作计算sigmoid
(iii) 运行session

练习：实现sigmoid，你将使用到

tf.placeholder(tf.float32, name = “…”)
tf.sigmoid(…)
sess.run(…, feed_dict = {x: z})

注意，在 tensorflow中有2种方法创建和使用session

方法1

sess = tf.Session()
# Run the variables initialization (if needed), run the operations
result = sess.run(..., feed_dict = {...})
sess.close() # Close the session

方法2

with tf.Session() as sess: 
    # run the variables initialization (if needed), run the operations
    result = sess.run(..., feed_dict = {...})
    # This takes care of closing the session for you :)

计算sigmoid实现如下

# GRADED FUNCTION: sigmoid

def sigmoid(z):
    """
    Computes the sigmoid of z 使用sigmoid函数计算z
    
    Arguments:
    z -- input value, scalar or vector 输入值，标量或矢量
    
    Returns: 
    results -- the sigmoid of z 用sigmoid计算z的值
    """
    
    ### START CODE HERE ### ( approx. 4 lines of code)
    # Create a placeholder for x. Name it 'x'.
    #创建一个占位符x
    x = tf.placeholder(tf.float32, name="x")

    # compute sigmoid(x)#计算sigmoid(z)
    sigmoid = tf.sigmoid(x)

    # Create a session, and run it. Please use the method 2 explained above. 
    #创建一个会话，使用方法2
    # You should use a feed_dict to pass z's value to x. 
    with tf.Session() as sess: 
        # Run session and call the output "result"
        result = result = sess.run(sigmoid, feed_dict = {x: z})
    
    ### END CODE HERE ###
    
    return result

运行一下

print ("sigmoid(0) = " + str(sigmoid(0)))
print ("sigmoid(12) = " + str(sigmoid(12)))

结果

sigmoid(0) = 0.5
sigmoid(12) = 0.9999938

以上代码包含步骤：
1、创建占位符
2、根据你要计算的操作定义计算图
3、创建session
4、运行session

1.3-计算成本

你还可以利用内置函数来计算NN的成本。不再需要自己来写代码 $a^{[2](i)}$ 和 $y^{(i)}$ for i=1…m: $\frac{1}{m} \sum_{i = 1}^m \large ( \small y^{(i)} \log a^{ [2] (i)} + (1-y^{(i)})\log (1-a^{ [2] (i)} )\large )\small\tag{2}$

在TensorFlow中只需要一行命令。

tf.nn.sigmoid_cross_entropy_with_logits(logits = ...,  labels = ...)

你的代码需要输入z，计算sigmoid得到a，然后计算交叉熵成本 $J$ 。所有这些，只需要使用tf.nn.sigmoid_cross_entropy_with_logits。

计算成本实现如下

# GRADED FUNCTION: cost

def cost(logits, labels):
    """
    Computes the cost using the sigmoid cross entropy
    使用sigmoid交叉熵计算成本
    
    Arguments:
    logits -- vector containing z, output of the last linear unit (before the final sigmoid activation)
    labels -- vector of labels y (1 or 0) 
    
    Note: What we've been calling "z" and "y" in this class are respectively called "logits" and "labels" 
    in the TensorFlow documentation. So logits will feed into z, and labels into y. 
    
    Returns:
    cost -- runs the session of the cost (formula (2))
    """
    
    ### START CODE HERE ### 
    
    # Create the placeholders for "logits" (z) and "labels" (y) (approx. 2 lines)
    z = tf.placeholder(tf.float32, name="z")
    y = tf.placeholder(tf.float32, name="y")
    
    # Use the loss function (approx. 1 line)
    cost = tf.nn.sigmoid_cross_entropy_with_logits(logits=z, labels=y)
    
    # Create a session (approx. 1 line). See method 1 above.
    sess = tf.Session()
    
    # Run the session (approx. 1 line).
    cost = sess.run(cost, feed_dict={z: logits, y: labels})
    
    # Close the session (approx. 1 line). See method 1 above.
    sess.close()
    
    ### END CODE HERE ###
    
    return cost

运行一下

logits = sigmoid(np.array([0.2, 0.4, 0.7, 0.9]))
cost = cost(logits, np.array([0, 0, 1, 1]))
print ("cost = " + str(cost))

结果

cost = [1.0053872  1.0366408  0.41385433 0.39956617]

1.4-使用one-hot编码

很多时候在DL中有向量 $y$ 的范围是从0到 $C - 1$ ， $C$ 是分类的类别数量。
例如， $C = 4$ ，即有4个类别，那么你可能需要对 $y$ 做以下的转换方式：
在这里插入图片描述

这个被称为one-hot编码，因为在转换后的表示法中，每列的有一个特别元素是“hot”（意思是设置为1）。在numpy中实现这种转换，你可能需要写一些代码。但是在Tensorflow中，你只需要一行代码

tf.one_hot(labels, depth, axis)

练习：实现以下函数，送入一个向量标签和类别的数量，返回one-hot编码。使用tf.one_hot()来实现。

# GRADED FUNCTION: one_hot_matrix

def one_hot_matrix(labels, C):
    """
    创建一个矩阵，其中第i行对应第i个类别号，第j列对应第j个训练样本
    所以如果第j个样本对应着第i个标签，那么entry (i,j)将会是1

    Creates a matrix where the i-th row corresponds to the ith class number and the jth column
                     corresponds to the jth training example. So if example j had a label i. Then entry (i,j) 
                     will be 1. 
                     
    Arguments:
    labels -- vector containing the labels 
    C -- number of classes, the depth of the one hot dimension 分类类别数量
    
    Returns: 
    one_hot -- one hot matrix
    """
    
    ### START CODE HERE ###
    
    # Create a tf.constant equal to C (depth), name it 'C'. (approx. 1 line)
    #创建一个tf.constant，赋值为C，名字叫C
    C = tf.constant(C, name='C')
    
    # Use tf.one_hot, be careful with the axis (approx. 1 line)
    #使用tf.one_hot，注意一下axis
    one_hot_matrix = tf.one_hot(indices=labels, depth=C, axis=0)
    
    # Create the session (approx. 1 line) #创建一个session
    sess = tf.Session()
    
    # Run the session (approx. 1 line) 运行
    one_hot = sess.run(one_hot_matrix)
    
    # Close the session (approx. 1 line). See method 1 above.
    sess.close()
    
    ### END CODE HERE ###
    
    return one_hot

运行一下

labels = np.array([1,2,3,0,2,1])
one_hot = one_hot_matrix(labels, C=4)
print ("one_hot = " + str(one_hot))

结果

one_hot = [[0. 0. 0. 1. 0. 0.]
 [1. 0. 0. 0. 0. 1.]
 [0. 1. 0. 0. 1. 0.]
 [0. 0. 1. 0. 0. 0.]]

1.5-用0和1初始化

现在我们来学习一下用0和1初始化向量。使用的内置函数是tf.zeros(shape)和tf.ones(shape)。这2个函数，你送入维度，可以返回给你指定维度包含全部是0和1的数组。

实现代码如下

# GRADED FUNCTION: ones

def ones(shape):
    """
    Creates an array of ones of dimension shape
    创建一个维度为shape的数组，其值全为1
    
    Arguments:
    shape -- shape of the array you want to create 你要创建数组的维度
        
    Returns: 
    ones -- array containing only ones 全1数组 
    """
    
    ### START CODE HERE ###
    
    # Create "ones" tensor using tf.ones(...). (approx. 1 line)
    ones = tf.ones(shape)
    
    # Create the session (approx. 1 line)
    sess = tf.Session()
    
    # Run the session to compute 'ones' (approx. 1 line)
    ones = sess.run(ones)
    
    # Close the session (approx. 1 line). See method 1 above.
    sess.close()
    
    ### END CODE HERE ###
    return ones

运行一下

print ("ones = " + str(ones([3])))

结果

ones = [1. 1. 1.]

2.在Tensorflow中实现你的第一个NN

在本节中，你将使用Tensorflow来实现一个NN。记住，实现Tensorflow模型有2个步骤：

创建一个计算图
运行计算图

让我们深入研究你想解决的问题吧。

2.0-问题描述：SIGNS数据集

一天下午，我们和一些朋友决定教计算机辨认手语。我们花了几个小时在白墙前拍照，于是就有了以下的数据集。现在你的任务是建立一个算法，促进语言障碍者和不懂手语的人交流。

训练集：1080个图片（64x64像素）代表0-5数字的手势（每个数字180个图片）
测试集：120个图片（64x64像素）代表0-5数字的手势（每个数字20个图片）

注意这是SIGNS数据集的子集，完整的数据集包含更多的手势。

下面是每个数字的样本，以及如何表示标签的解释。这些都是原始图片，我们实际上用的是64 x 64像素的图片。

在这里插入图片描述

加载数据集

# Loading the dataset
X_train_orig, Y_train_orig, X_test_orig, Y_test_orig, classes = load_dataset()

更改下面的索引，可以看到数据集中的样本

# Example of a picture
index = 10
plt.imshow(X_train_orig[index])
plt.show()
print ("y = " + str(np.squeeze(Y_train_orig[:, index])))

运行结果

y = 2

在这里插入图片描述

通常你需要扁平化图像数据集，通过除255来归一化数据。在此基础上，你要把每个标签转化为one-hot向量。

实现代码如下

# Flatten the training and test images 扁平化训练和测试图像
X_train_flatten = X_train_orig.reshape(X_train_orig.shape[0], -1).T
X_test_flatten = X_test_orig.reshape(X_test_orig.shape[0], -1).T

# Normalize image vectors #归一化数据
X_train = X_train_flatten / 255.
X_test = X_test_flatten / 255.

# Convert training and test labels to one hot matrices 转换为one-hot矩阵
Y_train = convert_to_one_hot(Y_train_orig, 6)
Y_test = convert_to_one_hot(Y_test_orig, 6)

print("number of training examples = " + str(X_train.shape[1]))
print("number of test examples = " + str(X_test.shape[1]))
print("X_train shape: " + str(X_train.shape))
print("Y_train shape: " + str(Y_train.shape))
print("X_test shape: " + str(X_test.shape))
print("Y_test shape: " + str(Y_test.shape))

运行结果

number of training examples = 1080
number of test examples = 120
X_train shape: (12288, 1080)
Y_train shape: (6, 1080)
X_test shape: (12288, 120)
Y_test shape: (6, 120)

注意：12288=64x64x3，每个图像都是64x64，3个RGB通道。

你的目标是建立一个识别手势的算法，要求正确率高。要做到这点，你要建立一个Tensorflow模型，和之前用numpy来识别猫一样，只是最后输出使用softmax（因为结果不再是二分分类）。这也是numpy和Tenforflow对比的好机会。

模型结构是 LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SOFTMAX。sigmoid输出层被替换为softmax，因为分类超过2个。

2.1-创建占位符

你的第一个任务是创建占位符 X 和 Y。这个可以让你在运行session时候传入你的训练数据。

实现创建占位符代码如下

# GRADED FUNCTION: create_placeholders

def create_placeholders(n_x, n_y):
    """
    Creates the placeholders for the tensorflow session.
    为TensorFlow会话创建占位符
    
    Arguments:
    实数，图片向量的大小（64*64*3 = 12288）
    n_x -- scalar, size of an image vector (num_px * num_px = 64 * 64 * 3 = 12288)

    实数，分类数量（从0到5，所以n_y = 6）
    n_y -- scalar, number of classes (from 0 to 5, so -> 6)
    
    Returns:
    X -- placeholder for the data input, of shape [n_x, None] and dtype "float"
    Y -- placeholder for the input labels, of shape [n_y, None] and dtype "float"
    
    Tips:
    使用None，因为它让我们可以灵活处理占位符提供的样本数量。
    事实上，测试/训练期间的样本数量是不同的。
    - You will use None because it let's us be flexible on the number of examples you will for the placeholders.
      In fact, the number of examples during test/train is different.
    """

    ### START CODE HERE ### (approx. 2 lines)
    X = tf.placeholder(tf.float32, [n_x, None], name="X")
    Y = tf.placeholder(tf.float32, [n_y, None], name="Y")
    ### END CODE HERE ###
    
    return X, Y

运行一下

X, Y = create_placeholders(12288, 6)
print("X = " + str(X))
print("Y = " + str(Y))

结果如下

X = Tensor("X:0", shape=(12288, ?), dtype=float32)
Y = Tensor("Y:0", shape=(6, ?), dtype=float32)

2.2-初始化参数

第二步任务是初始化Tensorflow参数。

你可以使用Xavier初始化权重，而偏置初始化为0。以下以W1 和 b1 为例。

W1 = tf.get_variable("W1", [25,12288], initializer = tf.contrib.layers.xavier_initializer(seed = 1))
b1 = tf.get_variable("b1", [25,1], initializer = tf.zeros_initializer())

请使用seed = 1确保结果和本例演示的一致。

实现代码如下

# GRADED FUNCTION: initialize_parameters

def initialize_parameters():
    """
    初始化NN的参数，参数的维度如下：
    Initializes parameters to build a neural network with tensorflow. The shapes are:
                        W1 : [25, 12288]
                        b1 : [25, 1]
                        W2 : [12, 25]
                        b2 : [12, 1]
                        W3 : [6, 12]
                        b3 : [6, 1]
    
    Returns:
    parameters -- a dictionary of tensors containing W1, b1, W2, b2, W3, b3
    """
    
    tf.set_random_seed(1)    #指定随机种子# so that your "random" numbers match ours
        
    ### START CODE HERE ### (approx. 6 lines of code)
    W1 = tf.get_variable("W1", [25, 12288], initializer = tf.contrib.layers.xavier_initializer(seed=1))
    b1 = tf.get_variable("b1", [25, 1], initializer = tf.zeros_initializer())
    W2 = tf.get_variable("W2", [12, 25], initializer = tf.contrib.layers.xavier_initializer(seed=1))
    b2 = tf.get_variable("b2", [12, 1], initializer = tf.zeros_initializer())
    W3 = tf.get_variable("W3", [6, 12], initializer = tf.contrib.layers.xavier_initializer(seed=1))
    b3 = tf.get_variable("b3", [6, 1], initializer = tf.zeros_initializer())
    ### END CODE HERE ###

    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2,
                  "W3": W3,
                  "b3": b3}
    
    return parameters

测试一下

tf.reset_default_graph()
with tf.Session() as sess:
    parameters = initialize_parameters()
    print("W1 = " + str(parameters["W1"]))
    print("b1 = " + str(parameters["b1"]))
    print("W2 = " + str(parameters["W2"]))
    print("b2 = " + str(parameters["b2"]))

运行结果

W1 = <tf.Variable 'W1:0' shape=(25, 12288) dtype=float32_ref>
b1 = <tf.Variable 'b1:0' shape=(25, 1) dtype=float32_ref>
W2 = <tf.Variable 'W2:0' shape=(12, 25) dtype=float32_ref>
b2 = <tf.Variable 'b2:0' shape=(12, 1) dtype=float32_ref>

正如希望的那样，参数尚未评估。

2.3-Tensorflow中的前向传播

Tensorflow中前向传播函数将接受一个字典参数传入，用到的内置函数包括

tf.add(…,…) 加法
tf.matmul(…,…) 矩阵乘法
tf.nn.relu(…) ReLU激活

我们建议你拿numpy与TensorFlow实现NN的代码作比较。最重要的是前向传播要在Z3处停止，因为在TensorFlow中，最后的线性输出层的输出作为计算损失函数的输入，所以不需要A3。

实现代码

# GRADED FUNCTION: forward_propagation

def forward_propagation(X, parameters):
    """
    实现前向传播
    模型为LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SOFTMAX
    Implements the forward propagation for the model: LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SOFTMAX
    
    Arguments:
    输入数据的占位符，维度为（输入节点数量，样本数量）
    X -- input dataset placeholder, of shape (input size, number of examples)
    parameters -- python dictionary containing your parameters "W1", "b1", "W2", "b2", "W3", "b3"
                  the shapes are given in initialize_parameters

    Returns:
    Z3 -- the output of the last LINEAR unit 最后一个LINEAR节点的输出
    """
    
    # Retrieve the parameters from the dictionary "parameters" 
    W1 = parameters['W1']
    b1 = parameters['b1']
    W2 = parameters['W2']
    b2 = parameters['b2']
    W3 = parameters['W3']
    b3 = parameters['b3']
    
    ### START CODE HERE ### (approx. 5 lines)              # Numpy Equivalents:
    Z1 = tf.add(tf.matmul(W1, X), b1)                      # Z1 = np.dot(W1, X) + b1
    A1 = tf.nn.relu(Z1)                                    # A1 = relu(Z1)
    Z2 = tf.add(tf.matmul(W2, A1), b2)                     # Z2 = np.dot(W2, a1) + b2
    A2 = tf.nn.relu(Z2)                                    # A2 = relu(Z2)
    Z3 = tf.add(tf.matmul(W3, A2), b3)                     # Z3 = np.dot(W3,Z2) + b3
    ### END CODE HERE ###
    
    return Z3

测试一下

tf.reset_default_graph()

with tf.Session() as sess:
    X, Y = create_placeholders(12288, 6)
    parameters = initialize_parameters()
    Z3 = forward_propagation(X, parameters)
    print("Z3 = " + str(Z3))

运行结果

Z3 = Tensor("Add_2:0", shape=(6, ?), dtype=float32)

你可能注意到了，前向传播没有输出任何cache（保存参数用于反向传播）。在下面实现发现传播时候，你就明白原因了。

2.4-计算成本

如上面所述，计算成本很容易，代码如下

tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = ..., labels = ...))

tf.nn.softmax_cross_entropy_with_logits的输入"logits" 和 "labels"的形状为(样本的数量，分类的数量)。我们为你转置了Z3 和 Y。
此外，tf.reduce_mean基本上就是样本汇总

实现代码

# GRADED FUNCTION: compute_cost 

def compute_cost(Z3, Y):
    """
    Computes the cost 计算成本
    
    Arguments:
    前向传播最后一个线性单元的输出，形状（6，样本数量）
    Z3 -- output of forward propagation (output of the last LINEAR unit), of shape (6, number of examples)
    
    标签向量，一个占位符，和Z3的维度相同
    Y -- "true" labels vector placeholder, same shape as Z3
    
    Returns:
    cost - Tensor of the cost function
    """
    
    # to fit the tensorflow requirement for tf.nn.softmax_cross_entropy_with_logits(...,...)
    #转置，配合tensorflow需要
    logits = tf.transpose(Z3)
    labels = tf.transpose(Y)
    
    ### START CODE HERE ### (1 line of code)
    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels))
    ### END CODE HERE ###
    
    return cost

测试一下

tf.reset_default_graph()

with tf.Session() as sess:
    X, Y = create_placeholders(12288, 6)
    parameters = initialize_parameters()
    Z3 = forward_propagation(X, parameters)
    cost = compute_cost(Z3, Y)
    print("cost = " + str(cost))

运行结果

cost = Tensor("Mean:0", shape=(), dtype=float32)

2.5-反向传播和参数更新

这里你需要感谢程序框架。所有的反向传播和参数更新只需要1行代码就可以了。很容易把它加入到模型中去。

计算成本之后。你创建一个"optimizer"对象。运行tf.session时，你必须将此对象与成本一起调用。当调用时，它将使用所选的方法和学习率对给定的成本进行优化。

例如，对于梯度下降

optimizer = tf.train.GradientDescentOptimizer(learning_rate = learning_rate).minimize(cost)

要进行优化，代码为

_ , c = sess.run([optimizer, cost], feed_dict={X: minibatch_X, Y: minibatch_Y})

注意：编写代码时候，我们通常使用 _ 作为一次性变量来存储我们稍后不需要使用的值。这里， _有我们不需要的optimizer变量的评估值（并且c取值为成本变量的值）。

2.6-构筑模型

现在可以把前面完成的内容合并在一起来实现整个模型了。

def model(X_train, Y_train, X_test, Y_test, learning_rate = 0.0001,
          num_epochs = 1500, minibatch_size = 32, print_cost = True):
    """
    实现一个3层tensorflow NN
    Implements a three-layer tensorflow neural network: LINEAR->RELU->LINEAR->RELU->LINEAR->SOFTMAX.
    
    Arguments:
    X_train -- training set, of shape (input size = 12288, number of training examples = 1080)
    Y_train -- test set, of shape (output size = 6, number of training examples = 1080)
    X_test -- training set, of shape (input size = 12288, number of training examples = 120)
    Y_test -- test set, of shape (output size = 6, number of test examples = 120)
    learning_rate -- learning rate of the optimization
    num_epochs -- number of epochs of the optimization loop
    minibatch_size -- size of a minibatch
    print_cost -- True to print the cost every 100 epochs
    
    Returns:学习好的参数，可以用于预测
    parameters -- parameters learnt by the model. They can then be used to predict.
    """
    
    #能够重新运行模型而不覆盖tf变量
    ops.reset_default_graph()                         # to be able to rerun the model without overwriting tf variables
    
    tf.set_random_seed(1)                             # to keep consistent results
    seed = 3                                          # to keep consistent results
   
    #获取输入节点数量和样本数
    (n_x, m) = X_train.shape                          # (n_x: input size, m : number of examples in the train set)
    
    #获取输出节点数量
    n_y = Y_train.shape[0]                            # n_y : output size

    #成本集
    costs = []                                        # To keep track of the cost
    
    # Create Placeholders of shape (n_x, n_y)
    #给X和Y创建placeholder
    ### START CODE HERE ### (1 line)
    X, Y = create_placeholders(n_x, n_y)
    ### END CODE HERE ###

    # Initialize parameters #初始化参数
    ### START CODE HERE ### (1 line)
    parameters = initialize_parameters()
    ### END CODE HERE ###
    
    # Forward propagation: Build the forward propagation in the tensorflow graph
    #前向传播
    ### START CODE HERE ### (1 line)
    Z3 = forward_propagation(X, parameters)
    ### END CODE HERE ###
    
    # Cost function: Add cost function to tensorflow graph #计算成本
    ### START CODE HERE ### (1 line)
    cost = compute_cost(Z3, Y)
    ### END CODE HERE ###
    
    # Backpropagation: Define the tensorflow optimizer. Use an AdamOptimizer.
    #反向传播，使用Adam优化
    ### START CODE HERE ### (1 line)
    optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
    ### END CODE HERE ###
    
    # Initialize all the variables#初始化所有的变量
    init = tf.global_variables_initializer()

    # Start the session to compute the tensorflow graph#开始会话并计算
    with tf.Session() as sess:
        
        # Run the initialization#初始化
        sess.run(init)
        
        # Do the training loop #训练循环
        for epoch in range(num_epochs):

            epoch_cost = 0.                       # Defines a cost related to an epoch
            num_minibatches = int(m / minibatch_size) # number of minibatches of size minibatch_size in the train set
            seed = seed + 1
            minibatches = random_mini_batches(X_train, Y_train, minibatch_size, seed)

            for minibatch in minibatches:

                # Select a minibatch 选择一个minibatch
                (minibatch_X, minibatch_Y) = minibatch
                
                # IMPORTANT: The line that runs the graph on a minibatch.
                # Run the session to execute the "optimizer" and the "cost", the feedict should contain a minibatch for (X,Y).
                #数据已经准备好了，开始运行session
                ### START CODE HERE ### (1 line)
                _ , minibatch_cost = sess.run([optimizer, cost], feed_dict={X: minibatch_X, Y: minibatch_Y})
                ### END CODE HERE ###
                
               #计算这个minibatch在这一代中所占的误差
                epoch_cost += minibatch_cost / num_minibatches

            # Print the cost every epoch
            if print_cost == True and epoch % 100 == 0:
                print ("Cost after epoch %i: %f" % (epoch, epoch_cost))
            if print_cost == True and epoch % 5 == 0:
                costs.append(epoch_cost)
                
        # plot the cost 绘图
        plt.plot(np.squeeze(costs))
        plt.ylabel('cost')
        plt.xlabel('iterations (per tens)')
        plt.title("Learning rate =" + str(learning_rate))
        plt.show()

        # lets save the parameters in a variable 保存学习后的参数
        parameters = sess.run(parameters)
        print("Parameters have been trained!")

        # Calculate the correct predictions 计算当前的预测结果
        correct_prediction = tf.equal(tf.argmax(Z3), tf.argmax(Y))

        # Calculate accuracy on the test set 计算准确率
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))

        print("Train Accuracy:", accuracy.eval({X: X_train, Y: Y_train}))
        print("Test Accuracy:", accuracy.eval({X: X_test, Y: Y_test}))
        
        return parameters

现在可以训练模型了。

parameters = model(X_train, Y_train, X_test, Y_test)

注意，运行需要很多时间。所以第一个100个迭代后，结果应该是1.016458。如果不是，请中断运行，修改正确代码后再运行吧。
运行结果

Cost after epoch 0: 1.855702
Cost after epoch 100: 1.016458
Cost after epoch 200: 0.733102
Cost after epoch 300: 0.572939
Cost after epoch 400: 0.468774
Cost after epoch 500: 0.381021
Cost after epoch 600: 0.313827
Cost after epoch 700: 0.254280
Cost after epoch 800: 0.203799
Cost after epoch 900: 0.166512
Cost after epoch 1000: 0.140937
Cost after epoch 1100: 0.107750
Cost after epoch 1200: 0.086299
Cost after epoch 1300: 0.060949
Cost after epoch 1400: 0.050934
Parameters have been trained!
Train Accuracy: 0.9990741
Test Accuracy: 0.725

识别0-5数字手势的测试正确率是72.5%

成本曲线图
在这里插入图片描述

说明：

你的模型看上去足够大以适应训练集。但是，对比训练集和测试集的正确率，你可以考虑加入L2或者dropout正则来减少过拟合。
将session视为一块代码来训练模型。在每个minibatch上运行会话时，它都会训练我们的参数。总的来说，你已经运行了很多次（1500epochs），直到你获得训练好的参数。

2.7-测试你自己的图像(选做/进阶练习)

恭喜你已经完成了这个练习。
你可以利用自己的图片来测试一下你的模型。

注意：运行原文的代码会报错，具体如下。主要原因就是 scipy的版本变化后，函数也有变化。

AttributeError: module ‘scipy.ndimage’ has no attribute ‘imread’

查找原因，https://stackoverflow.com/questions/15345790/scipy-misc-module-has-no-attribute-imread
解释和解决办法如下

imread is deprecated in SciPy 1.0.0, and will be removed in 1.2.0. Use imageio.imread instead.

查看一下我安装的版本

c:\>pip show scipy
Name: scipy
Version: 1.4.1
Summary: SciPy: Scientific Library for Python
Home-page: https://www.scipy.org
Author: None
Author-email: None
License: BSD
Location: c:\users\aaa\anaconda3\envs\tensorflow\lib\site-packages
Requires: numpy

scipy版本确实太高了。
修改后运行，继续报错

module 'scipy' has no attribute 'misc'

查找原因，https://stackoverflow.com/questions/56204985/how-to-fix-scipy-misc-has-no-attribute-imresize
解释和解决办法如下

imresize is deprecated! imresize is deprecated in SciPy 1.0.0, and will be removed in 1.3.0. Use Pillow instead: numpy.array(Image.fromarray(arr).resize()).

也是scipy的版本问题。

修改后的代码

import imageio
import scipy
from PIL import Image
from scipy import ndimage

## START CODE HERE ## (PUT YOUR IMAGE NAME) 
my_image = "3.jpg"
## END CODE HERE ##

# We preprocess your image to fit your algorithm.
fname = "images/" + my_image
#image = np.array(ndimage.imread(fname, flatten=False))
image = np.array(imageio.imread(fname))
#my_image = scipy.misc.imresize(image, size=(64, 64)).reshape((1, 64 * 64 * 3)).T
my_image = np.array(Image.fromarray(image).resize((64, 64),Image.ANTIALIAS)) #ANTIALIAS表示保留图片所有像素，不丢失原有像素
my_image = my_image.reshape((1, 64 * 64 * 3)).T

my_image_prediction = predict(my_image, parameters)

plt.imshow(image)
plt.show()
print("Your algorithm predicts: y = " + str(np.squeeze(my_image_prediction)))