【深度学习】吴恩达深度学习-Course2改善深层神经网络：超参数调试、正则化以及优化-第三周超参数调试、Batch正则化和程序框架编程

最新推荐文章于 2024-07-19 15:44:38 发布

passer__jw767

最新推荐文章于 2024-07-19 15:44:38 发布

阅读量1.1k

点赞数 1

分类专栏：深度学习文章标签：深度学习神经网络 batch

本文链接：https://blog.csdn.net/passer__jw767/article/details/123035930

版权

深度学习专栏收录该内容

20 篇文章 14 订阅

订阅专栏

（本篇文章使用TF2.0进行编程）

视频链接：【中英字幕】吴恩达深度学习课程第二课 — 改善深层神经网络：超参数调试、正则化以及优化
参考链接：

〇、作业目标

欢迎来到本周的编程任务。直到现在，你已经使用过numpy建立神经网络了。现在我们将会一步步引导你使用深度学习框架来更容易地建立起神经网络。机器学习框架如Tensorflow、PaddlePaddle、Torch、Caffe、Keras以及其他框架都能够很大程度上加速你机器学习的速率。所有的这些框架都有其对应的文档，你可以随意阅读。在这篇文章中，你将学习到以下的Tensorflow内容：

初始化变量
开始你自己的session
训练算法
完善一个神经网络

编程框架不仅能够缩短你的编程时间，而且有时候还会提速你的代码。
在这之前，请下载一下这份编程所需的资料：【点击下载】，提取码：dvrc。这个链接是我从参考链接1拿过来的，如果提取码有误你可以点开参考链接一看一看提取码/地址是否已经更新了。

一、导入Tensorflow库

开始，你需要引入库：

import math
import numpy as np
import h5py
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.python.framework import ops
from tf_utils import load_dataset, random_mini_batches, convert_to_one_hot, predict

np.random.seed(1)
tf.compat.v1.disable_eager_execution()  # 保证session.run()能够正常运行

我的环境使用的是PyCharm和Anaconda。这两个可以直接从网上下载下来的。至于安装math、numpy、h5py、matplotlib的方法可以参考这篇文章：【深度学习】吴恩达深度学习-Course1神经网络与深度学习-第二周神经网络基础编程，这里就简单说一下安装tensorflow的方法（注意后面的注释在输入指令时候可别一起放上去啊。。）：

conda search tensorflow	# 搜索tensorflow所有版本
conda install tensorflow # 默认给你安装最新版本的tensorflow，我安装时使用此条指令，故安装的是tf2.0
conda install tensorflow=x.x	#给你安装x.x版本的tensorflow

如果上边的代码没有报错，说明我们已经成功导入库了。本篇文章将带你过一遍tf的各种应用方式。你将从下边的例子开始。我们为你计算了一次训练的损失：

$loss=L(\hat{y},y)=(\hat{y}^{(i)}-y^{(i)})^2$

# Example1
y_hat = tf.constant(36, name='y_hat')            # Define y_hat constant. Set to 36.  
y = tf.constant(39, name='y')                    # Define y. Set to 39  
  
loss = tf.Variable((y - y_hat)**2, name='loss')  # Create a variable for the loss  
  
init = tf.compat.v1.global_variables_initializer()         # When init is run later (session.run(init)),  
 # the loss variable will be initialized and ready to be computedwith tf.compat.v1.Session () as session:                    # Create a session and print the output  
  session.run(init)                            # Initializes the variables  
  print(session.run(loss))                     # Prints the loss

运行后结果如下：

写和运行Tensorflow代码有以下步骤：

创建尚未执行/计算的变量。
实现Tensorflow变量之间的操作定义。
初始化你的Tensorflow变量。
创建一个Session。
运行Session，它将运行你上边写好的步骤。

因此，当我们为loss创建变量时，我们只是将loss定义为其他数量的函数，但没有评估其价值。为了评估它，我们需要运行init = tf.compat.v1.global_variables_initializer()。初始化loss变量，在最后我们能够估计loss的值并打印出来。

现在让我们看一个简单的例子，跑一下下面的代码：

# Example2 
a = tf.constant(2)
b = tf.constant(10)
c = tf.multiply(a,b)
print(c)

得到的结果如下：Tensor(“Mul:0”, shape=(), dtype=int32)

Tensor("Mul:0", shape=(), dtype=int32)

正如预期所想的，你将不会看到20！不过我们得到了一个Tensor类型的变量，没有维度，数字类型为int32。你做的所有都是将这些东西放入了计算图中。而我们还没有开始运行这个计算图。为了真正地将这两个数乘起来，你需要创建一个session并将它跑起来：

Tensor("Mul:0", shape=(), dtype=int32)

得到的结果如下：

很棒！总而言之，记得初始化你的变量，创建一个session，并且在session里边运行那些操作。

接下来，你需要知道placeholders（占位符）。一个placeholder就是一个你可以在之后赋值的对象。为了给placeholders赋值，你可以通过使用“feed字典”（变量feed_dict）来传值。接下来，我们为x创建一个placehoder。这允许我们在运行session的时候再给它赋值。

# Example3  
x = tf.compat.v1.placeholder(tf.int64,name="x")  
sess = tf.compat.v1.Session ()  
print(sess.run(2 * x,feed_dict={x:3}))  
sess.close()

得到结果如下：

当你第一次定义x时，你并没有给它赋值。一个placeholder就是一个你会在接下来在运行session时才调整其值的简单变量。我们通常会说你在运行session时给这些placehoders赋值。

这里发生的事情如下：当你指定计算所需的操作时，你正在告诉TensorFlow如何构造计算图。计算图可以有一些占位符，这些占位符的值将在以后指定。最后，在运行会话时，您告诉TensorFlow执行计算图。

1.1 线性函数

让我们开始编程练习，在这次编程练习中我们要计算以下等式： $Y = W X + b$ ，其中 $W$ 和 $X$ 是随机矩阵，b是随机变量。
练习：计算 $W X + b$ 。 $W$ ， $X$ 和 $b$ 都是随机从正态分布中提取的。 $W$ 的维度是(4,3)， $X$ 的维度是(3,1)且 $b$ 的维度是(4,1)。作为一个例子，这里有你如何定义维度为(3,1)的常数 $X$ ：
x = tf.constant(np.random.randn(3,1), name = "x" )
同时，下边的函数可能会对你有帮助：

tf,matmul(..., ...)矩阵乘法
tf.add(..., ...)用于相加
np.random.randn(...)用于随机初始化

那么，完成以下函数吧，记得指定随机种子np.random.seed(1)：

def linear_function():
    """
 Implements a linear function: 
 Initializes W to be a random tensor of shape (4,3)
 Initializes X to be a random tensor of shape (3,1)
 Initializes b to be a random tensor of shape (4,1)
 Returns: 
 result -- runs the session for Y = WX + b 
 """

完成后应该如下：

# Exercise1
def linear_function():
    """
    Implements a linear function:
            Initializes W to be a random tensor of shape (4,3)
            Initializes X to be a random tensor of shape (3,1)
            Initializes b to be a random tensor of shape (4,1)
    Returns:
    result -- runs the session for Y = WX + b
    """
    np.random.seed(1)

    X = np.random.randn(3, 1)
    W = np.random.randn(4, 3)
    b = np.random.randn(4, 1)
    Y = tf.add(tf.matmul(W, X), b)     #或者直接tf.matmul(W, X)+b
    # 创建session
    sess = tf.compat.v1.Session()
    result = sess.run(Y)
    # 使用结束，关闭session
    sess.close()

    return result

使用以下代码测试一下：

# 测试linear_function()
print( "result = " + str(linear_function()))

测试的结果如下（注意：我这里初始化的顺序是X，W，b。如果你的初始化顺序与我不同，则结果也会有所差异，你可以调转你的初始化顺序用以查看结果）：

result = [[-2.15657382]
 [ 2.95891446]
 [-1.08926781]
 [-0.84538042]]

1.2 计算sigmoid

很好！你刚才完善了一个线性函数。Tensorflow提供了许多常用的神经网络函数，比如tf.sigmoid和tf.softmax。对于这个联系，让我们计算sigmoid函数。
你将联系使用placeholder变量x。在跑session的时候，你需要使用feed字典将值z赋给变量x。在这个联系中，你将会(i)创建placeholderx，(ii)定义使用tf.sigmoid所需要的操作，然后(iii)运行session。
练习：完善接下来的sigmoid函数，你需要使用以下的代码：

tf.placeholder(tf.float32, name = "...")
tf.sigmoid(...)
sess.run(..., feed_dict = {x: z})

注意在Tensorflow中有两种典型的方式创建和使用sessions：
Method1：

sess = tf.compat.v1.Session()
# Run the variables initialization (if needed), run the operations
result = sess.run(..., feed_dict = {...})
sess.close() # Close the session

Method2：

with tf.compat.v1.Session () as session:
    # run the variables initialization (if needed), run the operations
    result = sess.run(..., feed_dict = {...})
    # This takes care of closing the session for you :)

完成以下函数：

def sigmoid(z):
    """
    Computes the sigmoid of z
    
    Arguments:
    z -- input value, scalar or vector
    
    Returns: 
    results -- the sigmoid of z
    """

完成后如下：

# Exercise2
def sigmoid(z):
    """
    Computes the sigmoid of z

    Arguments:
    z -- input value, scalar or vector

    Returns:
    results -- the sigmoid of z
    """
    x = tf.compat.v1.placeholder(tf.float32, name = "x")

    sess = tf.compat.v1.Session()
    result = sess.run(tf.sigmoid(x), feed_dict = {x: z})
    sess.close()

    return result

可以用以下代码测试一下：

print ("sigmoid(0) = " + str(sigmoid(0)))
print ("sigmoid(12) = " + str(sigmoid(12)))

得到的结果应为：

sigmoid(0) = 0.5
sigmoid(12) = 0.9999939

总而言之，通过这里你可以知道：

创建placeholders（占位符）
指定与要计算的操作对应的计算图
创建session
运行session，如果需要给placeholder变量传值，可以使用feed字典

1.3 计算成本

你也可以建立一个用于计算你神经网络成本的函数。之前，我们需要将成本作为一个函数来编码进行计算，对于 $i = 1 . . . m$ 的 $a^{[2](i)}$ 和 $y^{(i)}$ ：

$J=-\frac{1}{m}\sum_{i=1}^{m}(y^{(i)}loga^{[2](i)}+(1-y^{(i)})log(1-a^{[2](i)}))$

但在Tensorflow中，你只需要写一行代码！
练习：完善叉乘损失。你将会用到的函数如下：

tf.nn.sigmoid_cross_entropy_with_logits(logits = ..., labels = ...)

你的代码需要输入z，计算sigmoid（来得到a）并且计算叉乘成本J。所有的这些可以调用一个函数tf.nn.sigmoid_cross_entropy_with_logits，从而计算：

$-\frac{1}{m}\sum_{i=1}^{m}(y^{(i)}loga^{[2](i)}+(1-y^{(i)})log(1-a^{[2](i)}))$

完成以下函数：

def cost(logits, labels):
    """
    Computes the cost using the sigmoid cross entropy
    
    Arguments:
    logits -- vector containing z, output of the last linear unit (before the final sigmoid activation)
    labels -- vector of labels y (1 or 0) 
    
    Note: What we've been calling "z" and "y" in this class are respectively called "logits" and "labels" 
    in the TensorFlow documentation. So logits will feed into z, and labels into y. 
    
    Returns:
    cost -- runs the session of the cost (formula (2))
    """

完成后如下：

def cost(logits, labels):
    """
    Computes the cost using the sigmoid cross entropy

    Arguments:
    logits -- vector containing z, output of the last linear unit (before the final sigmoid activation)
    labels -- vector of labels y (1 or 0)

    Note: What we've been calling "z" and "y" in this class are respectively called "logits" and "labels"
    in the TensorFlow documentation. So logits will feed into z, and labels into y.

    Returns:
    cost -- runs the session of the cost (formula (2))
    """
    z = tf.compat.v1.placeholder(tf.float32, name = "z")
    y = tf.compat.v1.placeholder(tf.float32, name = "y")

    cost = tf.nn.sigmoid_cross_entropy_with_logits(logits = z, labels = y)
    sess = tf.compat.v1.Session()
    cost = sess.run(cost, feed_dict={z: logits, y: labels})
    sess.close()

    return cost

使用以下代码测试你的函数：

# 测试cost(logits, labels)
logits = sigmoid(np.array([0.2, 0.4, 0.7, 0.9]))
cost = cost(logits, np.array([0, 0, 1, 1]))
print ("cost = " + str(cost))

得到的结果如下：

cost = [1.0053873  1.0366408  0.41385436 0.39956617]

1.4 使用一个独热编码

很多时候在深度学习中y向量的维度是从0到C-1的。其中C是分类的类别的数量。举个例子C=4，你可能有下面的y向量，你需要的转换如下：
在这里插入图片描述
这叫做“独热”编码，因为在转换后的表示中，每列中只有一个元素是“热的”（意思是设置为1）。要在numpy中进行这种转换，可能需要编写几行代码，在tensorflow中，可以使用一行代码：

tf.one_hot(labels, depth, axis)

练习： 完善下边的函数，获取一个标签向量和类的总数C，并返回一个热编码。使用tf.one_hot()来做。

def one_hot_matrix(labels, C):
    """
    Creates a matrix where the i-th row corresponds to the ith class number and the jth column
                     corresponds to the jth training example. So if example j had a label i. Then entry (i,j) 
                     will be 1. 
                     
    Arguments:
    labels -- vector containing the labels 
    C -- number of classes, the depth of the one hot dimension
    
    Returns: 
    one_hot -- one hot matrix
    """

写完后如下：

def one_hot_matrix(labels, C):
    """
    Creates a matrix where the i-th row corresponds to the ith class number and the jth column
                     corresponds to the jth training example. So if example j had a label i. Then entry (i,j)
                     will be 1.

    Arguments:
    labels -- vector containing the labels
    C -- number of classes, the depth of the one hot dimension

    Returns:
    one_hot -- one hot matrix
    """
    C = tf.constant(C, name="C")

    one_hot_matrix = tf.one_hot(indices=labels, depth=C, axis=0)

    sess = tf.compat.v1.Session()
    one_hot = sess.run(one_hot_matrix)
    sess.close()

    return one_hot

用以下代码测试一下你写的函数：

# 测试one_hot_matrix(labels, C)
labels = np.array([1,2,3,0,2,1])
one_hot = one_hot_matrix(labels, C=4)
print ("one_hot = " + str(one_hot))

得到结果如下：

one_hot = [[0. 0. 0. 1. 0. 0.]
 [1. 0. 0. 0. 0. 1.]
 [0. 1. 0. 0. 1. 0.]
 [0. 0. 1. 0. 0. 0.]]

1.5 初始化0和1

现在你将学习如何初始化向量0 和1。你将会调用的函数是tf.ones()。为了初始化为0，你可以使用tf.zeros()来替代。给定这些函数一个维度值那么它们将会返回全是1或0的满足条件的向量/矩阵。
练习：完善下边的函数，返回一个数组（与shape有相同的维度，并且返回的数组中全为1）

tf.ones(shape)

def ones(shape):
    """
    Creates an array of ones of dimension shape
    
    Arguments:
    shape -- shape of the array you want to create
        
    Returns: 
    ones -- array containing only ones
    """

完成后如下：

# Exercise5
def ones(shape):
    """
    Creates an array of ones of dimension shape

    Arguments:
    shape -- shape of the array you want to create

    Returns:
    ones -- array containing only ones
    """
    ones = tf.ones(shape)

    sess = tf.compat.v1.Session()
    ones = sess.run(ones)
    sess.close()

    return ones

可以使用以下代码测试一下你的函数：

# 测试ones(shape)
print ("ones = " + str(ones([3])))

得到的结果如下：

ones = [1. 1. 1.]

二、在tensorflow中建立第一个神经网络

在这一部分的任务中，你将使用tensorflow建立一个神经网络。记住完成一个tensorflow模型一共有两个步骤：

创建一个计算图
运行计算图

让我们深入研究一下你想解决的问题吧！

2.0 问题描述：SIGNS 数据集

一个下午，你和你的一些朋友决定教你们的计算机如何破译手语。我们花费了几个小时来在一面白墙前拍照并构成了下边的输几局。现在，你的工作是建立一种算法，以促进言语障碍者与不懂手语的人之间的交流。

训练集：1080张图（64*64像素），代表了0到5之间的数字（每个数字180张图）
测试集：120张图（64*64像素），代表了0到5之间的数字（每个数字20张图）

注意，这是SIGNS数据集的子集。完整的数据集包括了更多的手势。
这里有每一个数字的例子，以及如何解释我们如何代表标签。这里有几张源图片。
在这里插入图片描述
图1：SIGNS数据集
跑一下以下的代码来加载数据集。

# Loading the dataset
X_train_orig, Y_train_orig, X_test_orig, Y_test_orig, classes = load_dataset()

改变下边的index并且跑一下代码来看一些在数据集中的例子。

# Example of a picture
index = 0
plt.imshow(X_train_orig[index])
print ("y = " + str(np.squeeze(Y_train_orig[:, index])))
plt.show()

此段代码跑出来的结果如下

y = 5

在这里插入图片描述
像之前一样，你将图片数据集展平，然后除以255进行归一化。除此之外，你还需要将每一个标签转化为一个“独热”向量，如图一中所示。运行下边的代码以进行此操作。

X_train_flatten = X_train_orig.reshape(X_train_orig.shape[0], -1).T
X_test_flatten = X_test_orig.reshape(X_test_orig.shape[0], -1).T
# Normalize image vectors
X_train = X_train_flatten / 255.
X_test = X_test_flatten / 255.
# Convert training and test labels to one hot matrices
Y_train = convert_to_one_hot(Y_train_orig, 6)
Y_test = convert_to_one_hot(Y_test_orig, 6)

print("number of training examples = " + str(X_train.shape[1]))
print("number of test examples = " + str(X_test.shape[1]))
print("X_train shape: " + str(X_train.shape))
print("Y_train shape: " + str(Y_train.shape))
print("X_test shape: " + str(X_test.shape))
print("Y_test shape: " + str(Y_test.shape))

得到结果为：

number of training examples = 1080
number of test examples = 120
X_train shape: (12288, 1080)
Y_train shape: (6, 1080)
X_test shape: (12288, 120)
Y_test shape: (6, 120)

笔记：上边的12288来自 $64 \times 64 \times 3$ 。每一张图片是一个正方形图片，为64乘以64像素，3代表了RGB的3个颜色。请确定所有的这些维度你已经完全了解后再进行下一步。

目标：建立一个具有高精准度的算法，适用于识别手势。为了做到这一点，你要使用tensorflow模型来建立，这和你之前使用numpy建立猫识别神经网络相同（但是现在使用softmax来输出）。这是一个好的几乎来比较numpy和tensorflow。

模型：模型是 $L I N E A R \to R E L U \to L I N E A R \to R E L U \to L I N E A R \to S O F T M A X$ 。SIGMOID输出层已经被转换为SOFTMAX了。一个SOFTMAX层包括SIGMOID当有超过两个类的时候。

2.1 创建placeholders

你第一个任务就是为X和Y创建placeholders。在接下来运行你的session时你将向其中传入你的训练数据

练习：完善下边的函数，在tensorflow中创建placeholders。

def create_placeholders(n_x, n_y):
    """
    Creates the placeholders for the tensorflow session.

    Arguments:
    n_x -- scalar, size of an image vector (num_px * num_px = 64 * 64 * 3 = 12288)
    n_y -- scalar, number of classes (from 0 to 5, so -> 6)

    Returns:
    X -- placeholder for the data input, of shape [n_x, None] and dtype "float"
    Y -- placeholder for the input labels, of shape [n_y, None] and dtype "float"

    Tips:
    - You will use None because it let's us be flexible on the number of examples you will for the placeholders.
      In fact, the number of examples during test/train is different.
    """

完成后如下：

def create_placeholders(n_x, n_y):
    """
    Creates the placeholders for the tensorflow session.

    Arguments:
    n_x -- scalar, size of an image vector (num_px * num_px = 64 * 64 * 3 = 12288)
    n_y -- scalar, number of classes (from 0 to 5, so -> 6)

    Returns:
    X -- placeholder for the data input, of shape [n_x, None] and dtype "float"
    Y -- placeholder for the input labels, of shape [n_y, None] and dtype "float"

    Tips:
    - You will use None because it let's us be flexible on the number of examples you will for the placeholders.
      In fact, the number of examples during test/train is different.
    """
    X = tf.compat.v1.placeholder(tf.float32, [n_x, None] ,name = "x")
    Y = tf.compat.v1.placeholder(tf.float32, [n_y, None],name = "y")

    return X,Y

使用以下代码测试一下你的函数吧：

# 测试create_placeholders(n_x, n_y)
X, Y = create_placeholders(12288, 6)
print("X = " + str(X))
print("Y = " + str(Y))

得到的结果应为：

X = Tensor("x:0", shape=(12288, None), dtype=float32)
Y = Tensor("y:0", shape=(6, None), dtype=float32)

2.2 初始化参数

你第二个任务是在tensorflow中初始化参数。

练习：在tensorflow中完成下边的初始化函数。你将会使用Xavier初始化权重，并使用零来初始化偏差。维度都在下边给出来了。作为一个例子，为了帮助你，W1和b1你可以使用：

# v1版本初始化W1如下：
W1 = tf.get_variable("W1", [25,12288], initializer = tf.contrib.layers.xavier_initializer(seed = 1))
# v2版本初始化W1如下：
W1 = tf.compat.v1.get_variable("W1", [25,12288], initializer=tf.initializers.GlorotUniform(seed=1))
b1 = tf.get_variable("b1", [25,1], initializer = tf.zeros_initializer())

请使用seed=1来保证你的结果和我们的结果匹配。
完成以下函数：

def initialize_parameters():
    """
    Initializes parameters to build a neural network with tensorflow. The shapes are:
                        W1 : [25, 12288]
                        b1 : [25, 1]
                        W2 : [12, 25]
                        b2 : [12, 1]
                        W3 : [6, 12]
                        b3 : [6, 1]

    Returns:
    parameters -- a dictionary of tensors containing W1, b1, W2, b2, W3, b3
    """

完成后结果如下：

def initialize_parameters():
    """
    Initializes parameters to build a neural network with tensorflow. The shapes are:
                        W1 : [25, 12288]
                        b1 : [25, 1]
                        W2 : [12, 25]
                        b2 : [12, 1]
                        W3 : [6, 12]
                        b3 : [6, 1]

    Returns:
    parameters -- a dictionary of tensors containing W1, b1, W2, b2, W3, b3
    """
    W1 = tf.compat.v1.get_variable("W1", [25,12288], initializer=tf.initializers.GlorotUniform(seed=1))
    b1 = tf.compat.v1.get_variable("b1", [25, 1], initializer=tf.zeros_initializer())
    W2 = tf.compat.v1.get_variable("W2", [12,25], initializer=tf.initializers.GlorotUniform(seed=1))
    b2 = tf.compat.v1.get_variable("b2", [12, 1], initializer=tf.zeros_initializer())
    W3 = tf.compat.v1.get_variable("W3", [6, 12], initializer=tf.initializers.GlorotUniform(seed=1))
    b3 = tf.compat.v1.get_variable("b3", [6, 1], initializer=tf.zeros_initializer())

    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2,
                  "W3": W3,
                  "b3": b3}

    return parameters

可以使用以下代码测试一下：

# 测试initialize_parameters()
ops.reset_default_graph()  # 用于清除默认图形堆栈并重置全局默认图形。
with tf.compat.v1.Session() as sess:
    parameters = initialize_parameters()
    print("W1 = " + str(parameters["W1"]))
    print("b1 = " + str(parameters["b1"]))
    print("W2 = " + str(parameters["W2"]))
    print("b2 = " + str(parameters["b2"]))

得到的结果为：

W1 = <tf.Variable 'W1:0' shape=(25, 12288) dtype=float32>
b1 = <tf.Variable 'b1:0' shape=(25, 1) dtype=float32>
W2 = <tf.Variable 'W2:0' shape=(12, 25) dtype=float32>
b2 = <tf.Variable 'b2:0' shape=(12, 1) dtype=float32>

正如预期的那样，这些参数尚未得到评估。

2.3 在tensorflow中的前向传播

你将会使用tensorflow完成前向传播模块。该方法将会输入字典类型的parameters并且将完成前向传播。这里有一些可能会用到的函数：

tf.add(...,...)用于相加
tf.matmul(...,...)用于矩阵相乘
tf.nn.relu(...)用于ReLU激活函数

问题：完成神经网络的前向传播部分。我们为你评估numpy等价物，以便你可以将tensorflow实现与numpy进行比较。很重要的是需要记住前向传播函数在z3停止。原因是tensorflow的最后线性输出将会用作计算损耗函数的输入。因此，你不需要a3！
完成如下函数：

def forward_propagation(X, parameters):
    """
    Implements the forward propagation for the model: LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SOFTMAX

    Arguments:
    X -- input dataset placeholder, of shape (input size, number of examples)
    parameters -- python dictionary containing your parameters "W1", "b1", "W2", "b2", "W3", "b3"
                  the shapes are given in initialize_parameters

    Returns:
    Z3 -- the output of the last LINEAR unit
    """

完成后应当如下：

def forward_propagation(X, parameters):
    """
    Implements the forward propagation for the model: LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SOFTMAX

    Arguments:
    X -- input dataset placeholder, of shape (input size, number of examples)
    parameters -- python dictionary containing your parameters "W1", "b1", "W2", "b2", "W3", "b3"
                  the shapes are given in initialize_parameters

    Returns:
    Z3 -- the output of the last LINEAR unit
    """
    W1 = parameters["W1"]
    b1 = parameters["b1"]
    W2 = parameters["W2"]
    b2 = parameters["b2"]
    W3 = parameters["W3"]
    b3 = parameters["b3"]

    Z1 = tf.add(tf.matmul(W1, X), b1)
    A1 = tf.nn.relu(Z1)
    Z2 = tf.add(tf.matmul(W2, A1), b2)
    A2 = tf.nn.relu(Z2)
    Z3 = tf.add(tf.matmul(W3, A2), b3)

    return Z3

使用下面的代码进行简单的测试：

# 测试forward_propagation(X, parameters)
ops.reset_default_graph()  # 用于清除默认图形堆栈并重置全局默认图形。
with tf.compat.v1.Session() as sess:
    X, Y = create_placeholders(12288, 6)
    parameters = initialize_parameters()
    Z3 = forward_propagation(X, parameters)
    print("Z3 = " + str(Z3))

得到的结果如下：

Z3 = Tensor("Add_2:0", shape=(6, None), dtype=float32)

你可能会注意到前向传播并没有任何缓存。你将会在接下来我们进行反向传播时明白为什么。

2.4 计算成本

正如我们之前看到的，计算成本时很容易的，只需要使用下边这一句：
tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = ..., labels = ...))
问题：完成下边的成本函数。

重要的是需要知道"logit“和”labels"作为tf.nn.softmax_cross_entropy_with_logits的输入，需要具有维度(例子总数, 类别数量)。因此，我们为你转置了Z3和Y。
另外，tf.reduce_mean能够对所有例子进行了求和

完成如下函数：

 def compute_cost(Z3, Y):
    """
    Computes the cost

    Arguments:
    Z3 -- output of forward propagation (output of the last LINEAR unit), of shape (6, number of examples)
    Y -- "true" labels vector placeholder, same shape as Z3

    Returns:
    cost - Tensor of the cost function
    """

完成后应当如下：

 def compute_cost(Z3, Y):
    """
    Computes the cost

    Arguments:
    Z3 -- output of forward propagation (output of the last LINEAR unit), of shape (6, number of examples)
    Y -- "true" labels vector placeholder, same shape as Z3

    Returns:
    cost - Tensor of the cost function
    """
    logits = tf.transpose(Z3)
    labels = tf.transpose(Y)

    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits,labels=labels))

    return cost

使用下面的代码进行简单的测试：

 # 测试compute_cost(Z3, Y)
ops.reset_default_graph()  # 用于清除默认图形堆栈并重置全局默认图形。
with tf.compat.v1.Session() as sess:
    X, Y = create_placeholders(12288, 6)
    parameters = initialize_parameters()
    Z3 = forward_propagation(X, parameters)
    cost = compute_cost(Z3, Y)
    print("cost = " + str(cost))

得到的结果如下：

 cost = Tensor("Mean:0", shape=(), dtype=float32)

## 2.5 反向传播和参数更新在这一步，你将会理解使用框架进行编程的优势了。所有的反向传播和参数更新都会被缩减为一行代码。从而在model中合并这一行代码，这是非常容易的。在你计算成本函数之后，你将会创建一个"``optimizer``"对象。你在运行tf.compat.v1.session时候将会调用这个对象及成本。在调用时候，它会根据选择的方法和学习率对给定的成本进行优化。对于本实例，梯度下降的优化会是： `optimizer = tf.train.GradientDescentOptimizer(learning_rate = learning_rate).minimize(cost)` 为了实现优化，你将会： `_ , c = sess.run([optimizer, cost], feed_dict={X: minibatch_X, Y: minibatch_Y})` 这以相反的顺序通过tensorflow图来计算反向传播。从成本到输入。 **笔记**：在编码时，我们通常使用`_`作为一次性变量来存储我们以后不需要的值。在这里，`_`取优化器的评估值，这是我们并不需要的（c取的是成本变量的值）。

2.6 建立模型

现在，你将要将上面的东西集合在一起。
练习：完成模型。你将会调用你前面完成的函数。

def model(X_train, Y_train, X_test, Y_test, learning_rate = 0.0001,
          num_epochs = 1500, minibatch_size = 32, print_cost = True):
    """
    Implements a three-layer tensorflow neural network: LINEAR->RELU->LINEAR->RELU->LINEAR->SOFTMAX.
    
    Arguments:
    X_train -- training set, of shape (input size = 12288, number of training examples = 1080)
    Y_train -- test set, of shape (output size = 6, number of training examples = 1080)
    X_test -- training set, of shape (input size = 12288, number of training examples = 120)
    Y_test -- test set, of shape (output size = 6, number of test examples = 120)
    learning_rate -- learning rate of the optimization
    num_epochs -- number of epochs of the optimization loop
    minibatch_size -- size of a minibatch
    print_cost -- True to print the cost every 100 epochs
    
    Returns:
    parameters -- parameters learnt by the model. They can then be used to predict.
    """

完成后应该如下：

def model(X_train, Y_train, X_test, Y_test, learning_rate = 0.0001,
          num_epochs = 1500, minibatch_size = 32, print_cost = True):
    """
    Implements a three-layer tensorflow neural network: LINEAR->RELU->LINEAR->RELU->LINEAR->SOFTMAX.
    
    Arguments:
    X_train -- training set, of shape (input size = 12288, number of training examples = 1080)
    Y_train -- test set, of shape (output size = 6, number of training examples = 1080)
    X_test -- training set, of shape (input size = 12288, number of training examples = 120)
    Y_test -- test set, of shape (output size = 6, number of test examples = 120)
    learning_rate -- learning rate of the optimization
    num_epochs -- number of epochs of the optimization loop
    minibatch_size -- size of a minibatch
    print_cost -- True to print the cost every 100 epochs
    
    Returns:
    parameters -- parameters learnt by the model. They can then be used to predict.
    """
	ops.reset_default_graph()  # 能够重新运行模型而不覆盖tf变量
    tf.random.set_seed(1)
    # tf.set_random_seed(1)
    seed = 3
    (n_x, m) = X_train.shape  # 获取输入节点数量和样本数
    n_y = Y_train.shape[0]  # 获取输出节点数量
    costs = []  # 成本集

    # 给X和Y创建placeholder
    X, Y = create_placeholders(n_x, n_y)

    # 初始化参数
    parameters = initialize_parameters()

    # 前向传播
    Z3 = forward_propagation(X, parameters)

    # 计算成本
    cost = compute_cost(Z3, Y)

    # 反向传播，使用Adam优化
    optimizer = tf.compat.v1.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

    # 初始化所有的变量
    init = tf.compat.v1.global_variables_initializer()

    # 开始会话并计算
    with tf.compat.v1.Session() as sess:
        # 初始化
        sess.run(init)

        # 正常训练的循环
        for epoch in range(num_epochs):

            epoch_cost = 0  # 每代的成本
            num_minibatches = int(m / minibatch_size)  # minibatch的总数量
            seed = seed + 1
            minibatches = random_mini_batches(X_train, Y_train, minibatch_size, seed)

            for minibatch in minibatches:
                # 选择一个minibatch
                (minibatch_X, minibatch_Y) = minibatch

                # 数据已经准备好了，开始运行session
                _, minibatch_cost = sess.run([optimizer, cost], feed_dict={X: minibatch_X, Y: minibatch_Y})

                # 计算这个minibatch在这一代中所占的误差
                epoch_cost = epoch_cost + minibatch_cost / num_minibatches

            # 记录并打印成本
            ## 记录成本
            if epoch % 5 == 0:
                costs.append(epoch_cost)
                # 是否打印：
                if print_cost and epoch % 100 == 0:
                    print("epoch = " + str(epoch) + "    epoch_cost = " + str(epoch_cost))

        # 是否绘制图谱
        if is_plot:
            plt.plot(np.squeeze(costs))
            plt.ylabel('cost')
            plt.xlabel('iterations (per tens)')
            plt.title("Learning rate =" + str(learning_rate))
            plt.show()

        # 保存学习后的参数
        parameters = sess.run(parameters)
        print("参数已经保存到session。")

        # 计算当前的预测结果
        correct_prediction = tf.equal(tf.argmax(Z3), tf.argmax(Y))

        # 计算准确率
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))

        print("训练集的准确率：", accuracy.eval({X: X_train, Y: Y_train}))
        print("测试集的准确率:", accuracy.eval({X: X_test, Y: Y_test}))

        return parameters

可以用以下代码测试一下你的函数：

# 测试函数model
parameters = model(X_train, Y_train, X_test, Y_test)

得到的结果如下：

epoch = 0    epoch_cost = 1.8665231249549175
epoch = 100    epoch_cost = 0.8127114610238508
epoch = 200    epoch_cost = 0.571791567585685
epoch = 300    epoch_cost = 0.39651927938967046
epoch = 400    epoch_cost = 0.27292838354002347
epoch = 500    epoch_cost = 0.19333269401933206
epoch = 600    epoch_cost = 0.1259097773017305
epoch = 700    epoch_cost = 0.08381712944670158
epoch = 800    epoch_cost = 0.05590562443390039
epoch = 900    epoch_cost = 0.03187121072727622
epoch = 1000    epoch_cost = 0.022090311133951848
epoch = 1100    epoch_cost = 0.011794825239727896
epoch = 1200    epoch_cost = 0.008767742480179579
epoch = 1300    epoch_cost = 0.005665080000956853
epoch = 1400    epoch_cost = 0.003184822825432727

在这里插入图片描述

参数已经保存到session。
训练集的准确率： 1.0
测试集的准确率: 0.8666667

令人惊讶的是，你的算法可以识别一个代表0到5之间数字的符号，准确率为86.6%。

观察：

你的模型看起来足够大，可以很好地适应训练环境。然而，考虑到训练和测试精度之间的差异，您可以尝试添加L2正则化，以减少过度拟合。
将session视为训练模型的代码块。每次在batch上运行session时候，它都会训练参数。在获得经过良好训练的参数之前，您运行了大概1500次会话。

2.7 测试你自己的图片（选做）

使用以下代码（记得将图片转成64*64的）：

my_image1 = "Picture1.jpg"                                            #定义图片名称
fileName1 = "H:/DeepLearning_wed/course2/week3/" + my_image1                      #图片地址
image1 = mpimg.imread(fileName1)                               #读取图片
plt.imshow(image1)                                             #显示图片
my_image1 = image1.reshape(1,64 * 64 * 3).T                    #重构图片
my_image_prediction = predict(my_image1, parameters)  #开始预测
print("预测结果: y = " + str(np.squeeze(my_image_prediction)))

我使用的三张图如下：
在这里插入图片描述

我已经转成了64*64的了（格式工厂转的）。
最终识别出来：

预测结果: y = 3
预测结果: y = 3
预测结果: y = 3

我觉得。。这样的结果是因为过度拟合了。训练集中的图片全部是以白色为底，当我们的模型过度拟合了那些数据，在黑底的时候就会表现出不良的效果。你可以用白墙自己拍一些图片，用格式工厂转化一下。

三、总结

你应该记住：

Tensorflow是一个用于深度学习的编程框架
Tensorflow中的两个主要对象类是Tensors和运算符
在tensorflow中编写代码时，必须执行以下步骤：

创建包含Tensors（变量、占位符等）的图形和操作（tf.matmul，tf.add，…）
创建会话
初始化会话
运行会话以执行图形

正如在model（）中看到的那样，可以多次执行该图
反向传播和优化是在“optimizer”对象上运行会话时自动完成的。

四、总代码

练习部分main.py：

import math
import numpy as np
import h5py
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.python.framework import ops
from course2.week3.tf_utils import load_dataset, random_mini_batches, convert_to_one_hot, predict

np.random.seed(1)
tf.compat.v1.disable_eager_execution()  # 保证session.run()能够正常运行

# # Example1
# y_hat = tf.constant(36, name='y_hat')            # Define y_hat constant. Set to 36.
# y = tf.constant(39, name='y')                    # Define y. Set to 39
#
# loss = tf.Variable((y - y_hat)**2, name='loss')  # Create a variable for the loss
#
# init = tf.compat.v1.global_variables_initializer()         # When init is run later (session.run(init)),
#                                                  # the loss variable will be initialized and ready to be computed
# with tf.compat.v1.Session () as session:                    # Create a session and print the output
#     session.run(init)                            # Initializes the variables
#     print(session.run(loss))                     # Prints the loss

# # Example2
# a = tf.constant(2)
# b = tf.constant(10)
# c = tf.multiply(a,b)
# print(c)
#
# sess = tf.compat.v1.Session ()
# print(sess.run(c))

# # Example3
# x = tf.compat.v1.placeholder(tf.int64,name="x")
# sess = tf.compat.v1.Session ()
# print(sess.run(2 * x,feed_dict={x:3}))
# sess.close()

# Exercise1
def linear_function():
    """
    Implements a linear function:
            Initializes W to be a random tensor of shape (4,3)
            Initializes X to be a random tensor of shape (3,1)
            Initializes b to be a random tensor of shape (4,1)
    Returns:
    result -- runs the session for Y = WX + b
    """
    np.random.seed(1)

    X = np.random.randn(3, 1)
    W = np.random.randn(4, 3)
    b = np.random.randn(4, 1)
    Y = tf.add(tf.matmul(W, X), b)     #或者直接tf.matmul(W, X)+b
    # 创建session
    sess = tf.compat.v1.Session()
    result = sess.run(Y)
    # 使用结束，关闭session
    sess.close()

    return result

# # 测试linear_function()
# print( "result = " + str(linear_function()))


# Exercise2
def sigmoid(z):
    """
    Computes the sigmoid of z

    Arguments:
    z -- input value, scalar or vector

    Returns:
    results -- the sigmoid of z
    """
    x = tf.compat.v1.placeholder(tf.float32, name = "x")

    sess = tf.compat.v1.Session()
    result = sess.run(tf.sigmoid(x), feed_dict = {x: z})
    sess.close()

    return result

# # 测试sigmoid(z)
# print ("sigmoid(0) = " + str(sigmoid(0)))
# print ("sigmoid(12) = " + str(sigmoid(12)))

# Exercise3
def cost(logits, labels):
    """
    Computes the cost using the sigmoid cross entropy

    Arguments:
    logits -- vector containing z, output of the last linear unit (before the final sigmoid activation)
    labels -- vector of labels y (1 or 0)

    Note: What we've been calling "z" and "y" in this class are respectively called "logits" and "labels"
    in the TensorFlow documentation. So logits will feed into z, and labels into y.

    Returns:
    cost -- runs the session of the cost (formula (2))
    """
    z = tf.compat.v1.placeholder(tf.float32, name = "z")
    y = tf.compat.v1.placeholder(tf.float32, name = "y")

    cost = tf.nn.sigmoid_cross_entropy_with_logits(logits = z, labels = y)
    sess = tf.compat.v1.Session()
    cost = sess.run(cost, feed_dict={z: logits, y: labels})
    sess.close()

    return cost

# # 测试cost(logits, labels)
# logits = sigmoid(np.array([0.2, 0.4, 0.7, 0.9]))
# cost = cost(logits, np.array([0, 0, 1, 1]))
# print ("cost = " + str(cost))


# Exercise4
def one_hot_matrix(labels, C):
    """
    Creates a matrix where the i-th row corresponds to the ith class number and the jth column
                     corresponds to the jth training example. So if example j had a label i. Then entry (i,j)
                     will be 1.

    Arguments:
    labels -- vector containing the labels
    C -- number of classes, the depth of the one hot dimension

    Returns:
    one_hot -- one hot matrix
    """
    C = tf.constant(C, name="C")

    one_hot_matrix = tf.one_hot(indices=labels, depth=C, axis=0)

    sess = tf.compat.v1.Session()
    one_hot = sess.run(one_hot_matrix)
    sess.close()

    return one_hot

# # 测试one_hot_matrix(labels, C)
# labels = np.array([1,2,3,0,2,1])
# one_hot = one_hot_matrix(labels, C=4)
# print ("one_hot = " + str(one_hot))


# Exercise5
def ones(shape):
    """
    Creates an array of ones of dimension shape

    Arguments:
    shape -- shape of the array you want to create

    Returns:
    ones -- array containing only ones
    """
    ones = tf.ones(shape)

    sess = tf.compat.v1.Session()
    ones = sess.run(ones)
    sess.close()

    return ones

# # 测试ones(shape)
# print ("ones = " + str(ones([3])))

神经网络部分neuralwork.py：

import math
import numpy as np
import h5py
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import tensorflow as tf
from tensorflow.python.framework import ops
from course2.week3.tf_utils import load_dataset, random_mini_batches, convert_to_one_hot, predict

np.random.seed(1)
tf.compat.v1.disable_eager_execution()  # 保证session.run()能够正常运行

# Loading the dataset
X_train_orig, Y_train_orig, X_test_orig, Y_test_orig, classes = load_dataset()

# # Example of a picture
# index = 0
# plt.imshow(X_train_orig[index])
# print ("y = " + str(np.squeeze(Y_train_orig[:, index])))
# plt.show()

# Flatten the image dataset
X_train_flatten = X_train_orig.reshape(X_train_orig.shape[0], -1).T
X_test_flatten = X_test_orig.reshape(X_test_orig.shape[0], -1).T
# Normalize image vectors
X_train = X_train_flatten / 255.
X_test = X_test_flatten / 255.
# Convert training and test labels to one hot matrices
Y_train = convert_to_one_hot(Y_train_orig, 6)
Y_test = convert_to_one_hot(Y_test_orig, 6)

# print("number of training examples = " + str(X_train.shape[1]))
# print("number of test examples = " + str(X_test.shape[1]))
# print("X_train shape: " + str(X_train.shape))
# print("Y_train shape: " + str(Y_train.shape))
# print("X_test shape: " + str(X_test.shape))
# print("Y_test shape: " + str(Y_test.shape))


def create_placeholders(n_x, n_y):
    """
    Creates the placeholders for the tensorflow session.

    Arguments:
    n_x -- scalar, size of an image vector (num_px * num_px = 64 * 64 * 3 = 12288)
    n_y -- scalar, number of classes (from 0 to 5, so -> 6)

    Returns:
    X -- placeholder for the data input, of shape [n_x, None] and dtype "float"
    Y -- placeholder for the input labels, of shape [n_y, None] and dtype "float"

    Tips:
    - You will use None because it let's us be flexible on the number of examples you will for the placeholders.
      In fact, the number of examples during test/train is different.
    """
    X = tf.compat.v1.placeholder(tf.float32, [n_x, None] ,name = "x")
    Y = tf.compat.v1.placeholder(tf.float32, [n_y, None],name = "y")

    return X,Y

# # 测试create_placeholders(n_x, n_y)
# X, Y = create_placeholders(12288, 6)
# print("X = " + str(X))
# print("Y = " + str(Y))


def initialize_parameters():
    """
    Initializes parameters to build a neural network with tensorflow. The shapes are:
                        W1 : [25, 12288]
                        b1 : [25, 1]
                        W2 : [12, 25]
                        b2 : [12, 1]
                        W3 : [6, 12]
                        b3 : [6, 1]

    Returns:
    parameters -- a dictionary of tensors containing W1, b1, W2, b2, W3, b3
    """
    W1 = tf.compat.v1.get_variable("W1", [25,12288], initializer=tf.initializers.GlorotUniform(seed=1))
    b1 = tf.compat.v1.get_variable("b1", [25, 1], initializer=tf.zeros_initializer())
    W2 = tf.compat.v1.get_variable("W2", [12,25], initializer=tf.initializers.GlorotUniform(seed=1))
    b2 = tf.compat.v1.get_variable("b2", [12, 1], initializer=tf.zeros_initializer())
    W3 = tf.compat.v1.get_variable("W3", [6, 12], initializer=tf.initializers.GlorotUniform(seed=1))
    b3 = tf.compat.v1.get_variable("b3", [6, 1], initializer=tf.zeros_initializer())

    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2,
                  "W3": W3,
                  "b3": b3}

    return parameters

# # 测试initialize_parameters()
# ops.reset_default_graph()  # 用于清除默认图形堆栈并重置全局默认图形。
# with tf.compat.v1.Session() as sess:
#     parameters = initialize_parameters()
#     print("W1 = " + str(parameters["W1"]))
#     print("b1 = " + str(parameters["b1"]))
#     print("W2 = " + str(parameters["W2"]))
#     print("b2 = " + str(parameters["b2"]))


def forward_propagation(X, parameters):
    """
    Implements the forward propagation for the model: LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SOFTMAX

    Arguments:
    X -- input dataset placeholder, of shape (input size, number of examples)
    parameters -- python dictionary containing your parameters "W1", "b1", "W2", "b2", "W3", "b3"
                  the shapes are given in initialize_parameters

    Returns:
    Z3 -- the output of the last LINEAR unit
    """
    W1 = parameters["W1"]
    b1 = parameters["b1"]
    W2 = parameters["W2"]
    b2 = parameters["b2"]
    W3 = parameters["W3"]
    b3 = parameters["b3"]

    Z1 = tf.add(tf.matmul(W1, X), b1)
    A1 = tf.nn.relu(Z1)
    Z2 = tf.add(tf.matmul(W2, A1), b2)
    A2 = tf.nn.relu(Z2)
    Z3 = tf.add(tf.matmul(W3, A2), b3)

    return Z3

# # 测试forward_propagation(X, parameters)
# ops.reset_default_graph()  # 用于清除默认图形堆栈并重置全局默认图形。
# with tf.compat.v1.Session() as sess:
#     X, Y = create_placeholders(12288, 6)
#     parameters = initialize_parameters()
#     Z3 = forward_propagation(X, parameters)
#     print("Z3 = " + str(Z3))


def compute_cost(Z3, Y):
    """
    Computes the cost

    Arguments:
    Z3 -- output of forward propagation (output of the last LINEAR unit), of shape (6, number of examples)
    Y -- "true" labels vector placeholder, same shape as Z3

    Returns:
    cost - Tensor of the cost function
    """
    logits = tf.transpose(Z3)
    labels = tf.transpose(Y)

    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits,labels=labels))

    return cost

# # 测试compute_cost(Z3, Y)
# ops.reset_default_graph()  # 用于清除默认图形堆栈并重置全局默认图形。
# with tf.compat.v1.Session() as sess:
#     X, Y = create_placeholders(12288, 6)
#     parameters = initialize_parameters()
#     Z3 = forward_propagation(X, parameters)
#     cost = compute_cost(Z3, Y)
#     print("cost = " + str(cost))

def model(X_train, Y_train, X_test, Y_test,
          learning_rate=0.0001, num_epochs=1500, minibatch_size=32,
          print_cost=True, is_plot=True):
    """
    实现一个三层的TensorFlow神经网络：LINEAR->RELU->LINEAR->RELU->LINEAR->SOFTMAX

    参数：
        X_train - 训练集，维度为（输入大小（输入节点数量） = 12288, 样本数量 = 1080）
        Y_train - 训练集分类数量，维度为（输出大小(输出节点数量) = 6, 样本数量 = 1080）
        X_test - 测试集，维度为（输入大小（输入节点数量） = 12288, 样本数量 = 120）
        Y_test - 测试集分类数量，维度为（输出大小(输出节点数量) = 6, 样本数量 = 120）
        learning_rate - 学习速率
        num_epochs - 整个训练集的遍历次数
        mini_batch_size - 每个小批量数据集的大小
        print_cost - 是否打印成本，每100代打印一次
        is_plot - 是否绘制曲线图

    返回：
        parameters - 学习后的参数
    """
    ops.reset_default_graph()  # 能够重新运行模型而不覆盖tf变量
    tf.random.set_seed(1)
    # tf.set_random_seed(1)
    seed = 3
    (n_x, m) = X_train.shape  # 获取输入节点数量和样本数
    n_y = Y_train.shape[0]  # 获取输出节点数量
    costs = []  # 成本集

    # 给X和Y创建placeholder
    X, Y = create_placeholders(n_x, n_y)

    # 初始化参数
    parameters = initialize_parameters()

    # 前向传播
    Z3 = forward_propagation(X, parameters)

    # 计算成本
    cost = compute_cost(Z3, Y)

    # 反向传播，使用Adam优化
    optimizer = tf.compat.v1.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

    # 初始化所有的变量
    init = tf.compat.v1.global_variables_initializer()

    # 开始会话并计算
    with tf.compat.v1.Session() as sess:
        # 初始化
        sess.run(init)

        # 正常训练的循环
        for epoch in range(num_epochs):

            epoch_cost = 0  # 每代的成本
            num_minibatches = int(m / minibatch_size)  # minibatch的总数量
            seed = seed + 1
            minibatches = random_mini_batches(X_train, Y_train, minibatch_size, seed)

            for minibatch in minibatches:
                # 选择一个minibatch
                (minibatch_X, minibatch_Y) = minibatch

                # 数据已经准备好了，开始运行session
                _, minibatch_cost = sess.run([optimizer, cost], feed_dict={X: minibatch_X, Y: minibatch_Y})

                # 计算这个minibatch在这一代中所占的误差
                epoch_cost = epoch_cost + minibatch_cost / num_minibatches

            # 记录并打印成本
            ## 记录成本
            if epoch % 5 == 0:
                costs.append(epoch_cost)
                # 是否打印：
                if print_cost and epoch % 100 == 0:
                    print("epoch = " + str(epoch) + "    epoch_cost = " + str(epoch_cost))

        # 是否绘制图谱
        if is_plot:
            plt.plot(np.squeeze(costs))
            plt.ylabel('cost')
            plt.xlabel('iterations (per tens)')
            plt.title("Learning rate =" + str(learning_rate))
            plt.show()

        # 保存学习后的参数
        parameters = sess.run(parameters)
        print("参数已经保存到session。")

        # 计算当前的预测结果
        correct_prediction = tf.equal(tf.argmax(Z3), tf.argmax(Y))

        # 计算准确率
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))

        print("训练集的准确率：", accuracy.eval({X: X_train, Y: Y_train}))
        print("测试集的准确率:", accuracy.eval({X: X_test, Y: Y_test}))

        return parameters

# # 测试函数model
parameters = model(X_train, Y_train, X_test, Y_test)


# 以下测试我自己的图片，路径是我电脑上的。
my_image1 = "Picture1.jpg"                                            #定义图片名称
fileName1 = "H:/DeepLearning_wed/course2/week3/" + my_image1                      #图片地址
image1 = mpimg.imread(fileName1)                               #读取图片
plt.imshow(image1)                                             #显示图片
my_image1 = image1.reshape(1,64 * 64 * 3).T                    #重构图片
my_image_prediction = predict(my_image1, parameters)  #开始预测
print("预测结果: y = " + str(np.squeeze(my_image_prediction)))

my_image2 = "Picture2.jpg"                                            #定义图片名称
fileName2 = "H:/DeepLearning_wed/course2/week3/" + my_image2                      #图片地址
image2 = mpimg.imread(fileName2)                               #读取图片
plt.imshow(image2)                                             #显示图片
my_image2 = image2.reshape(1,64 * 64 * 3).T                    #重构图片
my_image_prediction = predict(my_image2, parameters)  #开始预测
print("预测结果: y = " + str(np.squeeze(my_image_prediction)))

my_image3 = "Picture3.jpg"                                            #定义图片名称
fileName3 = "H:/DeepLearning_wed/course2/week3/" + my_image3                     #图片地址
image3 = mpimg.imread(fileName3)                               #读取图片
plt.imshow(image3)                                             #显示图片
my_image3 = image3.reshape(1,64 * 64 * 3).T                    #重构图片
my_image_prediction = predict(my_image3, parameters)  #开始预测
print("预测结果: y = " + str(np.squeeze(my_image_prediction)))

passer__jw767

关注

1
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
【深度学习】吴恩达深度学习-Course2改善深层神经网络：超参数调试、正则化以及优化-第三周超参数调试、Batch正则化和程序框架编程

（本篇文章使用TF2.0进行编程）视频链接：【中英字幕】吴恩达深度学习课程第二课 — 改善深层神经网络：超参数调试、正则化以及优化参考链接：【中文】【吴恩达课后编程作业】Course 2 - 改善深层神经网络 - 第三周作业TensorFlow Tutorial吴恩达深度学习第二课第三周编程作业（我使用的是TF2.0）目录（本篇文章使用TF2.0进行编程）〇、作业目标一、导入Tensorflow库1.1 线性函数1.2 计算sigmoid1.3 计算成本1.4 使用一个独热编码1.5 初始化
复制链接

扫一扫