吴恩达深度学习作业-第二门课改善深层神经网络-第三周：超参数调整，批量标准化，编程框架

ThreeS_tones

于 2023-02-24 21:39:26 发布

阅读量686

点赞数

分类专栏：吴恩达深度学习笔记文章标签：深度学习神经网络 python

本文链接：https://blog.csdn.net/ThreeS_tones/article/details/129170505

版权

吴恩达深度学习笔记专栏收录该内容

6 篇文章 1 订阅

订阅专栏

1. 测验

如果在大量的超参数中搜索最佳的参数值，应当尝试随机值，不要使用网格搜索，因为你不知道哪些超参数比其他的更重要。
每个超参数的重要程度不同，比如视频里讲到的学习率这个超参数比其他超参数重要。
在超参数搜索过程中，你尝试只照顾一个模型（使用熊猫策略）还是一起训练大量的模型（鱼子酱策略）在很大程度上取决于：在你能力范围内，你能够拥有多大的计算能力。
如果您认为𝜷（动量超参数）介于 0.9 和 0.99 之间，那么应该采用以下公式进行采样：

r = np.random.rand()
beta = 1 - 10 ** (-r - 1)

即使你只研究一个问题，你的超参数并不是一成不变的，模型中的细微变化可能导致您需要从头开始重新找到好的超参数（比如你的服务器更新或者数据变化）。
标准化指的是对z[l]进行标准化。
𝒛𝒏𝒐𝒓𝒎(𝒊) =𝒛(𝒊)−𝝁√𝝈𝟐+𝜺中，epsilon是为了避免除零操作。
Batch Norm 中关于 γ 和 β：它们可以在 Adam、具有动量的梯度下降或 RMSprop 使中用，而不仅仅是用梯度下降来学习；它们设定给定层的线性变量𝑧[𝑙] 的均值和方差。
对于测试集，在新的样本上评估神经网络，执行所需的标准化，在训练期间使用了 𝜇 和 𝜎2 的指数加权平均值来估计 mini-batches 的情况。
深度学习编程框架：通过编程框架，您可以使用比低级语言（如 Python）更少的代码来编写深度学习算法。即使一个项目目前是开源的，项目的良好管理有助于确保它即使在长期内仍然保持开放，而不是仅仅为了一个公司而关闭或修改。

2. 作业

在TensorFlow中编写和运行程序有以下步骤:

创建尚未执行/求值的张量(变量)。
写这些张量之间的操作。
初始化你的张量。
创建会话。
运行会话。这将运行上面所写的操作。

注意：使用的环境是TensorFlow2.x的，而教程中的函数用的都是1.x版本的，所以在代码中使用的函数与教程中会有些不同。

2.1 探索Tensorflow库

先导入需要用到的库

import math
import numpy as np
import h5py
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.python.framework import ops
from tf_utils import load_dataset, random_mini_batches, convert_to_one_hot, predict

%matplotlib inline
np.random.seed(1)

使用TensorFlow的例子，为一个训练样本计算损失函数：
$\mathcal{L}(\hat{y}, y) = (\hat y^{(i)} - y^{(i)})^2 \tag{1}$

y_hat = tf.constant(36, name='y_hat')            # Define y_hat constant. Set to 36.
y = tf.constant(39, name='y')                    # Define y. Set to 39

loss = tf.Variable((y - y_hat)**2, name='loss')  # Create a variable for the loss

init = tf.compat.v1.global_variables_initializer()
# init = tf.global_variables_initializer()         # When init is run later (session.run(init)),
                                                 # the loss variable will be initialized and ready to be computed
with tf.compat.v1.Session() as session:                    # Create a session and print the output
    session.run(init)                            # Initializes the variables
    print(session.run(loss))                     # Prints the loss

因此，当我们为损失创建一个变量时，我们简单地将损失定义为其他量的函数，而没有计算其值。要计算它，我们必须运行init=tf.global_variables_initializer()（版本1.x的函数），如果是2.x，使用init = tf.compat.v1.global_variables_initializer()。这初始化了loss变量，在最后一行中，我们最终能够计算loss的值并输出它的值。

a = tf.constant(2)
b = tf.constant(10)
c = tf.multiply(a,b)
print(c)

Tensor(“Mul_1:0”, shape=(), dtype=int32)

不出所料，你不会看到20个!你得到一个张量，说结果是一个不具有shape属性的张量，类型为“int32”。你所做的只是放入“计算图”，但你还没有运行这个计算。为了实际将这两个数字相乘，您必须创建一个会话（Session）并运行它。

sess = tf.compat.v1.Session()
print(sess.run(c))

20

记住要初始化变量，创建会话并在会话中运行操作。

占位符，例子：

# Change the value of x in the feed_dict

x = tf.compat.v1.placeholder(tf.int64, name = 'x')
print(sess.run(2 * x, feed_dict = {x: 3}))
sess.close()

占位符是一个只能在以后指定其值的对象。要为占位符指定值，可以使用“feed字典”(feed_dict变量)传入值。下面，我们为x创建了一个占位符。这允许我们在稍后运行会话时传入一个数字。

当你第一次定义x时，你不需要为它指定一个值。占位符只是稍后在运行会话时才会分配数据的变量。我们说，您在运行会话时将数据提供给这些占位符。
这里发生了什么:当你指定一个计算所需的操作时，你是在告诉TensorFlow如何构造一个计算图。计算图可以有一些占位符，稍后您将指定它们的值。最后，当你运行会话时，你是在告诉TensorFlow执行计算图。

2.1.1 线性函数

计算 $Y = W X + b$
定义常量的方法：X = tf.constant(np.random.randn(3,1), name = "X")
矩阵乘法运算：tf.matmul(…, …)
加法运算：tf.add(…, …) to do an addition
随机初始化：np.random.randn(…)

# GRADED FUNCTION: linear_function

def linear_function():
    """
    Implements a linear function: 
            Initializes W to be a random tensor of shape (4,3)
            Initializes X to be a random tensor of shape (3,1)
            Initializes b to be a random tensor of shape (4,1)
    Returns: 
    result -- runs the session for Y = WX + b 
    """
    
    np.random.seed(1)
    
    ### START CODE HERE ### (4 lines of code)
    X = tf.constant(np.random.randn(3,1), name = "X")
    W = tf.constant(np.random.randn(4,3), name = "W")
    b = tf.constant(np.random.randn(4,1), name = "b")
    Y = tf.add(tf.matmul(W, X), b)
    ### END CODE HERE ### 
    
    # Create the session using tf.Session() and run it with sess.run(...) on the variable you want to calculate
    
    ### START CODE HERE ###
    sess = tf.compat.v1.Session()
    result = sess.run(Y)
    ### END CODE HERE ### 
    
    # close the session 
    sess.close()

    return result

2.2.2 计算sigmoid函数

Tensorflow提供了各种常用的神经网络函数，如tf。Sigmoid和tf。softmax。在这个练习中，我们计算输入的sigmoid函数。

你将使用一个占位符变量x来做这个练习。当运行会话时，你应该使用feed字典来传入输入z。在这个练习中，你必须(i)创建一个占位符x， (ii)定义使用tf计算sigmoid所需的操作。然后(iii)运行会话。

需要使用：

tf.placeholder(tf.float32, name = “…”)
tf.sigmoid(…)
sess.run(…, feed_dict = {x: z})

TensorFlow中两种使用session的典型方法：
1:
sess = tf.Session() # Run the variables initialization (if needed), run the operations
result = sess.run(…, feed_dict = {…})
sess.close() # 关闭Session

2:
with tf.Session() as sess:
# 运行变量初始化(如果需要)，运行操作
result = sess.run(…, feed_dict = {…})
这种方式不用自己关闭session

# GRADED FUNCTION: sigmoid

def sigmoid(z):
    """
    Computes the sigmoid of z
    
    Arguments:
    z -- input value, scalar or vector
    
    Returns: 
    results -- the sigmoid of z
    """
    
    ### START CODE HERE ### ( approx. 4 lines of code)
    # Create a placeholder for x. Name it 'x'.
    x = tf.compat.v1.placeholder(tf.float32, name = "x")
    
    # compute sigmoid(x)
    result = tf.sigmoid(x)
    
    # Create a session, and run it. Please use the method 2 explained above. 
    # You should use a feed_dict to pass z's value to x. 
    with tf.compat.v1.Session() as sess:
        # Run session and call the output "result"
        result = sess.run(result, feed_dict = {x : z})
        
    ### END CODE HERE ###
    
    return result

调用函数：

print ("sigmoid(0) = " + str(sigmoid(0)))
print ("sigmoid(12) = " + str(sigmoid(12)))

总结：

创建占位符。
指定你想进行操作相应的计算图。
创建会话。
运行会话，必要时使用提要字典指定占位符变量的值。

2.2.3 计算成本函数

$\frac{1}{m} \sum_{i = 1}^m \large ( \small y^{(i)} \log a^{ [2] (i)} + (1-y^{(i)})\log (1-a^{ [2] (i)} )\large )\small\tag{2}$
使用代码：

tf.nn.sigmoid_cross_entropy_with_logits(logits = …, labels = …)

代码报错了，没有找到解决办法

# GRADED FUNCTION: cost

def cost(logits, labels):
    """
    Computes the cost using the sigmoid cross entropy
    
    Arguments:
    logits -- vector containing z, output of the last linear unit (before the final sigmoid activation)
    labels -- vector of labels y (1 or 0) 
    
    Note: What we've been calling "z" and "y" in this class are respectively called "logits" and "labels" 
    in the TensorFlow documentation. So logits will feed into z, and labels into y. 
    
    Returns:
    cost -- runs the session of the cost (formula (2))
    """
    
    ### START CODE HERE ### 
    
    # Create the placeholders for "logits" (z) and "labels" (y) (approx. 2 lines)
    z = tf.compat.v1.placeholder(tf.float32, name = "z")
    y = tf.compat.v1.placeholder(tf.float32, name = "y")
    
    # Use the loss function (approx. 1 line)
    cost = tf.nn.sigmoid_cross_entropy_with_logits(logits, labels)
    
    # Create a session (approx. 1 line). See method 1 above.
    sess = tf.compat.v1.Session()
    
    # Run the session (approx. 1 line).
    cost = sess.run(cost, feed_dict = {z:logits, y:labels})
    
    # Close the session (approx. 1 line). See method 1 above.
    sess.close()
    
    ### END CODE HERE ###
    
    return cost

2.2.4 使用独热编码（0、1编码）

在这里插入图片描述
第i行表示第i个类别，第j列表示第j个样本。
这被称为“one hot”编码，因为在转换后的表示中，每列中恰好有一个元素是“hot”(即设置为1)。要在numpy中进行这种转换，可能必须编写几行代码。在tensorflow中，你可以使用一行代码:

tf.one_hot(labels, depth, axis)

# GRADED FUNCTION: one_hot_matrix

def one_hot_matrix(labels, C):
    """
    Creates a matrix where the i-th row corresponds to the ith class number and the jth column
                     corresponds to the jth training example. So if example j had a label i. Then entry (i,j) 
                     will be 1. 
                     
    Arguments:
    labels -- vector containing the labels 
    C -- number of classes, the depth of the one hot dimension
    
    Returns: 
    one_hot -- one hot matrix
    """
    
    ### START CODE HERE ###
    
    # Create a tf.constant equal to C (depth), name it 'C'. (approx. 1 line)
    C = tf.constant(C, name = "C")
    
    # Use tf.one_hot, be careful with the axis (approx. 1 line)
    one_hot = tf.one_hot(labels, C, axis = 0) # 注意axis = 0
    
    # Create the session (approx. 1 line)
    sess = tf.compat.v1.Session()
    
    # Run the session (approx. 1 line)
    one_hot = sess.run(one_hot)
    
    # Close the session (approx. 1 line). See method 1 above.
    sess.close()
    
    ### END CODE HERE ###
    
    return one_hot

labels = np.array([1,2,3,0,2,1])
one_hot = one_hot_matrix(labels, C = 4)
print ("one_hot = " + str(one_hot))

2.2.5 用0和1初始化

tf.ones(shape)
tf.zeros(shape)

# GRADED FUNCTION: ones

def ones(shape):
    """
    Creates an array of ones of dimension shape
    
    Arguments:
    shape -- shape of the array you want to create
        
    Returns: 
    ones -- array containing only ones
    """
    
    ### START CODE HERE ###
    
    # Create "ones" tensor using tf.ones(...). (approx. 1 line)

    ones = tf.ones((1, 3))
    # Create the session (approx. 1 line)
    sess = tf.compat.v1.Session()
    
    # Run the session to compute 'ones' (approx. 1 line)
    ones = sess.run(ones)
    
    # Close the session (approx. 1 line). See method 1 above.
    sess.close()
    
    ### END CODE HERE ###
    return ones

print ("ones = " + str(ones([3])))

结果

ones = [[1. 1. 1.]]

2.2 使用TensorFlow建立你第一个神经网络

2.2.1 SIGNS数据集

训练集：1080张代表从数字0到5图片（像素6464）
测试集：120张代表从数字0到5图片（像素6464）

加载数据集

# Loading the dataset
X_train_orig, Y_train_orig, X_test_orig, Y_test_orig, classes = load_dataset()

改变索引可视化数据集

# Example of a picture
index = 1079
plt.imshow(X_train_orig[index])
print ("y = " + str(np.squeeze(Y_train_orig[:, index])))

“展平”数据集，然后标准化（除以255），在此基础上，把每个标签转换为一个one-hot向量，如上面的所示。运行下面的单元格来执行此操作。

# Flatten the training and test images
X_train_flatten = X_train_orig.reshape(X_train_orig.shape[0], -1).T
X_test_flatten = X_test_orig.reshape(X_test_orig.shape[0], -1).T
# Normalize image vectors
X_train = X_train_flatten/255.
X_test = X_test_flatten/255.
# Convert training and test labels to one hot matrices
Y_train = convert_to_one_hot(Y_train_orig, 6)
Y_test = convert_to_one_hot(Y_test_orig, 6)

print ("number of training examples = " + str(X_train.shape[1]))
print ("number of test examples = " + str(X_test.shape[1]))
print ("X_train shape: " + str(X_train.shape))
print ("Y_train shape: " + str(Y_train.shape))
print ("X_test shape: " + str(X_test.shape))
print ("Y_test shape: " + str(Y_test.shape))

结果：

number of training examples = 1080
number of test examples = 120
X_train shape: (12288, 1080)
Y_train shape: (6, 1080)
X_test shape: (12288, 120)
Y_test shape: (6, 120)

注意，12288来自64×64×3。每张图像都是正方形的，64 * 64像素，3是RGB颜色。在继续之前，请确保所有这些形状都有意义。

你的目标是建立一个能够高精度识别符号的算法。为此，您将构建一个tensorflow模型，它与您之前在numpy中为cat识别构建的模型几乎相同(但现在使用softmax输出)。将numpy实现与tensorflow实现进行比较是一个很好的机会。

模型为LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SOFTMAX。SIGMOID输出层已转换为SOFTMAX。当有两个以上的类时，SOFTMAX层将SIGMOID泛化。

2.2.2 建立占位符

您的第一个任务是为X和Y创建占位符，这将允许您稍后在运行会话时传递您的训练数据。

# GRADED FUNCTION: create_placeholders

def create_placeholders(n_x, n_y):
    """
    Creates the placeholders for the tensorflow session.
    
    Arguments:
    n_x -- scalar, size of an image vector (num_px * num_px = 64 * 64 * 3 = 12288)
    n_y -- scalar, number of classes (from 0 to 5, so -> 6)
    
    Returns:
    X -- placeholder for the data input, of shape [n_x, None] and dtype "float"
    Y -- placeholder for the input labels, of shape [n_y, None] and dtype "float"
    
    Tips:
    - You will use None because it let's us be flexible on the number of examples you will for the placeholders.
      In fact, the number of examples during test/train is different.
    """

    ### START CODE HERE ### (approx. 2 lines)
    X = tf.compat.v1.placeholder(tf.float32, [n_x, None], name = "X")
    Y = tf.compat.v1.placeholder(tf.float32, [n_y, None], name = "Y")
    ### END CODE HERE ###
    
    return X, Y

X, Y = create_placeholders(12288, 6)
print ("X = " + str(X))
print ("Y = " + str(Y))

结果：

X = Tensor(“X_6:0”, shape=(12288, None), dtype=float32)
Y = Tensor(“Y_4:0”, shape=(6, None), dtype=float32)

2.2.3 初始化参数

实现下面的函数来初始化tensorflow中的参数。你将使用Xavier初始化来表示权重，使用零初始化来表示偏差。形状如下所示。作为一个例子，为了帮助你，对于W1和b1，你可以使用:

W1 = tf.get_variable("W1", [25,12288], initializer = tf.contrib.layers.xavier_initializer(seed = 1))
b1 = tf.get_variable("b1", [25,1], initializer = tf.zeros_initializer())

请使用seed = 1以确保您的结果与我们的一致。

TensorFlow 2.x之后把tf.contrib.layers.xavier_initializer()替换了，使用新的函数替换即可。
https://blog.csdn.net/weixin_44069398/article/details/109490910

# GRADED FUNCTION: initialize_parameters

def initialize_parameters():
    """
    Initializes parameters to build a neural network with tensorflow. The shapes are:
                        W1 : [25, 12288]
                        b1 : [25, 1]
                        W2 : [12, 25]
                        b2 : [12, 1]
                        W3 : [6, 12]
                        b3 : [6, 1]
    
    Returns:
    parameters -- a dictionary of tensors containing W1, b1, W2, b2, W3, b3
    """
    
    tf.set_random_seed(1)                   # so that your "random" numbers match ours
        
    ### START CODE HERE ### (approx. 6 lines of code)
    W1 = tf.compat.v1.get_variable("W1", [25, 12288], initializer = tf.compat.v2.initializers.GlorotUniform(seed = 1))
    b1 = tf.compat.v1.get_variable("b1", [25, 1], initializer = tf.zeros_initializer())
    W2 = tf.compat.v1.get_variable("W2", [12, 25], initializer = tf.compat.v2.initializers.GlorotUniform(seed = 1))
    b2 = tf.compat.v1.get_variable("b2", [12, 1], initializer = tf.zeros_initializer())
    W3 = tf.compat.v1.get_variable("W3", [6, 12], initializer = tf.compat.v2.initializers.GlorotUniform(seed = 1))
    b3 = tf.compat.v1.get_variable("b3", [6, 1], initializer = tf.zeros_initializer())
    ### END CODE HERE ###

    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2,
                  "W3": W3,
                  "b3": b3}
    
    return parameters

tf.reset_default_graph()
with tf.Session() as sess:
    parameters = initialize_parameters()
    print("W1 = " + str(parameters["W1"]))
    print("b1 = " + str(parameters["b1"]))
    print("W2 = " + str(parameters["W2"]))
    print("b2 = " + str(parameters["b2"]))

结果：

W1 = <tf.Variable ‘W1:0’ shape=(25, 12288) dtype=float32_ref>
b1 = <tf.Variable ‘b1:0’ shape=(25, 1) dtype=float32_ref>
W2 = <tf.Variable ‘W2:0’ shape=(12, 25) dtype=float32_ref>
b2 = <tf.Variable ‘b2:0’ shape=(12, 1) dtype=float32_ref>

正如预期的那样，参数还没有计算出来。

2.2.4 前向传播

现在您将在tensorflow中实现前向传播模块。该函数将接受一个参数字典，并完成向前传递。您将使用的函数是:

tf.add(…,…) to do an addition
tf.matmul(…,…) to do a matrix multiplication
tf.nn.relu(…) to apply the ReLU activation

实现神经网络的前向传递。我们为您注释了numpy的等价物，以便您可以将tensorflow实现与numpy进行比较。重要的是要注意前向传播在z3处停止。原因是在TensorFlow中，最后一个线性层输出被作为计算损失的函数的输入。因此，您不需要a3!

# GRADED FUNCTION: forward_propagation

def forward_propagation(X, parameters):
    """
    Implements the forward propagation for the model: LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SOFTMAX
    
    Arguments:
    X -- input dataset placeholder, of shape (input size, number of examples)
    parameters -- python dictionary containing your parameters "W1", "b1", "W2", "b2", "W3", "b3"
                  the shapes are given in initialize_parameters

    Returns:
    Z3 -- the output of the last LINEAR unit
    """
    
    # Retrieve the parameters from the dictionary "parameters" 
    W1 = parameters['W1']
    b1 = parameters['b1']
    W2 = parameters['W2']
    b2 = parameters['b2']
    W3 = parameters['W3']
    b3 = parameters['b3']
    
    ### START CODE HERE ### (approx. 5 lines)              # Numpy Equivalents:
    Z1 = tf.add(tf.matmul(W1, X), b1)
    A1 = tf.nn.relu(Z1)
    Z2 = tf.add(tf.matmul(W2, A1), b2)
    A2 = tf.nn.relu(Z2)
    Z3 = tf.add(tf.matmul(W3, A2), b3)
    ### END CODE HERE ###
    
    return Z3

tf.reset_default_graph()

with tf.Session() as sess:
    X, Y = create_placeholders(12288, 6)
    parameters = initialize_parameters()
    Z3 = forward_propagation(X, parameters)
    print("Z3 = " + str(Z3))

结果：

Z3 = Tensor(“Add_2:0”, shape=(6, ?), dtype=float32)

你可能会发现前向传播没有使用任何cache缓存，当你在下面实施反向传播时，你就为理解为什么这样。

2.2.5 计算损失

使用tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = ..., labels = ...))计算损失。2.x版本使用tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits = ..., labels = ...))
实现下面的代价函数。
重要的是要知道tf.nn. softmax_cross_entropy_with_logits 的输入“logits”和“labels”的形状为**(样本数，类别数)**。代码中我们就把Z3和Y转置了。
此外,tf. reduce_mean是对样本进行求和。

# GRADED FUNCTION: compute_cost 

def compute_cost(Z3, Y):
    """
    Computes the cost
    
    Arguments:
    Z3 -- output of forward propagation (output of the last LINEAR unit), of shape (6, number of examples)
    Y -- "true" labels vector placeholder, same shape as Z3
    
    Returns:
    cost - Tensor of the cost function
    """
    
    # to fit the tensorflow requirement for tf.nn.softmax_cross_entropy_with_logits(...,...)
    logits = tf.transpose(Z3)
    labels = tf.transpose(Y)
    
    ### START CODE HERE ### (1 line of code)
    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits = logits, labels = labels))
    ### END CODE HERE ###
    
    return cost

tf.reset_default_graph()

with tf.Session() as sess:
    X, Y = create_placeholders(12288, 6)
    parameters = initialize_parameters()
    Z3 = forward_propagation(X, parameters)
    cost = compute_cost(Z3, Y)
    print("cost = " + str(cost))

结果：

cost = Tensor(“Mean:0”, shape=(), dtype=float32)

2.2.6 计算损失&参数更新

这就是您应该感谢编程框架的地方。所有反向传播和参数更新都在一行代码中完成。将一行代码合并到模型中是非常容易的。
在你计算成本函数之后。您将创建一个“optimizer”对象。在运行tf.session时，必须调用该对象以及损失。当被调用时，它将使用所选择的方法和学习率对给定的成本进行优化。
例如，对于梯度下降的优化器是：

optimizer = tf.train.GradientDescentOptimizer(learning_rate = learning_rate).minimize(cost)

为了进行优化，你会做:

_ , c = sess.run([optimizer, cost], feed_dict={X: minibatch_X, Y: minibatch_Y})

它通过以相反的顺序通过TensorFlow来计算反向传播。从成本到输入。
注意:在编码时，我们经常使用_作为“一次性”变量来存储以后不需要使用的值。这里，_取optimizer的评估值，这是我们不需要的(c取cost变量的值)。

2.2.7 建立模型

def model(X_train, Y_train, X_test, Y_test, learning_rate = 0.0001,
          num_epochs = 1500, minibatch_size = 32, print_cost = True):
    """
    Implements a three-layer tensorflow neural network: LINEAR->RELU->LINEAR->RELU->LINEAR->SOFTMAX.
    
    Arguments:
    X_train -- training set, of shape (input size = 12288, number of training examples = 1080)
    Y_train -- test set, of shape (output size = 6, number of training examples = 1080)
    X_test -- training set, of shape (input size = 12288, number of training examples = 120)
    Y_test -- test set, of shape (output size = 6, number of test examples = 120)
    learning_rate -- learning rate of the optimization
    num_epochs -- number of epochs of the optimization loop
    minibatch_size -- size of a minibatch
    print_cost -- True to print the cost every 100 epochs
    
    Returns:
    parameters -- parameters learnt by the model. They can then be used to predict.
    """
    
    ops.reset_default_graph()                         # to be able to rerun the model without overwriting tf variables
    tf.set_random_seed(1)                             # to keep consistent results
    seed = 3                                          # to keep consistent results
    (n_x, m) = X_train.shape                          # (n_x: input size, m : number of examples in the train set)
    n_y = Y_train.shape[0]                            # n_y : output size
    costs = []                                        # To keep track of the cost
    
    # Create Placeholders of shape (n_x, n_y)
    ### START CODE HERE ### (1 line)
    X, Y = create_placeholders(n_x, n_y)
    ### END CODE HERE ###

    # Initialize parameters
    ### START CODE HERE ### (1 line)
    parameters = initialize_parameters()
    ### END CODE HERE ###
    
    # Forward propagation: Build the forward propagation in the tensorflow graph
    ### START CODE HERE ### (1 line)
    Z3 = forward_propagation(X, parameters)
    ### END CODE HERE ###
    
    # Cost function: Add cost function to tensorflow graph
    ### START CODE HERE ### (1 line)
    cost = compute_cost(Z3, Y)
    ### END CODE HERE ###
    
    # Backpropagation: Define the tensorflow optimizer. Use an AdamOptimizer.
    ### START CODE HERE ### (1 line)
    optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(cost)
    ### END CODE HERE ###
    
    # Initialize all the variables
    init = tf.global_variables_initializer()

    # Start the session to compute the tensorflow graph
    with tf.Session() as sess:
        
        # Run the initialization
        sess.run(init)
        
        # Do the training loop
        for epoch in range(num_epochs):

            epoch_cost = 0.                       # Defines a cost related to an epoch
            num_minibatches = int(m / minibatch_size) # number of minibatches of size minibatch_size in the train set
            seed = seed + 1
            minibatches = random_mini_batches(X_train, Y_train, minibatch_size, seed)

            for minibatch in minibatches:

                # Select a minibatch
                (minibatch_X, minibatch_Y) = minibatch
                
                # IMPORTANT: The line that runs the graph on a minibatch.
                # Run the session to execute the "optimizer" and the "cost", the feedict should contain a minibatch for (X,Y).
                ### START CODE HERE ### (1 line)
                _ , minibatch_cost = sess.run([optimizer, cost], feed_dict={X: minibatch_X, Y: minibatch_Y})
                ### END CODE HERE ###
                
                epoch_cost += minibatch_cost / num_minibatches

            # Print the cost every epoch
            if print_cost == True and epoch % 100 == 0:
                print ("Cost after epoch %i: %f" % (epoch, epoch_cost))
            if print_cost == True and epoch % 5 == 0:
                costs.append(epoch_cost)
                
        # plot the cost
        plt.plot(np.squeeze(costs))
        plt.ylabel('cost')
        plt.xlabel('iterations (per tens)')
        plt.title("Learning rate =" + str(learning_rate))
        plt.show()

        # lets save the parameters in a variable
        parameters = sess.run(parameters)
        print ("Parameters have been trained!")

        # Calculate the correct predictions
        correct_prediction = tf.equal(tf.argmax(Z3), tf.argmax(Y))

        # Calculate accuracy on the test set
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))

        print ("Train Accuracy:", accuracy.eval({X: X_train, Y: Y_train}))
        print ("Test Accuracy:", accuracy.eval({X: X_test, Y: Y_test}))
        
        return parameters

运行下面的单元格来训练你的模型!在我们的机器上大约需要5分钟。你的“纪元100之后的成本”应该是1.016458。如果不是，就不要浪费时间;通过点击笔记本上方栏的方块(⬛)来中断训练，并尝试纠正你的代码。如果是正确的成本，休息一下，5分钟后再回来!

parameters = model(X_train, Y_train, X_test, Y_test)

运行结果

Cost after epoch 0: 1.866523
Cost after epoch 100: 0.812711
Cost after epoch 200: 0.571792
Cost after epoch 300: 0.396519
Cost after epoch 400: 0.272928
Cost after epoch 500: 0.193333
Cost after epoch 600: 0.125910
Cost after epoch 700: 0.083817
Cost after epoch 800: 0.055906
Cost after epoch 900: 0.031871
Cost after epoch 1000: 0.022090
Cost after epoch 1100: 0.011795
Cost after epoch 1200: 0.008768
Cost after epoch 1300: 0.005532
Cost after epoch 1400: 0.003180

Parameters have been trained!
Train Accuracy: 1.0
Test Accuracy: 0.8666667
在这里插入图片描述
你的模型似乎足够大，可以很好地适应训练集。然而，考虑到训练精度和测试精度之间的差异，您可以尝试添加L2或dropout正则化来减少过拟合。
将会话看作训练模型的代码块。每次在小批处理上运行会话时，它都会训练参数。总的来说，您已经运行了会话很多次(1500个epoch)，直到您获得训练良好的参数。

总结

Tensorflow是一个用于深度学习的编程框架。
Tensorflow中的两个主要对象类是张量(tensor)和操作符(Operators)。
当你在tensorflow中编码时，你必须采取以下步骤:
- 创建一个包含张量(变量，占位符…)和操作 (tf.matmul, tf.add, …)。
- 创建一个会话
- 初始化会话
- 运行会话来执行图
- 你可以多次执行图，就像你在model()中看到的那样-当在“optimizer”对象上运行会话时，反向传播和优化会自动完成。