吴恩达深度学习第二课-第三周笔记及课后编程题

最新推荐文章于 2023-10-04 19:30:39 发布

Giraffeee_

最新推荐文章于 2023-10-04 19:30:39 发布

阅读量426

点赞数 1

分类专栏：吴恩达深度学习文章标签：深度学习人工智能

本文链接：https://blog.csdn.net/m0_52370089/article/details/129913503

版权

吴恩达深度学习专栏收录该内容

6 篇文章 8 订阅

订阅专栏

笔记

超参数调试处理

在机器学习领域，

- 超参数较少的情况下，我们利用设置网格点的方式来调试超参数；

- 超参数较多的情况下，我们抛弃网格点，而采用随机选择点的方式进行超参数调试。

进一步，我们可以采取coarse to fine方法，从大范围 -> 小范围 -> 再进一步缩小，如下图：

在进行超参数调试处理时要遵循以下原则：

- Random sampling

- Adequate search

- Coarse to fine

为超参数选择合适的范围

上面提到的Random sampling不是指在有效值范围内随机均匀取值，而是应该选择合适的标尺（appropriate scale），在标尺范围内进行随机均匀取值。

- 对于隐藏层数，每一个隐藏层中隐藏单元的数量等，我们可以直接使用线性标尺，在标尺中均匀取整数进行调试即可。

- 对于学习率 α ，假设它的取值区间是[0.0001, 1]，如果采用线性标尺并在上面随机均匀取值，那么有可能90%的搜索资源都会被用在 [0.1, 1] 这个区间，而 [0.0001, 0.1] 这个区间可能只分配到10%的搜索资源，这显然是不合理的。

为了解决这种不合理，我们采用对数标尺，如下图所示：

此时取值范围变为 [-4, 0]，然后我们在这个区间内随机均匀取值r，使 $\alpha=10^r$ 。代码实现如下：

r = -4 * np.random.rand()  # r in [-4, 0]
learning_rate = 10 ** r

使用对数标尺可以保证在每一个小区间内搜索资源分配更均匀。

- 对于指数加权平均中的参数 β，假设其取值区间是 [0.9, 0.999]，同样如果采取线性标尺进行搜索是不合理的。

为了解决这种不合理，我们同样采用对数标尺，但和学习率 α 有些许不同，具体实现如下：

$\beta\in [0.9, 0.999]$

则 $1-\beta\in[0.001, 0.1]$

对1 - β 采用对数标尺 => 取值区间变为 [-3, -1]

在此区间上随机均匀取值r，则 $1-\beta=10^r$ ， $\beta=1-10^r$

超参数调试两大流派

在超参数调试的实际操作中，我们需要根据我们现有的计算资源来决定以什么样的方式去调试超参数，进而对模型进行改进。

- Panda：在没有足够的计算资源时，我们只能训练一个模型，在训练过程中进行监视，不断调整参数进行优化，以获得一个良好的模型。

- Cavier：在计算资源充足时，我们进行平行测试，即同时训练多个具有不同超参数设置的模型，选取其中最好的一个。

Batch Normalization

在前面的学习中，我们知道将输入特征进行归一化可以加速模型的训练。同样的，在多层网络中，我们可以对每一层的激活（activation）进行归一化，从而加速网络的训练。

在实际应用时，通常是对 $z^{[l]}$ 进行归一化，具体实现如下：

以神经网络中某一隐藏层的中间值为例：

$\mu=\frac{1}{m}\sum_{i=1}^mz^{(i)}$

$\sigma^2=\frac{1}{m}\sum_{i=1}^m(z^{(i)}-\mu)^2$

$z^{(i)}_{norm}=\frac{z^{(i)}-\mu}{\sqrt{\sigma^2+\epsilon}}$ ，这里的ε是为了避免分母为0

完成以上步骤后，所有的z分量都处在均值为0，方差为1的分布上，但是我们不总希望隐藏层的单元在这样的分布上，或许不同的分布更有意义或更有效果，所以加上一步——

$\tilde{z}^{(i)}=\gamma z_{norm}^{(i)}+\beta$

注意：

- 这里的 β 不是指数加权平均中的参数 β。γ 和 β都是和权重W一样的可学习参数（learnable parameters），它们共同确定 $\tilde{z}^{(i)}$ 所属的分布。

- 在观看教学视频时，看到有很多朋友在纠结batch norm是对隐藏层的横向还是纵向作用。参照对输入特征进行归一化，它是对不同样本的同一特征进行归一化的，因此对隐藏层也应如此，所以我认为是横向的，而且吴恩达老师的公式用的是1\m，根据以往课程的标注，m指的是样本数量。

Batch Norm在神经网络中的实现

- Forward Propagation:

- Backward Propagation:

· 计算各个参数的梯度： $dW^{[l]}$ ， $d\gamma^{[l]}$ ， $d\beta^{[l]}$

· 更新参数：

   $W^{[l]}:=W^{[l]}-\alpha dW^{[l]}$

         $\gamma^{[l]}:=\gamma^{[l]}-\alpha d\gamma^{[l]}$

         $\beta^{[l]}:=\beta^{[l]}-\alpha d\beta^{[l]}$

可以注意到，上面没有出现偏差项 $b^{[l]}$ ，是因为在归一化的“减均值”操作中，b会被消去——同一个神经元中横向都是同一个偏差项 $b^{[l]}_i$ ，均值为本身，因此无论 $b^{[l]}$ 是多少，在归一化过程中都会被消去，不起作用。因此在神经网络中应用Batch norm，可以直接将 $b^{[l]}$ 去掉，或者将其置零。

Batch norm同样适用于Momentum，RMSprop，Adam。

Batch Norm为什么奏效

- 与对输入特征进行归一化相同，它可以使数据落在大致相同的范围内，从而改变了Cost function的形状（如：从一个扁平的椭圆变成一个圆），使得每一次梯度下降都可以更快的接近全局最小点，从而加速了模型的训练。

- 它可以使权重比网络更滞后或更深层，比如第十层的权重比前面层的权重更能经受得住变化。怎么理解呢？数据经过一层层网络计算后，其数据分布也在发生着变化，这种现象称为covariant shift，在神经网络中它描述的是训练数据和测试数据存在分布的差异性，给网络的泛化性和训练速度带来了影响。在使用了batch norm之后，即使上一层的值不断变化，其均值和方差都会保持，所以我们说batch norm限制了前层的参数更新对后面网络数值分布程度的影响，使得后层的数值更稳定。换一个说法就是，batch norm削弱了前层参数和后层参数之间的联系，使得网络的每层相对其他层有一定的独立性，这有助于加速训练。

Batch Norm's Side Effect

Batch Norm还有轻微的正则化效果。

在使用Mini-batch梯度下降的时候，每次计算均值和偏差都是在一个Mini-batch上进行，而不是在整个数据样集上。这样就给均值和偏差带来一些比较小的噪声。那么用均值和偏差计算得到的 $\tilde{z}^{[l]}$ 也将会加入一定的噪声。

和Dropout相似，其在每个隐藏层的激活值上加入了一些噪声，所以batch norm也有轻微的正则化效果。

需要注意的是，batch norm只是有正则化的效果，但是不能作为正式的正则化方法使用。

在测试数据上使用batch norm

在训练过程中，我们对每一个mini-batch使用batch norm，计算所需的均值 $\mu$ 和方差 $\sigma^2$ 。而在测试时，我们是对每一个测试样本单独预测，此时计算它的均值和方差是没有意义的，但是训练好的模型中有这一步，该如何解决呢？

在训练过程中，对于训练集的所有mini-batch，使用指数加权平均，当训练结束后，得到指数加权平均后的均值 $\theta_{\mu}$ 和方差 $\theta_{\sigma^2}$ ，将它们作为测试样本的计算。具体实现如下：

on layer l:

set   $\theta_{\mu}=0$ ，维度与 μ 相同

set $\theta_{\sigma^2}=0$ ，维度与 $\sigma^2$ 相同

for t = 1... :

         $\theta_{\mu}=\beta \theta_{\mu}+(1-\beta)\mu$

         $\theta_{\sigma^2}=\beta \theta_{\sigma^2}+(1-\beta)\sigma^2$

Softmax回归

概述

Softmax回归是logistic回归的更一般形式，应用于多分类问题中。

Softmax回归将多分类任务的输出转换成各个类别可能的概率，然后将最大的概率值所对应的类别作为输入样本的输出类别。

具体实现如下：

$z^{[L]}=W^{[L]}a^{[L-1]}+b^{[L]}$

$t=e^{z^{[L]}}$

$a^{[L]}=\frac{e^{z^{[L]}}}{\sum_{j=1}^Ct_j}$ ，即 $a_i^{[L]}=\frac{t_i}{\sum_{j=1}^Ct_j}$ # C为类别数

由上面式子可知： $\sum_{i=1}^Ca^{[L]}_i=1$ 。

在没有隐藏层时，即输入特征直接进入Softmax层然后输出，在不同数量的类别下，Softmax层的作用是使不同的类别间有线性的决策边界，如下图：

Loss Function

$L(\hat{y},y)=-\sum_{j=1}^Cy_jlog\hat{y}_j$

易知：当 $y_j=1$ 时才有意义，所以公式可化简为： $L(\hat{y},y)=-y_ilog\hat{y}_i=-log\hat{y}_i$ （其中i为标签值为1的类别），因此为了最小化Loss function，我们要使 $\hat{y}_i$ 的概率尽可能大。

Loss function的作用是找到训练集中的真实类别，然后使得该类别对应的概率尽可能高，这其实是极大似然估计（Maximum Likelyhood Estimation）的一种形式。

Cost Function

$J(W^{[1]},b^{[1]}...)=\frac{1}{m}\sum_{i=1}^mL(\hat{y}^{(i)},y^{(i)})$

Softmax VS Hardmax

通常是将输出的最大值所对应的类别判定为该样本的类别，将最大值的位置置为1，其他位置置为0，这是所谓的“Hardmax”。而Softmax仍然保留了本来计算出来的各个类别概率，不做改变。

课后编程题

在进行本周的编程题前，需要准备文件：第三周课后编程题资料，提取码：w4de

到目前为止，我们一直在使用numpy来自己编写神经网络。这次作业我们将使用深度学习框架进行神经网络编写。使用框架编程不仅可以节省你的写代码时间，还可以让你的优化速度更快。

我们将学习TensorFlow这个框架如何实现以下任务：

初始化变量
建立一个会话(session)
训练的算法
实现一个神经网络

对于tensorflow的代码而言，实现代码的结构如下：

- 创建tensorflow变量（此时，尚未直接计算）

- 实现tensorflow变量之间的操作定义

- 初始化tensorflow变量 init = tf.global_variables_initializer()

- 创建session进行评估损失的值并打印它的值

- 运行session，此时，之前编写的操作都会在这一步运行

总之，我们需要初始化变量，并创建一个session来运行它。

使用session的两种方法：

Method 1:

sess = tf.Session()
# Run the variables initialization (if needed), run the operations
result = sess.run(..., feed_dict = {...})
sess.close() # Close the session

Method 2:

with tf.Session() as sess: 
    # run the variables initialization (if needed), run the operations
    result = sess.run(..., feed_dict = {...})
    # This takes care of closing the session for you :)

使用TensorFlow构建一个神经网络，需要记住的是实现模型需要做以下两个步骤：

创建计算图
运行计算图

代码如下：

（需要说明的是：我的tensorflow版本是2.9.1，所以部分代码会跟网上的其他同题材的不一样）

import tensorflow as tf
import h5py
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.python.framework import ops
import time
from tr_utils import *

np.random.seed(1)

"""
Writing and running programs in TensorFlow has the following steps:
    - Create Tensors (variables) that are not yet executed/evaluated.
    - Write operations between those Tensors.
    - Initialize your Tensors.
    - Create a Session.
    - Run the Session. This will run the operations you'd written above.
"""
tf.compat.v1.disable_eager_execution()

# loss=(y_hat[i]-y[i])^2
y_hat = tf.constant(36, name='y_hat')            # Define y_hat constant. Set to 36.
y = tf.constant(39, name='y')                    # Define y. Set to 39

loss = tf.Variable((y - y_hat)**2, name='loss')

init = tf.compat.v1.global_variables_initializer()         # When init is run later (session.run(init)), the loss
                                                           # variable will be initialized and ready to be computed

with tf.compat.v1.Session() as session:                    # Create a session and print the output
    session.run(init)                            # Initializes the variables
    print(session.run(loss))                     # Prints the loss

"""
Placeholder:
A placeholder is an object whose value you can specify only later. To specify values for a placeholder, you can pass in 
values by using a "feed dictionary" (feed_dict variable). 
A placeholder is simply a variable that you will assign data to only later, when running the session. We say that you 
feed data to these placeholders when running the session.
"""
x = tf.compat.v1.placeholder(tf.int64, name='x')
with tf.compat.v1.Session() as session:
    print(session.run(2*x, feed_dict={x: 3}))


# define a linear function
def linear_function():
    """
    Implements a linear function:
            Initializes W to be a random tensor of shape (4,3)
            Initializes X to be a random tensor of shape (3,1)
            Initializes b to be a random tensor of shape (4,1)
    Returns:
    result -- runs the session for Y = WX + b
    """

    np.random.seed(1)

    # START CODE HERE #
    X = np.random.randn(3, 1)
    W = np.random.randn(4, 3)
    b = np.random.randn(4, 1)
    Y = tf.add(tf.matmul(W, X), b)
    # END CODE HERE #

    # Create the session using tf.Session() and run it with sess.run(...) on the variable you want to calculate
    # START CODE HERE #
    sess = tf.compat.v1.Session()
    result = sess.run(Y)
    # END CODE HERE #

    # close the session
    sess.close()

    return result


# print("result = " + str(linear_function()))

# implement the sigmoid function
def sigmoid(z):
    """
    Computes the sigmoid of z

    Arguments:
    z -- input value, scalar or vector

    Returns:
    results -- the sigmoid of z
    """

    # START CODE HERE #
    # Create a placeholder for x. Name it 'x'.
    tf.compat.v1.placeholder(tf.float32, name='x')

    # compute sigmoid(x)
    sigmoid = tf.sigmoid(x)

    # Create a session, and run it.
    # You should use a feed_dict to pass z's value to x.
    with tf.compat.v1.Session() as sess:
        # Run session and call the output "result"
        result = sess.run(sigmoid, feed_dict={x: z})
    # END CODE HERE #

    return result


# print("sigmoid(0) = " + str(sigmoid(0.0)))
# print("sigmoid(12) = " + str(sigmoid(12.0)))


# Implement the cross entropy loss
def cost(logits, labels):
    """
    Computes the cost using the sigmoid cross entropy

    Arguments:
    logits -- vector containing z, output of the last linear unit (before the final sigmoid activation)
    labels -- vector of labels y (1 or 0)

    Note: What we've been calling "z" and "y" in this class are respectively called "logits" and "labels"
    in the TensorFlow documentation. So logits will feed into z, and labels into y.

    Returns:
    cost -- runs the session of the cost
    """
    # START CODE HERE #
    # Create the placeholders for "logits" (z) and "labels" (y)
    z = tf.compat.v1.placeholder(tf.float32, name='z')
    y = tf.compat.v1.placeholder(tf.float32, name='y')

    # Use the loss function
    cost = tf.nn.sigmoid_cross_entropy_with_logits(logits=z, labels=y)

    with tf.compat.v1.Session() as sess:
        cost = sess.run(cost, feed_dict={z: logits, y: labels})
    # END CODE HERE #

    return cost


# logits = sigmoid(np.array([0.2, 0.4, 0.7, 0.9]))
# cost = cost(logits, np.array([0., 0., 1., 1.]))
# print("cost = " + str(cost))


# Using one hot encodings
def one_hot_matrix(labels, C):
    """
    Creates a matrix where the i-th row corresponds to the ith class number and the jth column
                     corresponds to the jth training example. So if example j had a label i. Then entry (i,j)
                     will be 1.

    Arguments:
    labels -- vector containing the labels
    C -- number of classes, the depth of the one hot dimension

    Returns:
    one_hot -- one hot matrix
    """
    # START CODE HERE #
    # Create a tf.constant equal to C (depth), name it 'C'.
    C = tf.constant(C, name="C")

    # Use tf.one_hot, be careful with the axis
    one_hot_matrix = tf.one_hot(indices=labels, depth=C, axis=0)

    with tf.compat.v1.Session() as sess:
        one_hot = sess.run(one_hot_matrix)
    # CODE END HERE #

    return one_hot


# labels = np.array([1, 2, 3, 0, 2, 1])
# one_hot = one_hot_matrix(labels, C=4)
# print("one_hot = " + str(one_hot))


# Initialize with zeros and ones
def ones(shape):
    """
    Creates an array of ones of dimension shape

    Arguments:
    shape -- shape of the array you want to create

    Returns:
    ones -- array containing only ones
    """
    # START CODE HERE #
    # Create "ones" tensor using tf.ones(...).
    Ones = tf.ones(shape)

    with tf.compat.v1.Session() as sess:
        Ones = sess.run(Ones)
    # END CODE HERE #

    return Ones


# print("ones = " + str(ones([3, 2])))


# Building neural network in tensorflow
"""
Training set: 1080 pictures (64 by 64 pixels) of signs representing numbers from 0 to 5 (180 pictures per number).
Test set: 120 pictures (64 by 64 pixels) of signs representing numbers from 0 to 5 (20 pictures per number).
"""

# Loading the dataset
X_train_orig, Y_train_orig, X_test_orig, Y_test_orig, classes = load_dataset()

index = 22
plt.imshow(X_train_orig[index])
plt.show()
print("y = " + str(np.squeeze(Y_train_orig[:, index])))
print("X_train_orig.shape is: " + str(X_train_orig.shape))
print("Y_train_orig.shape is: " + str(Y_train_orig.shape))
print("X_test_orig.shape is: " + str(X_test_orig.shape))
print("Y_test_orig.shape is: " + str(Y_test_orig.shape))

# Flatten the training and test images
X_train_flatten = X_train_orig.reshape(X_train_orig.shape[0], -1).T
X_test_flatten = X_test_orig.reshape(X_test_orig.shape[0], -1).T
# Normalize image vectors
X_train = X_train_flatten / 255
X_test = X_test_flatten / 255
# Convert training and test labels to one hot matrices
Y_train = convert_to_one_hot(Y_train_orig, 6)
Y_test = convert_to_one_hot(Y_test_orig, 6)

print("number of training examples = " + str(X_train.shape[1]))
print("number of test examples = " + str(X_test.shape[1]))
print("X_train shape: " + str(X_train.shape))
print("Y_train shape: " + str(Y_train.shape))
print("X_test shape: " + str(X_test.shape))
print("Y_test shape: " + str(Y_test.shape))

# The model is LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SOFTMAX.
# Create placeholders
def create_placeholders(n_x, n_y):
    """
    Creates the placeholders for the tensorflow session.

    Arguments:
    n_x -- scalar, size of an image vector (num_px * num_px = 64 * 64 * 3 = 12288)
    n_y -- scalar, number of classes (from 0 to 5, so -> 6)

    Returns:
    X -- placeholder for the data input, of shape [n_x, None] and dtype "float"
    Y -- placeholder for the input labels, of shape [n_y, None] and dtype "float"

    Tips:
    - You will use None because it let's us be flexible on the number of examples you will for the placeholders.
      In fact, the number of examples during test/train is different.
    """
    # START CODE HERE #
    X = tf.compat.v1.placeholder(tf.float32, [n_x, None], name="X")
    Y = tf.compat.v1.placeholder(tf.float32, [n_y, None], name="Y")
    # END CODE HERE #

    return X, Y


X, Y = create_placeholders(12288, 6)
print("X = " + str(X))
print("Y = " + str(Y))


# Initializing the parameters
# Use Xavier Initialization for weights and Zero Initialization for biases
def initialize_parameters():
    """
    Initializes parameters to build a neural network with tensorflow. The shapes are:
                        W1 : [25, 12288]
                        b1 : [25, 1]
                        W2 : [12, 25]
                        b2 : [12, 1]
                        W3 : [6, 12]
                        b3 : [6, 1]

    Returns:
    parameters -- a dictionary of tensors containing W1, b1, W2, b2, W3, b3
    """

    tf.compat.v1.set_random_seed(1)  # so that your "random" numbers match ours

    # START CODE HERE #
    W1 = tf.compat.v1.get_variable("W1", [25, 12288], initializer=tf.compat.v1.keras.initializers.glorot_normal(seed=1))
    b1 = tf.compat.v1.get_variable("b1", [25, 1], initializer=tf.zeros_initializer())
    W2 = tf.compat.v1.get_variable("W2", [12, 25], initializer=tf.compat.v1.keras.initializers.glorot_normal(seed=1))
    b2 = tf.compat.v1.get_variable("b2", [12, 1], initializer=tf.zeros_initializer())
    W3 = tf.compat.v1.get_variable("W3", [6, 12], initializer=tf.compat.v1.keras.initializers.glorot_normal(seed=1))
    b3 = tf.compat.v1.get_variable("b3", [6, 1], initializer=tf.zeros_initializer())
    # END CODE HERE #

    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2,
                  "W3": W3,
                  "b3": b3}

    return parameters


tf.compat.v1.reset_default_graph()
with tf.compat.v1.Session() as sess:
    parameters = initialize_parameters()
    print("W1 = " + str(parameters["W1"]))
    print("b1 = " + str(parameters["b1"]))
    print("W2 = " + str(parameters["W2"]))
    print("b2 = " + str(parameters["b2"]))


# Forward Propagation
def forward_propagation(X, parameters):
    """
    Implements the forward propagation for the model: LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SOFTMAX

    Arguments:
    X -- input dataset placeholder, of shape (input size, number of examples)
    parameters -- python dictionary containing your parameters "W1", "b1", "W2", "b2", "W3", "b3"
                  the shapes are given in initialize_parameters

    Returns:
    Z3 -- the output of the last LINEAR unit
    """

    # Retrieve the parameters from the dictionary "parameters"
    W1 = parameters['W1']
    b1 = parameters['b1']
    W2 = parameters['W2']
    b2 = parameters['b2']
    W3 = parameters['W3']
    b3 = parameters['b3']

    # START CODE HERE #
    Z1 = tf.add(tf.matmul(W1, X), b1)
    A1 = tf.nn.relu(Z1)
    Z2 = tf.add(tf.matmul(W2, A1), b2)
    A2 = tf.nn.relu(Z2)
    Z3 = tf.add(tf.matmul(W3, A2), b3)
    # END CODE HERE #

    return Z3


tf.compat.v1.reset_default_graph()

with tf.compat.v1.Session() as sess:
    X, Y = create_placeholders(12288, 6)
    parameters = initialize_parameters()
    Z3 = forward_propagation(X, parameters)
    print("Z3 = " + str(Z3))


# Compute cost
def compute_cost(Z3, Y):
    """
    Computes the cost

    Arguments:
    Z3 -- output of forward propagation (output of the last LINEAR unit), of shape (6, number of examples)
    Y -- "true" labels vector placeholder, same shape as Z3

    Returns:
    cost - Tensor of the cost function
    """

    # to fit the tensorflow requirement for tf.nn.softmax_cross_entropy_with_logits(...,...)
    logits = tf.transpose(Z3)
    labels = tf.transpose(Y)

    # START CODE HERE #
    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels))
    # END CODE HERE #

    return cost


tf.compat.v1.reset_default_graph()

with tf.compat.v1.Session() as sess:
    X, Y = create_placeholders(12288, 6)
    parameters = initialize_parameters()
    Z3 = forward_propagation(X, parameters)
    cost = compute_cost(Z3, Y)
    print("cost = " + str(cost))


# Implement the model
def model(X_train, Y_train, X_test, Y_test, learning_rate=0.0001,
          num_epochs=1500, minibatch_size=32, print_cost=True):
    """
    Implements a three-layer tensorflow neural network: LINEAR->RELU->LINEAR->RELU->LINEAR->SOFTMAX.

    Arguments:
    X_train -- training set, of shape (input size = 12288, number of training examples = 1080)
    Y_train -- test set, of shape (output size = 6, number of training examples = 1080)
    X_test -- training set, of shape (input size = 12288, number of training examples = 120)
    Y_test -- test set, of shape (output size = 6, number of test examples = 120)
    learning_rate -- learning rate of the optimization
    num_epochs -- number of epochs of the optimization loop
    minibatch_size -- size of a minibatch
    print_cost -- True to print the cost every 100 epochs

    Returns:
    parameters -- parameters learnt by the model. They can then be used to predict.
    """
    ops.reset_default_graph()  # to be able to rerun the model without overwriting tf variables
    tf.compat.v1.set_random_seed(1)  # to keep consistent results
    seed = 3  # to keep consistent results
    (n_x, m) = X_train.shape  # (n_x: input size, m : number of examples in the train set)
    n_y = Y_train.shape[0]  # n_y : output size
    costs = []  # To keep track of the cost

    # Create Placeholders of shape (n_x, n_y)
    X, Y = create_placeholders(n_x, n_y)

    # Initialize parameters
    parameters = initialize_parameters()

    # Forward propagation
    Z3 = forward_propagation(X, parameters)

    # Compute cost
    cost = compute_cost(Z3, Y)

    # Backward propagation: Define the tensorflow optimizer. Use an AdamOptimizer
    optimizer = tf.compat.v1.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

    # Initialize all the variables
    init = tf.compat.v1.global_variables_initializer()

    # Start the session to compute the tensorflow graph
    with tf.compat.v1.Session() as sess:
        # Run the initialization
        sess.run(init)

        # Do the training loop
        for epoch in range(num_epochs):
            epoch_cost = 0.  # Defines a cost related to an epoch
            num_minibatches = int(m / minibatch_size)  # number of minibatches of size minibatch_size in the train set
            seed = seed + 1
            minibatches = random_mini_batches(X_train, Y_train, minibatch_size, seed)

            for minibatch in minibatches:
                # Select a minibatch
                (minibatch_X, minibatch_Y) = minibatch

                # IMPORTANT: The line that runs the graph on a minibatch.
                # Run the session to execute the "optimizer" and the "cost", the feed_dict should contain a minibatch for (X,Y).
                _, minibatch_cost = sess.run([optimizer, cost], feed_dict={X: minibatch_X, Y: minibatch_Y})

                epoch_cost += minibatch_cost / num_minibatches

            # Print the cost every epoch
            if print_cost is True and epoch % 100 == 0:
                print("Cost after epoch %i: %f" % (epoch, epoch_cost))
            if print_cost is True and epoch % 5 == 0:
                costs.append(epoch_cost)

        # Plot the cost
        plt.plot(np.squeeze(costs))
        plt.ylabel('cost')
        plt.xlabel('iterations(per tens)')
        plt.title("learning_rate = " + str(learning_rate))
        plt.show()

        # Save the parameters in a variable
        parameters = sess.run(parameters)

        # Calculate the correct predictions
        correct_prediction = tf.equal(tf.argmax(Z3), tf.argmax(Y))

        # Calculate accuracy on the test set
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))

        print("Train Accuracy:", accuracy.eval({X: X_train, Y: Y_train}))
        print("Test Accuracy:", accuracy.eval({X: X_test, Y: Y_test}))

        return parameters

start_time = time.clock()
parameters = model(X_train, Y_train, X_test, Y_test)
end_time = time.clock()
print("CPU time = " + str(end_time-start_time) + 's')

通过此次tensorflow的入门，我们知道了：

1. tensorflow是一个用于深度学习的编程框架；

2. tensorflow中的两个主要对象类是张量（tensor）和操作符（operator）；

3. 当使用tensorflow进行编程时，要遵循以下步骤：

- 创建包含张量（tensor）（张量包括但不限于变量（variables），占位符（placeholder））和操作（tf.add, tf.matmul...）的计算图；

- 创建一个会话（session）；

- 初始化会话（initialize session）；

- 运行会话用于执行计算图（Run the session to execute the graph）；

4. 可以多次执行计算图；

5. 反向传播和优化是在“optimizer”对象上运行会话时自动完成的。