深度学习（二）改善深层神经网络:超参数调试、正则化以及优化-（超参数调试、Batch正则化和程序框架及课后作业） -Andrew Ng

最新推荐文章于 2024-04-28 18:33:36 发布

小飞猪666

最新推荐文章于 2024-04-28 18:33:36 发布

阅读量2.1k

点赞数

分类专栏：深度学习吴恩达

本文链接：https://blog.csdn.net/yangshaojun1992/article/details/105584848

版权

深度学习吴恩达专栏收录该内容

13 篇文章 16 订阅

订阅专栏

1.5 将 BatchNorm 拟合进神经网络

3.7 在Tensorflow中构建您的第一个神经网络

一、基础知识

1.1 调试处理

关于训练深度最难的事情之一是你要处理的参数的数量，从学习速率𝑎 到Momentum（动量梯度下降法）的参数𝛽。

如果使用Momentum或Adam优化算法的参数𝛽1，𝛽2和𝜀，也许你还得选择层数，也许你还得选择不同层中隐藏单元的数量，也许你还想使用学习率衰减。

所以，你使用的不是单一的学习率𝑎，当然你可能还需要选择mini-batch的大小。

结果证实一些超参数比其它的更为重要，学习速率就是需要调试的最重要的超参数。除了𝑎，还有一些参数需要调试，例如 Momentum 参数𝛽，0.9 就是个很好的默认值。

我还会调试 mini-batch 的大小，以确保最优算法运行有效。我还会经常调试隐藏单元，我用橙色圈住的这些，这三个是我觉得其次比较重要的。

当应用 Adam 算法时，事实上，我从不调试𝛽1，𝛽2和𝜀，我总是选定其分别为 0.9，0.999 和10−8。

我们在进行超参调整的时候，在几个几何范围内进行尝试，如果找到一个较好的范围，则在这个范围继续细分尝试。

1.2 为超参数选择合适的范围

这一节讲怎样选择合适的标尺来寻找最优超参

假如我们设定学习率应该在0.001~1这个区间内比较好，我们怎么选择我们尝试的点呢？是1-0.001再除10？这样对于0.001~0.01这个区间我们使用了过少的资源，因此我们采用一种新的方式

我们使用0.001，0.01，0.1，1这几个梯度来进行计算，这样在每一个梯度之间使用的资源几乎相等。

也就是我们在10的n次方上取等份。

1.3 超参数训练的实践

深度学习领域中，发展很好的一点是，不同应用领域的人们会阅读越来越多其它研究领域的文章，跨领域去寻找灵感。

所以这两种方式的选择，是由你拥有的计算资源决定的，如果你拥有足够的计算机去平行试验许多模型，那绝对采用鱼子酱方式，尝试许多不同的超参数，看效果怎么样。但在一些应用领域，比如在线广告设置和计算机视觉应用领域，那里的数据太多了，你需要试验大量的模型，所以同时试验大量的模型是很困难的，它的确是依赖于应用的过程。但我看到那些应用熊猫方式多一些的组织，那里，你会像对婴儿一样照看一个模型，调试参数，试着让它工作运转。

1.4 归一化网络的激活函数

batch归一化会使你的参数搜索问题变得更加容易，使神经网络对超参的选择更加稳定，超参的范围会更加庞大，工作效果会更好，因此你训练起来也更容易。

之前我们学过输入数据的归一化，这次我们将输入的归一化引入到神经网络中，我们对 𝑧 进行归一化，以达到一个比较好的效果。

方法如下，减去均值再除以标准偏差，为了使数值稳定，通常将𝜀作为分母，以防𝜎 = 0的情况。

1.5 将 BatchNorm 拟合进神经网络

你已经看到那些等式，它可以在单一隐藏层进行 Batch 归一化，接下来，让我们看看它是怎样在深度网络训练中拟合的吧。

Batch归一化是发生在计算z和a之间。

实际上就是在每一次计算完z之后进行一次数据标准化（减去均值除以方差），然后在进行下一层计算，至于使用什么来激活，使用什么方法进行下降，都和这个没有关系。

在框架中实际上只有一行代码就可以完成这个操作，比如在TensorFlow中，我们只需要tf.nn.batch_normalization即可完成batch Norm操作。

我们一般将batch Norm与mini-batch组合使用。

1.6 BatchNorm 为什么奏效

第一个原因，经过标准化的输入似的其均值为0，方差为1，因此我们可以将一个很大范围的输入值转换到0~1之间，可以加速学习。

另一个原因是它可以使权重比你的网络更滞后或更深层。

比如我们使用一个神经网络来训练猫的识别，我们所有的例子都是黑猫

这时候如果我们测试其他颜色的猫

效果很可能不好，为了防止这种情况，我们需要使网络不过分依赖输入数据，这时候我们就可以引入Batch Norm，

我们可以把每一层神经网络拆开来，当前层的前一层为当前层的输入层，那么怎么使当前层不过分依赖前一层的值呢，就是标准化，在前一层值的基础上重新计算一个方差为1平均值为0的一组数，让每一层网络都可以“独立”学习，以达到更好的效果。

当然，dropout也有一定的效果，有时候我们将dropout和batch norm合起来使用。

1.7 测试时的Batch Norm

我们在训练时由于使用了Batch Norm，每一层都进行标准化，但是我们在测试的时候，一个数据的标准化是没有意义的，因此我们需要估计这两个标准化的参数。

我们采用指数加权平均的方法来估计这两个数，根据训练过程中每个mini-batch的数求出平均值，用于测试。

实际上采用何种方法比如直接取最后一次是没有很大的影响的（但是实际上运用加权平均比较多），只要合理的取出这两个数值在测试中都会有效。

1.8 Softmax回归

对于多分类的问题，我们不能像之前二分类一样输出是不是的概率。

因此引入softmax，实际上就是输出层的激活函数，先对输出层取指数，然后求出一个和为1的概率值。

1.9 深度学习框架

现在有许多深度学习框架，能让实现神经网络变得更简单。

二、测验

1. 如果在大量的超参数中搜索最佳的参数值，那么应该尝试在网格中搜索而不是使用随机值，以便更系统的搜索，而不是依靠运气，请问这句话是正确的吗？

错误，应当尝试随机值，不要使用网格搜索，因为你不知道哪些超参数比其他的更重要。

2. 每个超参数如果设置得不好，都会对训练产生巨大的负面影响，因此所有的超参数都要调整好，请问这是正确的吗？

错误，比如epsilon，就属于比较无关紧要的参数。

3. 在超参数搜索过程中，你尝试只照顾一个模型（使用熊猫策略）还是一起训练大量的模型（鱼子酱策略）在很大程度上取决于

是否使用批量（batch）或小批量优化（mini-batch optimization）
神经网络中局部最小值（鞍点）的存在性
在你能力范围内，你能够拥有多大的计算能力（true）
需要调整的超参数的数量

4. 如果您认为β（动量超参数）介于0.9和0.99之间，那么推荐采用以下哪一种方法来对β值进行取样？

r = np.random.rand()
beta = 1 - 10 ** ( - r - 1 )

5. 找到好的超参数的值是非常耗时的，所以通常情况下你应该在项目开始时做一次，并尝试找到非常好的超参数，这样你就不必再次重新调整它们。请问这正确吗？

错误，模型中的细微变化可能导致您需要从头开始重新找到好的超参数。

6. 在视频中介绍的批量标准化中，如果将其应用于神经网络的第l层，那么您使用什么进行标准化？

z[l]

7.在标准化公式中，为什么要使用epsilon（ϵ）？

为了避免除0操作。

8. 批处理规范中关于 γ 和 β 的以下哪些陈述是正确的？

它们可以在Adam、具有动量的梯度下降或RMSprop使中用，而不仅仅是用梯度下降来学习。（true）
它们设定给定层的线性变量 z[l] 的均值和方差。（true）

9. 在训练具有批处理规范的神经网络之后，在测试时间，在新样本上评估神经网络，您应该：

执行所需的标准化，在训练期间使用使用了μ和σ2的指数加权平均值来估计mini-batches的情况。

10. 关于深度学习编程框架的这些陈述中，哪一个是正确的？

通过编程框架，您可以使用比低级语言（如Python）更少的代码来编写深度学习算法。（true）
即使一个项目目前是开源的，项目的良好管理有助于确保它即使在长期内仍然保持开放，而不是仅仅为了一个公司而关闭或修改。（true）
深度学习编程框架的运行需要基于云的机器。

三、编程作业

到目前为止，您一直使用numpy来构建神经网络。现在，我们将引导您通过深度学习框架，该框架将使您可以更轻松地构建神经网络。 TensorFlow，PaddlePaddle，Torch，Caffe，Keras等机器学习框架可以极大地加快您的机器学习开发速度。所有这些框架也都有很多文档，您应该随时阅读。在此作业中，您将学习在TensorFlow中执行以下操作：

初始化变量
开始自己的会话
训练算法
实现神经网络

编程框架不仅可以缩短编码时间，而且有时还可以进行优化以加快代码速度。

环境准备tensorflow 下载地址：

https://files.pythonhosted.org/packages/d5/1c/3ac472009a5c54ae7ec5a3294520ca36d1908cd1e5cf3e3fd923f9b7b31f/tensorflow-1.13.1-cp37-cp37m-macosx_10_11_x86_64.whl

https://files.pythonhosted.org/packages/35/55/a0dbd642e68e68f3e309d1413abdc0a7aa7e1534c79c0fc2501defb864ac/tensorflow-2.1.0-cp37-cp37m-macosx_10_11_x86_64.whl

这里我们安装1.13.1 版本

pip install tensorflow-1.13.1-cp37-cp37m-macosx_10_11_x86_64.whl

3.1 探索Tensorflow库

导入库：

import tensorflow as tf

现在，已经导入了库，我们将完成其不同的应用程序。接下来从一个示例开始，计算一个示例的损失。

y_hat = tf.constant(36, name='y_hat')            # Define y_hat constant. Set to 36.
y = tf.constant(39, name='y')                    # Define y. Set to 39
x1=2
x2=4
loss = tf.Variable((x1 - x2)**2, name='loss')  # Create a variable for the loss

init = tf.global_variables_initializer()        # When init is run later (session.run(init)),
                                                 # the loss variable will be initialized and ready to be computed
with tf.Session() as session:                    # Create a session and print the output
    session.run(init)                            # Initializes the variables
    print(session.run(loss))                   # Prints the loss

在TensorFlow中编写和运行程序具有以下步骤：

Create Tensors (variables) that are not yet executed/evaluated.
Write operations between those Tensors.
Initialize your Tensors.
Create a Session.
Run the Session. This will run the operations you'd written above.

因此，当我们为损失创建变量时，我们仅将损失定义为其他数量的函数，但没有评估其价值。为了对其进行评估，我们必须运行init = tf.global_variables_initializer（）。初始化了损失变量，在最后一行中，我们终于能够评估损失值并打印其值。

现在让我们看一个简单的例子。运行下面的cell：

a = tf.constant(2)
b = tf.constant(10)
c = tf.multiply(a,b)
print(c)

# 输出结果
# Tensor("Mul:0", shape=(), dtype=int32)

不出所料，您将看不到20！您得到一个张量，是不具有shape属性且类型为“ int32”的张量。您做的所有操作都已放入“计算图”中，但您尚未运行此计算。为了将两个数字相乘，您将必须创建一个会话并运行它。

sess = tf.Session()
print(sess.run(c))
# 输出结果
20

总而言之，请记住初始化变量，创建会话并在该会话中运行操作。

# Change the value of x in the feed_dict

x = tf.placeholder(tf.int64, name = 'x')
print(sess.run(2 * x, feed_dict = {x: 3}))
# 输出结果
6

定义x时，不必为其指定值。 placeholder只是一个变量，您将在运行会话时稍后才将数据分配给该变量。

当您指定计算所需的操作时，您在告诉TensorFlow如何构造计算图。计算图可以具有一些占位符，您将在稍后指定它们的值。最后，在运行会话时，您要告诉TensorFlow执行计算图。

3.2 线性回归

让我们通过计算以下方程式开始此编程练习：𝑌=𝑊𝑋+𝑏Y=WX+b, 这里的W和X是随机矩阵，b是随机向量。

计算 WX+b 这里的 W 和 X 从随机正态分布中得出 W的shape是(4, 3) X的shape是（3,1） b的shape是（4，1）

例如，下面是定义shape为（3,1）的常量X的方法：

X = tf.constant(np.random.randn(3,1), name = "X")

您可能会发现以下方法是有用的：

tf.matmul（...，...）进行矩阵乘法
tf.add（...，...）进行加法
np.random.randn（...）随机初始化

def linear_function():
    """
    Implements a linear function:
            Initializes W to be a random tensor of shape (4,3)
            Initializes X to be a random tensor of shape (3,1)
            Initializes b to be a random tensor of shape (4,1)
    Returns:
    result -- runs the session for Y = WX + b
    """
    np.random.seed(1) # 设定一个种子 这样在每次执行的时候 数值不变

    X = tf.constant(np.random.randn(3, 1), name="X")
    """
    [[ 1.62434536]
    [-0.61175641]
    [-0.52817175]]
    """
    W = tf.constant(np.random.randn(4, 3), name="W")
    """
    [[-1.07296862  0.86540763 -2.3015387 ]
    [ 1.74481176 -0.7612069   0.3190391 ]
    [-0.24937038  1.46210794 -2.06014071]
    [-0.3224172  -0.38405435  1.13376944]]
    """
    b = tf.constant(np.random.randn(4, 1), name="b")
    """
    [[-1.09989127]
    [-0.17242821]
    [-0.87785842]
    [ 0.04221375]]
    """
    Y = tf.add(tf.matmul(W, X), b)
    # 使用 tf.Session() 创建一个 session 然后通过 sess.run(...) 计算你需要的变量
    sess = tf.Session()
    result = sess.run(Y)
    # 关闭session
    sess.close()
    return result

if __name__ == '__main__':
    print("result = " + str(linear_function()))

输出结果：

result = [[-2.15657382]
 [ 2.95891446]
 [-1.08926781]
 [-0.84538042]]

3.3 计算 sigmoid函数

您刚刚实现了线性函数。 Tensorflow提供了各种常用的神经网络方法，例如 tf.sigmoid 和 tf.softmax。对于本练习，让我们计算输入的 sigmoid 函数。

您将使用占位符变量 x 进行此练习。运行会话时，应该使用feed字典传入输入z。在本练习中，您将必须

（i）创建一个占位符x，

（ii）使用tf.sigmoid定义计算Sigmoid所需的操作，然后

（iii）运行该会话。

实现下面的 sigmoid 函数。您应该使用以下内容：

tf.placeholder(tf.float32, name = "...")
tf.sigmoid(...)
sess.run(..., feed_dict = {x: z})

请注意，在tensorflow中创建和使用会话有两种典型的方法：

Method 1:

sess = tf.Session()
# Run the variables initialization (if needed), run the operations
result = sess.run(..., feed_dict = {...})
sess.close() # Close the session

Method 2:

with tf.Session() as sess: 
    # run the variables initialization (if needed), run the operations
    result = sess.run(..., feed_dict = {...})
    # This takes care of closing the session for you :)

def sigmoid(z):
    """
    计算z的sigmoid函数
    
    Arguments:
    z -- input value, scalar or vector
    
    Returns: 
    results -- the sigmoid of z
    """
    
    # Create a placeholder for x. Name it 'x'.
    x = tf.placeholder(tf.float32, name = 'x')

    # compute sigmoid(x)
    sigmoid = tf.sigmoid(x)

    # Create a session, and run it. Please use the method 2 explained above. 
    # You should use a feed_dict to pass z's value to x. 
    with tf.Session() as sess:
        # Run session and call the output "result"
        result = sess.run(sigmoid, feed_dict = {x: z})
        
    return result

if __name__ == '__main__':

    print ("sigmoid(0) = " + str(sigmoid(0)))
    print ("sigmoid(12) = " + str(sigmoid(12)))

输出结果：

sigmoid(0) = 0.5
sigmoid(12) = 0.9999938

3.4 计算损失函数

您还可以使用内置函数来计算神经网络的成本，无需编写代码。

您可以在tensorflow的一行代码中做到这一点！

tf.nn.sigmoid_cross_entropy_with_logits(logits = ..., labels = ...)

def cost(logits, labels):
   """
    Computes the cost using the sigmoid cross entropy
    
    Arguments:
    logits -- vector containing z, output of the last linear unit (before the final sigmoid activation)
    labels -- vector of labels y (1 or 0) 
    
    Note: What we've been calling "z" and "y" in this class are respectively called "logits" and "labels" 
    in the TensorFlow documentation. So logits will feed into z, and labels into y. 
    
    Returns:
    cost -- runs the session of the cost (formula (2))
    """
    
    ### START CODE HERE ### 
    
    # Create the placeholders for "logits" (z) and "labels" (y) (approx. 2 lines)
    z = tf.placeholder(tf.float32, name='z')
    y = tf.placeholder(tf.float32, name='y')
    
    # Use the loss function (approx. 1 line)
    cost = tf.nn.sigmoid_cross_entropy_with_logits(logits = z, labels = y)
    
    # Create a session (approx. 1 line). See method 1 above.
    sess = tf.Session()
    
    # Run the session (approx. 1 line).
    cost = sess.run(cost, feed_dict={z: logits, y: labels})
    
    # Close the session (approx. 1 line). See method 1 above.
    sess.close()
    
    ### END CODE HERE ###
    
    return cost

if __name__ == '__main__':
    # print("result = " + str(linear_function()))
    logits = sigmoid(np.array([0.2, 0.4, 0.7, 0.9]))
    cost = cost(logits, np.array([0, 0, 1, 1]))
    print("cost = " + str(cost))

输出结果:

cost = [1.0053872  1.0366409  0.41385433 0.39956614]

3.5 独热编码

在深度学习中，很多时候您会得到一个y向量，其数字范围从0到C-1，其中C是类的数量。如果C是4，那么您可能具有以下y向量，您将需要按以下方式对其进行转换：

这称为“一次热”编码，因为在转换后的表示形式中，每一列中的一个元素正好是“热”（意味着设置为1）。要以numpy格式进行此转换，您可能必须编写几行代码。在tensorflow中，您可以使用一行代码：

tf.one_hot(labels, depth, axis)

实现以下功能以获取一个标签向量和类总数𝐶，并返回一个热编码。使用tf.one_hot（）执行此操作。


def one_hot_matrix(labels, C):
    """
   创建一个矩阵，其中第i行对应于第i个类号，第j列对应于第j个训练示例。
   因此，如果示例j带有标签i。 然后输入（i，j）将为1。

    Arguments:
    labels -- 标签向量
    C -- 类别的数目, 独热维度的深度

    Returns:
    one_hot -- 独热矩阵
    """
    # Create a tf.constant equal to C (depth), name it 'C'. (approx. 1 line)
    C = tf.constant(C, name='C')

    # Use tf.one_hot, be careful with the axis (approx. 1 line)
    one_hot_matrix = tf.one_hot(labels, C, axis=0)

    # Create the session (approx. 1 line)
    sess = tf.compat.v1.Session()

    # Run the session (approx. 1 line)
    one_hot = sess.run(one_hot_matrix)

    # Close the session (approx. 1 line). See method 1 above.
    sess.close()

    return one_hot

if __name__ == '__main__':
    labels = np.array([1, 2, 3, 0, 2, 1])
    one_hot = one_hot_matrix(labels, C=4)
    print("one_hot = " + str(one_hot))

输出结果：

one_hot = [[0. 0. 0. 1. 0. 0.]
 [1. 0. 0. 0. 0. 1.]
 [0. 1. 0. 0. 1. 0.]
 [0. 0. 1. 0. 0. 0.]]

3.6 用零和一初始化

现在，您将学习如何初始化零和一的向量。您将要调用的函数是tf.ones（）。要使用零初始化，可以改用tf.zeros（）。这些函数采用一个shape，并分别返回一个包含零和一的维 shape 数组。

tf.ones(shape)

def ones(shape):
    """
    Creates an array of ones of dimension shape
    
    Arguments:
    shape -- shape of the array you want to create
        
    Returns: 
    ones -- array containing only ones
    """
    
    ### START CODE HERE ###
    
    # Create "ones" tensor using tf.ones(...). (approx. 1 line)
    ones = tf.ones(shape)
    
    # Create the session (approx. 1 line)
    sess = tf.Session()
    
    # Run the session to compute 'ones' (approx. 1 line)
    ones = sess.run(ones)
    
    # Close the session (approx. 1 line). See method 1 above.
    sess.close()
    
    return ones

if __name__ == '__main__':
    print("ones = " + str(ones([3])))

输出结果：

ones = [1. 1. 1.]

3.7 在Tensorflow中构建您的第一个神经网络

在这一部分中，您将使用 tensorflow 构建神经网络。请记住，实现 tensorflow 模型有两个部分：

Create the computation graph
Run the graph

3.7.1 SIGNS数据集

一个下午，我们决定和一些朋友一起教我们的计算机来解密手语。我们花了几个小时在白墙前拍照，并提出了以下数据集。现在，您的工作就是构建一种算法，以促进从语音障碍者到不懂手语的人的通信。

训练集：1080张图片（64 x 64像素）的符号表示从0到5的数字（每个数字180张图片）。
测试装置：120张图片（64 x 64像素）的符号，代表从0到5的数字（每个数字20张图片）。

请注意，这是SIGNS数据集的子集。完整的数据集包含更多的符号。

这是每个数字的示例，以及如何解释标签的方式。这些是原始图片，然后我们将图像分辨率降低到64 x 64像素。

运行以下代码以加载数据集。

X_train_orig, Y_train_orig, X_test_orig, Y_test_orig, classes = load_dataset()

更改下面的索引并运行单元格以可视化数据集中的一些示例。

# Example of a picture
index = 0
plt.imshow(X_train_orig[index])
print ("y = " + str(np.squeeze(Y_train_orig[:, index])))

输出结果：

y = 5

通常，将图像数据集展平，然后通过除以255对其进行归一化。最重要的是，您将每个标签转换为一个热向量，如图1所示。运行下面的单元格即可。

# Flatten the training and test images
X_train_flatten = X_train_orig.reshape(X_train_orig.shape[0], -1).T
X_test_flatten = X_test_orig.reshape(X_test_orig.shape[0], -1).T
# Normalize image vectors
X_train = X_train_flatten/255.
X_test = X_test_flatten/255.
# Convert training and test labels to one hot matrices
Y_train = convert_to_one_hot(Y_train_orig, 6)
Y_test = convert_to_one_hot(Y_test_orig, 6)

print ("number of training examples = " + str(X_train.shape[1]))
print ("number of test examples = " + str(X_test.shape[1]))
print ("X_train shape: " + str(X_train.shape))
print ("Y_train shape: " + str(Y_train.shape))
print ("X_test shape: " + str(X_test.shape))
print ("Y_test shape: " + str(Y_test.shape))

输出结果：

number of training examples = 1080
number of test examples = 120
X_train shape: (12288, 1080)
Y_train shape: (6, 1080)
X_test shape: (12288, 120)
Y_test shape: (6, 120)

注意：12288来自 64×64×3 每个图像均为正方形，64 x 64像素，其中3 为RGB颜色。请确保所有这些 shape 对您有意义，然后再继续。

您的目标是建立一种能够高精度识别符号的算法。为此，您将构建一个 tensorflow模型，该模型与您先前在numpy中为猫识别构建的模型几乎相同（但现在使用softmax输出）。这是将numpy实现与tensorflow进行比较的好机会。

该模型是 LINEAR-> RELU->LINEAR-> RELU->LINEAR-> SOFTMAX。 SIGMOID输出层已转换为SOFTMAX。 SOFTMAX层将SIGMOID推广到两个以上的类。

3.7.2 创建placeholders

您的第一个任务是为X和Y创建占位符。这将允许您稍后在运行会话时传递训练数据。

练习：实现以下功能以在tensorflow中创建占位符。


def create_placeholders(n_x, n_y):
    """
    Creates the placeholders for the tensorflow session.
    
    Arguments:
    n_x -- scalar, size of an image vector (num_px * num_px = 64 * 64 * 3 = 12288)
    n_y -- scalar, number of classes (from 0 to 5, so -> 6)
    
    Returns:
    X -- placeholder for the data input, of shape [n_x, None] and dtype "float"
    Y -- placeholder for the input labels, of shape [n_y, None] and dtype "float"
    
    Tips:
    - You will use None because it let's us be flexible on the number of examples you will for the placeholders.
      In fact, the number of examples during test/train is different.
    """

    ### START CODE HERE ### (approx. 2 lines)
    X = tf.placeholder(tf.float32, shape=[n_x, None])
    Y = tf.placeholder(tf.float32, shape=[n_y, None])
    ### END CODE HERE ###
    
    return X, Y

X, Y = create_placeholders(12288, 6)
print ("X = " + str(X))
print ("Y = " + str(Y))

X = Tensor("Placeholder_4:0", shape=(12288, ?), dtype=float32)
Y = Tensor("Placeholder_5:0", shape=(6, ?), dtype=float32)

3.7.3 初始化参数

您的第二个任务是初始化tensorflow中的参数。

练习：实现以下函数以初始化tensorflow中的参数。您将使用权重的Xavier初始化和偏差的零初始化。 shape如下。例如，为了帮助您，对于W1和b1，您可以使用：

请使用seed = 1来确保您的结果与我们的结果相符。

def initialize_parameters():
    """
    Initializes parameters to build a neural network with tensorflow. The shapes are:
                        W1 : [25, 12288]
                        b1 : [25, 1]
                        W2 : [12, 25]
                        b2 : [12, 1]
                        W3 : [6, 12]
                        b3 : [6, 1]
    
    Returns:
    parameters -- a dictionary of tensors containing W1, b1, W2, b2, W3, b3
    """
    
    tf.set_random_seed(1)                   # so that your "random" numbers match ours
        
    W1 = tf.get_variable("W1", [25,12288], initializer = tf.contrib.layers.xavier_initializer(seed = 1))
    b1 = tf.get_variable("b1", [25,1], initializer = tf.zeros_initializer())
    W2 = tf.get_variable("W2", [12,25], initializer = tf.contrib.layers.xavier_initializer(seed = 1))
    b2 = tf.get_variable("b2", [12,1], initializer = tf.zeros_initializer())
    W3 = tf.get_variable("W3", [6,12], initializer = tf.contrib.layers.xavier_initializer(seed = 1))
    b3 = tf.get_variable("b3", [6,1], initializer = tf.zeros_initializer())

    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2,
                  "W3": W3,
                  "b3": b3}
    
    return parameters

tf.reset_default_graph()
with tf.Session() as sess:
    parameters = initialize_parameters()
    print("W1 = " + str(parameters["W1"]))
    print("b1 = " + str(parameters["b1"]))
    print("W2 = " + str(parameters["W2"]))
    print("b2 = " + str(parameters["b2"]))

W1 = <tf.Variable 'W1:0' shape=(25, 12288) dtype=float32_ref>
b1 = <tf.Variable 'b1:0' shape=(25, 1) dtype=float32_ref>
W2 = <tf.Variable 'W2:0' shape=(12, 25) dtype=float32_ref>
b2 = <tf.Variable 'b2:0' shape=(12, 1) dtype=float32_ref>

如预期的那样，尚未对参数进行评估。

3.7.4 正向传播

您现在将在tensorflow中实现前向传播模块。该函数将接收参数字典，并将完成前向传递。您将使用的功能是：

tf.add（...，...）做一个加法
tf.matmul（...，...）做矩阵乘法
tf.nn.relu（...）应用ReLU激活

问题：实现神经网络的正向传递。我们为您注释了numpy等效项，以便您可以将tensorflow实现与numpy进行比较。重要的是要注意，前向传播在z3处停止。原因是在 tensorflow 中，最后的线性层输出作为计算损失的函数的输入给出。因此，您不需要a3！

def forward_propagation(X, parameters):
    """
    Implements the forward propagation for the model: LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SOFTMAX
    
    Arguments:
    X -- input dataset placeholder, of shape (input size, number of examples)
    parameters -- python dictionary containing your parameters "W1", "b1", "W2", "b2", "W3", "b3"
                  the shapes are given in initialize_parameters

    Returns:
    Z3 -- the output of the last LINEAR unit
    """
    
    # Retrieve the parameters from the dictionary "parameters" 
    W1 = parameters['W1']
    b1 = parameters['b1']
    W2 = parameters['W2']
    b2 = parameters['b2']
    W3 = parameters['W3']
    b3 = parameters['b3']
    
    ### START CODE HERE ### (approx. 5 lines)              # Numpy Equivalents:
    Z1 = tf.add(tf.matmul(W1, X), b1)                                              # Z1 = np.dot(W1, X) + b1
    A1 = tf.nn.relu(Z1)                                              # A1 = relu(Z1)
    Z2 = tf.add(tf.matmul(W2, A1), b2)                                              # Z2 = np.dot(W2, a1) + b2
    A2 = tf.nn.relu(Z2)                                               # A2 = relu(Z2)
    Z3 = tf.add(tf.matmul(W3, A2), b3)                                              # Z3 = np.dot(W3,Z2) + b3
    ### END CODE HERE ###
    
    return Z3

tf.reset_default_graph()

with tf.Session() as sess:
    X, Y = create_placeholders(12288, 6)
    parameters = initialize_parameters()
    Z3 = forward_propagation(X, parameters)
    print("Z3 = " + str(Z3))

Z3 = Tensor("Add_2:0", shape=(6, ?), dtype=float32)

您可能已经注意到，前向传播不会输出任何缓存。当我们进行反向传播时，您将在下面理解为什么。

3.7.5 计算损失

如前所述，使用以下方法很容易计算成本：

tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = ..., labels = ...))

重要的是要知道

tf.nn.softmax_cross_entropy_with_logits的“ logits”和“ labels”输入应具有一定的 shape（示例数，num_classes）。因此，我们为您转换了Z3和Y。此外，tf.reduce_mean基本上对示例进行求和。

def compute_cost(Z3, Y):
    """
    Computes the cost
    
    Arguments:
    Z3 -- output of forward propagation (output of the last LINEAR unit), of shape (6, number of examples)
    Y -- "true" labels vector placeholder, same shape as Z3
    
    Returns:
    cost - Tensor of the cost function
    """
    
    # to fit the tensorflow requirement for tf.nn.softmax_cross_entropy_with_logits(...,...)
    logits = tf.transpose(Z3)
    labels = tf.transpose(Y)
    
    ### START CODE HERE ### (1 line of code)
    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = logits, labels = labels))
    ### END CODE HERE ###
    
    return cost

tf.reset_default_graph()

with tf.Session() as sess:
    X, Y = create_placeholders(12288, 6)
    parameters = initialize_parameters()
    Z3 = forward_propagation(X, parameters)
    cost = compute_cost(Z3, Y)
    print("cost = " + str(cost))

输出结果：

cost = Tensor("Mean:0", shape=(), dtype=float32)

3.7.6 反向传播和更新参数

这是您对编程框架表示感谢的地方。所有反向传播和参数更新均在1行代码中完成。将这条线合并到模型中非常容易。

计算成本函数之后。您将创建一个“优化程序”对象。运行tf.session时，必须与成本一起调用此对象。调用时，它将使用所选方法和学习率对给定成本执行优化。

例如，对于梯度下降，优化器将是：

optimizer = tf.train.GradientDescentOptimizer(learning_rate = learning_rate).minimize(cost)

要进行优化，您可以执行以下操作：

_ , c = sess.run([optimizer, cost], feed_dict={X: minibatch_X, Y: minibatch_Y})

3.7.7 创建模型

现在，您将把它们放在一起！

练习：实现模型。您将调用以前实现的功能。

import tensorflow as tf
import numpy as np
import math
from tensorflow.python.framework import ops
import h5py
import matplotlib.pyplot  as plt

# 加载数据
def load_dataset():
    train_dataset = h5py.File('datasets/train_signs.h5', "r")
    train_set_x_orig = np.array(train_dataset["train_set_x"][:])  # your train set features
    train_set_y_orig = np.array(train_dataset["train_set_y"][:])  # your train set labels

    test_dataset = h5py.File('datasets/test_signs.h5', "r")
    test_set_x_orig = np.array(test_dataset["test_set_x"][:])  # your test set features
    test_set_y_orig = np.array(test_dataset["test_set_y"][:])  # your test set labels

    classes = np.array(test_dataset["list_classes"][:])  # the list of classes

    train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0]))
    test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[0]))

    return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes


# 转为独热编码
def convert_to_one_hot(Y, C):
    Y = np.eye(C)[Y.reshape(-1)].T
    return Y


# 创建 create_placeholders
def create_placeholders(n_x, n_y):
    """
    为 tensorflow session 创建 placeholders

    Arguments:
    n_x -- scalar, size of an image vector (num_px * num_px = 64 * 64 * 3 = 12288)
    n_y -- scalar, number of classes (from 0 to 5, so -> 6)

    Returns:
    X -- placeholder for the data input, of shape [n_x, None] and dtype "float"
    Y -- placeholder for the input labels, of shape [n_y, None] and dtype "float"

    Tips:
    - You will use None because it let's us be flexible on the number of examples you will for the placeholders.
      In fact, the number of examples during test/train is different.
    """

    X = tf.placeholder(tf.float32, shape=[n_x, None])
    Y = tf.placeholder(tf.float32, shape=[n_y, None])

    return X, Y


# 初始化参数
def initialize_parameters():
    """
    Initializes parameters to build a neural network with tensorflow. The shapes are:
                        W1 : [25, 12288]
                        b1 : [25, 1]
                        W2 : [12, 25]
                        b2 : [12, 1]
                        W3 : [6, 12]
                        b3 : [6, 1]

    Returns:
    parameters -- a dictionary of tensors containing W1, b1, W2, b2, W3, b3
    """

    tf.set_random_seed(1)  # so that your "random" numbers match ours

    W1 = tf.get_variable("W1", [25, 12288], initializer=tf.contrib.layers.xavier_initializer(seed=1))
    b1 = tf.get_variable("b1", [25, 1], initializer=tf.zeros_initializer())
    W2 = tf.get_variable("W2", [12, 25], initializer=tf.contrib.layers.xavier_initializer(seed=1))
    b2 = tf.get_variable("b2", [12, 1], initializer=tf.zeros_initializer())
    W3 = tf.get_variable("W3", [6, 12], initializer=tf.contrib.layers.xavier_initializer(seed=1))
    b3 = tf.get_variable("b3", [6, 1], initializer=tf.zeros_initializer())

    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2,
                  "W3": W3,
                  "b3": b3}

    return parameters


def forward_propagation(X, parameters):
    """
    模型的正向传播: LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SOFTMAX

    Arguments:
    X -- input dataset placeholder, of shape (input size, number of examples)
    parameters -- python dictionary containing your parameters "W1", "b1", "W2", "b2", "W3", "b3"
                  the shapes are given in initialize_parameters

    Returns:
    Z3 -- 最后线性函数的输出
    """

    # 从字典“参数”中检索参数
    W1 = parameters['W1']
    b1 = parameters['b1']
    W2 = parameters['W2']
    b2 = parameters['b2']
    W3 = parameters['W3']
    b3 = parameters['b3']

    Z1 = tf.add(tf.matmul(W1, X), b1)  # Z1 = np.dot(W1, X) + b1
    A1 = tf.nn.relu(Z1)  # A1 = relu(Z1)
    Z2 = tf.add(tf.matmul(W2, A1), b2)  # Z2 = np.dot(W2, a1) + b2
    A2 = tf.nn.relu(Z2)  # A2 = relu(Z2)
    Z3 = tf.add(tf.matmul(W3, A2), b3)  # Z3 = np.dot(W3,Z2) + b3

    return Z3


def compute_cost(Z3, Y):
    """
    计算损失函数

    Arguments:
    Z3 -- output of forward propagation (output of the last LINEAR unit), of shape (6, number of examples)
    Y -- "true" labels vector placeholder, same shape as Z3

    Returns:
    cost - Tensor of the cost function
    """

    # to fit the tensorflow requirement for tf.nn.softmax_cross_entropy_with_logits(...,...)
    logits = tf.transpose(Z3)
    labels = tf.transpose(Y)

    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels))
    return cost


def random_mini_batches(X, Y, mini_batch_size=64, seed=0):
    """
    Creates a list of random minibatches from (X, Y)
    从（X，Y）创建随机 minibatches 的列表

    Arguments:
    X -- input data, of shape (input size, number of examples)
    Y -- true "label" vector (containing 0 if cat, 1 if non-cat), of shape (1, number of examples)
    mini_batch_size - size of the mini-batches, integer
    seed -- this is only for the purpose of grading, so that you're "random minibatches are the same as ours.

    Returns:
    mini_batches -- list of synchronous (mini_batch_X, mini_batch_Y)
    """

    m = X.shape[1]  # number of training examples
    mini_batches = []
    np.random.seed(seed)

    # Step 1: Shuffle (X, Y)
    permutation = list(np.random.permutation(m))
    shuffled_X = X[:, permutation]
    shuffled_Y = Y[:, permutation].reshape((Y.shape[0], m))

    # Step 2: Partition (shuffled_X, shuffled_Y). Minus the end case.
    num_complete_minibatches = math.floor(
        m / mini_batch_size)  # number of mini batches of size mini_batch_size in your partitionning
    for k in range(0, num_complete_minibatches):
        mini_batch_X = shuffled_X[:, k * mini_batch_size: k * mini_batch_size + mini_batch_size]
        mini_batch_Y = shuffled_Y[:, k * mini_batch_size: k * mini_batch_size + mini_batch_size]
        mini_batch = (mini_batch_X, mini_batch_Y)
        mini_batches.append(mini_batch)

    # 处理剩下的部分 (last mini-batch < mini_batch_size)
    if m % mini_batch_size != 0:
        mini_batch_X = shuffled_X[:, num_complete_minibatches * mini_batch_size: m]
        mini_batch_Y = shuffled_Y[:, num_complete_minibatches * mini_batch_size: m]
        mini_batch = (mini_batch_X, mini_batch_Y)
        mini_batches.append(mini_batch)

    return mini_batches


def model(X_train, Y_train, X_test, Y_test, learning_rate=0.0001,
          num_epochs=1500, minibatch_size=32, print_cost=True):
    """
    实现一个三层的 tensorflow 神经网络: LINEAR->RELU->LINEAR->RELU->LINEAR->SOFTMAX.

    Arguments:
    X_train -- training set, of shape (input size = 12288, number of training examples = 1080)
    Y_train -- test set, of shape (output size = 6, number of training examples = 1080)
    X_test -- training set, of shape (input size = 12288, number of training examples = 120)
    Y_test -- test set, of shape (output size = 6, number of test examples = 120)
    learning_rate -- 优化的学习率
    num_epochs -- 迭代的次数
    minibatch_size -- minibatch的大小
    print_cost -- 打印损失

    Returns:
    parameters -- 学到的模型参数. 使用这些参数可以用来预测.
    """

    ops.reset_default_graph()  # 能够在不覆盖tf变量的情况下重新运行模型
    tf.set_random_seed(1)  # 保持一致的结果
    seed = 3  # 保持一致的结果
    (n_x, m) = X_train.shape  # (n_x: 特征数目, m : 训练集样本数目)
    n_y = Y_train.shape[0]  # n_y : 训练集标签的数目
    costs = []  # cost 跟踪

    # 创建 shape的 Placeholders (n_x, n_y)
    X, Y = create_placeholders(n_x, n_y)

    # 初始化参数
    parameters = initialize_parameters()

    # 正向传播
    Z3 = forward_propagation(X, parameters)

    # 损失函数: 添加损失函数到 tensorflow graph
    cost = compute_cost(Z3, Y)

    # 反向传播: 定义 tensorflow 优化. 使用 AdamOptimizer.
    optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

    # 初始化所有的变量
    init = tf.global_variables_initializer()

    # 开始 session 计算 tensorflow graph
    with tf.Session() as sess:

        # 运行初始化
        sess.run(init)

        # 进行迭代
        for epoch in range(num_epochs):

            epoch_cost = 0.  # 初始化损失
            num_minibatches = int(m / minibatch_size)  # 训练集的 minibatch_size大小
            seed = seed + 1
            minibatches = random_mini_batches(X_train, Y_train, minibatch_size, seed)

            for minibatch in minibatches:
                # 选择一个 minibatch
                (minibatch_X, minibatch_Y) = minibatch

                # 重点: 在 minibatch 上运行 graph
                # 运行 session 执行 "optimizer" and the "cost", feedict 包含了 minibatch for (X,Y).
                _, minibatch_cost = sess.run([optimizer, cost], feed_dict={X: minibatch_X, Y: minibatch_Y})

                epoch_cost += minibatch_cost / num_minibatches

            # 打印损失
            if print_cost == True and epoch % 100 == 0:
                print("Cost after epoch %i: %f" % (epoch, epoch_cost))
            if print_cost == True and epoch % 5 == 0:
                costs.append(epoch_cost)

        # 可视化损失
        plt.plot(np.squeeze(costs))
        plt.ylabel('cost')
        plt.xlabel('iterations (per tens)')
        plt.title("Learning rate =" + str(learning_rate))
        plt.show()

        # 保存 parameters 在一个变量中
        parameters = sess.run(parameters)
        print("Parameters have been trained!")

        # 计算预测
        correct_prediction = tf.equal(tf.argmax(Z3), tf.argmax(Y))

        # 计算正确率
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))

        print("Train Accuracy:", accuracy.eval({X: X_train, Y: Y_train}))
        print("Test Accuracy:", accuracy.eval({X: X_test, Y: Y_test}))

        return parameters


if __name__ == '__main__':
    X_train_orig, Y_train_orig, X_test_orig, Y_test_orig, classes = load_dataset()

    parameters = model(X_train, Y_train, X_test, Y_test)

运行以下单元格来训练您的模型！在我们的机器上大约需要5分钟。epoch 100后的 costs ”应为1.016458。如果不是，请不要浪费时间。单击笔记本电脑上方栏中的正方形（⬛），以中断培训，然后尝试更正您的代码。如果费用正确，请稍等片刻，然后在5分钟内回来！

输出结果：

Cost after epoch 0: 1.855702
Cost after epoch 100: 1.016458
Cost after epoch 200: 0.733102
Cost after epoch 300: 0.572915
Cost after epoch 400: 0.468685
Cost after epoch 500: 0.381068
Cost after epoch 600: 0.313809
Cost after epoch 700: 0.254146
Cost after epoch 800: 0.203801
Cost after epoch 900: 0.166393
Cost after epoch 1000: 0.141141
Cost after epoch 1100: 0.107718
Cost after epoch 1200: 0.086261
Cost after epoch 1300: 0.060924
Cost after epoch 1400: 0.050927

Parameters have been trained!
Train Accuracy: 0.9990741
Test Accuracy: 0.725

令人惊讶的是，您的算法可以识别出表示0到5之间数字的符号，准确度为71.7％。

见解：

您的模型似乎足够适合训练集。但是，鉴于训练和测试精度之间的差异，您可以尝试添加 L2 或 dropout 正则化以减少过度拟合。
将会话看作是训练模型的代码块。每次您在 minibatch 运行会话时，它都会训练参数。总的来说，您已经运行了该会话多次（1500个epochs），直到获得优秀的参数为止。

3.7.8 使用您自己的图像进行测试

祝贺您完成此作业。现在，您可以拍张手的照片并查看模型的输出。为此：

1.单击笔记本上方栏中的“文件”，然后单击“打开”以打开Coursera Hub。

2.将图像添加到Jupyter Notebook的目录“ images”文件夹中。

3.在以下代码中输入图像名称。

4.运行代码并检查算法是否正确！

import scipy
from PIL import Image
from scipy import ndimage

## START CODE HERE ## (PUT YOUR IMAGE NAME) 
my_image = "thumbs_up.jpg"
# my_image = "example5.jpg"
## END CODE HERE ##

# We preprocess your image to fit your algorithm.
fname = "images/" + my_image
image = np.array(ndimage.imread(fname, flatten=False))
my_image = scipy.misc.imresize(image, size=(64,64)).reshape((1, 64*64*3)).T
my_image_prediction = predict(my_image, parameters)

plt.imshow(image)
print("Your algorithm predicts: y = " + str(np.squeeze(my_image_prediction)))

尽管您确实看到算法似乎对它进行了错误分类，但您确实值得“竖起大拇指”。原因是训练集不包含任何“竖起大拇指”，因此模型不知道如何处理！我们称其为“数据分配不匹配”，它是下一章“构建机器学习项目”中的各种课程之一。

四、总结

Tensorflow是深度学习中使用的编程框架-Tensorflow中的两个主要对象类是Tensor和Operators。 -在Tensorflow中进行编码时，必须采取以下步骤：

-创建一个包含张量（Variables, Placeholders ...）和操作（tf.matmul, tf.add, ...) 的图形
-创建会话
-初始化 session
-运行会话以执行图形

-您可以像在model（）中看到的那样多次执行图形

-在“ optimizer”对象上运行会话时，将自动进行反向传播和优化。

小飞猪666

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
深度学习（二）改善深层神经网络:超参数调试、正则化以及优化-（超参数调试、Batch正则化和程序框架及课后作业） -Andrew Ng

一、基础知识1.1调试处理关于训练深度最难的事情之一是你要处理的参数的数量，从学习速率???? 到Momentum（动量梯度下降法）的参数????。如果使用Momentum或Adam优化算法的参数????1，????2和????，也许你还得选择层数，也许你还得选择不同层中隐藏单元的数量，也许你还想使用学习率衰减。所以，你使用的不是单一的学习率????，当然你可能还需要选择mini-batch的大小。结果证实一些...
复制链接

扫一扫