deeplearning.24Tensorflow深度学习框架

最新推荐文章于 2024-09-15 22:50:21 发布

疯子的梦想＠

最新推荐文章于 2024-09-15 22:50:21 发布

阅读量400

点赞数

分类专栏：深度学习文章标签： tensorflow 深度学习 python

本文链接：https://blog.csdn.net/weixin_48681463/article/details/120462098

版权

深度学习专栏收录该内容

39 篇文章 2 订阅

订阅专栏

学习使用tensorflow

探索tensorflow
使用Tensorflow构建一个神经网络
总结
完整代码

探索tensorflow

准备工作

使用pycharm，新建一个工程，然后新建一个tf.py文件，并且下载好本次工作所需要的基本资料至工程文件夹下。（下载链接）。

在这里插入图片描述

导入相关的库

import numpy as np
import h5py
import matplotlib.pyplot as plt
import tensorflow as tf

# 使用tensorflow2.几的版本运行一些用1.几版本写的代码时 加上这个
tf.compat.v1.disable_eager_execution()
from tensorflow.python.framework import ops
import tf_utils
import time

尝试计算损失函数

我们知道损失函数的公式如下：
$\mathcal{L}(\hat{y}, y) = (\hat y^{(i)} - y^{(i)})^2 \tag{1}$
注意其中tensorflow2.0的版本差异，使用下述代码可以是应该2.x版本。

# 计算损失函数，定义一些变量
tf.compat.v1.disable_eager_execution()  # 保证session.run()能够正常运行
y_hat = tf.constant(36, name='y_hat')  # Define y_hat constant. Set to 36.
y = tf.constant(39, name='y')  # Define y. Set to 39

loss = tf.Variable((y - y_hat) ** 2, name='loss')
init = tf.compat.v1.global_variables_initializer()
with tf.compat.v1.Session() as session:  # Create a session and print the output
    session.run(init)  # Initializes the variables
    print(session.run(loss))

在TensorFlow中编写和运行程序包含以下步骤：

创建尚未执行的变量。
在这些变量之间编写操作。
初始化变量。
创建一个会话session。
运行会话session，这将运行你上面编写的操作。
因此，当我们为损失创建变量时，我们仅将损失定义为其他数量的函数，但没有验证其值。为了验证它，我们必须运行init = tf.compat.v1.global_variables_initializer()初始化损失变量，在最后一行中，我们终于能够验证loss的值并打印它。
测试一个简单的例子

# 测试简单的例子
a = tf.constant(2)
b = tf.constant(10)
c = tf.multiply(a, b)

print(c)

在这里插入图片描述
最后并没有输出c是什么，不过我们得到了一个Tensor类型的变量，没有维度，数字类型为int32。我们之前所做的一切都只是把这些东西放到了一个“计算图(computation graph)”中，而我们还没有开始运行这个计算图，为了实际计算这两个数字，我们需要创建一个会话并运行它：
追加下列代码：

sess = tf.compat.v1.Session ()
 
print(sess.run(c))

最后得到loss值20.
在这里插入图片描述
接下来，我们需要了解一下占位符（placeholders）。占位符是一个对象，它的值只能在稍后指定，即先在此处占据一个位置。要指定占位符的值，可以使用一个feed字典（feed_dict变量）来传入，接下来，我们为x创建一个占位符，这将允许我们在稍后运行会话时传入一个数字。

# 利用feed_dict来改变x的值
x = tf.compat.v1.placeholder(tf.int64, name="x")
print(sess.run(2 * x, feed_dict={x: 3}))
sess.close()

输出为6
在这里插入图片描述

线性函数

计算Y=Wx+b线性函数。其中W， X X X和 b b b是从随机正态分布中抽取的。 W 的维度是（4,3）， X 是（3,1）， b 是（4,1）。

# 定义线性函数
def linear_function():
    """
    实现一个线性功能：
        初始化W，类型为tensor的随机变量，维度为(4,3)
        初始化X，类型为tensor的随机变量，维度为(3,1)
        初始化b，类型为tensor的随机变量，维度为(4,1)
    返回：
        result - 运行了session后的结果，运行的是Y = WX + b

    """

    X = np.random.randn(3, 1)
    W = np.random.randn(4, 3)
    b = np.random.randn(4, 1)

    Y = tf.add(tf.matmul(W, X), b)  # tf.matmul是矩阵乘法
    # Y = tf.matmul(W,X) + b #也可以以写成这样子

    # 创建一个session并运行它
    sess = tf.compat.v1.Session()# tf2.几的版本写法
    result = sess.run(Y)

    # session使用完毕，关闭它
    sess.close()

    return result
# 测试一下
print("result = " +  str(linear_function()))

输出的是四行一列的矩阵。如下：
在这里插入图片描述

计算sigmoid

tf为我们提供了多种常用的神经网络函数，tf.softmax、tf.sigmoid。使用占位符变量x进行此练习。在运行会话时，应该使用feed字典传入输入z。进行以下步骤。
（i）创建一个占位符x；
（ii）使用tf.sigmoid定义计算Sigmoid所需的操作；
（iii）然后运行该会话
实现上述步骤会用到以下代码：

tf.placeholder(tf.float32, name = “…”)
tf.sigmoid(…)
sess.run(…, feed_dict = {x: z})

注意，在tensorflow中创建和使用会话有两种典型的方法：

//第一种
sess = tf.Session()
result = sess.run(...,feed_dict = {...})
sess.close()

//第二种
with tf.Session as sess:
    result = sess.run(...,feed_dict = {...})

下边我们来实现sigmoid的计算

# 定义sigmoid计算
def sigmoid(z):
    """
    实现使用sigmoid函数计算z

    参数：
        z - 输入的值，标量或矢量

    返回：
        result - 用sigmoid计算z的值

    """

    # 创建一个占位符x，名字叫“x”
    x = tf.compat.v1.placeholder(tf.float32, name="x")

    # 计算sigmoid(z)
    sigmoid = tf.sigmoid(x)

    # 创建一个会话，使用方法二
    with tf.compat.v1.Session() as sess:
        result = sess.run(sigmoid, feed_dict={x: z})

    return result
# 测试一下该函数
print ("sigmoid(0) = " + str(sigmoid(0)))
print ("sigmoid(12) = " + str(sigmoid(12)))

测试的输出结果如下：
在这里插入图片描述

计算成本

使用内置函数来计算神经网络的损失，tensorflow可以使用一行代码完成。成本函数J如下所示。
在这里插入图片描述
需要使用的代码如下：

tf.nn.sigmoid_cross_entropy_with_logits(logits = ..., labels = ...)

我们来测试一个成本函数计算，代码应输入z，计算出sigmoid（得到a），然后计算出J

# 成本函数
def cost(logits, labels):
    """
    Computes the cost using the sigmoid cross entropy

    Arguments:
    logits -- vector containing z, output of the last linear unit (before the final sigmoid activation)
    labels -- vector of labels y (1 or 0)

    Note: What we've been calling "z" and "y" in this class are respectively called "logits" and "labels"
    in the TensorFlow documentation. So logits will feed into z, and labels into y.

    Returns:
    cost -- runs the session of the cost (formula (2))
    """

    # Create the placeholders for "logits" (z) and "labels" (y) (approx. 2 lines)
    z = tf.compat.v1.placeholder(tf.float32, name="z")
    y = tf.compat.v1.placeholder(tf.float32, name="y")

    # Use the loss function (approx. 1 line)
    cost = tf.nn.sigmoid_cross_entropy_with_logits(logits=z, labels=y)

    # Create a session (approx. 1 line). See method 1 above.
    with tf.compat.v1.Session() as sess:
    # Run the session (approx. 1 line).
        cost = sess.run(cost, feed_dict={z: logits, y: labels})

    # Close the session (approx. 1 line). See method 1 above.
    sess.close()

    return cost
# 测试一下
logits = sigmoid(np.array([0.2,0.4,0.7,0.9]))
cost = cost(logits, np.array([0,0,1,1]))
print ("cost = " + str(cost))

输出如下：
在这里插入图片描述

使用独热编码（0、1编码）

在深度学习中，很多时候你会得到一个y向量，其数字范围从0到C-1，其中C是类的数量。例如C是4，那么你可能具有以下y向量，你将需要按以下方式对其进行转换：
在这里插入图片描述
这称为独热编码，又称一位有效编码，因为在转换后的表示形式中，每一列中都有一个元素正好是“hot”（意思是设置为1）。要以numpy格式进行此转换，你可能需要编写几行代码。在tensorflow中，你可以只使用一行代码：tf.one_hot(labels, depth, axis)。
定义独热编码：

# 定义独热编码
def one_hot_matrix(lables, C):
    """
    创建一个矩阵，其中第i行对应第i个类号，第j列对应第j个训练样本
    所以如果第j个样本对应着第i个标签，那么entry (i,j)将会是1

    参数：
        lables - 标签向量
        C - 分类数

    返回：
        one_hot - 独热矩阵

    """

    # 创建一个tf.constant，赋值为C，名字叫C
    C = tf.constant(C, name="C")

    # 使用tf.one_hot，注意一下axis
    one_hot_matrix = tf.one_hot(indices=lables, depth=C, axis=0)

    # 创建一个session
    sess = tf.compat.v1.Session()

    # 运行session
    one_hot = sess.run(one_hot_matrix)

    # 关闭session
    sess.close()

    return one_hot
# 测试一下
labels = np.array([1,2,3,0,2,1])
one_hot = one_hot_matrix(labels,C=4)
print(str(one_hot))

输出如下：
在这里插入图片描述

使用0和1初始化

如何初始化0和1的向量。将要调用的函数是tf.ones()。要使用零初始化，可以改用tf.zeros()。这些函数采用一个维度，并分别返回一个包含0和1的维度数组。

# 定义1的初始化
def ones(shape):
    """
    创建一个维度为shape的变量，其值全为1

    参数：
        shape - 你要创建的数组的维度

    返回：
        ones - 只包含1的数组
    """

    # 使用tf.ones()
    ones = tf.ones(shape)

    # 创建会话
    sess = tf.compat.v1.Session()
    # 运行会话
    ones = sess.run(ones)

    # 关闭会话
    sess.close()

    return ones
# 测试一下
print ("ones = " + str(ones([3])))

输出如下：
在这里插入图片描述

使用Tensorflow构建一个神经网络

在这里插入图片描述

加载数据集

同一个工程文件夹下新建一个文件，shou_zhi.py，输入代码引入相关库并加载数据集。

#构建网络,加载数据集
X_train_orig , Y_train_orig , X_test_orig , Y_test_orig , classes = tf_utils.load_dataset()

查看数据集里排序为0的图片是什么。

# 展示排序为0的图片
index = 0
plt.imshow(X_train_orig[index])
plt.show()
print("y = " + str(np.squeeze(Y_train_orig[:, index])))

输出如下：

在这里插入图片描述

和往常一样，我们要对数据集进行扁平化，然后再除以255以归一化数据，除此之外，我们要需要把每个标签转化为独热向量，像上面的图一样。

# 数据处理
X_train_flatten = X_train_orig.reshape(X_train_orig.shape[0], -1).T  # 每一列就是一个样本
X_test_flatten = X_test_orig.reshape(X_test_orig.shape[0], -1).T
# 归一化数据
X_train = X_train_flatten / 255
X_test = X_test_flatten / 255
# 转换为独热矩阵
Y_train = tf_utils.convert_to_one_hot(Y_train_orig, 6)
Y_test = tf_utils.convert_to_one_hot(Y_test_orig, 6)
print("训练集样本数 = " + str(X_train.shape[1]))
print("测试集样本数 = " + str(X_test.shape[1]))
print("X_train.shape: " + str(X_train.shape))
print("Y_train.shape: " + str(Y_train.shape))
print("X_test.shape: " + str(X_test.shape))
print("Y_test.shape: " + str(Y_test.shape))

输出如下：
在这里插入图片描述

创建占位符

为X和X创建占位符，方便你以后在运行会话时传递训练数据。

# 创建占位符
def create_placeholders(n_x, n_y):
    """
    为TensorFlow会话创建占位符
    参数：
        n_x - 一个实数，图片向量的大小（64*64*3 = 12288）
        n_y - 一个实数，分类数（从0到5，所以n_y = 6）

    返回：
        X - 一个数据输入的占位符，维度为[n_x, None]，dtype = "float"
        Y - 一个对应输入的标签的占位符，维度为[n_Y,None]，dtype = "float"

    提示：
        使用None，因为它让我们可以灵活处理占位符提供的样本数量。事实上，测试/训练期间的样本数量是不同的。

    """

    X = tf.compat.v1.placeholder(tf.float32, [n_x, None], name="X")  # 这里注意下tf版本，和原博客不一致
    Y = tf.compat.v1.placeholder(tf.float32, [n_y, None], name="Y")

    return X, Y
# 测试一下
X, Y = create_placeholders(12288, 6)
print("X = " + str(X))
print("Y = " + str(Y))

输出如下：
在这里插入图片描述

初始化参数

初始化tensorflow中的参数。使用权重的Xavier初始化和偏差的零初始化。比如:
注意：由于TF2.x删除了contrib，我使用了initializer = tf.initializers.GlorotUniform(seed=1)) 来代替initializer=tf.contrib.layers.xavier_initializer(seed=1))

W1 = tf.get_variable("W1", [25,12288], initializer = tf.contrib.layers.xavier_initializer(seed = 1))
b1 = tf.get_variable("b1", [25,1], initializer = tf.zeros_initializer())

定义初始化参数函数

# 初始化参数
def initialize_parameters():
    
    """
    初始化神经网络的参数，参数的维度如下：
        W1 : [25, 12288]
        b1 : [25, 1]
        W2 : [12, 25]
        b2 : [12, 1]
        W3 : [6, 12]
        b3 : [6, 1]

    返回：
        parameters - 包含了W和b的字典

    """

    W1 = tf.compat.v1.get_variable("W1", [25, 12288], initializer=tf.initializers.GlorotUniform(seed=1))
    b1 = tf.compat.v1.get_variable("b1", [25, 1], initializer=tf.zeros_initializer())
    W2 = tf.compat.v1.get_variable("W2", [12, 25], initializer=tf.initializers.GlorotUniform(seed=1))
    b2 = tf.compat.v1.get_variable("b2", [12, 1], initializer=tf.zeros_initializer())
    W3 = tf.compat.v1.get_variable("W3", [6, 12], initializer=tf.initializers.GlorotUniform(seed=1))
    b3 = tf.compat.v1.get_variable("b3", [6, 1], initializer=tf.zeros_initializer())

    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2,
                  "W3": W3,
                  "b3": b3}

    return parameters
# 测试一下
ops.reset_default_graph()  # 用于清除默认图形堆栈并重置全局默认图形。

with tf.compat.v1.Session()  as sess:
    parameters = initialize_parameters()
    print("W1 = " + str(parameters["W1"]))
    print("b1 = " + str(parameters["b1"]))
    print("W2 = " + str(parameters["W2"]))
    print("b2 = " + str(parameters["b2"]))

输出如下：
在这里插入图片描述

tensorflow中的正向传播

我们将要在TensorFlow中实现前向传播，该函数将接受一个字典参数并完成前向传播，它会用到以下代码：

tf.add(…) ：加法
tf.matmul(… , …) ：矩阵乘法
tf.nn.relu(…) ：Relu激活函数
重要的是前向传播要在Z3处停止，因为在TensorFlow中最后的线性输出层的输出作为计算损失函数的输入，所以不需要A3.。

# 定义前向传播
def forward_propagation(X, parameters):
    """
    实现一个模型的前向传播，模型结构为LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SOFTMAX

    参数：
        X - 输入数据的占位符，维度为（输入节点数量，样本数量）
        parameters - 包含了W和b的参数的字典

    返回：
        Z3 - 最后一个LINEAR节点的输出

    """

    W1 = parameters['W1']
    b1 = parameters['b1']
    W2 = parameters['W2']
    b2 = parameters['b2']
    W3 = parameters['W3']
    b3 = parameters['b3']
                                        # 也可以这样写，使用numpy的写法
    Z1 = tf.add(tf.matmul(W1, X), b1)   # Z1 = np.dot(W1, X) + b1
    A1 = tf.nn.relu(Z1)                 # A1 = relu(Z1)
    Z2 = tf.add(tf.matmul(W2, A1), b2)  # Z2 = np.dot(W2, a1) + b2
    A2 = tf.nn.relu(Z2)                 # A2 = relu(Z2)
    Z3 = tf.add(tf.matmul(W3, A2), b3)  # Z3 = np.dot(W3,Z2) + b3

    return Z3
# 测试一下
ops.reset_default_graph() #用于清除默认图形堆栈并重置全局默认图形。
with tf.compat.v1.Session ()  as sess:
    X,Y = create_placeholders(12288,6)
    parameters = initialize_parameters()
    Z3 = forward_propagation(X,parameters)
    print("Z3 = " + str(Z3))

输出如下：
在这里插入图片描述
可能已经注意到，正向传播不会输出任何缓存（cache）。当我们开始进行反向传播时，你将在下面理解为什么。

计算成本

使用如下代码可以计算成本tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = ..., labels = ...))。

重要的是要知道tf.nn.softmax_cross_entropy_with_logits的"logits"和"labels"输入应具有一样的维度（数据数，类别数）。需要转换了Z3和Y。
此外，tf.reduce_mean是对所以数据进行求和。

# 定义计算成本函数
def compute_cost(Z3, Y):
    """
    计算成本

    参数：
        Z3 - 前向传播的结果
        Y - 标签，一个占位符，和Z3的维度相同

    返回：
        cost - 成本值


    """
    logits = tf.transpose(Z3)  # 转置
    labels = tf.transpose(Y)  # 转置

    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels))

    return cost
# 测试一下
ops.reset_default_graph()

with tf.compat.v1.Session() as sess:
    X, Y = create_placeholders(12288, 6)
    parameters = initialize_parameters()
    Z3 = forward_propagation(X, parameters)
    cost = compute_cost(Z3, Y)
    print("cost = " + str(cost))

输出如下：
在这里插入图片描述

反向传播和参数更新

得益于编程框架，所有反向传播和参数更新都在1行代码中处理。计算成本函数后，将创建一个“optimizer”对象。运行tf.session时，必须将此对象与成本函数一起调用，当被调用时，它将使用所选择的方法和学习速率对给定成本进行优化。
例如，对于梯度下降，优化器将是：

optimizer = tf.train.GradientDescentOptimizer(learning_rate = learning_rate).minimize(cost)

要进行优化，你可以执行以下操作：

_ , c = sess.run([optimizer,cost],feed_dict={X:mini_batch_X,Y:mini_batch_Y})

编写代码时，我们经常使用 _作为（throwaway）一次性变量来存储我们稍后不需要使用的值。这里，_具有我们不需要的优化器（optimizer）的评估值（并且c取值为成本变量的值）

建立模型

使用之前构建的函数组合成为一个模型

# 定义模型
def model(X_train, Y_train, X_test, Y_test,
          learning_rate=0.0001, num_epochs=1500, minibatch_size=32,
          print_cost=True, is_plot=True):
    """
    实现一个三层的TensorFlow神经网络：LINEAR->RELU->LINEAR->RELU->LINEAR->SOFTMAX

    参数：
        X_train - 训练集，维度为（输入大小（输入节点数量） = 12288, 样本数量 = 1080）
        Y_train - 训练集分类数量，维度为（输出大小(输出节点数量) = 6, 样本数量 = 1080）
        X_test - 测试集，维度为（输入大小（输入节点数量） = 12288, 样本数量 = 120）
        Y_test - 测试集分类数量，维度为（输出大小(输出节点数量) = 6, 样本数量 = 120）
        learning_rate - 学习速率
        num_epochs - 整个训练集的遍历次数
        mini_batch_size - 每个小批量数据集的大小
        print_cost - 是否打印成本，每100代打印一次
        is_plot - 是否绘制曲线图

    返回：
        parameters - 学习后的参数

    """
    ops.reset_default_graph()  # 能够重新运行模型而不覆盖tf变量
    tf.random.set_seed(1)
    # tf.set_random_seed(1)
    seed = 3
    (n_x, m) = X_train.shape  # 获取输入节点数量和样本数
    n_y = Y_train.shape[0]  # 获取输出节点数量
    costs = []  # 成本集

    # 给X和Y创建placeholder
    X, Y = create_placeholders(n_x, n_y)

    # 初始化参数
    parameters = initialize_parameters()

    # 前向传播
    Z3 = forward_propagation(X, parameters)

    # 计算成本
    cost = compute_cost(Z3, Y)

    # 反向传播，使用Adam优化
    optimizer = tf.compat.v1.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

    # 初始化所有的变量
    init = tf.compat.v1.global_variables_initializer()

    # 开始会话并计算
    with tf.compat.v1.Session() as sess:
        # 初始化
        sess.run(init)

        # 正常训练的循环
        for epoch in range(num_epochs):

            epoch_cost = 0  # 每代的成本
            num_minibatches = int(m / minibatch_size)  # minibatch的总数量
            seed = seed + 1
            minibatches = tf_utils.random_mini_batches(X_train, Y_train, minibatch_size, seed)

            for minibatch in minibatches:
                # 选择一个minibatch
                (minibatch_X, minibatch_Y) = minibatch

                # 数据已经准备好了，开始运行session
                _, minibatch_cost = sess.run([optimizer, cost], feed_dict={X: minibatch_X, Y: minibatch_Y})

                # 计算这个minibatch在这一代中所占的误差
                epoch_cost = epoch_cost + minibatch_cost / num_minibatches

            # 记录并打印成本
            ## 记录成本
            if epoch % 5 == 0:
                costs.append(epoch_cost)
                # 是否打印：
                if print_cost and epoch % 100 == 0:
                    print("epoch = " + str(epoch) + "    epoch_cost = " + str(epoch_cost))

        # 是否绘制图谱
        if is_plot:
            plt.plot(np.squeeze(costs))
            plt.ylabel('cost')
            plt.xlabel('iterations (per tens)')
            plt.title("Learning rate =" + str(learning_rate))
            plt.show()

        # 保存学习后的参数
        parameters = sess.run(parameters)
        print("参数已经保存到session。")

        # 计算当前的预测结果
        correct_prediction = tf.equal(tf.argmax(Z3), tf.argmax(Y))

        # 计算准确率
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))

        print("训练集的准确率：", accuracy.eval({X: X_train, Y: Y_train}))
        print("测试集的准确率:", accuracy.eval({X: X_test, Y: Y_test}))

        return parameters
# 测试一下
#开始时间
start_time = time.perf_counter()
#开始训练
parameters = model(X_train, Y_train, X_test, Y_test)
#结束时间
end_time = time.perf_counter()
#计算时差
print("CPU的执行时间 = " + str(end_time - start_time) + " 秒" )

输出如下：：
在这里插入图片描述

总结

Tensorflow是深度学习中经常使用的编程框架
Tensorflow中的两个主要对象类别是变量和运算符。
在Tensorflow中进行编码时，你必须执行以下步骤：
- 创建一个包含张量（变量，占位符…）和操作（tf.matmul，tf.add，…）的计算图
- 创建会话
- 初始化会话
- 运行会话以执行计算图
你可以像在model（）中看到的那样多次执行计算图
在“优化器”对象上运行会话时，将自动完成反向传播和优化。

完整代码

使用的python3.6解释器，tensorflow2.6版本。
tf.py内代码如下

import numpy as np
import h5py
import matplotlib.pyplot as plt
import tensorflow as tf

# 使用tensorflow2.几的版本运行一些用1.几版本写的代码时 加上这个
tf.compat.v1.disable_eager_execution()
from tensorflow.python.framework import ops
import tf_utils
import time




## 计算损失函数，定义一些变量
# tf.compat.v1.disable_eager_execution()  # 保证session.run()能够正常运行
# y_hat = tf.constant(36, name='y_hat')  # Define y_hat constant. Set to 36.
# y = tf.constant(39, name='y')  # Define y. Set to 39
#
# loss = tf.Variable((y - y_hat) ** 2, name='loss')
# init = tf.compat.v1.global_variables_initializer()
# with tf.compat.v1.Session() as session:  # Create a session and print the output
#     session.run(init)  # Initializes the variables
#     print(session.run(loss))

# 测试简单的例子
# a = tf.constant(2)
# b = tf.constant(10)
# c = tf.multiply(a, b)
#
# print(c)
# sess = tf.compat.v1.Session()
#
# print(sess.run(c))
#
# # 利用feed_dict来改变x的值
# x = tf.compat.v1.placeholder(tf.int64, name="x")
# print(sess.run(2 * x, feed_dict={x: 3}))
# sess.close()

# 定义线性函数
def linear_function():
    """
    实现一个线性功能：
        初始化W，类型为tensor的随机变量，维度为(4,3)
        初始化X，类型为tensor的随机变量，维度为(3,1)
        初始化b，类型为tensor的随机变量，维度为(4,1)
    返回：
        result - 运行了session后的结果，运行的是Y = WX + b

    """

    X = np.random.randn(3, 1)
    W = np.random.randn(4, 3)
    b = np.random.randn(4, 1)

    Y = tf.add(tf.matmul(W, X), b)  # tf.matmul是矩阵乘法
    # Y = tf.matmul(W,X) + b #也可以以写成这样子

    # 创建一个session并运行它
    sess = tf.compat.v1.Session()# tf2.几的版本写法
    result = sess.run(Y)

    # session使用完毕，关闭它
    sess.close()

    return result
# 测试一下
# print("result = " +  str(linear_function()))

# 定义sigmoid计算
def sigmoid(z):
    """
    实现使用sigmoid函数计算z

    参数：
        z - 输入的值，标量或矢量

    返回：
        result - 用sigmoid计算z的值

    """

    # 创建一个占位符x，名字叫“x”
    x = tf.compat.v1.placeholder(tf.float32, name="x")

    # 计算sigmoid(z)
    sigmoid = tf.sigmoid(x)

    # 创建一个会话，使用方法二
    with tf.compat.v1.Session() as sess:
        result = sess.run(sigmoid, feed_dict={x: z})

    return result
# 测试一下该函数
# print ("sigmoid(0) = " + str(sigmoid(0)))
# print ("sigmoid(12) = " + str(sigmoid(12)))

# 成本函数
def cost(logits, labels):
    """
    Computes the cost using the sigmoid cross entropy

    Arguments:
    logits -- vector containing z, output of the last linear unit (before the final sigmoid activation)
    labels -- vector of labels y (1 or 0)

    Note: What we've been calling "z" and "y" in this class are respectively called "logits" and "labels"
    in the TensorFlow documentation. So logits will feed into z, and labels into y.

    Returns:
    cost -- runs the session of the cost (formula (2))
    """

    # Create the placeholders for "logits" (z) and "labels" (y) (approx. 2 lines)
    z = tf.compat.v1.placeholder(tf.float32, name="z")
    y = tf.compat.v1.placeholder(tf.float32, name="y")

    # Use the loss function (approx. 1 line)
    cost = tf.nn.sigmoid_cross_entropy_with_logits(logits=z, labels=y)

    # Create a session (approx. 1 line). See method 1 above.
    with tf.compat.v1.Session() as sess:
    # Run the session (approx. 1 line).
        cost = sess.run(cost, feed_dict={z: logits, y: labels})

    # Close the session (approx. 1 line). See method 1 above.
    sess.close()

    return cost
# # 测试一下
# logits = sigmoid(np.array([0.2,0.4,0.7,0.9]))
# cost = cost(logits, np.array([0,0,1,1]))
# print ("cost = " + str(cost))

# 定义独热编码
def one_hot_matrix(lables, C):
    """
    创建一个矩阵，其中第i行对应第i个类号，第j列对应第j个训练样本
    所以如果第j个样本对应着第i个标签，那么entry (i,j)将会是1

    参数：
        lables - 标签向量
        C - 分类数

    返回：
        one_hot - 独热矩阵

    """

    # 创建一个tf.constant，赋值为C，名字叫C
    C = tf.constant(C, name="C")

    # 使用tf.one_hot，注意一下axis
    one_hot_matrix = tf.one_hot(indices=lables, depth=C, axis=0)

    # 创建一个session
    sess = tf.compat.v1.Session()

    # 运行session
    one_hot = sess.run(one_hot_matrix)

    # 关闭session
    sess.close()

    return one_hot
# # 测试一下
# labels = np.array([1,2,3,0,2,1])
# one_hot = one_hot_matrix(labels,C=4)
# print(str(one_hot))

# 定义1的初始化
def ones(shape):
    """
    创建一个维度为shape的变量，其值全为1

    参数：
        shape - 你要创建的数组的维度

    返回：
        ones - 只包含1的数组
    """

    # 使用tf.ones()
    ones = tf.ones(shape)

    # 创建会话
    sess = tf.compat.v1.Session()
    # 运行会话
    ones = sess.run(ones)

    # 关闭会话
    sess.close()

    return ones
# # 测试一下
# print ("ones = " + str(ones([3])))

shou_zhi.py内代码如下：

import numpy as np
import h5py
import matplotlib.pyplot as plt
import tensorflow as tf

# 使用tensorflow2.几的版本运行一些用1.几版本写的代码时 加上这个
tf.compat.v1.disable_eager_execution()
from tensorflow.python.framework import ops
import tf_utils
import time

#构建网络,加载数据集
X_train_orig , Y_train_orig , X_test_orig , Y_test_orig , classes = tf_utils.load_dataset()
# # 展示排序为0的图片
# index = 0
# plt.imshow(X_train_orig[index])
# plt.show()
# print("y = " + str(np.squeeze(Y_train_orig[:, index])))

# 数据处理
X_train_flatten = X_train_orig.reshape(X_train_orig.shape[0], -1).T  # 每一列就是一个样本
X_test_flatten = X_test_orig.reshape(X_test_orig.shape[0], -1).T
# 归一化数据
X_train = X_train_flatten / 255
X_test = X_test_flatten / 255
# 转换为独热矩阵
Y_train = tf_utils.convert_to_one_hot(Y_train_orig, 6)
Y_test = tf_utils.convert_to_one_hot(Y_test_orig, 6)
# print("训练集样本数 = " + str(X_train.shape[1]))
# print("测试集样本数 = " + str(X_test.shape[1]))
# print("X_train.shape: " + str(X_train.shape))
# print("Y_train.shape: " + str(Y_train.shape))
# print("X_test.shape: " + str(X_test.shape))
# print("Y_test.shape: " + str(Y_test.shape))

# 创建占位符
def create_placeholders(n_x, n_y):
    """
    为TensorFlow会话创建占位符
    参数：
        n_x - 一个实数，图片向量的大小（64*64*3 = 12288）
        n_y - 一个实数，分类数（从0到5，所以n_y = 6）

    返回：
        X - 一个数据输入的占位符，维度为[n_x, None]，dtype = "float"
        Y - 一个对应输入的标签的占位符，维度为[n_Y,None]，dtype = "float"

    提示：
        使用None，因为它让我们可以灵活处理占位符提供的样本数量。事实上，测试/训练期间的样本数量是不同的。

    """

    X = tf.compat.v1.placeholder(tf.float32, [n_x, None], name="X")  # 这里注意下tf版本，和原博客不一致
    Y = tf.compat.v1.placeholder(tf.float32, [n_y, None], name="Y")

    return X, Y
# # 测试一下
# X, Y = create_placeholders(12288, 6)
# print("X = " + str(X))
# print("Y = " + str(Y))

# 初始化参数
def initialize_parameters():

    """
    初始化神经网络的参数，参数的维度如下：
        W1 : [25, 12288]
        b1 : [25, 1]
        W2 : [12, 25]
        b2 : [12, 1]
        W3 : [6, 12]
        b3 : [6, 1]

    返回：
        parameters - 包含了W和b的字典

    """

    W1 = tf.compat.v1.get_variable("W1", [25, 12288], initializer=tf.initializers.GlorotUniform(seed=1))
    b1 = tf.compat.v1.get_variable("b1", [25, 1], initializer=tf.zeros_initializer())
    W2 = tf.compat.v1.get_variable("W2", [12, 25], initializer=tf.initializers.GlorotUniform(seed=1))
    b2 = tf.compat.v1.get_variable("b2", [12, 1], initializer=tf.zeros_initializer())
    W3 = tf.compat.v1.get_variable("W3", [6, 12], initializer=tf.initializers.GlorotUniform(seed=1))
    b3 = tf.compat.v1.get_variable("b3", [6, 1], initializer=tf.zeros_initializer())

    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2,
                  "W3": W3,
                  "b3": b3}

    return parameters
# # 测试一下
# ops.reset_default_graph()  # 用于清除默认图形堆栈并重置全局默认图形。
#
# with tf.compat.v1.Session()  as sess:
#     parameters = initialize_parameters()
#     print("W1 = " + str(parameters["W1"]))
#     print("b1 = " + str(parameters["b1"]))
#     print("W2 = " + str(parameters["W2"]))
#     print("b2 = " + str(parameters["b2"]))

# 定义前向传播
def forward_propagation(X, parameters):
    """
    实现一个模型的前向传播，模型结构为LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SOFTMAX

    参数：
        X - 输入数据的占位符，维度为（输入节点数量，样本数量）
        parameters - 包含了W和b的参数的字典

    返回：
        Z3 - 最后一个LINEAR节点的输出

    """

    W1 = parameters['W1']
    b1 = parameters['b1']
    W2 = parameters['W2']
    b2 = parameters['b2']
    W3 = parameters['W3']
    b3 = parameters['b3']
                                        # 也可以这样写，使用numpy的写法
    Z1 = tf.add(tf.matmul(W1, X), b1)   # Z1 = np.dot(W1, X) + b1
    A1 = tf.nn.relu(Z1)                 # A1 = relu(Z1)
    Z2 = tf.add(tf.matmul(W2, A1), b2)  # Z2 = np.dot(W2, a1) + b2
    A2 = tf.nn.relu(Z2)                 # A2 = relu(Z2)
    Z3 = tf.add(tf.matmul(W3, A2), b3)  # Z3 = np.dot(W3,Z2) + b3

    return Z3
# # 测试一下
# ops.reset_default_graph() #用于清除默认图形堆栈并重置全局默认图形。
# with tf.compat.v1.Session ()  as sess:
#     X,Y = create_placeholders(12288,6)
#     parameters = initialize_parameters()
#     Z3 = forward_propagation(X,parameters)
#     print("Z3 = " + str(Z3))

# 定义计算成本函数
def compute_cost(Z3, Y):
    """
    计算成本

    参数：
        Z3 - 前向传播的结果
        Y - 标签，一个占位符，和Z3的维度相同

    返回：
        cost - 成本值


    """
    logits = tf.transpose(Z3)  # 转置
    labels = tf.transpose(Y)  # 转置

    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels))

    return cost
# # 测试一下
# ops.reset_default_graph()
#
# with tf.compat.v1.Session() as sess:
#     X, Y = create_placeholders(12288, 6)
#     parameters = initialize_parameters()
#     Z3 = forward_propagation(X, parameters)
#     cost = compute_cost(Z3, Y)
#     print("cost = " + str(cost))

# 定义模型
def model(X_train, Y_train, X_test, Y_test,
          learning_rate=0.0001, num_epochs=1500, minibatch_size=32,
          print_cost=True, is_plot=True):
    """
    实现一个三层的TensorFlow神经网络：LINEAR->RELU->LINEAR->RELU->LINEAR->SOFTMAX

    参数：
        X_train - 训练集，维度为（输入大小（输入节点数量） = 12288, 样本数量 = 1080）
        Y_train - 训练集分类数量，维度为（输出大小(输出节点数量) = 6, 样本数量 = 1080）
        X_test - 测试集，维度为（输入大小（输入节点数量） = 12288, 样本数量 = 120）
        Y_test - 测试集分类数量，维度为（输出大小(输出节点数量) = 6, 样本数量 = 120）
        learning_rate - 学习速率
        num_epochs - 整个训练集的遍历次数
        mini_batch_size - 每个小批量数据集的大小
        print_cost - 是否打印成本，每100代打印一次
        is_plot - 是否绘制曲线图

    返回：
        parameters - 学习后的参数

    """
    ops.reset_default_graph()  # 能够重新运行模型而不覆盖tf变量
    tf.random.set_seed(1)
    # tf.set_random_seed(1)
    seed = 3
    (n_x, m) = X_train.shape  # 获取输入节点数量和样本数
    n_y = Y_train.shape[0]  # 获取输出节点数量
    costs = []  # 成本集

    # 给X和Y创建placeholder
    X, Y = create_placeholders(n_x, n_y)

    # 初始化参数
    parameters = initialize_parameters()

    # 前向传播
    Z3 = forward_propagation(X, parameters)

    # 计算成本
    cost = compute_cost(Z3, Y)

    # 反向传播，使用Adam优化
    optimizer = tf.compat.v1.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

    # 初始化所有的变量
    init = tf.compat.v1.global_variables_initializer()

    # 开始会话并计算
    with tf.compat.v1.Session() as sess:
        # 初始化
        sess.run(init)

        # 正常训练的循环
        for epoch in range(num_epochs):

            epoch_cost = 0  # 每代的成本
            num_minibatches = int(m / minibatch_size)  # minibatch的总数量
            seed = seed + 1
            minibatches = tf_utils.random_mini_batches(X_train, Y_train, minibatch_size, seed)

            for minibatch in minibatches:
                # 选择一个minibatch
                (minibatch_X, minibatch_Y) = minibatch

                # 数据已经准备好了，开始运行session
                _, minibatch_cost = sess.run([optimizer, cost], feed_dict={X: minibatch_X, Y: minibatch_Y})

                # 计算这个minibatch在这一代中所占的误差
                epoch_cost = epoch_cost + minibatch_cost / num_minibatches

            # 记录并打印成本
            ## 记录成本
            if epoch % 5 == 0:
                costs.append(epoch_cost)
                # 是否打印：
                if print_cost and epoch % 100 == 0:
                    print("epoch = " + str(epoch) + "    epoch_cost = " + str(epoch_cost))

        # 是否绘制图谱
        if is_plot:
            plt.plot(np.squeeze(costs))
            plt.ylabel('cost')
            plt.xlabel('iterations (per tens)')
            plt.title("Learning rate =" + str(learning_rate))
            plt.show()

        # 保存学习后的参数
        parameters = sess.run(parameters)
        print("参数已经保存到session。")

        # 计算当前的预测结果
        correct_prediction = tf.equal(tf.argmax(Z3), tf.argmax(Y))

        # 计算准确率
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))

        print("训练集的准确率：", accuracy.eval({X: X_train, Y: Y_train}))
        print("测试集的准确率:", accuracy.eval({X: X_test, Y: Y_test}))

        return parameters
# 测试一下
#开始时间
start_time = time.perf_counter()
#开始训练
parameters = model(X_train, Y_train, X_test, Y_test)
#结束时间
end_time = time.perf_counter()
#计算时差
print("CPU的执行时间 = " + str(end_time - start_time) + " 秒" )