tensorflow学习笔记（四）

最新推荐文章于 2024-04-22 00:01:18 发布

段段努力上分

最新推荐文章于 2024-04-22 00:01:18 发布

阅读量326

点赞数

文章标签： python 深度学习 tensorflow

本文链接：https://blog.csdn.net/weixin_44660348/article/details/113386942

版权

（听完北大曹健老师的课，特此复习）

一、本节课学习的点

预备知识
神经网络复杂度
指数衰减学习率
激活函数
损失函数
欠拟合与过拟合
正则化减少过拟合
优化器更新网络参数

二、预备知识

tf.where()
tf.where(条件语句，真返回A，假返回B)

import tensorflow as tf
a = tf.constant([1, 2, 3, 1, 1])
b = tf.constant([0, 1, 3, 4, 5])
c = tf.where(tf.greater(a, b), a, b)  # 若a>b，返回a对应位置的元素，否则返回b对应位置的元素
print("c：", c)
#输出：1>0,返回a中的1；2>1,返回a中的2...
>>c： tf.Tensor([1 2 3 4 5], shape=(5,), dtype=int32)

np.random.RandomState.rand()
返回一个0~1之间的随机数
RandomState()：随机数种子。功能：随机产生所需数据。

import numpy as np

rdm = np.random.RandomState(seed=1) #seed=常数，每次生成随机数相同
a = rdm.rand(2,3)
b = rdm.rand(2,3)
print("a:", a)
print("b:", b)
#输出:
>>>a: [[4.17022005e-01 7.20324493e-01 1.14374817e-04]
	  [3.02332573e-01 1.46755891e-01 9.23385948e-02]]
   b: [[0.18626021 0.34556073 0.39676747]
      [0.53881673 0.41919451 0.6852195 ]]
#如果要使两次生成的随机数相同的话，必须设置两次seed

import numpy as np

rdm = np.random.RandomState(seed=1) #seed=常数，每次生成随机数相同
a = rdm.rand(2,3)
rdm = np.random.RandomState(seed=1) #seed=常数，每次生成随机数相同
b = rdm.rand(2,3)
print("a:", a)
print("b:", b)
#输出：
>>a: [[4.17022005e-01 7.20324493e-01 1.14374817e-04]
     [3.02332573e-01 1.46755891e-01 9.23385948e-02]]
  b: [[4.17022005e-01 7.20324493e-01 1.14374817e-04]
     [3.02332573e-01 1.46755891e-01 9.23385948e-02]]

np.vstack()
np.vstack(数组1，数组2) 功能：将两个数组按垂直方向叠加。

import numpy as np

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
c = np.vstack((a, b))
print("c:\n", c)
#输出：

>>c:
 [[1 2 3]
 [4 5 6]]

np.mgrid[ ]
np.mgrid[ 起始值 : 结束值 : 步长，起始值 : 结束值 : 步长 , … ]

用法：返回多维结构，常见的如2D图形，3D图形。第1返回值为第1维数据在最终结构中的分布。
第2返回值为第2维数据在最终结构中的分布，以此类推。
np.mgrid的用法

x.ravel( )
将x变为一维数组，“把 . 前变量拉直”
np.c_[ ]
np.c_[array1, array2] 把数组array1和数组array2配对后输出。

import numpy as np
import tensorflow as tf

# 生成等间隔数值点
x, y = np.mgrid[1:3:1, 2:4:0.5] #因为有两组，所以会生成二维数组
# 将x, y拉直，并合并配对为二维张量，生成二维坐标点
grid = np.c_[x.ravel(), y.ravel()]
print("x:\n", x)
print("y:\n", y)
print("x.ravel():\n", x.ravel())
print("y.ravel():\n", y.ravel())
print('grid:\n', grid)
#输出：
>>x = [[1. 1. 1. 1.] #x处在第0维，故按照axis=0的方向扩充
      [2. 2. 2. 2.]]
  y = [[2. 2.5 3. 3.5] #y处在第1维，故按照axis=1的方向扩充
      [2. 2.5 3. 3.5]]
  grid:
	  [[1. 2. ]
	  [1. 2.5]
	  [1. 3. ]
	  [1. 3.5]
	  [2. 2. ]
	  [2. 2.5]
	  [2. 3. ]
	  [2. 3.5]]

三、神经网络的复杂度

神经网络的复杂度分为时间复杂度与空间复杂度，空间复杂度由神经网络的层数与总的参数个数有关；时间复杂度与乘加运算的次数有关。
神经网络复杂度的运算方法

四、学习率

我的理解是：学习率是神经网络进行反向传播时与优化算法相结合更新神经网络中的参数的。
学习率的计算方法
但是一个问题来了，学习率的设定是个难题，设置过小的话收敛（达到最佳值）太慢，设置过大的话无法收敛。（有时会在最佳值周围反复横跳）
指数衰减学习率
可以先用较大的学习率，快速得到较优解，然后逐步减小学习率，使模型在训练后期稳定。

import tensorflow as tf
#利用指数衰减的方法更新学习率
w = tf.Variable(tf.constant(5, dtype=tf.float32))
#指数衰减学习率=初始学习率*学习率衰减率^(当前轮数/多少轮衰减一次)
# lr = 0.2
epoch = 25
LR_BASE = 0.2 #初始学习率
LR_DECAY = 0.99 #学习率衰减率
LR_STEP = 1 #多少轮衰减一次

for epoch in range(epoch):  # for epoch 定义顶层循环，表示对数据集循环epoch次，此例数据集数据仅有1个w,初始化时候constant赋值为5，循环40次迭代。
    with tf.GradientTape() as tape:  # with结构到grads框起了梯度的计算过程。
        lr = LR_BASE * LR_DECAY ** (epoch/LR_STEP)
        loss = tf.square(w + 1)
    grads = tape.gradient(loss, w)  # .gradient函数告知谁对谁求导

    w.assign_sub(lr * grads)  # .assign_sub 对变量做自减 即：w -= lr*grads 即 w = w - lr*grads
    print("After %s epoch,w is %f,loss is %f,lr is %f" % (epoch, w.numpy(), loss, lr))

学习率不断更新，收敛到最佳值

五、激活函数

我个人的理解是：激活函数的作用就是将网络中学习到的特征以我们想要的形式输出。
深入理解激活函数
在这里插入图片描述
常见的激活函数有sigmoid函数、Tanh函数、Relu函数、Leaky Relu函数等。

激活函数这个老师对于初学者的建议是：

首选relu激活函数；
学习率设置较小值；
输入特征标准化，即让输入特征满足以0为均值，1为标准差的正态分布；
初始参数中心化，即让随机生成的参数满足以0为均值，根号下（2/当前输入特征个数）当前层输入特征个数为标准差的正态分布。

六、损失函数（loss）

均方误差
在这里插入图片描述

'''
预测酸奶日销量y，x1、x2是影响日销量的因素。
预先采集的数据有：每日x1、x2和销量y_（即已知答案，最佳情况：产量=销量）
拟造数据集X,Y_： y_ = x1 + x2 噪声：-0.05 ~ +0.05 拟合可以预测销量的函数
'''
import tensorflow as tf
import numpy as np

SEED = 23455

rdm = np.random.RandomState(seed=SEED)  # 生成[0,1)之间的随机数
x = rdm.rand(32, 2)
y_ = [[x1 + x2 + (rdm.rand() / 10.0 - 0.05)] for (x1, x2) in x]  # 生成噪声[0,1)/10=[0,0.1); [0,0.1)-0.05=[-0.05,0.05)
x = tf.cast(x, dtype=tf.float32)#这里转换训练集的数据类型

w1 = tf.Variable(tf.random.normal([2, 1], stddev=1, seed=1))
#输入为（32，2），输出为（32，1），那么w应该为（2，1）
epoch = 15000
lr = 0.002

for epoch in range(epoch):
    with tf.GradientTape() as tape:
        y = tf.matmul(x, w1)
        loss_mse = tf.reduce_mean(tf.square(y_ - y)) #均方误差

    grads = tape.gradient(loss_mse, w1) #计算梯度
    w1.assign_sub(lr * grads) #更新参数

    if epoch % 500 == 0:
        print("After %d training steps,w1 is " % (epoch))
        print(w1.numpy(), "\n")
print("Final w1 is: ", w1.numpy())

对于上述例子，提出另外一个问题：
在这里插入图片描述
我们要使神经网络尽可能将利益最大化，就要尽可能的偏像获取更多的利润。

'''
预测酸奶销量，酸奶成本（COST）1元，酸奶利润（PROFIT）99元。
预测少了损失利润99元，大于预测多了损失成本1元。
预测少了损失大，希望生成的预测函数往多了预测。
自定义损失函数
'''
import tensorflow as tf
import numpy as np
#若是
SEED = 23455
COST = 1
PROFIT = 99

rdm = np.random.RandomState(SEED)
x = rdm.rand(32, 2)
y_ = [[x1 + x2 + (rdm.rand() / 10.0 - 0.05)] for (x1, x2) in x]  # 生成噪声[0,1)/10=[0,0.1); [0,0.1)-0.05=[-0.05,0.05)
x = tf.cast(x, dtype=tf.float32)

w1 = tf.Variable(tf.random.normal([2, 1], stddev=1, seed=1))

epoch = 10000
lr = 0.002

for epoch in range(epoch):
    with tf.GradientTape() as tape:
        y = tf.matmul(x, w1)
        loss = tf.reduce_sum(tf.where(tf.greater(y, y_), (y - y_) * COST, (y_ - y) * PROFIT))

    grads = tape.gradient(loss, w1)
    w1.assign_sub(lr * grads)

    if epoch % 500 == 0:
        print("After %d training steps,w1 is " % (epoch))
        print(w1.numpy(), "\n")
print("Final w1 is: ", w1.numpy())

# 自定义损失函数
# 酸奶成本1元， 酸奶利润99元
# 成本很低，利润很高，人们希望多预测些，生成模型系数大于1，往多了预测

交叉熵损失函数

softmax与交叉熵损失函数结合
在这里插入图片描述

七、欠拟合与过拟合

欠拟合就是模型没有很好地捕捉到数据特征，不能够很好地拟合数据，模型不具备泛化能力。
过拟合就是模型把数据学习的太彻底，以至于把噪声数据的特征也学习到了，这样就会导致在后期测试的时候不能够很好地识别数据，即不能正确的分类，模型泛化能力太差。
欠拟合、正确拟合与过拟合

欠拟合的解决方法：
增加输入特征项
增加网络参数
减少正则化参数
过拟合的解决方法：
数据清洗
增大训练集
采用正则化
增大正则化参数

利用正则化缓解过拟合现象

# 导入所需模块
import tensorflow as tf
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd

# 读入数据/标签 生成x_train y_train
df = pd.read_csv('dot.csv')
x_data = np.array(df[['x1', 'x2']])
y_data = np.array(df['y_c'])

x_train = np.vstack(x_data).reshape(-1,2)
y_train = np.vstack(y_data).reshape(-1,1)

Y_c = [['red' if y else 'blue'] for y in y_train]
#这里是将y_train中是1的元素设为red，0设为blue生成一个新的二维列表，用到下面可视化结果
# 转换x的数据类型，否则后面矩阵相乘时会因数据类型问题报错
x_train = tf.cast(x_train, tf.float32)
y_train = tf.cast(y_train, tf.float32)

# from_tensor_slices函数切分传入的张量的第一个维度，生成相应的数据集，使输入特征和标签值一一对应
train_db = tf.data.Dataset.from_tensor_slices((x_train, y_train)).batch(32)

# 生成神经网络的参数，输入层为2个神经元，隐藏层为11个神经元，1层隐藏层，输出层为1个神经元
# 用tf.Variable()保证参数可训练
w1 = tf.Variable(tf.random.normal([2, 11]), dtype=tf.float32)
b1 = tf.Variable(tf.constant(0.01, shape=[11]))

w2 = tf.Variable(tf.random.normal([11, 1]), dtype=tf.float32)
b2 = tf.Variable(tf.constant(0.01, shape=[1]))

lr = 0.01  # 学习率
epoch = 400  # 循环轮数

# 训练部分
for epoch in range(epoch):
    for step, (x_train, y_train) in enumerate(train_db):
        with tf.GradientTape() as tape:  # 记录梯度信息

            h1 = tf.matmul(x_train, w1) + b1  # 记录神经网络乘加运算
            h1 = tf.nn.relu(h1)
            y = tf.matmul(h1, w2) + b2

            # 采用均方误差损失函数mse = mean(sum(y-out)^2)
            loss = tf.reduce_mean(tf.square(y_train - y))

        # 计算loss对各个参数的梯度
        variables = [w1, b1, w2, b2]
        grads = tape.gradient(loss, variables)

        # 实现梯度更新
        # w1 = w1 - lr * w1_grad tape.gradient是自动求导结果与[w1, b1, w2, b2] 索引为0，1，2，3 
        w1.assign_sub(lr * grads[0])
        b1.assign_sub(lr * grads[1])
        w2.assign_sub(lr * grads[2])
        b2.assign_sub(lr * grads[3])

    # 每20个epoch，打印loss信息
    if epoch % 20 == 0:
        print('epoch:', epoch, 'loss:', float(loss))

# 预测部分
print("*******predict*******")
# xx在-3到3之间以步长为0.01，yy在-3到3之间以步长0.01,生成间隔数值点
xx, yy = np.mgrid[-3:3:.1, -3:3:.1] #xx与yy都是(60,60)的二维矩阵
# 将xx , yy拉直，并合并配对为二维张量，生成二维坐标点
grid = np.c_[xx.ravel(), yy.ravel()] #生成配对的坐标
grid = tf.cast(grid, tf.float32)
# 将网格坐标点喂入神经网络，进行预测，probs为输出的预测值
probs = []
for x_test in grid:
    # 使用训练好的参数进行预测
    h1 = tf.matmul([x_test], w1) + b1
    h1 = tf.nn.relu(h1)
    y = tf.matmul(h1, w2) + b2  # y为预测结果
    probs.append(y)

# 取第0列给x1，取第1列给x2
x1 = x_data[:, 0]
x2 = x_data[:, 1]
# probs的shape调整成xx的样子
probs = np.array(probs).reshape(xx.shape)
plt.scatter(x1, x2, color=np.squeeze(Y_c)) #squeeze去掉纬度是1的纬度,相当于去掉[['red'],[''blue]]的内层括号变为['red','blue']
# 把坐标xx yy和对应的值probs放入contour<[‘kɑntʊr]>函数，给probs值为0.5的所有点上色  plt点show后 显示的是红蓝点的分界线
plt.contour(xx, yy, probs, levels=[.5])
#绘制轮廓线，类于等高线，这里要求probs与xx，yy的shape相同
plt.show()

# 读入红蓝点，画出分割线，不包含正则化
# 不清楚的数据，建议print出来查看

在这里插入图片描述
发现存在过拟合现象，弧线不够平滑。
加入L2正则化后的代码

# 导入所需模块
import tensorflow as tf
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd

# 读入数据/标签 生成x_train y_train
df = pd.read_csv('dot.csv')
x_data = np.array(df[['x1', 'x2']])
y_data = np.array(df['y_c'])

x_train = x_data
y_train = y_data.reshape(-1, 1)

Y_c = [['red' if y else 'blue'] for y in y_train]

# 转换x的数据类型，否则后面矩阵相乘时会因数据类型问题报错
x_train = tf.cast(x_train, tf.float32)
y_train = tf.cast(y_train, tf.float32)

# from_tensor_slices函数切分传入的张量的第一个维度，生成相应的数据集，使输入特征和标签值一一对应
train_db = tf.data.Dataset.from_tensor_slices((x_train, y_train)).batch(32)

# 生成神经网络的参数，输入层为4个神经元，隐藏层为32个神经元，2层隐藏层，输出层为3个神经元
# 用tf.Variable()保证参数可训练
w1 = tf.Variable(tf.random.normal([2, 11]), dtype=tf.float32)
b1 = tf.Variable(tf.constant(0.01, shape=[11]))

w2 = tf.Variable(tf.random.normal([11, 1]), dtype=tf.float32)
b2 = tf.Variable(tf.constant(0.01, shape=[1]))

lr = 0.01  # 学习率为
epoch = 400  # 循环轮数

# 训练部分
for epoch in range(epoch):
    for step, (x_train, y_train) in enumerate(train_db):
        with tf.GradientTape() as tape:  # 记录梯度信息

            h1 = tf.matmul(x_train, w1) + b1  # 记录神经网络乘加运算
            h1 = tf.nn.relu(h1)
            y = tf.matmul(h1, w2) + b2

            # 采用均方误差损失函数mse = mean(sum(y-out)^2)
            loss_mse = tf.reduce_mean(tf.square(y_train - y))
            # 添加l2正则化
            loss_regularization = []
            # tf.nn.l2_loss(w)=sum(w ** 2) / 2
            loss_regularization.append(tf.nn.l2_loss(w1)) #l2正则化w1权重
            loss_regularization.append(tf.nn.l2_loss(w2)) #l2正则化w2权重
            # 求和
            # 例：x=tf.constant(([1,1,1],[1,1,1]))
            #   tf.reduce_sum(x)
            # >>>6
            # loss_regularization = tf.reduce_sum(tf.stack(loss_regularization))
            loss_regularization = tf.reduce_sum(loss_regularization)#求和
            loss = loss_mse + 0.03 * loss_regularization #REGULARIZER = 0.03
            #总loss包括均方误差loss加上w1与w2的l2正则化loss
        # 计算loss对各个参数的梯度
        variables = [w1, b1, w2, b2]
        grads = tape.gradient(loss, variables)

        # 实现梯度更新
        # w1 = w1 - lr * w1_grad
        w1.assign_sub(lr * grads[0])
        b1.assign_sub(lr * grads[1])
        w2.assign_sub(lr * grads[2])
        b2.assign_sub(lr * grads[3])

    # 每200个epoch，打印loss信息
    if epoch % 20 == 0:
        print('epoch:', epoch, 'loss:', float(loss))

# 预测部分
print("*******predict*******")
# xx在-3到3之间以步长为0.01，yy在-3到3之间以步长0.01,生成间隔数值点
xx, yy = np.mgrid[-3:3:.1, -3:3:.1]
# 将xx, yy拉直，并合并配对为二维张量，生成二维坐标点
grid = np.c_[xx.ravel(), yy.ravel()]
grid = tf.cast(grid, tf.float32)
# 将网格坐标点喂入神经网络，进行预测，probs为输出
probs = []
for x_predict in grid:
    # 使用训练好的参数进行预测
    h1 = tf.matmul([x_predict], w1) + b1
    h1 = tf.nn.relu(h1)
    y = tf.matmul(h1, w2) + b2  # y为预测结果
    probs.append(y)

# 取第0列给x1，取第1列给x2
x1 = x_data[:, 0]
x2 = x_data[:, 1]
# probs的shape调整成xx的样子
probs = np.array(probs).reshape(xx.shape)
plt.scatter(x1, x2, color=np.squeeze(Y_c))
# 把坐标xx yy和对应的值probs放入contour<[‘kɑntʊr]>函数，给probs值为0.5的所有点上色  plt点show后 显示的是红蓝点的分界线
plt.contour(xx, yy, probs, levels=[.5])
plt.show()

# 读入红蓝点，画出分割线，包含正则化
# 不清楚的数据，建议print出来查看

在这里插入图片描述
发现，弧线变得平滑许多，过拟合现象有所缓解。

八、优化器

优化器的作用就是结合损失函数来更新权重，使其更快的收敛到最佳值。
在这里插入图片描述
SGD（无momentum），常用的梯度下降法。

应用到代码中就是：

SGDM（含momentum的SGD），在SGD基础上增加一阶动量

Adagrad，在SGD基础上增加二阶动量

应用到代码中：

RMSProp，SGD基础上增加二阶动量
在这里插入图片描述
应用到代码中：