tensorflow2.0基础

我的小白兔奶糖

已于 2024-01-06 21:48:39 修改

阅读量29

点赞数

分类专栏：人工智能预测股票文章标签： tensorflow 深度学习人工智能

于 2023-11-13 21:53:21 首次发布

本文链接：https://blog.csdn.net/c07290/article/details/134387590

版权

人工智能预测股票专栏收录该内容

6 篇文章 0 订阅

订阅专栏

更多精彩内容详见个人量化交易专辑索引

tensorflow2.0基础

模型

选择并构建模型

model = tf.keras.Sequential([
    tf.keras.layers.Dense(16, activation='relu')
    tf.keras.layers.Dense(1, activation='sigmoid')
])

配置模型

model.compile(optimizer=tf.keras.optimizers.Adam(),
              loss=tf.keras.losses.BinaryCrossentropy(from_logits=False),
              metrics=['binary_accuracy'])

参数optimizer：字符串(优化器的名称)或优化器实例。
参数loss：损失函数，如果模型有多个输出，您可以通过传递字典或损失列表来对每个输出使用不同的损失，由模型最小化的损失值将是所有独立损失的总和。
参数metrics：在训练和测试期间，模型要评估的度量标准列表，要为一个多输出模型的不同输出指定不同的度量，可以传递一个字典。

训练模型

callbacks_list = [tf.keras.callbacks.EarlyStopping(monitor='val_binary_crossentropy', patience=100)]
simple_history = model.fit(x_train, y_train, 
        validation_data=[x_test, y_test],
        epochs=2048,
        batch_size=512,
        verbose=2,
        callbacks=callbacks_list)

参数x：训练集输入
参数y：训练集输出
参数validation_data：测试集输入和输出，用于评估损失的数据和每个epoch结束时的任何模型度量
参数epochs：训练模型的迭代数
参数batch_size：每次梯度更新的样本数
参数verbose：选择打印的信息量
参数callbacks：每次epoch结束调用

评价模型

model.evaluate(x_test, y_test, batch_size=1, verbose=2)

用模式预测

predictions = model.predict(x_data)
with open('result.csv', 'w', newline='') as csvfile:
    writer = csv.writer(csvfile)
    for prediction in predictions:
        writer.writerow(prediction[0])

核心网络层

Dense

全连接层实现的运算为
output=activation(dot(input,kernel)+bias)

tf.keras.layers.Dense(
    units,                                 # 正整数，输出空间的维数
    activation=None,                       # 激活函数，不指定则没有
    use_bias=True,						   # 布尔值，是否使用偏移向量
    kernel_initializer='glorot_uniform',   # 核权重矩阵的初始值设定项
    bias_initializer='zeros',              # 偏差向量的初始值设定项
    kernel_regularizer=None,               # 正则化函数应用于核权矩阵
    bias_regularizer=None,                 # 应用于偏差向量的正则化函数
    activity_regularizer=None,             # Regularizer function applied to the output of the layer (its "activation")
    kernel_constraint=None,                # Constraint function applied to the kernel weights matrix.
    bias_constraint=None, **kwargs         # Constraint function applied to the bias vector
)

Activation

keras.layers.Activation(activation)

激活函数可以通过设置单独的激活层实现，也可以在构造层对象时通过传递 activation 参数实现：

from keras.layers import Activation, Dense

model.add(Dense(64))
model.add(Activation('tanh'))
# 等价于：
model.add(Dense(64, activation='tanh'))

Flatten

将输入展平。不影响批量大小。

keras.layers.Flatten(data_format=None)

model = Sequential()
model.add(Conv2D(64, (3, 3),
                 input_shape=(3, 32, 32), padding='same',))
# 现在：model.output_shape == (None, 64, 32, 32)

model.add(Flatten())
# 现在：model.output_shape == (None, 65536)

Dropout

Dropout 包括在训练中每次更新时，将输入单元的按比率随机设置为0，这有助于防止过拟合。

keras.layers.Dropout(rate, noise_shape=None, seed=None)

rate: 在 0 和 1 之间浮动。需要丢弃的输入比例。
noise_shape: 1D 整数张量，表示将与输入相乘的二进制 dropout 掩层的形状。例如，如果你的输入尺寸为 (batch_size, timesteps, features)，然后你希望 dropout 掩层在所有时间步都是一样的，你可以使用 noise_shape=(batch_size, 1, features)。
seed: 一个作为随机种子的 Python 整数。

Embedding

Embedding层本质也是一个映射，不过不是映射为on-hot编码，而是映射为一个指定维度的向量，该向量是一个变量，通过学习寻找到最优值

tf.keras.layers.Embedding(
    input_dim,
    output_dim,
    embeddings_initializer='uniform',
    embeddings_regularizer=None,
    activity_regularizer=None,
    embeddings_constraint=None,
    mask_zero=False,
    input_length=None,
    **kwargs
)

GlobalAveragePooling1D

为时域信号施加全局平均值池化
输入shape：形如（samples，steps，features）的3D张量
输出shape：形如(samples, features)的2D张量

tf.keras.layers.GlobalAveragePooling1D(
    data_format='channels_last', **kwargs
)

激活函数

relu

keras.activations.relu(x, alpha=0.0, max_value=None, threshold=0.0)

在这里插入图片描述

使用默认值时，它返回逐元素的 max(x, 0)。
否则，它遵循：
如果 x >= max_value：f(x) = max_value，
如果 threshold <= x < max_value：f(x) = x，
否则：f(x) = alpha * (x - threshold)。

优点：相较于sigmoid和tanh函数，ReLU对于随机梯度下降的收敛有巨大的加速作用。在一定程度上缓解了神经网络的梯度消失问题。
优点：sigmoid和tanh神经元含有指数运算等耗费计算资源的操作，而ReLU可以简单地通过对一个矩阵进行阈值计算得到。
缺点：ReLU输出非零中心化
缺点：在训练的时候，ReLU单元比较脆弱并且可能“死掉”。举例来说，当一个很大的梯度流过ReLU的神经元的时候，可能会导致梯度更新到一种特别的状态，在这种状态下神经元将无法被其他任何数据点再次激活。如果这种情况发生，那么从此所以流过这个神经元的梯度将都变成0。也就是说，这个ReLU单元在训练中将不可逆转的死亡，因为这导致了数据多样化的丢失。例如，如果学习率设置得太高，可能会发现网络中40%的神经元都会死掉（在整个训练集中这些神经元都不会被激活）。通过合理设置学习率，这种情况的发生概率会降低。

elu

keras.activations.elu(x, alpha=1.0)

alpha：一个标量，表示负数部分的斜率。
在这里插入图片描述
ELU具有以下的优点：
1、将前面单元输入的激活值均值控制在0
2、让激活函数的负值部分也可以被使用了（这意思应该是之前的激活函数，负值部分几乎不携带信息，特别是ReLU）

sigmoid

tf.keras.activations.sigmoid(x)

在这里插入图片描述

在历史上，sigmoid函数非常常用，这是因为它对于神经元的激活频率有良好的解释：从完全不激活（0）到在求和后的最大频率处的完全饱和（saturated）的激活（1）。然而现在sigmoid函数已经不太受欢迎，实际很少使用了，这是因为它有两个主要缺点：
（1）Sigmoid函数饱和使梯度消失。
（2）Sigmoid函数的输出不是零中心的。
（3）exp()指数计算复杂度较大

softmax

keras.activations.softmax(x, axis=-1)

axis：整数，代表softmax所作用的维度

在二分类任务时，经常使用sigmoid激活函数。而在处理多分类问题的时候，需要使用softmax函数。它的输出有两条规则:
（1）每一项的区间范围的(0,1)
（2）所有项相加的和为1.

tanh

keras.activations.tanh(x)

在这里插入图片描述
它将实数值压缩到[-1,1]之间。和sigmoid神经元一样，它也存在饱和问题，但是和sigmoid神经元不同的是，它的输出是零中心的。因此，在实际操作中，tanh非线性函数比sigmoid非线性函数更受欢迎。注意tanh神经元是一个简单放大的sigmoid神经元。
但不足的是，tanh函数仍未解决梯度消失问题。

损失函数

BinaryCrossentropy

二值交叉熵用于二分类问题
在这里插入图片描述

tf.keras.losses.BinaryCrossentropy(
    from_logits=False, label_smoothing=0.0, axis=-1,
    reduction=losses_utils.ReductionV2.AUTO, name='binary_crossentropy'
)

# Example 1:(batch_size = 1, number of samples = 4)
y_true = [0, 1, 0, 0]
y_pred = [-18.6, 0.51, 2.94, -12.8]
bce = tf.keras.losses.BinaryCrossentropy(from_logits=True)
bce(y_true, y_pred).numpy()
0.865

SparseCategoricalCrossentropy

计算标签和预测之间的交叉熵损失

tf.keras.losses.SparseCategoricalCrossentropy(
    from_logits=False, reduction=losses_utils.ReductionV2.AUTO,
    name='sparse_categorical_crossentropy'
)

from_logits y_pred 是否预期为 logits 张量。默认情况下，我们假设 y_pred 对概率分布进行编码。

y_true = [1, 2]
y_pred = [[0.05, 0.95, 0], [0.1, 0.8, 0.1]]
# Using 'auto'/'sum_over_batch_size' reduction type.
scce = tf.keras.losses.SparseCategoricalCrossentropy()
scce(y_true, y_pred).numpy()
1.177

优化器

Adam

tf.keras.optimizers.Adam(
    learning_rate=0.001,  # 学习率 默认 0.001
    beta_1=0.9,  # 一阶矩估计的指数衰减率。 默认 0.9
    beta_2=0.999,  # 二阶矩估计的指数衰减率。 默认 0.999
    epsilon=1e-07,  # 模糊因子。如果None，默认为K.epsilon()。该参数是非常小的数，其为了防止在实现中除以零。默认 1e-07
    amsgrad=False,  # 布尔。是否应用该算法的AMSGrad变体。默认 False
    name='Adam',  # 应用渐变时创建的操作的可选名称。
    **kwargs)

在这里插入图片描述
参数leanrning_rate对应于学习率或者步长α；参数beta_1，beta_2对应于β1，β2，表示梯度的带权平均和带权方差，初始为0向量；参数epsilon对应于ϵ。

性能监控

Accuracy

计算预测与真实值的准确度。
例如，如果y_true为[1、2、3、4]，而y_pred为[0、2、3、4]，则精度为3/4或 .75 。如果将权重指定为 [1、1、0、0] ，则精度将为1/2或 .5 ,权重0是用来屏蔽的。

Mean

计算给定值的（加权）平均值。
例如，如果值为[1、3、5、7]，则平均值为4。如果权重指定为[1、1、0、0]，则平均值为2。

防止过拟合

降低网络容量
添加权重正则化

l2_model = tf.keras.Sequential([
    layers.Dense(512, activation='elu',
                 kernel_regularizer=regularizers.l2(0.001),
                 input_shape=(FEATURES,)),
    layers.Dense(512, activation='elu',
                 kernel_regularizer=regularizers.l2(0.001)),
    layers.Dense(512, activation='elu',
                 kernel_regularizer=regularizers.l2(0.001)),
    layers.Dense(512, activation='elu',
                 kernel_regularizer=regularizers.l2(0.001)),
    layers.Dense(1)
])

添加随机失活

dropout_model = tf.keras.Sequential([
    layers.Dense(512, activation='elu', input_shape=(FEATURES,)),
    layers.Dropout(0.5),
    layers.Dense(512, activation='elu'),
    layers.Dropout(0.5),
    layers.Dense(512, activation='elu'),
    layers.Dropout(0.5),
    layers.Dense(512, activation='elu'),
    layers.Dropout(0.5),
    layers.Dense(1)
])

数据增强（待添加）
批次归一化（待添加）
tf.keras.layers.BatchNormalization

详见https://tensorflow.google.cn/tutorials/keras/overfit_and_underfit?hl=zh-cn

超参数调节

隐藏层的数量和宽度

加大layers层数能提高训练集的准确率，每层学习的可能是对不同层次的抽象；
加大节点数量能提高训练集的准确率，通常和kernel_regularizer及Dropout联合使用，防止过拟合。

batch_size

batch的size设置的不能太大也不能太小，因此实际工程中最常用的就是mini-batch，一般size设置为几十或者几百。
对于二阶优化算法，减小batch换来的收敛速度提升远不如引入大量噪声导致的性能下降，因此在使用二阶优化算法时，往往要采用大batch哦。此时往往batch设置成几千甚至一两万才能发挥出最佳性能。
GPU对2的幂次的batch可以发挥更佳的性能，因此设置成16、32、64、128…时往往要比设置为整10、整100的倍数时表现更优。

epochs

epochs是训练次数，训练次数越多训练集的准确率越高，但有存在过拟合的风险，会导致模型过度关注细节，缺乏很好的抽象，测试集准确率越低。

自动调节（待添加）

保存和恢复模型

仅保存和恢复权重

# Save the weights
model.save_weights('./checkpoints/my_checkpoint')

# Create a new model instance
model = create_model()

# Restore the weights
model.load_weights('./checkpoints/my_checkpoint')

# Evaluate the model
loss, acc = model.evaluate(test_images, test_labels, verbose=2)
print("Restored model, accuracy: {:5.2f}%".format(100 * acc))

保存和恢复整个模型

# Create and train a new model instance.
model = create_model()
model.fit(train_images, train_labels, epochs=5)

# Save the entire model to a HDF5 file.
# The '.h5' extension indicates that the model should be saved to HDF5.
model.save('my_model.h5')

# Recreate the exact same model, including its weights and the optimizer
new_model = tf.keras.models.load_model('my_model.h5')

# Show the model architecture
new_model.summary()

在训练期间保存模型（以 checkpoints 形式保存）

# Include the epoch in the file name (uses `str.format`)
checkpoint_path = "training_2/cp-{epoch:04d}.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)

batch_size = 32

# Create a callback that saves the model's weights every 5 epochs
cp_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_path, 
    verbose=1, 
    save_weights_only=True,
    save_freq=5*batch_size)

# Create a new model instance
model = create_model()

# Save the weights using the `checkpoint_path` format
model.save_weights(checkpoint_path.format(epoch=0))

# Train the model with the new callback
model.fit(train_images, 
          train_labels,
          epochs=50, 
          batch_size=batch_size, 
          callbacks=[cp_callback],
          validation_data=(test_images, test_labels),
          verbose=0)

详见https://tensorflow.google.cn/tutorials/keras/save_and_load?hl=zh-cn