深度学习训练损失为none_深度学习 - 提升训练质量的技巧合集

最新推荐文章于 2024-05-28 10:39:13 发布

weixin_39978444

最新推荐文章于 2024-05-28 10:39:13 发布

阅读量544

点赞数 1

文章标签：深度学习训练损失为none

在深度学习中经常出现一些问题导致训练出来的效果不佳，这篇文章就说一说如何提升网络训练的质量。

索引：

欠拟合
过拟合
如何检测过拟合
如何过拟合
动量梯度下降
学习率自适应
提前停止
Dropout
随机梯度下降

欠拟合 underfitting

就是模型的复杂度小于真实的复杂度，因此模型不能够表达真实的情况。如果遇到无论怎么训练，训练的accuracy很低，测试的accuracy很低，loss也下不去，这个时候很可能出现了underfitting。可以使用容量更大的模型来表达更加复杂的情况，或者更多的层数以及更多的节点。

1.png

提高模型容量(model capacity)如下图可以解决欠拟合，然而在实际的应用中过拟合的情况更多

2.png

过拟合Overfitting(Generalization Performance泛化能力)

模型复杂度大于真实模型的复杂度。表现为训练loss和训练accuracy都很好，但是测试accuracy不好。

4.png

5.png

如何检测overfitting：

使用交叉验证，将数据集分为Train、Validation、Test三个部分，其中Validation做模型参数的挑选，test做最后的性能检测
使用K-fold方式，将数据集划分为K份，每次去K-1份用来做train，一份用来做validation，每个epoch切换train和validation的数据集，这样既防止了死记硬背又防止了记忆的特性。这样会对网络有一定的提升(提升不算很大)，Kera是提供了一个很方便的方法：network.fit(db_train, epochs=6, validation_split=0.1, validation_freq=2) 会将数据按照0.1和0.9来分。

import  tensorflow as tffrom    tensorflow.keras import datasets, layers, optimizers, Sequential, metricsdef preprocess(x, y):    """    x is a simple image, not a batch    """    x = tf.cast(x, dtype=tf.float32) / 255.    x = tf.reshape(x, [28*28])    y = tf.cast(y, dtype=tf.int32)    y = tf.one_hot(y, depth=10)    return x,ybatchsz = 128(x, y), (x_test, y_test) = datasets.mnist.load_data()print('datasets:', x.shape, y.shape, x.min(), x.max())idx = tf.range(60000)idx = tf.random.shuffle(idx)x_train, y_train = tf.gather(x, idx[:50000]), tf.gather(y, idx[:50000])x_val, y_val = tf.gather(x, idx[-10000:]) , tf.gather(y, idx[-10000:])print(x_train.shape, y_train.shape, x_val.shape, y_val.shape)## traindb_train = tf.data.Dataset.from_tensor_slices((x_train,y_train))db_train = db_train.map(preprocess).shuffle(50000).batch(batchsz)db_val = tf.data.Dataset.from_tensor_slices((x_val,y_val))db_val = db_val.map(preprocess).shuffle(10000).batch(batchsz)db_test = tf.data.Dataset.from_tensor_slices((x_test, y_test))db_test = db_test.map(preprocess).batch(batchsz) sample = next(iter(db_train))print(sample[0].shape, sample[1].shape)network = Sequential([layers.Dense(256, activation='relu'),                     layers.Dense(128, activation='relu'),                     layers.Dense(64, activation='relu'),                     layers.Dense(32, activation='relu'),                     layers.Dense(10)])network.build(input_shape=(None, 28*28))network.summary()network.compile(optimizer=optimizers.Adam(lr=0.01),      loss=tf.losses.CategoricalCrossentropy(from_logits=True),      metrics=['accuracy']   )network.fit(db_train, epochs=6, validation_data=db_val, validation_freq=2)print('Test performance:') network.evaluate(db_test)sample = next(iter(db_test))x = sample[0]y = sample[1] # one-hotpred = network.predict(x) # [b, 10]# convert back to number y = tf.argmax(y, axis=1)pred = tf.argmax(pred, axis=1)print(pred)print(y)

如何减轻Overfitting

原则：如果不是必要的就选择最小的。

主流的做法：

提供更多的数据
降低模型的复杂度，数据集的大小和网络的大小是相对的
Dropout
Data argumentation
Early Stopping 使用Validation set来做一个提前的终结
Regularization
Regularization
6.png
经过Regularization退化成更少次方的网络结构，更低复杂度的网络结构从而降低Overfitting，是一种weight decay的方法

通过下面的例子可以清楚的看到Regularization降低网络的表达能力从而防止噪声造成的overfitting