深度学习实战(一) 在MNIST上训练DNN，比较学习曲线并保存最优模型

最新推荐文章于 2023-12-31 16:38:36 发布

小T是我

最新推荐文章于 2023-12-31 16:38:36 发布

阅读量2.2k

点赞数

分类专栏：深度学习之TensoFlow实战文章标签： dropout tenserflow sklearn 神经网络 DNN

本文链接：https://blog.csdn.net/junjun150013652/article/details/81256186

版权

本文介绍了在MNIST数据集上使用DNN进行深度学习的实践。首先构建了一个包含5个隐藏层、100个神经元的网络，并应用了He初始化和ELU激活函数。通过Adam优化器和早停策略，仅针对数字0到4进行训练，并使用softmax输出层。通过交叉验证调整超参数以提高精度。接着引入Batch Normalization，观察学习曲线，发现它加快了收敛速度并可能改善模型性能。当发现模型过拟合时，我们在每一层添加Dropout，结果显示这有助于缓解过拟合问题。

摘要由CSDN通过智能技术生成

Build a DNN with five hidden layers of 100 neurons each, He initialization, and the ELU activation function.
Using Adam optimization and early stopping, try training it on MNIST but only on digits 0 to 4, as we will use transfer learning for digits 5 to 9 in the next exercise. You will need a softmax output layer with five neurons, and as always make sure to save checkpoints at regular intervals and save the final model so you can reuse it later.
Tune the hyperparameters using cross-validation and see what precision you can achieve.

import tensorflow as tf
import numpy as np
from datetime import datetime
import os

(X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data()
X_train = X_train.astype(np.float32).reshape(-1, 28*28) / 255.0
X_test = X_test.astype(np.float32).reshape(-1, 28*28) / 255.0
y_train = y_train.astype(np.int32)
y_test = y_test.astype(np.int32)
X_valid, X_train = X_train[:5000], X_train[5000:]
y_valid, y_train = y_train[:5000], y_train[5000:]

X_train = X_train[y_train < 5]
y_train = y_train[y_train < 5]
X_valid = X_valid[y_valid < 5]
y_valid = y_valid[y_valid < 5]
X_test = X_test[y_test < 5]
y_test = y_test[y_test < 5]

he_init = tf.variance_scaling_initializer()

def log_dir(prefix=""):
    now = datetime.utcnow().strftime("%Y%m%d%H%M%S")
    root_logdir = "tf_logs"
    if prefix:
        prefix += "-"
    name = prefix + "run-" + now
    return "{}/{}/".format(root_logdir, name)

def shuffle_batch(X, y, batch_size):
    rnd_idx = np.random.permutation(len(X))
    n_batches = len(X) // batch_size
    for batch_idx in np.array_split(rnd_idx, n_batches):
        X_batch, y_batch = X[batch_idx], y[batch_idx]
        yield X_batch, y_batch

def dnn(inputs,layers=5,n_neural=100,init_type=he_init,activation=tf.nn.elu,name=""):
    with tf.variable_scope("dnn"):
        for layrnum in range(layers):
            head = "hidden"
            tname = "{}{}".format(head, layrnum+1)
            inputs = tf.layers.dense(inputs, n_neural,kernel_initializer=init_type, activation=activation, name=tname ) 
        return inputs

logdir = log_dir("mnist_dnn")
n_inputs = 28*28
n_outputs = 5

learning_rate = 0.01

X = tf.placeholder(tf.float32, shape=(None, n_inputs), name="X")
y = tf.placeholder(tf.int32, shape=(None), name="y")

with tf.name_scope("dnn"):
    outputs = dnn(X)
    logits = tf.layers.dense(outputs, n_outputs, name="logits")

with tf.name_scope("loss"):
    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)
    loss = tf.reduce_mean(xentropy, name="loss")
    loss_summary = tf.summary.scalar('loss', loss)

with tf.name_scope("train"):
    optimizer = tf.train.AdamOptimizer(learning_rate)
    training_op = optimizer.minimize(loss)  

with tf.name_scope("eval"):
    correct = tf.nn.in_top_k(logits, y, 1)
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))
    accuracy_summary = tf.summary.scalar('accuracy', accuracy)

n_epochs = 10000
batch_size = 50
m = X_train.shape[0]
n_batches = int(np.ceil(m / batch_size))

best_loss_val = np.infty
checks_since_last_progress = 0
max_checks_without_progress = 20

checkpoint_path = "/tmp/my_logreg_model.ckpt"
checkpoint_epoch_path = checkpoint_path + ".epoch"
final_model_path = "./my_logreg_model"

init = tf.global_variables_initializer()
saver = tf.train.Saver()
file_writer = tf.summary.FileWriter(logdir, tf.get_default_graph())

with tf.Session() as sess:
    if os.path.isfile(checkpoint_epoch_path):
        # if the checkpoint file exists, restore the model and load the epoch number
        with open(checkpoint_epoch_path, "rb") as f:
            start_epoch = int(f.read())
        print("Training was interrupted. Continuing at epoch", start_epoch)
        saver.restore(sess, checkpoint_path)
    else:
        start_epoch = 0
        sess.run(init)

    for epoch in range(start_epoch, n_epochs):
        batch_index = 0    
        for X_batch, y_batch in shuffle_batch(X_train, y_train, batch_size):
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})          
            if batch_index % 10 == 0:
                summary_str = loss_summary.eval(feed_dict={X: X_batch, y: y_batch})
                step = epoch * n_batches + batch