在MNIST上观测过拟合

最新推荐文章于 2023-04-21 12:02:02 发布

qq_32464407

最新推荐文章于 2023-04-21 12:02:02 发布

阅读量907

点赞数

本文链接：https://blog.csdn.net/qq_32464407/article/details/79083865

版权

一直都在看到过拟合这个说法,但是自己从没有在实验中遇到过.最近用TensorFlow在MNIST上面做了些实验,总算弄清了一些门道.

首先我用mini-batch SGD训练最简单的单隐层MLP(无dropout,BN,正则化等trick),不出意料,即使迭代到几万步,模型在测试集上的accuracy也没有下降的意思.然后自然就是增加层数(到5层),即使如此,我发现如果仍然用全部训练数据上mini-batch SGD依然不会出现过拟合.然后想到过拟合的另一个可能原因是数据量不够,因此只用1000个训练样本,在上面用标准GD训练,果然就出现了过拟合,详见下图.

图中橘线对应的是baseline(全部训练样本,mini-batch SGD),蓝线模型是: 1000训练样本+GD.可见在蓝色模型在训练集上的loss(cross entropy)虽然一直在下降,在测试集上却在初期快速下降之后不降反升.另一个间接的性能指标分类正确率(accuracy)则是两个模型都是增长后进入平台,当然过拟合模型的平台要低于baseline,不过并没有发现先升后降的现象,

# -*- coding: utf-8 -*-

import time  
start =time.clock()  
  
from tensorflow.examples.tutorials.mnist import input_data  
mnist = input_data.read_data_sets("MNIST_DATA/", one_hot = True)  
import tensorflow as tf    
tf.reset_default_graph()   
import random as rd
  
sess = tf.InteractiveSession()  
  
log_dir = './logs/myMLP2'  
in_units = 784  
dataset_size = 1000
batch_size = 100
[h1_units, h2_units, h3_units, h4_units] = [300, 300, 300, 300]  
W1 = tf.Variable(tf.truncated_normal(shape = [in_units, h1_units], mean = 0, stddev = 0.1))  
#tf.add_to_collection('losses', tf.contrib.layers.l2_regularizer(0.001)(W1))  
b1 = tf.Variable(tf.zeros(shape = [h1_units]))  
W2 = tf.Variable(tf.truncated_normal(shape = [h1_units, h2_units], mean = 0, stddev = 0.1))  
#tf.add_to_collection('losses', tf.contrib.layers.l2_regularizer(0.001)(W2))  
b2 = tf.Variable(tf.zeros(shape = [h2_units]))  


W3 = tf.Variable(tf.truncated_normal(shape = [h2_units, h3_units], mean = 0, stddev = 0.1))  
#tf.add_to_collection('losses', tf.contrib.layers.l2_regularizer(0.001)(W2))  
b3 = tf.Variable(tf.zeros(shape = [h3_units]))  

W4 = tf.Variable(tf.truncated_normal(shape = [h3_units, h4_units], mean = 0, stddev = 0.1))  
#tf.add_to_collection('losses', tf.contrib.layers.l2_regularizer(0.001)(W2))  
b4 = tf.Variable(tf.zeros(shape = [h4_units]))  

W5 = tf.Variable(tf.truncated_normal(shape = [h4_units, 10], mean = 0, stddev = 0.1))  
#tf.add_to_collection('losses', tf.contrib.layers.l2_regularizer(0.001)(W2))  
b5 = tf.Variable(tf.zeros(shape = [10]))  

x = tf.placeholder(dtype = tf.float32, shape = [None, in_units])  
keep_prob = tf.placeholder(dtype = tf.float32)  
  
h1 = tf.nn.relu(tf.matmul(x,W1)+b1)  
h2 = tf.nn.relu(tf.matmul(h1,W2)+b2)  
h3 = tf.nn.relu(tf.matmul(h2,W3)+b3)  
h4 = tf.nn.relu(tf.matmul(h3,W4)+b4)  

#h1_drop = tf.nn.dropout(h1, keep_prob)  
y = tf.nn.softmax(tf.matmul(h4, W5)+b5)  
y_ =tf.placeholder(tf.float32, [None,10])  
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))  
tf.add_to_collection('losses', cross_entropy)  
  
total_loss = tf.add_n(tf.get_collection('losses'))  
cross_entropy_train = tf.summary.scalar('cross_entropy_train', cross_entropy)  
cross_entropy_test = tf.summary.scalar('cross_entropy_test', cross_entropy)  


  
train_step = tf.train.AdagradOptimizer(0.01).minimize(total_loss)  
  
  
if_correct = tf.equal(tf.argmax(y,1), tf.argmax(y_, 1))  
  
acc = tf.reduce_mean(tf.cast(if_correct, tf.float32))  
acc_test = tf.summary.scalar('acc_test', acc)  
  
  
train_writer = tf.summary.FileWriter(log_dir, sess.graph)  
saver = tf.train.Saver()  
  
xs, ys = mnist.train.next_batch(dataset_size)  
tf.global_variables_initializer().run()  
for i in range(30000):  
    index = rd.sample(range(0,dataset_size), batch_size)
    batch_xs = xs[index]
    batch_ys = ys[index]
    _, sum_cross_entropy_train = sess.run([train_step, cross_entropy_train], feed_dict={x: batch_xs, y_: batch_ys, keep_prob: 1.})  
    train_writer.add_summary(sum_cross_entropy_train, i)  
  
      
    if i%100 == 1:  
           sum_cross_entropy_test, sum_acc_test = sess.run([acc_test, cross_entropy_test], feed_dict={x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0})  
           train_writer.add_summary(sum_acc_test, i)  
           train_writer.add_summary(sum_cross_entropy_test, i)
#           saver.save(sess, log_dir+'/model.ckpt', i)  
             
train_writer.close()  
  
  
print(acc.eval({x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))  
  
end = time.clock()  
print('Running time: %s Seconds'%(end-start))  
#  
#$ tensorboard --logdir=C:/TensorFlow/MNIST/logs/myMLP2
#浏览器输入:localhost:6006