一直都在看到过拟合这个说法,但是自己从没有在实验中遇到过.最近用TensorFlow在MNIST上面做了些实验,总算弄清了一些门道.
首先我用mini-batch SGD训练最简单的单隐层MLP(无dropout,BN,正则化等trick),不出意料,即使迭代到几万步,模型在测试集上的accuracy也没有下降的意思.然后自然就是增加层数(到5层),即使如此,我发现如果仍然用全部训练数据上mini-batch SGD依然不会出现过拟合.然后想到过拟合的另一个可能原因是数据量不够,因此只用1000个训练样本,在上面用标准GD训练,果然就出现了过拟合,详见下图.
图中橘线对应的是baseline(全部训练样本,mini-batch SGD),蓝线模型是: 1000训练样本+GD.可见在蓝色模型在训练集上的loss(cross entropy)虽然一直在下降,在测试集上却在初期快速下降之后不降反升.另一个间接的性能指标分类正确率(accuracy)则是两个模型都是增长后进入平台,当然过拟合模型的平台要低于baseline,不过并没有发现先升后降的现象,
# -*- coding: utf-8 -*-
import time
start =time.clock()
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_DATA/", one_hot = True)
import tensorflow as tf
tf.reset_default_graph()
import random as rd
sess = tf.InteractiveSession()
log_dir = './logs/myMLP2'
in_units = 784
dataset_size = 1000
batch_size = 100
[h1_units, h2_units, h3_units, h4_units] = [300, 300, 300, 300]
W1 = tf.Variable(tf.truncated_normal(shape = [in_units, h1_units], mean = 0, stddev = 0.1))
#tf.add_to_collection('losses', tf.contrib.layers.l2_regularizer(0.001)(W1))
b1 = tf.Variable(tf.zeros(shape = [h1_units]))
W2 = tf.Variable(tf.truncated_normal(shape = [h1_units, h2_units], mean = 0, stddev = 0.1))
#tf.add_to_collection('losses', tf.contrib.layers.l2_regularizer(0.001)(W2))
b2 = tf.Variable(tf.zeros(shape = [h2_units]))
W3 = tf.Variable(tf.truncated_normal(shape = [h2_units, h3_units], mean = 0, stddev = 0.1))
#tf.add_to_collection('losses', tf.contrib.layers.l2_regularizer(0.001)(W2))
b3 = tf.Variable(tf.zeros(shape = [h3_units]))
W4 = tf.Variable(tf.truncated_normal(shape = [h3_units, h4_units], mean = 0, stddev = 0.1))
#tf.add_to_collection('losses', tf.contrib.layers.l2_regularizer(0.001)(W2))
b4 = tf.Variable(tf.zeros(shape = [h4_units]))
W5 = tf.Variable(tf.truncated_normal(shape = [h4_units, 10], mean = 0, stddev = 0.1))
#tf.add_to_collection('losses', tf.contrib.layers.l2_regularizer(0.001)(W2))
b5 = tf.Variable(tf.zeros(shape = [10]))
x = tf.placeholder(dtype = tf.float32, shape = [None, in_units])
keep_prob = tf.placeholder(dtype = tf.float32)
h1 = tf.nn.relu(tf.matmul(x,W1)+b1)
h2 = tf.nn.relu(tf.matmul(h1,W2)+b2)
h3 = tf.nn.relu(tf.matmul(h2,W3)+b3)
h4 = tf.nn.relu(tf.matmul(h3,W4)+b4)
#h1_drop = tf.nn.dropout(h1, keep_prob)
y = tf.nn.softmax(tf.matmul(h4, W5)+b5)
y_ =tf.placeholder(tf.float32, [None,10])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
tf.add_to_collection('losses', cross_entropy)
total_loss = tf.add_n(tf.get_collection('losses'))
cross_entropy_train = tf.summary.scalar('cross_entropy_train', cross_entropy)
cross_entropy_test = tf.summary.scalar('cross_entropy_test', cross_entropy)
train_step = tf.train.AdagradOptimizer(0.01).minimize(total_loss)
if_correct = tf.equal(tf.argmax(y,1), tf.argmax(y_, 1))
acc = tf.reduce_mean(tf.cast(if_correct, tf.float32))
acc_test = tf.summary.scalar('acc_test', acc)
train_writer = tf.summary.FileWriter(log_dir, sess.graph)
saver = tf.train.Saver()
xs, ys = mnist.train.next_batch(dataset_size)
tf.global_variables_initializer().run()
for i in range(30000):
index = rd.sample(range(0,dataset_size), batch_size)
batch_xs = xs[index]
batch_ys = ys[index]
_, sum_cross_entropy_train = sess.run([train_step, cross_entropy_train], feed_dict={x: batch_xs, y_: batch_ys, keep_prob: 1.})
train_writer.add_summary(sum_cross_entropy_train, i)
if i%100 == 1:
sum_cross_entropy_test, sum_acc_test = sess.run([acc_test, cross_entropy_test], feed_dict={x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0})
train_writer.add_summary(sum_acc_test, i)
train_writer.add_summary(sum_cross_entropy_test, i)
# saver.save(sess, log_dir+'/model.ckpt', i)
train_writer.close()
print(acc.eval({x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))
end = time.clock()
print('Running time: %s Seconds'%(end-start))
#
#$ tensorboard --logdir=C:/TensorFlow/MNIST/logs/myMLP2
#浏览器输入:localhost:6006