1 dropout
It is used to prevent over fitting. This function randomly chooses some neurons to activate, so just part of neurons learn the “noise”.
Its use is as follows:
#at the beginning
keep_prob=tf.placeholder(tf.float32)
......
#feed "keep_prob" in sess
#keep_prob:0.8 means activating 80% neurons
with tf.Session() as sess:
cost, _ = sess.run([loss, optimizer],feed_dict={x: x_data,y: y_data, keep_prob: 0.8})
2 exponential decay
It used to set learning rate. We can set a high
decayed_learning_rate = learning_rate * decay_rate ^ (global_step / decay_steps)
learning_rate = tf.train.exponential_decay(0.07, global_steps, 1000, 0.96, staircase=False)
This funcition means: beginning learning rate is 0.07, every 1000 circle learning rate decay to 96%.
3 different activate functions
Activate functions could change a linear relationship to non-linear relationship.
In different projects, we may choose different functions. The most commonly used.
tf.nn.relu
tf.nn.sigmoid
tf.nn.tanh
4 optimizer
There are several optimizers in tensorflow.
tf.train.GradientDescentOptimizer
tf.train.AdadeltaOptimizer
tf.train.AdagradOptimizer
tf.train.AdagradDAOptimizer
tf.train.MomentumOptimizer
tf.train.AdamOptimizer
tf.train.FtrlOptimizer
tf.train.ProximalGradientDescentOptimizer
tf.train.ProximalAdagradOptimizer
tf.train.RMSPropOptimize
A video declared that AdamOptimizer is the most useful optimizer in machine learning. I have not prove this conclusion, but AdamOptimizer is better than GradientDescentOptimizer.
5 My code
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
#载入数据集
mnist = input_data.read_data_sets('mnist_data', one_hot=True)
#放三个空间
x = tf.placeholder(dtype=tf.float32, shape=[None, 784], name='x')
y = tf.placeholder(dtype=tf.float32, shape=[None, 10], name='y')
keep_prob=tf.placeholder(tf.float32)
#每个批次的大小
batch_size = 1000
#函数,定义w、b
def add_layer(input_data, input_num, output_num, activation_function=None):
w = tf.Variable(initial_value=tf.random_normal(shape=[input_num, output_num]))
b = tf.Variable(initial_value=tf.random_normal(shape=[1, output_num]))
output = tf.add(tf.matmul(input_data, w), b)
if activation_function:
output_2 = activation_function(output)
else:
output_2 = output
output_3 = tf.nn.dropout(output_2, keep_prob)
return output_3
#函数,构建隐藏层/输出层
def build_nn(data):
#一层784-50,一层50-10
layer_1 = add_layer(data, 784, 50, tf.nn.sigmoid)
output = add_layer(layer_1, 50, 10)
return output
#函数,训练
def train_nn(data):
output = build_nn(data)
global_steps = tf.Variable(0)
#指数衰减学习率
learning_rate = tf.train.exponential_decay(0.07, global_steps, mnist.train.num_examples/batch_size, 0.96,
staircase=False)
#交叉熵计算loss
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=output))
#Adam优化器
optimizer = tf.train.AdamOptimizer(learning_rate).minimize(loss, global_step=global_steps)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
#训练10次
for i in range(20):
epoch_const = 0
for _ in range(int(mnist.train.num_examples/batch_size)):
x_data, y_data = mnist.train.next_batch(batch_size)
#feed值进去,设置激活80%的神经元
cost, _ = sess.run([loss, optimizer], feed_dict={x: x_data,y: y_data, keep_prob: 0.8})
epoch_const += cost
#计算准确度
accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.arg_max(y, 1), tf.arg_max(output, 1)), tf.float32))
#训练样本的准确度
acc_1 = sess.run(accuracy, feed_dict={x: mnist.train.images, y: mnist.train.labels, keep_prob: 1})
#测试样本的准确度,以判断有无过拟合
acc = sess.run(accuracy, feed_dict={x: mnist.test.images, y: mnist.test.labels, keep_prob: 1})
print('Epoch', i+1, ':', epoch_const, "accuracy1:", acc_1, "accuracy:", acc)
train_nn(x)
The result is:
Epoch 1 : 88.3080345392 accuracy1: 0.881782 accuracy: 0.8842
…
…
Epoch 20 : 21.7765403092 accuracy1: 0.970291 accuracy: 0.9521