4-2利用二次代价函数，交叉熵函数提高mnist数据集准确率

最新推荐文章于 2024-03-17 16:47:14 发布

ZQSZXY

最新推荐文章于 2024-03-17 16:47:14 发布

阅读量467

点赞数 2

分类专栏： tensorflow 文章标签：人工智能 TensorFlow python

本文链接：https://blog.csdn.net/ZHUQIUSHI123/article/details/83655915

版权

tensorflow 专栏收录该内容

15 篇文章 0 订阅

订阅专栏

理论知识部分在最下面的编程实践中要用到，可以直接翻到下面看编程实践。

二次代价函数(quadratic cost)

表达式：

其中，C表示代价函数，x表示样本，y表示实际值，a表示输出值，n表示样本的总数。

为简单起见，以一个样本为例进行说明，此时二次代价函数为：
假如我们使用梯度下降法(Gradient descent)来调整权值参数的大小，权值w和偏置b的梯度推导如下：

其中，z表示神经元的输入，σ表示激活函数。w和b的梯度跟激活函数的梯度成正比，激活函数的梯度越大，w和b的大小调整得越快，训练收敛得就越快。
假设我们的激活函数是sigmoid函数：

图中A,B两点相比，A点的梯度大小大于B点的梯度大小，所以A点的权值调节速度比B快。

交叉熵代价函数（cross-entropy）

换一个思路，我们不改变激活函数，而是改变代价函数，改用交叉熵代价函数：

其中，C表示代价函数，x表示样本，y表示实际值，a表示输出值，n表示样本的总数。

代价函数对w和b的偏导数如下：

合并重写为以下公式：

结论：
1.权值和偏置值的调整与激活函数的导数无关，另外，梯度公式中的
表示输出值与实际值的误差。所以当误差越大时，梯度就越大，参数w和b的调整就越快，训练的速度也就越快。
2.如果输出神经元是线性的，那么二次代价函数就是一种合适的选择。如果输出神经元是S型函数，那么比较适合用交叉熵代价函数。

对数释然代价函数(log-likelihood cost)

对数释然函数常用来作为softmax回归的代价函数，如果输出层神经元是sigmoid函数，可以采用交叉熵代价函数。而深度学习中更普遍的做法是将softmax作为最后一层，此时常用的代价函数是对数释然代价函数。
对数似然代价函数与softmax的组合和交叉熵与sigmoid函数的组合非常相似。对数释然代价函数在二分类时可以化简为交叉熵代价函数的形式。
在Tensorflow中用：
tf.nn.sigmoid_cross_entropy_with_logits()来表示跟sigmoid搭配使用的交叉熵。 tf.nn.softmax_cross_entropy_with_logits()来表示跟softmax搭配使用的交叉熵。

拟合

回归问题中的拟合分以下三种：
分类问题中的拟合：
防止过拟合的方法：
1. 增加数据集
  增加数据集能够更好的训练出符合问题的模型。
2. 正则化方法
  
  正则化方法，是在原代价函数后加入一项正则项，w为权值。
3. (dropout)使中间层神经元交替工作
  中间层神经元交替工作，每次迭代只用到部分神经元，这样也能很好的防止过拟合。
  在TensorFlow中可以用 tf.nn.dropout(）来控制。

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

# 载入数据集
mnist = input_data.read_data_sets("MNIST_data", one_hot=True)

# 每个批次的大小
batch_size = 100
# 计算一共有多少个批次
n_batch = mnist.train.num_examples // batch_size

# 定义两个placeholder
x = tf.placeholder(tf.float32, [None, 784])
y = tf.placeholder(tf.float32, [None, 10])
keep_prob = tf.placeholder(tf.float32)

# 创建一个简单的神经网络
W1 = tf.Variable(tf.truncated_normal([784, 2000], stddev=0.1))
b1 = tf.Variable(tf.zeros([2000]) + 0.1)
L1 = tf.nn.tanh(tf.matmul(x, W1) + b1)
L1_drop = tf.nn.dropout(L1, keep_prob)

W2 = tf.Variable(tf.truncated_normal([2000, 2000], stddev=0.1))
b2 = tf.Variable(tf.zeros([2000]) + 0.1)
L2 = tf.nn.tanh(tf.matmul(L1_drop, W2) + b2)
L2_drop = tf.nn.dropout(L2, keep_prob)

W3 = tf.Variable(tf.truncated_normal([2000, 1000], stddev=0.1))
b3 = tf.Variable(tf.zeros([1000]) + 0.1)
L3 = tf.nn.tanh(tf.matmul(L2_drop, W3) + b3)
L3_drop = tf.nn.dropout(L3, keep_prob)

W4 = tf.Variable(tf.truncated_normal([1000, 10], stddev=0.1))
b4 = tf.Variable(tf.zeros([10]) + 0.1)
prediction = tf.nn.softmax(tf.matmul(L3_drop, W4) + b4)

# 二次代价函数
# loss = tf.reduce_mean(tf.square(y-prediction))
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=prediction))
# 使用梯度下降法
train_step = tf.train.GradientDescentOptimizer(0.2).minimize(loss)

# 初始化变量
init = tf.global_variables_initializer()

# 结果存放在一个布尔型列表中
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(prediction, 1))  # argmax返回一维张量中最大的值所在的位置
# 求准确率
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

with tf.Session() as sess:
    sess.run(init)
    for epoch in range(31):
        for batch in range(n_batch):
            batch_xs, batch_ys = mnist.train.next_batch(batch_size)
            sess.run(train_step, feed_dict={x: batch_xs, y: batch_ys, keep_prob: 0.7})

        test_acc = sess.run(accuracy, feed_dict={x: mnist.test.images, y: mnist.test.labels, keep_prob: 1.0})
        train_acc = sess.run(accuracy, feed_dict={x: mnist.train.images, y: mnist.train.labels, keep_prob: 1.0})
        print("Iter " + str(epoch) + ",Testing Accuracy " + str(test_acc) + ",Training Accuracy " + str(train_acc))

运行结果：

Extracting MNIST_data\train-images-idx3-ubyte.gz
Extracting MNIST_data\train-labels-idx1-ubyte.gz
Extracting MNIST_data\t10k-images-idx3-ubyte.gz
Extracting MNIST_data\t10k-labels-idx1-ubyte.gz
Iter 0,Testing Accuracy 0.9117,Training Accuracy 0.903655
Iter 1,Testing Accuracy 0.9243,Training Accuracy 0.923709
Iter 2,Testing Accuracy 0.936,Training Accuracy 0.933528
Iter 3,Testing Accuracy 0.9389,Training Accuracy 0.939546
Iter 4,Testing Accuracy 0.9419,Training Accuracy 0.942328
Iter 5,Testing Accuracy 0.9466,Training Accuracy 0.948309
Iter 6,Testing Accuracy 0.9487,Training Accuracy 0.951873
Iter 7,Testing Accuracy 0.9514,Training Accuracy 0.953437
Iter 8,Testing Accuracy 0.9525,Training Accuracy 0.956291
Iter 9,Testing Accuracy 0.9533,Training Accuracy 0.957327
Iter 10,Testing Accuracy 0.9558,Training Accuracy 0.958237
Iter 11,Testing Accuracy 0.9571,Training Accuracy 0.961128
Iter 12,Testing Accuracy 0.9576,Training Accuracy 0.963473
Iter 13,Testing Accuracy 0.9595,Training Accuracy 0.964037
Iter 14,Testing Accuracy 0.9604,Training Accuracy 0.965255
Iter 15,Testing Accuracy 0.9619,Training Accuracy 0.966
Iter 16,Testing Accuracy 0.9621,Training Accuracy 0.966928
Iter 17,Testing Accuracy 0.9634,Training Accuracy 0.968091
Iter 18,Testing Accuracy 0.9643,Training Accuracy 0.968691
Iter 19,Testing Accuracy 0.9659,Training Accuracy 0.970237
Iter 20,Testing Accuracy 0.9656,Training Accuracy 0.971128
Iter 21,Testing Accuracy 0.9653,Training Accuracy 0.971218
Iter 22,Testing Accuracy 0.9667,Training Accuracy 0.971873
Iter 23,Testing Accuracy 0.9676,Training Accuracy 0.973655
Iter 24,Testing Accuracy 0.9675,Training Accuracy 0.973546
Iter 25,Testing Accuracy 0.9695,Training Accuracy 0.974637
Iter 26,Testing Accuracy 0.97,Training Accuracy 0.975618
Iter 27,Testing Accuracy 0.9688,Training Accuracy 0.9752
Iter 28,Testing Accuracy 0.969,Training Accuracy 0.976928
Iter 29,Testing Accuracy 0.9695,Training Accuracy 0.976491
Iter 30,Testing Accuracy 0.971,Training Accuracy 0.977219