tensorflow神经网络

神经网络模型训练及优化

使用手写字体mnist数据

#载入库
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
# 设置超参数

INPUT_NODE = 784 #输入节点 
OUTPUT_NODE = 10 # 输出节点
LAYER1_NODE = 500 # 隐藏层节点

BATCH_SIZE = 100
LEARNING_RATE = 0.8 # 自动调节学习率
LEARNING_RATE_DECAY = 0.99 # 学习率衰减率
REGULARIZATION_RATE = 0.0001 #正则化参数
TRAINING_STEPS = 30000
MOVING_AVERAGE_DECAY = 0.99 #滑动平均衰减率 控制弄醒更新的速度。使得模型对测试数据更加健壮
# 前向网络搭建
def inference(input_tensor, avg_class, weights1, biases1, weights2, biases2):
    """
    input_tensor: 输入tensor
    avg_cclass: 是否使用滑动平均,输入的是滑动平均的类
    我们这里只搭建一层网络,主要是为了学习后面学习率的衰减等使用方法
    """
    if avg_class == None:
        # 没有使用滑动平均
        layer1 = tf.nn.relu(tf.matmul(input_tensor, weights1) + biases1)
        return tf.matmul(layer1, weights2)+biases2
    else:
        # 使用滑动平均,在参数上使用
        # 首先获取滑动平均的值 average()函数
        layer1 = tf.nn.relu(tf.matmul(input_tensor, avg_class.average(weights1)) + avg_class.average(biases1))
        return tf.matmul(layer1, avg_class.average(weights2))+avg_class.average(biases2)
# 搭建训练模型
def train(mnist):
    x = tf.placeholder(tf.float32, [None, INPUT_NODE], name='x-input')
    y_ = tf.placeholder(tf.float32, [None, OUTPUT_NODE], name='y_input')

    weights1 = tf.Variable(tf.truncated_normal([INPUT_NODE, LAYER1_NODE], stddev=0.1))
    biases1 = tf.Variable(tf.constant(0.1, shape=[LAYER1_NODE]))

    weights2 = tf.Variable(tf.truncated_normal([LAYER1_NODE, OUTPUT_NODE], stddev=0.1))
    biases2 = tf.Variable(tf.constant(0.1, shape=[OUTPUT_NODE]))

    # 前向传播计算 无avg_class
    y = inference(x, None, weights1, biases1, weights2, biases2)

    #定义存储训练轮数的变量
    global_step = tf.Variable(0, trainable=False)


    # 定义滑动平均类,给定step,每次更新后衰减率就会变化,加快训练早期的训练速度
    variables_average = tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DECAY, global_step)
    # 我们需要在所有变量上使用滑动平均
    variables_average_op = variables_average.apply(tf.trainable_variables()) # tf.trainable_variables() 可以获取所有需要训练的变量

    # 再次计算 使用avg_class
    average_y = inference(x, variables_average, weights1, biases1, weights2, biases2)

    #计算交叉熵
    cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=tf.argmax(y_, 1), logits=y)

    cross_entropy_mean = tf.reduce_mean(cross_entropy)

    # l2损失函数的计算
    regularizer = tf.contrib.layers.l2_regularizer(REGULARIZATION_RATE) # 定义类

    # 定义总loss
    loss = cross_entropy_mean + regularizer(weights1) + regularizer(weights2)

    #设置衰减学习率
    learning_rate = tf.train.exponential_decay(LEARNING_RATE, global_step, mnist.train.num_examples/BATCH_SIZE, LEARNING_RATE_DECAY)

    # 训练op
    train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)

    # 同时更新两个操作中的变量
    train_op = tf.group(train_step, variables_average_op)

    # 计算精确度
    corret_prediction = tf.equal(tf.argmax(average_y, 1), tf.argmax(y_, 1))
    accuracy = tf.reduce_mean(tf.cast(corret_prediction, tf.float32))

    # 创建会话
    with tf.Session() as sess:
        init = tf.global_variables_initializer()
        sess.run(init)

        validate_feed = {x: mnist.validation.images, y_:mnist.validation.labels}

        test_feed = {x:mnist.test.images, y_:mnist.test.labels}

        for i in range(TRAINING_STEPS):
            if i % 1000 == 0:
                validate_acc = sess.run(accuracy, feed_dict=validate_feed)
                print("After %d training steps, validation accuracy using average model is %g" % (i, validate_acc))
            xs, ys = mnist.train.next_batch(BATCH_SIZE)
            sess.run(train_op, feed_dict={x:xs, y_:ys})
        test_acc = sess.run(accuracy, feed_dict=test_feed)
        print("After %d training steps, test accuracy using average model is %g" % (TRAINING_STEPS, test_acc))
mnist = input_data.read_data_sets('../MNIST_data', one_hot=True)
train(mnist)
Extracting ../MNIST_data\train-images-idx3-ubyte.gz
Extracting ../MNIST_data\train-labels-idx1-ubyte.gz
Extracting ../MNIST_data\t10k-images-idx3-ubyte.gz
Extracting ../MNIST_data\t10k-labels-idx1-ubyte.gz
After 0 training steps, validation accuracy using average model is 0.1148
After 1000 training steps, validation accuracy using average model is 0.9768
After 2000 training steps, validation accuracy using average model is 0.98
After 3000 training steps, validation accuracy using average model is 0.9832
After 4000 training steps, validation accuracy using average model is 0.9826
After 5000 training steps, validation accuracy using average model is 0.9836
After 6000 training steps, validation accuracy using average model is 0.9836
After 7000 training steps, validation accuracy using average model is 0.9838
After 8000 training steps, validation accuracy using average model is 0.984
After 9000 training steps, validation accuracy using average model is 0.984
After 10000 training steps, validation accuracy using average model is 0.9848
After 11000 training steps, validation accuracy using average model is 0.9846
After 12000 training steps, validation accuracy using average model is 0.9842
After 13000 training steps, validation accuracy using average model is 0.985
After 14000 training steps, validation accuracy using average model is 0.9852
After 15000 training steps, validation accuracy using average model is 0.984
After 16000 training steps, validation accuracy using average model is 0.986
After 17000 training steps, validation accuracy using average model is 0.9858
After 18000 training steps, validation accuracy using average model is 0.9852
After 19000 training steps, validation accuracy using average model is 0.985
After 20000 training steps, validation accuracy using average model is 0.985
After 21000 training steps, validation accuracy using average model is 0.9858
After 22000 training steps, validation accuracy using average model is 0.986
After 23000 training steps, validation accuracy using average model is 0.9856
After 24000 training steps, validation accuracy using average model is 0.9862
After 25000 training steps, validation accuracy using average model is 0.9866
After 26000 training steps, validation accuracy using average model is 0.9862
After 27000 training steps, validation accuracy using average model is 0.9862
After 28000 training steps, validation accuracy using average model is 0.9864
After 29000 training steps, validation accuracy using average model is 0.9862
After 30000 training steps, test accuracy using average model is 0.9838

这里我们主要是为了学习使用:指数衰减、正则损失、滑动平均等神经网络优化的操作。神经网络模型不是重点所以很简单。
接下来介绍下这几个操作的理论。

1 学习率设置

在进行梯度下降时,学习率过大过小都不合适,认为设置很困难,可以通过指数衰减来让模型子训练前期快速接近较优解,又可以保证在训练后期不会有太大的波动,从而更加接近局部最优。

Tensorflow 提供了一种相应的API:tf.train.exponential_decay(), 先使用较大的学习率,随着迭代的进行逐步减小学习率。

decay_learning_rate = learning_rate * decay_rate^(global_step / decay_steps)

其中: decay_rate为衰减率, decay_steps: 衰减速度

调用方式:

learning_rate = tf.train.exponential_decay(LEARNING_RATE, global_step, decay_steps, decay_rate)

2 正则损失

可以解决过拟合问题

当网络复杂时,定义网络结构的部分和计算损失的部分可能不再同一个函数中,通过变量相加来计算总损失不仅使得loss很长,而且很不方便。因此,利用Tensorflow中的集合(collection)可以很方便的解决该问题。

import tensorflow as tf

# 获取一层神经网络的参数,并将该参数的l2正则化损失加入到名为‘losses’的集合中
def get_weight(shape, regularization_rate):
    var = tf.Variables(tf.random_normal(shape), dtype=tf.float32)
    # add_to_collection函数的使用
    # tf.contrib.layers.l2_regularizer 的使用
    tf.add_to_collection('losses', tf.contrib.layers.l2_regularizer(regularization_rate)(var))
    return var

当在前向计算完loss后(比如:cross_entropy),然后加上所有的l2正则化损失,作为总损失

tf.add_to_collection(‘losses’, cross_entropy)

total_loss = tf.add_n(tf.get_collection(‘losses’))

tf.get_collection可以获取一个集合中的所有元素。

3 滑动平均模型

可以使模型在测试数据上更健壮。

tensorflow通过tf.train.ExponentialMovingAverage()来实现。在初始化时需要提供一个衰减率(decay)。这个衰减率将用于控制模型更新的速度。

ExponentialMovingAverage会对每一个变量维护一个影子变量(shadow_variable),该影子变量初始值就是相应变量的初始值,每次变量更新时,影子变量的值就会变化:

shadow_variable = decay * shadow_variable +(1-decay)* vriable

decay决定了模型的更新速度,越大模型越趋于稳定。

另外,tf.train.ExponentialMovingAverage()还有一个参数 num_updates可以动态设置decay的大小,可以使在前期模型更新快。

decay = min(dacay, ((1+num_updates)/(10+num_updates)))

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值