循环神经网络实例2:RNN改进

       对于相对较复杂的问题,这种RNN便会显出其缺陷,原因还是出在激活函数。通常来讲,激活函数在神经网络里最多只能6层左右,因为它的反向误差传递会随着层数的增加,传递的误差值越来越小,而在RNN中,误差传递不仅存在于层与层之间,也在存于每一层的样本序列间,所以RNN无法去学习太长的序列特征。

       于是,神经网络学科中又演化了许多RNN网络的变体版本,使得模型能够学习更长的序列特征。

       长短记忆的时间递归神经网络(Long Short Term Memory, LSTM

窥视孔连接(Peephole)的出现是为了弥补忘记门一个缺点:当前cell的状态不能影响到Input Gate, Forget Gate在下一时刻的输出,使整个cell对上个序列的处理丢失了部分信息。如下图虚线部分,计算的顺序为:

(1)上一时刻从cell输出的数据,随着本次时刻的数据一起输入Input Gate和Forget Gate。

(2)将输入门和忘记门的输出数据同时输入cell中。

(3)cell出来的数据输入到当前时刻的Output Gate,也输入到下一时刻的input gate,forget gate。

(4)Forget Gate输出的数据与cell激活后的数据一起作为整个Block的输出。

Bi-RNN采用了两个方向的RNN网络

基于神经网络的时序类分类CTC是语音辨识中的一个关键技术,通过增加一个额外的Symbol代表NULL来解决叠字问题。

该方法主要体现在处理loss值上,通过对序列对不上的label添加blank(空label)的方式,将预测的输出值与给定的label值在时间序列上对齐,通过交叉熵的算法求出具体损失值。

比如在语音识别的例子中,对于一句语音有它的序列值级对应的文本,可以使用CTC的损失函数求出模型输出与label之间的loss,再通过优化器的迭代训练让损失值变小的方式将模型训练出来。

TensorFlow中的RNN

定义好cell类之后,还需要将它们连接起来构成RNN网络。

1、静态RNN构建:static_rnn(cell, inputs, initial_state=None, dtype=None, sequence_length=None, scope=None)

  • cell:生成好的cell类对象
  • inputs:输入数据,一定是list或者二维张量,list的顺序就是时间序列。元素就是每一个序列的值。
  • initial_state:初始化cell状态
  • dtype:期望输出和初始化state的类型。
  • sequence_length:每一个输入的序列长度。
  • scope:命名空间
  • 返回值有两个,一个是结果,一个是cell状态,输入多少个时序,结果就会输出多少个元素

2、动态RNN构建:dynamic_rnn(cell, inputs, sequence_length=None, initial_state=None, dtype=None, parallel_iterations=None, sequence_length, time_major=False, scope=None)

  • cell:生成好的cell类对象
  • inputs:输入数据为张量,一般是三维,[batch_size, max_time, ...]
  • initial_state:初始化cell状态
  • dtype:期望输出和初始化state的类型
  • sequence_length:每一个输入的序列长度
  • time_major:默认False, input的shape为[batch_size, max_time, ...]。如果是True,shape为[max_time, batch_size, ...]
  • scope:命名空间
  • 返回值:一个是结果,[batch_size, max_time, ...],一个是cell状态

3、双向RNN构建:有4个函数可以使用

4、使用动态RNN处理变长序列

动态RNN还有个更高级的功能就是可以处理变长序列,方法就是:在准备样本的同时,将样本对应的长度也作为初始化参数,一起创建动态RNN

实例:使用RNN对MNIST分类

import tensorflow as tf
# 导入 MINST 数据集
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/data/", one_hot=True)

n_input = 28 # MNIST data 输入 (img shape: 28*28)
n_steps = 28 # timesteps
n_hidden = 128 # hidden layer num of features
n_classes = 10  # MNIST 列别 (0-9 ,一共10类)

tf.reset_default_graph()

# tf Graph input
x = tf.placeholder("float", [None, n_steps, n_input])
y = tf.placeholder("float", [None, n_classes])


x1 = tf.unstack(x, n_steps, 1)

#1 BasicLSTMCell
lstm_cell = tf.contrib.rnn.BasicLSTMCell(n_hidden, forget_bias=1.0)
outputs, states = tf.contrib.rnn.static_rnn(lstm_cell, x1, dtype=tf.float32)

#2 LSTMCell
#lstm_cell = tf.contrib.rnn.LSTMCell(n_hidden, forget_bias=1.0)
#outputs, states = tf.contrib.rnn.static_rnn(lstm_cell, x1, dtype=tf.float32)

#3 gru
#gru = tf.contrib.rnn.GRUCell(n_hidden)
#outputs = tf.contrib.rnn.static_rnn(gru, x1, dtype=tf.float32)

#4 创建动态RNN
#outputs,_  = tf.nn.dynamic_rnn(gru,x,dtype=tf.float32)
#outputs = tf.transpose(outputs, [1, 0, 2])

pred = tf.contrib.layers.fully_connected(outputs[-1],n_classes,activation_fn = None)

learning_rate = 0.001
training_iters = 100000
batch_size = 128
display_step = 10

# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

# Evaluate model
correct_pred = tf.equal(tf.argmax(pred,1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

# 启动session
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    step = 1
    # Keep training until reach max iterations
    while step * batch_size < training_iters:
        batch_x, batch_y = mnist.train.next_batch(batch_size)
        # Reshape data to get 28 seq of 28 elements
        batch_x = batch_x.reshape((batch_size, n_steps, n_input))
        # Run optimization op (backprop)
        sess.run(optimizer, feed_dict={x: batch_x, y: batch_y})
        if step % display_step == 0:
            # 计算批次数据的准确率
            acc = sess.run(accuracy, feed_dict={x: batch_x, y: batch_y})
            # Calculate batch loss
            loss = sess.run(cost, feed_dict={x: batch_x, y: batch_y})
            print ("Iter " + str(step*batch_size) + ", Minibatch Loss= " + \
                  "{:.6f}".format(loss) + ", Training Accuracy= " + \
                  "{:.5f}".format(acc))
        step += 1
    print (" Finished!")

    # 计算准确率 for 128 mnist test images
    test_len = 128
    test_data = mnist.test.images[:test_len].reshape((-1, n_steps, n_input))
    test_label = mnist.test.labels[:test_len]
    print ("Testing Accuracy:", \
        sess.run(accuracy, feed_dict={x: test_data, y: test_label}))

优化RNN

RNN的优化技巧有很多,这里介绍RNN特有的两个优化方法

1、dropout功能:RNN有自己的dropout,lstm_cell = tf.nn.rnn_cell.DropoutWrapper(lstm_cell, output_keep_prob)

从t-1时刻的状态传递到t时刻进行计算,这中间不进行memory的dropout,仅在同一个t时刻中,多层cell之间传递信息时进行dropout。所以RNN的dropout方法会有两个设置参数input_keep_prob(传入cell的保留率)和output_keep_prob(输出cell的保留率)

2、LN基于层的归一化:由于RNN的特殊结构,它的输入不同于前面所讲的全连接、卷积网络。

在BN中,每一层的输入只考虑当前批次样本(或批次样本的转化值)即可。

但是在RNN中,每一层的输入除了当前批次样本的转化值,还得考虑样本中上一个序列样本的输出值,所以对于RNN的归一化,BN算法不再使用,最小批次覆盖不了全部的输入数据,而是需要对于输入BN的某一层来做归一化,即layer-Normalization。

import numpy as np
import tensorflow as tf
# 导入 MINST 数据集
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/data/", one_hot=True)

from tensorflow.python.ops.rnn_cell_impl import _RNNCell as RNNCell
from tensorflow.python.ops.math_ops import sigmoid
from tensorflow.python.ops.math_ops import tanh
from tensorflow.python.ops import variable_scope as vs
from tensorflow.python.ops import array_ops
from tensorflow.contrib.rnn.python.ops.core_rnn_cell_impl import _linear 

print(tf.__version__)

tf.reset_default_graph()

def ln(tensor, scope = None, epsilon = 1e-5):
    """ Layer normalizes a 2D tensor along its second axis """
    assert(len(tensor.get_shape()) == 2)
    m, v = tf.nn.moments(tensor, [1], keep_dims=True)
    if not isinstance(scope, str):
        scope = ''
    with tf.variable_scope(scope + 'layer_norm'):
        scale = tf.get_variable('scale',
                                shape=[tensor.get_shape()[1]],
                                initializer=tf.constant_initializer(1))
        shift = tf.get_variable('shift',
                                shape=[tensor.get_shape()[1]],
                                initializer=tf.constant_initializer(0))
    LN_initial = (tensor - m) / tf.sqrt(v + epsilon)

    return LN_initial * scale + shift

class LNGRUCell(RNNCell):
    """Gated Recurrent Unit cell (cf. http://arxiv.org/abs/1406.1078)."""

    def __init__(self, num_units, input_size=None, activation=tanh):
        if input_size is not None:
            print("%s: The input_size parameter is deprecated." % self)
        self._num_units = num_units
        self._activation = activation

    @property
    def state_size(self):
        return self._num_units

    @property
    def output_size(self):
        return self._num_units

    def __call__(self, inputs, state):
        """Gated recurrent unit (GRU) with nunits cells."""
        with vs.variable_scope("Gates"):  # Reset gate and update gate.,reuse=True
            # We start with bias of 1.0 to not reset and not update.
            value =_linear([inputs, state], 2 * self._num_units, True, 1.0)
            r, u = array_ops.split(value=value, num_or_size_splits=2, axis=1)
            r = ln(r, scope = 'r/')
            u = ln(u, scope = 'u/')
            r, u = sigmoid(r), sigmoid(u)
        with vs.variable_scope("Candidate"):
#            with vs.variable_scope("Layer_Parameters"):
            Cand = _linear([inputs,  r *state], self._num_units, True)
            c_pre = ln(Cand,  scope = 'new_h/')
            c = self._activation(c_pre)
        new_h = u * state + (1 - u) * c
        return new_h, new_h

n_input = 28 # MNIST data 输入 (img shape: 28*28)
n_steps = 28 # timesteps
n_hidden = 128 # hidden layer num of features
n_classes = 10  # MNIST 列别 (0-9 ,一共10类)

tf.reset_default_graph()

# tf Graph input
x = tf.placeholder("float", [None, n_steps, n_input])
y = tf.placeholder("float", [None, n_classes])

x1 = tf.unstack(x, n_steps, 1)

#1 BasicLSTMCell
#lstm_cell = tf.contrib.rnn.BasicLSTMCell(n_hidden, forget_bias=1.0)
#outputs, states = tf.contrib.rnn.static_rnn(lstm_cell, x1, dtype=tf.float32)

#2 LSTMCell
#lstm_cell = tf.contrib.rnn.LSTMCell(n_hidden, forget_bias=1.0)
#outputs, states = tf.contrib.rnn.static_rnn(lstm_cell, x1, dtype=tf.float32)

#3 gru
#gru = tf.contrib.rnn.GRUCell(n_hidden)
gru = LNGRUCell(n_hidden)
#outputs = tf.contrib.rnn.static_rnn(gru, x1, dtype=tf.float32)

#4 创建动态RNN
outputs,_  = tf.nn.dynamic_rnn(gru,x,dtype=tf.float32)
outputs = tf.transpose(outputs, [1, 0, 2])

pred = tf.contrib.layers.fully_connected(outputs[-1],n_classes,activation_fn = None)

learning_rate = 0.001
training_iters = 100000
batch_size = 128
display_step = 10

# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

# Evaluate model
correct_pred = tf.equal(tf.argmax(pred,1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

# 启动session
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    step = 1
    # Keep training until reach max iterations
    while step * batch_size < training_iters:
        batch_x, batch_y = mnist.train.next_batch(batch_size)
        # Reshape data to get 28 seq of 28 elements
        batch_x = batch_x.reshape((batch_size, n_steps, n_input))
        # Run optimization op (backprop)
        sess.run(optimizer, feed_dict={x: batch_x, y: batch_y})
        if step % display_step == 0:
            # 计算批次数据的准确率
            acc = sess.run(accuracy, feed_dict={x: batch_x, y: batch_y})
            # Calculate batch loss
            loss = sess.run(cost, feed_dict={x: batch_x, y: batch_y})
            print ("Iter " + str(step*batch_size) + ", Minibatch Loss= " + \
                  "{:.6f}".format(loss) + ", Training Accuracy= " + \
                  "{:.5f}".format(acc))
        step += 1
    print (" Finished!")

    # 计算准确率 for 128 mnist test images
    test_len = 128
    test_data = mnist.test.images[:test_len].reshape((-1, n_steps, n_input))
    test_label = mnist.test.labels[:test_len]
    print ("Testing Accuracy:", \
        sess.run(accuracy, feed_dict={x: test_data, y: test_label}))

 

  • 1
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值