tensorflow实战-BiRNN_tensorflow birnn-CSDN博客

本文链接：https://blog.csdn.net/Alvysinger2018/article/details/81563592

参考：

Tensorflow实战

ValueError: Variable lstm_cell/rnn/multi_rnn_cell/cell_0/basic_lstm_cell/kernel already exists

内容：Tensorflow 实现Bidirectional LSTM Classifier

Bi-RNN网络结构的核心是把一个普通的单向RNN拆成两个方向，一个是随时序正向的，一个是逆着时序的反向的。这样当前时间节点的输出就可以同时利用正、反向两个方向的信息，而不是普通RNN需要等到后面的时间节点才可以获得未来信息。

在下图中，每一个时间节点的输入会分别传到正向和反向的RNN中，它们根据各自的状态产生输出，这两个输出会一起连接到Bi-RNN的输出节点，共同合成最终输出。

Bi-RNN的训练和普通单向RNN非常类似，因为两个不同方向的RNN之间几乎没有交集。因此可以分别展开为普通的前馈网络。不过在使用BPTT（back-propagation through time）算法训练时，我们无法同时更新状态和输出。同时，正向state在t=1时未知，且反向state在t=T时未知，即state在各自方向的开始处未知，这里需要人工设置。此外，正向状态的导数在t=T时未知，且反向state的导数在t=1时未知，即state的导数在结尾处未知，这里一般需要设置为0,代表此时对参数更新不重要。然后开始训练步骤：第一步，我们对输入数据做forword pass 操作，即inference的操作，我们先沿着1->T方向计算正向RNN的state，再沿着T->1方向计算反向RNN的state，然后获得输出的output；第二步，我们进行backward pass操作，即对目标函数求导的操作，我们先对输出output求导，然后沿着T->1方向计算正向RNN的state导数，再沿着1->T方向计算反向RNN的state的导数；第三步根据求得的梯度值更新模型参数，完成一次训练。

实现代码：

tensorflow 1.4

python 3.5

按照原来代码，在不添加with tf.variable_scope('birnn'): 的时候会报错。

Variable birnn/bidirectional_rnn/fw/basic_lstm_cell/kernel already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope? Originally defined at:

# -*- coding: utf-8 -*-
"""
用Bi-LSTM 进行分类
"""

import tensorflow as tf
import numpy as np


#加载测试数据的读写工具包，加载测试手写数据，目录MNIST_data是用来存放下载网络上的训练和测试数据的。
#可以修改 “/tmp/data”，把数据放到自己要存储的地方
import tensorflow.examples.tutorials.mnist.input_data as input_data
mnist = input_data.read_data_sets("/tmp/data", one_hot=True)


#设置了训练参数
learning_rate = 0.01
max_samples = 400000
batch_size =128 
display_step = 10

#MNIST图像尺寸为28*28，输入n_input 为28，
#同时n_steps 即LSTM的展开步数，也为28.
#n_classes为分类数目
n_input = 28
n_steps = 28
n_hidden =256 #LSTM的hidden是什么结构
n_classes =10

x=tf.placeholder("float",[None,n_steps,n_input])
y=tf.placeholder("float",[None,n_classes])

#softmax层的weights和biases 
#双向LSTM有forward 和backwrad两个LSTM的cell，所以wights的参数数量为2*n_hidden
weights = {
    'out': tf.Variable(tf.random_normal([2*n_hidden, n_classes]))
}
biases = {
    'out': tf.Variable(tf.random_normal([n_classes]))
}
#定义了Bidirectional LSTM网络的生成
#

def BiRNN(x,weights,biases):
    
    x = tf.transpose(x,[1,0,2]) 
    x = tf.reshape(x,[-1,n_input])
    x = tf.split(x,n_steps)
    
    #修改添加了作用域
    with tf.variable_scope('forward'):
        lstm_fw_cell = tf.contrib.rnn.BasicLSTMCell(n_hidden ,forget_bias=1.0)
    with tf.variable_scope('backward'):
        lstm_bw_cell = tf.contrib.rnn.BasicLSTMCell(n_hidden ,forget_bias=1.0)
    with tf.variable_scope('birnn'):
        outputs,_,_ =tf.contrib.rnn.static_bidirectional_rnn(lstm_fw_cell,lstm_bw_cell,x,dtype=tf.float32)
    #outputs = tf.contrib.rnn.static_bidirectional_rnn(lstm_fw_cell, lstm_bw_cell, x, dtype=tf.float32)
    return tf.matmul(outputs[-1],weights['out'])+biases['out']

pred =BiRNN(x,weights,biases)
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits =pred,labels = y))
optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(cost)
correct_pred = tf.equal(tf.argmax(pred,1),tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_pred,tf.float32))

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    step =1 
    while step* batch_size <max_samples :
        batch_x ,batch_y =mnist.train.next_batch(batch_size)
        batch_x =batch_x.reshape((batch_size,n_steps,n_input))
        
        sess.run(optimizer,feed_dict={x:batch_x,y:batch_y})
        if step % display_step ==0:
            acc = sess.run(accuracy , feed_dict={x:batch_x,y:batch_y})
            loss = sess.run(cost,feed_dict={x:batch_x,y:batch_y})
            print("Iter" + str(step*batch_size)+",Minibatch Loss = "+\
                  "{:.6f}".format(loss)+", Training Accuracy = "+ \
                  "{:.5f}".format(acc))
        step+=1
        
    print("Optimization Finished!")
    
    test_len = 10000
    test_data = mnist.test.images[:test_len].reshape((-1,n_steps,n_input))
    test_label = mnist.test.labels[:test_len]
    print("Testing Accuracy:",
          sess.run(accuracy,feed_dict={x:test_data,y:test_label}))

Iter396800,Minibatch Loss = 0.057251, Training Accuracy = 0.98438
Iter398080,Minibatch Loss = 0.072591, Training Accuracy = 0.98438
Iter399360,Minibatch Loss = 0.007500, Training Accuracy = 1.00000
Optimization Finished!
Testing Accuracy: 0.9849