TensorFlow - tf.dynamic_rnn & tf.stastic_rnn解析

1. basic

tensorflow 的dynamic_rnn方法,我们用一个小例子来说明其用法,假设你的RNN的输入input是[2,20,128],其中2是batch_size,20是文本最大长度,128是embedding_size,可以看出,有两个example,我们假设第二个文本长度只有13,剩下的7个是使用0-padding方法填充的。dynamic返回的是两个参数:outputs,last_states,其中outputs是[2,20,128],也就是每一个迭代隐状态的输出,last_states是由(c,h)组成的tuple,均为[batch,128]。

 

dynamic有个参数:sequence_length,这个参数用来指定每个example的长度,比如上面的例子中,我们令 sequence_length为[20,13],表示第一个example有效长度为20,第二个example有效长度为13,当我们传入这个参数的时候,对于第二个example,TensorFlow对于13以后的padding就不计算了,其last_states将重复第13步的last_states直至第20步,而outputs中超过13步的结果将会被置零。

 

测试实验结果可见:

https://blog.csdn.net/u010223750/article/details/71079036

 

对于自己通过static_rnn实现dynamic_rnn,就不得不考虑每个sequence的长度了。

1.5 关于padding

对于mnist这样的等长序列,不需要在意padding。但是实际任务中,更多的是变长序列。还是以文本分类为例,假设第一个句子有效长度为20,第二个句子有效长度为13,设置的max length是20;那么需要对第二个句子进行padding。一般的做法是,在list的左端填上0,维持序列长度是max length。

更具体来说,假如所有出现的word数目是200,那么通常会在词表中增加一个'PAD'。

vocabulary list:['PAD', 'I', 'love', 'coding', ... ]

 

vocabulary list, word2index,以及embedding 层的关系如下:

PAD  --->  0 ---> embedding vector 1;

I  --->  1 ---> embedding vector 2;

love  --->  2 ---> embedding vector 3;

coding  --->  3 ---> embedding vector 4;

 

句子'I love coding' 会用index来表示,变成[1,2,3];

经过padding之后,会填充0,使得所有句子成为等长序列,如[0,0,0,0,0,0,0,1,2,3] (这里假定max length统一为10);

然后,这个index组成的list会输入到embedding层,每一个index会从embedding层取到对应的embedding vector;每一个句子就相当于用二维矩阵表示;每个batch有多个句子,就相当于三维张量了。

这个过程可以看另一篇blog,https://blog.csdn.net/Zhou_Dao/article/details/103751162

 

2. static_rnn 等长序列分类

e.g.1  LSTM做mnist分类--> stastic_rnn 用法示例

import tensorflow as tf
from tensorflow.contrib import rnn

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("D:/vsCode_tensorflow/PKU_TF/PKU_TF_shuzi/data2", one_hot=True)


# Training Parameters
learning_rate = 0.001
training_steps = 10000
batch_size = 128
display_step = 200
# Network Parameters
num_input = 28 # MNIST data input (img shape: 28*28)
timesteps = 28 # timesteps
num_hidden = 128 # hidden layer num of features
num_classes = 10 # MNIST total classes (0-9 digits)

# tf Graph input
X = tf.placeholder("float", [None, timesteps, num_input])
Y = tf.placeholder("float", [None, num_classes])

# Define weights
weights = {
    'out': tf.Variable(tf.random_normal([num_hidden, num_classes]))
}
biases = {
    'out': tf.Variable(tf.random_normal([num_classes]))
}


def RNN(x, weights, biases):
    # Current data input shape: (batch_size, timesteps, n_input)
    # Required shape: 'timesteps' tensors list of shape (batch_size, n_input)
    # Unstack to get a list of 'timesteps' tensors of shape (batch_size, n_input)
    x = tf.unstack(x, timesteps, 1)
    # static_rnn的输入是二维shape=[batch,input]

    # Define a lstm cell with tensorflow
    lstm_cell = rnn.BasicLSTMCell(num_hidden, forget_bias=1.0)

    # Get lstm cell output
    outputs, states = rnn.static_rnn(lstm_cell, x, dtype=tf.float32)

    # Linear activation, using rnn inner loop last output
    return tf.matmul(outputs[-1], weights['out']) + biases['out']
    # outputs[-1]表示最后一个timestep的(batch_size,output)


logits = RNN(X, weights, biases)
prediction = tf.nn.softmax(logits)

# Define loss and optimizer
loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=Y))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
train_op = optimizer.minimize(loss_op)

# Evaluate model (with test logits, for dropout to be disabled)
correct_pred = tf.equal(tf.argmax(prediction, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

# Initialize the variables (i.e. assign their default value)
init = tf.global_variables_initializer()

# Start training
with tf.Session() as sess:
    # Run the initializer
    sess.run(init)
    for step in range(1, training_steps+1):
        batch_x, batch_y = mnist.train.next_batch(batch_size)
        # print('batch_x:',batch_x)
        # Reshape data to get 28 seq of 28 elements
        batch_x = batch_x.reshape([batch_size, timesteps, num_input])
        # print('reshaped_batch_x:', batch_x)
        # Run optimization op (backprop)
        sess.run(train_op, feed_dict={X: batch_x, Y: batch_y})
        if step % display_step == 0 or step == 1:
            # Calculate batch loss and accuracy
            loss, acc, prediction2 = sess.run([loss_op, accuracy, prediction], feed_dict={X: batch_x, Y: batch_y})

            a = tf.cast(tf.argmax(prediction2, 1), tf.float32)
            b = tf.cast(tf.argmax(batch_y, 1), tf.float32)
            a2, b2 = sess.run([a, b], feed_dict={X: batch_x,Y: batch_y})
            print('pred , label', a2, b2)

            print("Step " + str(step) + ", Minibatch Loss= " +
                  "{:.4f}".format(loss) + ", Training Accuracy= " +
                  "{:.3f}".format(acc))

    print("Optimization Finished!")

    # Calculate accuracy for 128 mnist test images
    test_len = 128
    test_data = mnist.test.images[:test_len].reshape((-1, timesteps, num_input))
    test_label = mnist.test.labels[:test_len]
    print("Testing Accuracy:",sess.run(accuracy, feed_dict={X: test_data, Y: test_label}))

3. static_rnn 变长序列分类

e.g. 2  使用static_rnn实现LSTM,判断序列是否是线性的。

import tensorflow as tf
import random
# 训练一个分类器,把有规律的线性序列和没有规律的随机序列分开
# ====================
#  TOY DATA GENERATOR
# ====================

class ToySequenceData(object):
    """ Generate sequence of data with dynamic length.
    This class generate samples for training:
    - Class 0: linear sequences (i.e. [0, 1, 2, 3,...])
    - Class 1: random sequences (i.e. [1, 3, 10, 7,...])

    NOTICE:
    We have to pad each sequence to reach 'max_seq_len' for TensorFlow
    consistency (we cannot feed a numpy array with inconsistent
    dimensions). The dynamic calculation will then be perform thanks to
    'seqlen' attribute that records every actual sequence length.
    """
    def __init__(self, n_samples=1000, max_seq_len=20, min_seq_len=3,
                 max_value=1000):
        self.data = []
        self.labels = []
        self.seqlen = []
        for i in range(n_samples):
            # Random sequence length
            len = random.randint(min_seq_len, max_seq_len)
            # Monitor sequence length for TensorFlow dynamic calculation
            self.seqlen.append(len)
            # Add a random or linear int sequence (50% prob)
            if random.random() < .5:
                # Generate a linear sequence
                rand_start = random.randint(0, max_value - len) # 产生3-20之间的随机数
                s = [[float(i)/max_value] for i in
                     range(rand_start, rand_start + len)]
                # Pad sequence for dimension consistency
                s += [[0.] for i in range(max_seq_len - len)]
                self.data.append(s)
                self.labels.append([1., 0.])
            else:
                # Generate a random sequence
                s = [[float(random.randint(0, max_value))/max_value]
                     for i in range(len)]
                # Pad sequence for dimension consistency
                s += [[0.] for i in range(max_seq_len - len)]
                self.data.append(s)
                self.labels.append([0., 1.])
        self.batch_id = 0  # batch_id 是全局变量,因此记录了累加值

    def next(self, batch_size):
        """ Return a batch of data. When dataset end is reached, start over.
        """
        if self.batch_id == len(self.data):
            self.batch_id = 0
        batch_data = (self.data[self.batch_id:min(self.batch_id +
                                                  batch_size, len(self.data))])
        batch_labels = (self.labels[self.batch_id:min(self.batch_id +
                                                  batch_size, len(self.data))])
        batch_seqlen = (self.seqlen[self.batch_id:min(self.batch_id +
                                                  batch_size, len(self.data))])
        self.batch_id = min(self.batch_id + batch_size, len(self.data))
        return batch_data, batch_labels, batch_seqlen

# ==========
#   MODEL
# ==========

# Parameters
learning_rate = 0.01
training_steps = 10000
batch_size = 128
display_step = 200

# Network Parameters
seq_max_len = 20 # Sequence max length
n_hidden = 64 # hidden layer num of features
n_classes = 2 # linear sequence or not

trainset = ToySequenceData(n_samples=1000, max_seq_len=seq_max_len)
testset = ToySequenceData(n_samples=500, max_seq_len=seq_max_len)

# tf Graph input
x = tf.placeholder("float", [None, seq_max_len, 1])  # 注意这里的input 1
y = tf.placeholder("float", [None, n_classes])
# A placeholder for indicating each sequence length
seqlen = tf.placeholder(tf.int32, [None])

# Define weights
weights = {
    'out': tf.Variable(tf.random_normal([n_hidden, n_classes]))
}
biases = {
    'out': tf.Variable(tf.random_normal([n_classes]))
}


def dynamicRNN(x, seqlen, weights, biases):
    # Prepare data shape to match `rnn` function requirements
    # Current data input shape: (batch_size, n_steps, n_input)
    # Required shape: 'n_steps' tensors list of shape (batch_size, n_input)

    # Unstack to get a list of 'n_steps' tensors of shape (batch_size, n_input)
    x = tf.unstack(x, seq_max_len, 1)

    # Define a lstm cell with tensorflow
    lstm_cell = tf.contrib.rnn.BasicLSTMCell(n_hidden)

    # Get lstm cell output, providing 'sequence_length' will perform dynamic
    # calculation.
    outputs, states = tf.contrib.rnn.static_rnn(lstm_cell, x, dtype=tf.float32, sequence_length=seqlen)

    # When performing dynamic calculation, we must retrieve the last
    # dynamically computed output, i.e., if a sequence length is 10, we need
    # to retrieve the 10th output.
    # However TensorFlow doesn't support advanced indexing yet, so we build
    # a custom op that for each sample in batch size, get its length and
    # get the corresponding relevant output.

    # 'outputs' is a list of output at every timestep, we pack them in a Tensor
    # and change back dimension to [batch_size, n_step, n_input]
    outputs = tf.stack(outputs)
    outputs = tf.transpose(outputs, [1, 0, 2])

    # Hack to build the indexing and retrieve the right output.
    batch_size = tf.shape(outputs)[0]
    # Start indices for each sample
    index = tf.range(0, batch_size) * seq_max_len + (seqlen - 1)
    # Indexing
    outputs = tf.gather(tf.reshape(outputs, [-1, n_hidden]), index)

    # Linear activation, using outputs computed above
    return tf.matmul(outputs, weights['out']) + biases['out']


pred = dynamicRNN(x, seqlen, weights, biases)

# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost)

# Evaluate model
correct_pred = tf.equal(tf.argmax(pred,1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

# Initialize the variables (i.e. assign their default value)
init = tf.global_variables_initializer()


# Start training
with tf.Session() as sess:

    # Run the initializer
    sess.run(init)

    for step in range(1, training_steps+1):
        batch_x, batch_y, batch_seqlen = trainset.next(batch_size)
        # Run optimization op (backprop)
        sess.run(optimizer, feed_dict={x: batch_x, y: batch_y,
                                       seqlen: batch_seqlen})
        if step % display_step == 0 or step == 1:
            # Calculate batch accuracy & loss
            acc, loss = sess.run([accuracy, cost], feed_dict={x: batch_x, y: batch_y,
                                                seqlen: batch_seqlen})
            print("Step " + str(step) + ", Minibatch Loss= " +
                  "{:.6f}".format(loss) + ", Training Accuracy= " +
                  "{:.5f}".format(acc))

    print("Optimization Finished!")

    # Calculate accuracy
    test_data = testset.data
    test_label = testset.labels
    test_seqlen = testset.seqlen
    print("Testing Accuracy:",sess.run(accuracy, feed_dict={x: test_data, y: test_label, seqlen: test_seqlen}))

4. dynamic_rnn 变长序列分类

e.g. 3  直接用dynamic_rnn,LSTM判断序列是否是线性的。

# coding: utf-8
from __future__ import print_function

import tensorflow as tf
import random
import numpy as np
#  来自tensorflow21项目 第13章

class ToySequenceData(object):
    """ 生成序列数据。每个数量可能具有不同的长度。
    一共生成下面两类数据
    - 类别 0: 线性序列 (如 [0, 1, 2, 3,...])
    - 类别 1: 完全随机的序列 (i.e. [1, 3, 10, 7,...])
    注意:
    max_seq_len是最大的序列长度。对于长度小于这个数值的序列,我们将会补0。
    在送入RNN计算时,会借助sequence_length这个属性来进行相应长度的计算。
    """
    def __init__(self, n_samples=1000, max_seq_len=20, min_seq_len=3,
                 max_value=1000):
        self.data = []
        self.labels = []
        self.seqlen = []
        for i in range(n_samples):
            # 序列的长度是随机的,在min_seq_len和max_seq_len之间。
            len = random.randint(min_seq_len, max_seq_len)
            # self.seqlen用于存储所有的序列。    实际的序列长度,不算0
            self.seqlen.append(len)
            # 以50%的概率,随机添加一个线性或随机的训练
            if random.random() < .5:
                # 生成一个线性序列
                rand_start = random.randint(0, max_value - len)
                s = [[float(i)/max_value] for i in range(rand_start, rand_start + len)]
                # 长度不足max_seq_len的需要补0
                s += [[0.] for i in range(max_seq_len - len)]
                self.data.append(s)
                # 线性序列的label是[1, 0](因为我们一共只有两类)
                self.labels.append([1., 0.])
            else:
                # 生成一个随机序列
                s = [[float(random.randint(0, max_value))/max_value] for i in range(len)]
                # 长度不足max_seq_len的需要补0
                s += [[0.] for i in range(max_seq_len - len)]
                self.data.append(s)
                self.labels.append([0., 1.])
        self.batch_id = 0  # batch_id 是全局变量,因此记录了累加值

    def next(self, batch_size):
        """
        生成batch_size的样本。
        如果使用完了所有样本,会重新从头开始。
        """
        if self.batch_id == len(self.data):
            self.batch_id = 0
        batch_data = (self.data[self.batch_id:min(self.batch_id + batch_size, len(self.data))])
        batch_labels = (self.labels[self.batch_id:min(self.batch_id + batch_size, len(self.data))])
        batch_seqlen = (self.seqlen[self.batch_id:min(self.batch_id + batch_size, len(self.data))])
        self.batch_id = min(self.batch_id + batch_size, len(self.data))
        return batch_data, batch_labels, batch_seqlen


# 这一部分只是测试一下如何使用上面定义的ToySequenceData
tmp = ToySequenceData()

# 生成样本
batch_data, batch_labels, batch_seqlen = tmp.next(32)

# batch_data是序列数据,它是一个嵌套的list,形状为(batch_size, max_seq_len, 1)
print(np.array(batch_data).shape)  # (32, 20, 1)

# 我们之前调用tmp.next(32),因此一共有32个序列
# 我们可以打出第一个序列
print(batch_data[0])  # 形如 [[0.084], [0.085].....[0.086], [0.087], [0.088]

# batch_labels是label,它也是一个嵌套的list,形状为(batch_size, 2)
# (batch_size, 2)中的“2”表示为两类分类
print(np.array(batch_labels).shape)  # (32, 2)

# 我们可以打出第一个序列的label
print(batch_labels[0])  # [1.0, 0.0]

# batch_seqlen一个长度为batch_size的list,表示每个序列的实际长度
print(np.array(batch_seqlen).shape)  # (32,)

# 我们可以打出第一个序列的长度
print(batch_seqlen[0])


batch_data2, batch_labels2, batch_seqlen2 = tmp.next(32)
print(batch_data2[0])


# 运行的参数
learning_rate = 0.01
training_iters = 1000000
batch_size = 128
display_step = 10

# 网络定义时的参数
seq_max_len = 20 # 最大的序列长度
n_hidden = 64 # 隐层的size
n_classes = 2 # 类别数

trainset = ToySequenceData(n_samples=1000, max_seq_len=seq_max_len)
testset = ToySequenceData(n_samples=500, max_seq_len=seq_max_len)

# x为输入,y为输出
# None的位置实际为batch_size
x = tf.placeholder("float", [None, seq_max_len, 1])
y = tf.placeholder("float", [None, n_classes])
# 这个placeholder存储了输入的x中,每个序列的实际长度
seqlen = tf.placeholder(tf.int32, [None])

# weights和bias在输出时会用到
weights = {
    'out': tf.Variable(tf.random_normal([n_hidden, n_classes]))
}
biases = {
    'out': tf.Variable(tf.random_normal([n_classes]))
}

def dynamicRNN(x, seqlen, weights, biases):

    # 输入x的形状: (batch_size, max_seq_len, n_input)
    # 输入seqlen的形状:(batch_size, )
    

    # 定义一个lstm_cell,隐层的大小为n_hidden(之前的参数)
    lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(n_hidden)

    # 使用tf.nn.dynamic_rnn展开时间维度
    # 此外sequence_length=seqlen也很重要,它告诉TensorFlow每一个序列应该运行多少步
    outputs, states = tf.nn.dynamic_rnn(lstm_cell, x, dtype=tf.float32, sequence_length=seqlen)
    
    # outputs的形状为(batch_size, max_seq_len, n_hidden)
    # 如果有疑问可以参考上一章内容

    # 我们希望的是取出与序列长度相对应的输出。如一个序列长度为10,我们就应该取出第10个输出
    # 但是TensorFlow不支持直接对outputs进行索引,因此我们用下面的方法来做:

    batch_size = tf.shape(outputs)[0]
    # 得到每一个序列真正的index
    index = tf.range(0, batch_size) * seq_max_len + (seqlen - 1)
    outputs = tf.gather(tf.reshape(outputs, [-1, n_hidden]), index)

    # 给最后的输出
    return tf.matmul(outputs, weights['out']) + biases['out']

# 这里的pred是logits而不是概率
pred = dynamicRNN(x, seqlen, weights, biases)

# 因为pred是logits,因此用tf.nn.softmax_cross_entropy_with_logits来定义损失
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost)

# 分类准确率
correct_pred = tf.equal(tf.argmax(pred,1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

# 初始化
init = tf.global_variables_initializer()

# 训练
with tf.Session() as sess:
    sess.run(init)
    step = 1
    while step * batch_size < training_iters:
        batch_x, batch_y, batch_seqlen = trainset.next(batch_size)
        # 每run一次就会更新一次参数
        sess.run(optimizer, feed_dict={x: batch_x, y: batch_y, seqlen: batch_seqlen})
        if step % display_step == 0:
            # 在这个batch内计算准确度
            acc = sess.run(accuracy, feed_dict={x: batch_x, y: batch_y, seqlen: batch_seqlen})
            # 在这个batch内计算损失
            loss = sess.run(cost, feed_dict={x: batch_x, y: batch_y,
                                             seqlen: batch_seqlen})
            print("Iter " + str(step*batch_size) + ", Minibatch Loss= " +
                  "{:.6f}".format(loss) + ", Training Accuracy= " +
                  "{:.5f}".format(acc))
        step += 1
    print("Optimization Finished!")

    # 最终,我们在测试集上计算一次准确度
    test_data = testset.data
    test_label = testset.labels
    test_seqlen = testset.seqlen
    print("Testing Accuracy:", sess.run(accuracy, feed_dict={x: test_data, y: test_label, seqlen: test_seqlen}))

 

  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值