使用tensorflow建模LSTM的详细步骤通俗易懂解读

使用tensorflow建模LSTM的详细步骤人性化解读

一步步条理清晰的写tensorflow代码

 

Understanding LSTM in Tensorflow(MNIST dataset)

Long Short Term Memory(LSTM) are the most common types of Recurrent Neural Networks used these days. They are mostly used with sequential data.An in depth look at LSTMs can be found in this incredible blog post.

 


Our Aim

As the title suggests,the main aim of this blogpost is to make the reader comfortable with the implementation details of basic LSTM network in tensorflow

 

For fulfilling this aim we will take MNIST as our dataset.

 


The MNIST dataset

The MNIST dataset consists of images of handwritten digits and their corresponding labels.We can download and read the data in tensorflow with the help of following in built functionality-

 

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)

 

The data is split into three parts-

 

  • Training data(mnist.train)-55000 images of training data
  • Test data(mnist.test)-10000 images of test data
  • Validation data(mnist.validation)-5000 images of validation data.

Shape of the data

Let us discuss the shape with respect to training data of MNIST dataset.Shapes of all three splits are identical.

 

The training set consists of 55000 images of 28 pixels X 28 pixels each.These 784(28X28) pixel values are flattened in form of a single vector of dimensionality 784.The collection of all such 55000 pixel vectors(one for each image) is stored in form of a numpy array of shape (55000,784) and is referred to as mnist.train.images

 

Each of these 55000 training images are associated with a label representing the class to which that image belongs.There are 10 such classes(0,1,2…9).Class labels are represented in one hot encoded form.Thus the labels are stored in form of numpy array of shape (55000,10) and is referred to as mnist.train.labels.

 


Why MNIST?

LSTMs are generally used for complex sequence related problems like language modelling which involves NLP concepts such as word embeddings, encoders etc.These topics themselves need a lot of understanding.It would be nice to eliminate these topics to concentrate on implementation details of LSTMs in tensorflow such as input formatting,LSTM cells and network designing.

 

MNIST gives us such an opportunity.The input data here is just a set of pixel values.We can easily format these values and concentrate on implementation details.

 


Implementation

Before getting our hands dirty with code,let us first draw an outline of our implementation.This will make the coding part more intuitive.


A vanilla RNN

A Recurrent Neural Network,when unrolled through time,can be visualised as-

 

Here,

 

  1. xt refers to the input at time step t.
  2. st refers to the hidden state at time step t.It can be visualised as “memory” of our network.
  3. ot refers to the output at time step t.
  4. U,V and W are parameters that are shared across all the time steps.The significance of this parameter sharing is that our model performs same task at each time step with different input.

 

What we have achieved by unrolling the RNN,is that at each time step,the network can be visualised as feed forward network taking into account the output of the previous time step(signified by the connections between the time steps).

 


Two caveats

Our implementation will hinge upon two main concepts which will make us comfortable with our implementation:

 

  1. Interpretation of LSTM cells in tensorflow.
  2. Formatting inputs before feeding them to tensorflow RNNs.

 


Interpretation of LSTM cells in tensorflow

A basic LSTM cell is declared in tensorflow as-

 

tf.contrib.rnn.BasicLSTMCell(num_units)

 

here num_units refers to the number of units in LSTM cell.

 

num_units can be interpreted as the analogy of hidden layer from the feed forward neural network.The number of nodes in hidden layer of a feed forward neural network is equivalent to num_units number of LSTM units in a LSTM cell at every time step of the network.Following picture should clear any confusion-

 

num_units.png?raw=Trueuploading.4e448015.gif转存失败重新上传取消

undefineduploading.4e448015.gif转存失败重新上传取消

Each of the num_units LSTM unit can be seen as a standard LSTM unit-

lstm_unit.png?raw=Trueuploading.4e448015.gif转存失败重新上传取消

The above diagram is taken from this incredible blogpost which describes the concept of LSTM effectively.

 


Formatting inputs before feeding them to tensorflow RNNs

The simplest form of RNN in tensorflow is static_rnn.It is defined in tensorflow as

 

tf.static_rnn(cell,inputs)

 

There are other arguments as well but we’ll limit ourselves to deal with only these two arguments.

 

The inputs argument accepts list of tensors of shape [batch_size,input_size].The length of this list is the number of time steps through which network is unrolled i.e. each element of this list corresponds to the input at respective time step of our unrolled network

 

For our case of MNIST images,we have images of size 28X28.They can be inferred as images having 28 rows of 28 pixels each.We will unroll our network through 28 time steps so that at every time step we can input one row of 28 pixels(input_size) and thus a full image through 28 time steps.If we supply batch_size number of images,every time step will be supplied with respective row of batch_size images.Following figure should clear any doubts-

inputs.png?raw=Trueuploading.4e448015.gif转存失败重新上传取消

 

The output generated by static_rnn is a list of tensors of shape [batch_size,num_units].The length of the list is number of time steps through which network is unrolled i.e. one output tensor for each time step.In this implementation we will only be concerned with output of the final time step as the prediction will be generated when all the rows of an image are supplied to RNN i.e. at the last time step.

 

Now that we have done all the heavy-lifting,we are ready to write the code.The coding part is very straight forward once above concepts are clear.

 


Code

To start with,lets import necessary dependencies,dataset and declare some constants.We will use batch_size=128 and num_units=128

import tensorflow as tf
from tensorflow.contrib import rnn

#import mnist dataset
from tensorflow.examples.tutorials.mnist import input_data
mnist=input_data.read_data_sets("/tmp/data/",one_hot=True)

#define constants
#unrolled through 28 time steps
time_steps=28
#hidden LSTM units
num_units=128
#rows of 28 pixels
n_input=28
#learning rate for adam
learning_rate=0.001
#mnist is meant to be classified in 10 classes(0-9).
n_classes=10
#size of batch
batch_size=128

Lets now declare placeholders and weights and bias variables which will be used to convert the output of shape [batch_size,num_units] to [batch_size,n_classes] so that correct class can be predicted.

#weights and biases of appropriate shape to accomplish above task
out_weights=tf.Variable(tf.random_normal([num_units,n_classes]))
out_bias=tf.Variable(tf.random_normal([n_classes]))

#defining placeholders
#input image placeholder
x=tf.placeholder("float",[None,time_steps,n_input])
#input label placeholder
y=tf.placeholder("float",[None,n_classes])

 

Now that we are receiving inputs of shape [batch_size,time_steps,n_input],we need to convert it into a list of tensors of shape [batch_size,n_inputs] of length time_steps so that it can be then fed to static_rnn

#processing the input tensor from [batch_size,n_steps,n_input] to "time_steps" number of [batch_size,n_input] tensors
input=tf.unstack(x ,time_steps,1)

 

Now we are ready to define our network.We will use one layer of BasicLSTMCell and make our static_rnn network out of it.

#defining the network
lstm_layer=rnn.BasicLSTMCell(num_units,forget_bias=1)
outputs,_=rnn.static_rnn(lstm_layer,input,dtype="float32")

As we are considered only with input of last time step,we will generate our prediction out of it.

#converting last output of dimension [batch_size,num_units] to [batch_size,n_classes] by out_weight multiplication
prediction=tf.matmul(outputs[-1],out_weights)+out_bias

 

Defining loss,optimizer and accuracy.

#loss_function
loss=tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=prediction,labels=y))
#optimization
opt=tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(loss)

#model evaluation
correct_prediction=tf.equal(tf.argmax(prediction,1),tf.argmax(y,1))
accuracy=tf.reduce_mean(tf.cast(correct_prediction,tf.float32))

Now that we have defined out graph,we can run it.

#initialize variables
init=tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)
    iter=1
    while iter<800:
        batch_x,batch_y=mnist.train.next_batch(batch_size=batch_size)

        batch_x=batch_x.reshape((batch_size,time_steps,n_input))

        sess.run(opt, feed_dict={x: batch_x, y: batch_y})

        if iter %10==0:
            acc=sess.run(accuracy,feed_dict={x:batch_x,y:batch_y})
            los=sess.run(loss,feed_dict={x:batch_x,y:batch_y})
            print("For iter ",iter)
            print("Accuracy ",acc)
            print("Loss ",los)
            print("__________________")

        iter=iter+1

One crucial thing to note here is that our images were essentially flattened into a single vector of dimensionality 784 to begin with.The function next_batch(batch_size) necessarily returns batch_size batches of these 784 dimensional vectors.They are thus reshaped to [batch_size,time_steps,n_input] so that it can be accepted by our placeholder.

 

We can also calculate test accuracy of our model-

#calculating test accuracy
test_data = mnist.test.images[:128].reshape((-1, time_steps, n_input))
test_label = mnist.test.labels[:128]
print("Testing Accuracy:", sess.run(accuracy, feed_dict={x: test_data, y: test_label}))

On running,the model runs with a test accuracy of 99.21%.

 

This blogpost was aimed at making the reader comfortable with the implementational details of RNNs in tensorflow.We’ll built some more complex models to use RNNs effectively in tensorflow.Stay tuned!

 

  • 0
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
LSTM模型是一种常用的循环神经网络模型,常用于处理序列数据。在TensorFlow中,可以使用tf.keras.layers.LSTM类来构建LSTM模型。本教程将介绍如何使用TensorFlow构建LSTM模型,包括数据预处理、模型构建、训练和预测。 1. 数据预处理 首先,我们需要准备数据。这里我们使用IMDB电影评论数据集作为例子。该数据集包含50,000个电影评论,其中25,000个评论用于训练,另外25,000个用于测试。每个评论由一个整数序列表示,表示单词在词汇表中的索引。我们需要将这些序列转换为固定长度的序列,以便输入到LSTM模型中。 代码如下: ```python import tensorflow as tf from tensorflow.keras.datasets import imdb from tensorflow.keras.preprocessing.sequence import pad_sequences # 加载数据集 (train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000) # 将序列填充到固定长度 maxlen = 500 train_data = pad_sequences(train_data, maxlen=maxlen) test_data = pad_sequences(test_data, maxlen=maxlen) ``` 在上面的代码中,我们使用了pad_sequences函数将序列填充到固定长度maxlen。这里我们将所有序列都填充到了长度为500。如果序列长度小于500,则在序列前面填充0,如果序列长度大于500,则截断序列,保留最后500个元素。 2. 模型构建 接下来,我们可以构建LSTM模型。在这里,我们使用了一层LSTM和一层全连接层。LSTM层的输出形状为(64, 32),表示有64个LSTM单元,每个单元输出32个特征。全连接层的输出形状为(64, 1),表示有64个神经元,输出一个标量值作为预测结果。 代码如下: ```python # 构建模型 model = tf.keras.Sequential([ tf.keras.layers.Embedding(10000, 32), tf.keras.layers.LSTM(32), tf.keras.layers.Dense(1, activation='sigmoid') ]) # 编译模型 model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc']) ``` 在上面的代码中,我们使用了tf.keras.Sequential类来构建模型。Sequential类是一个简单的模型容器,可以按照顺序添加各种层。在这里,我们添加了一层嵌入层(Embedding)、一层LSTM层(LSTM)和一层全连接层(Dense)。 嵌入层的作用是将输入序列中的每个整数索引转换为固定长度的向量。LSTM层的作用是处理序列数据,从而捕捉序列中的长期依赖关系。全连接层的作用是将LSTM层的输出转换为一个标量值作为预测结果。 在模型编译时,我们指定了优化器(rmsprop)、损失函数(binary_crossentropy)和评价指标(acc)。 3. 训练模型 接下来,我们可以使用训练数据对模型进行训练。在训练过程中,我们可以使用验证数据评估模型的性能。在本例中,我们将训练集的前2000个样本用作验证集。 代码如下: ```python # 训练模型 history = model.fit(train_data, train_labels, epochs=10, batch_size=64, validation_split=0.2) ``` 在上面的代码中,我们使用了fit方法训练模型。在训练过程中,我们指定了训练的轮数(epochs)、批次大小(batch_size)和验证集比例(validation_split)。 训练完成后,我们可以使用测试数据评估模型的性能。 代码如下: ```python # 评估模型 test_loss, test_acc = model.evaluate(test_data, test_labels) print('Test accuracy:', test_acc) ``` 4. 预测新数据 最后,我们可以使用训练好的模型对新的数据进行预测。在本例中,我们使用了测试集中的前10个样本进行预测。 代码如下: ```python # 预测新数据 predictions = model.predict(test_data[:10]) print(predictions) ``` 在上面的代码中,我们使用了predict方法对新的数据进行预测。预测结果为一个标量值,表示评论的情感倾向,越接近1表示正面情感,越接近0表示负面情感。 以上就是使用TensorFlow构建LSTM模型的详细教程。通过本教程,您可以学习到如何使用TensorFlow构建LSTM模型,包括数据预处理、模型构建、训练和预测。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值