Text Classification using Attention Mechanism in Keras

http://androidkt.com/text-classification-using-attention-mechanism-in-keras/

In this tutorial, We build text classification models in Keras that use attention mechanism to provide insight into how classification decisions are being made.

1.Prepare Dataset

We’ll use the IMDB dataset that contains the text of 50,000 movie reviews from the Internet Movie Database. The IMDB dataset comes packaged with Keras. It has already been preprocessed such that the sequences of words have been converted to sequences of integers, where each integer represents a specific word in a dictionary.

 

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

import tensorflow as tf

from keras_preprocessing import sequence

from tensorflow import keras

from tensorflow.python.keras import Input

from tensorflow.python.keras.layers import Concatenate

 

vocab_size = 10000

 

pad_id = 0

start_id = 1

oov_id = 2

index_offset = 2

 

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.imdb.load_data(num_words=vocab_size, start_char=start_id,

                                                                        oov_char=oov_id, index_from=index_offset)

 

word2idx = tf.keras.datasets.imdb.get_word_index()

 

idx2word = {v + index_offset: k for k, v in word2idx.items()}

 

idx2word[pad_id] = '<PAD>'

idx2word[start_id] = '<START>'

idx2word[oov_id] = '<OOV>'

 

max_len = 200

rnn_cell_size = 128

 

x_train = sequence.pad_sequences(x_train,

                                 maxlen=max_len,

                                 truncating='post',

                                 padding='post',

                                 value=pad_id)

x_test = sequence.pad_sequences(x_test, maxlen=max_len,

                                truncating='post',

                                padding='post',

                                value=pad_id)

Keras provide function pad_sequences takes care padding sequences. We only have to give it the max_len argument which will determine the length of the output arrays. If sentences are shorter than this length, they will be padded and if they are longer, they will be trimmed.

2.Create Attention Layer

You can use the final encoded state of a recurrent neural network for prediction. This could lose some useful information encoded in the previous steps of the sequence. In order to keep that information, you can use an average of the encoded states outputted by the RNN. But all of the encoded states of the RNN are equally valuable. Thus, we are using a weighted sum of these encoded states to make our prediction.

 

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

class Attention(tf.keras.Model):

    def __init__(self, units):

        super(Attention, self).__init__()

        self.W1 = tf.keras.layers.Dense(units)

        self.W2 = tf.keras.layers.Dense(units)

        self.V = tf.keras.layers.Dense(1)

 

    def call(self, features, hidden):

        hidden_with_time_axis = tf.expand_dims(hidden, 1)

        score = tf.nn.tanh(self.W1(features) + self.W2(hidden_with_time_axis))

        attention_weights = tf.nn.softmax(self.V(score), axis=1)

        context_vector = attention_weights * features

        context_vector = tf.reduce_sum(context_vector, axis=1)

 

        return context_vector, attention_weights

We compute these attention weights simply by building a small fully connected neural network on top of each encoded state. This network will have a single unit final layer which will correspond to the attention weight we will assign.

Keras Text Classification using Attention Mechanism

Attention function is very simple, it’s just dense layers back to back softmax. so basically a three-layer neural network density.

3.Embed Layer

Neural networks are the composition of operators from linear algebra and non-linear activation functions. In order to perform these computations on our input sentences, we must first embed them as a vector of numbers. There are two main approaches to perform this embedding pre-trained embeddings like Word2Vec or GloVe or randomly initializing. 

In this tutorial, we will be using a random initialization. To perform this embedding we use the embedding function from the layers package.The parameters of this matrix will then be trained with the rest of the graph.

 

 

1

2

3

sequence_input = Input(shape=(max_len,), dtype='int32')

 

embedded_sequences = keras.layers.Embedding(vocab_size, 128, input_length=max_len)(sequence_input)

4.Bi-directional RNN

We will use a bi-directional RNN. This is simply the concatentation of two RNNs, one which processes the sequence from left to right (the “forward” RNN) and one which process from right to left (the “backward” RNN). By using both directions, we get a stronger encoding as each word can be encoded using the context of its neighbors on boths sides rather than just a single side. 

 

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

import os

lstm = tf.keras.layers.Bidirectional(tf.keras.layers.LSTM

                                     (rnn_cell_size,

                                      dropout=0.3,

                                      return_sequences=True,

                                      return_state=True,

                                      recurrent_activation='relu',

                                      recurrent_initializer='glorot_uniform'), name="bi_lstm_0")(embedded_sequences)

 

lstm, forward_h, forward_c, backward_h, backward_c = tf.keras.layers.Bidirectional \

    (tf.keras.layers.LSTM

     (rnn_cell_size,

      dropout=0.2,

      return_sequences=True,

      return_state=True,

      recurrent_activation='relu',

      recurrent_initializer='glorot_uniform'))(lstm)

Our model uses a bi-directional RNN, we first concatenate the hidden states from each RNN before computing the attention weights and applying the weighted sum.

 

1

2

3

4

5

6

7

8

9

10

11

state_h = Concatenate()([forward_h, backward_h])

state_c = Concatenate()([forward_c, backward_c])

 

context_vector, attention_weights = attention(lstm, state_h)

 

output = keras.layers.Dense(1, activation='sigmoid')(context_vector)

 

model = keras.Model(inputs=sequence_input, outputs=output)

 

# summarize layers

print(model.summary())

The last layer is densely connected with a single output node. Using the sigmoid activation function, this value is a float between 0 and 1, representing a probability, or confidence level.

5.Compile Model

A model needs a loss function and an optimizer for training. Our model is a binary classification problem and the model outputs a probability. We’ll use the binary_crossentropy loss function.

 

1

2

3

4

5

6

7

8

model.compile(optimizer=tf.train.AdamOptimizer(),

              loss='binary_crossentropy',

              metrics=['accuracy'])

 

early_stopping_callback = keras.callbacks.EarlyStopping(monitor='val_loss',

                                                        min_delta=0,

                                                        patience=1,

                                                        verbose=0, mode='auto')

6.Train Model

Train the model for 10 epochs in mini-batches of 200 samples. This is 10 iterations over all samples in the x_train and y_train tensors. While training, monitor the model’s loss and accuracy on the 20% samples from the validation set.

 

1

2

3

4

5

history = model.fit(x_train,

                    y_train,

                    epochs=10,

                    batch_size=200,

                    validation_split=.3, verbose=1, callbacks=[early_stopping_callback])

7.Evaluate Model

Let’s see how the model performs. Two values will be returned. Loss and accuracy.

 

1

2

result = model.evaluate(x_test, y_test)

print(result)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值