《自然语言处理学习之路》15 Seq2Seq、Attention机制

最新推荐文章于 2024-09-30 19:35:25 发布

驭风少年君

最新推荐文章于 2024-09-30 19:35:25 发布

阅读量182

点赞数

分类专栏：自然语言处理文章标签：自然语言处理深度学习神经网络

本文链接：https://blog.csdn.net/qq_44951759/article/details/120668649

版权

自然语言处理专栏收录该内容

21 篇文章 3 订阅

订阅专栏

书山有路勤为径，学海无涯苦作舟

黑发不知勤学早，白首反悔读书迟。

1. Sequence-to-Sequence（N to M）

1.1 简介

在这里插入图片描述

先编码，再解码。STAR开始解码，END终止解码。

ENCODE的输入：输入数据是整个的一句话，编成一个中间向量

DECODE的分为两个部分，一个是训练要用的，一个是测试要用的

训练时候DECODE的输入：输入不止中间向量，还有真实的值yes，第一次的预测结果是yes。输入不止有前一步的值还有真实值label标签，预测出label，这样会使得结果更加准确。
在这里插入图片描述

测试的时候，只输入上一步的输出结果
在这里插入图片描述

1.2 应用

机器翻译（基于整体的翻译）
文本摘要

文本摘要的原始的文本数据的大小可能是不一样的，但是神经网络要求输入的大小是一样的，所以去文本中长度最大的为标准，小的就补全操作。

字符的简单转换为RNN的输入

将字符转为一个唯一的ID，在将ID的基础上在神经网络中做一个embeding，映射层N维度的向量。

或者也可以用word2vec

情感对话

2.seq2seq with TensorFlow

seq2seq网络架构:
在这里插入图片描述

Encoder获得输入[A，B，C].我们不关心encoder的输出是什么，只需要得到它最后的隐含状态就可以了，将它传递给decoder端,输入为(<EOS)，W，X Y，2]训练的目标为[W, x, Y,Z，].

2.1 辅助函数，构建Index

Vocabulary
如果是文本数据，则需要首先建立词库表，就是把词和对应的ID做好映射

x = [[5, 7, 8], [6, 3], [3], [1]]

import helpers
xt, xlen = helpers.batch(x)

print(xt)

array([[5, 6, 3, 1],
[7, 3, 0, 0],
[8, 0, 0, 0]])

xlen

[3, 2, 1, 1]

2.2 Building a model

import numpy as np
import tensorflow as tf
import helpers

tf.reset_default_graph()
sess = tf.InteractiveSession()

PAD = 0
EOS = 1

vocab_size = 10
input_embedding_size = 20

encoder_hidden_units = 20
decoder_hidden_units = encoder_hidden_units

encoder_inputs int32 tensor is shaped [encoder_max_time, batch_size]
decoder_targets int32 tensor is shaped [decoder_max_time, batch_size]

encoder_inputs = tf.placeholder(shape=(None, None), dtype=tf.int32, name='encoder_inputs')
decoder_targets = tf.placeholder(shape=(None, None), dtype=tf.int32, name='decoder_targets')

decoder_inputs int32 tensor is shaped [decoder_max_time, batch_size]

decoder_inputs = tf.placeholder(shape=(None, None), dtype=tf.int32, name='decoder_inputs')

Embeddings

embeddings = tf.Variable(tf.random_uniform([vocab_size, input_embedding_size], -1.0, 1.0), dtype=tf.float32)

encoder_inputs_embedded = tf.nn.embedding_lookup(embeddings, encoder_inputs)
decoder_inputs_embedded = tf.nn.embedding_lookup(embeddings, decoder_inputs)

Encoder¶

encoder_cell = tf.contrib.rnn.LSTMCell(encoder_hidden_units)

encoder_outputs, encoder_final_state = tf.nn.dynamic_rnn(
    encoder_cell, encoder_inputs_embedded,
    dtype=tf.float32, time_major=True,
)

del encoder_outputs

encoder_final_state

LSTMStateTuple(c=<tf.Tensor ‘rnn/while/Exit_2:0’ shape=(?, 20) dtype=float32>, h=<tf.Tensor ‘rnn/while/Exit_3:0’ shape=(?, 20) dtype=float32>)

Decoder¶

decoder_cell = tf.contrib.rnn.LSTMCell(decoder_hidden_units)

decoder_outputs, decoder_final_state = tf.nn.dynamic_rnn(
    decoder_cell, decoder_inputs_embedded,

    initial_state=encoder_final_state,

    dtype=tf.float32, time_major=True, scope="plain_decoder",
)

decoder_logits = tf.contrib.layers.linear(decoder_outputs, vocab_size)

decoder_prediction = tf.argmax(decoder_logits, 2)

Optimizer

decoder_logits

<tf.Tensor ‘fully_connected/BiasAdd:0’ shape=(?, ?, 10) dtype=float32>

stepwise_cross_entropy = tf.nn.softmax_cross_entropy_with_logits(
    labels=tf.one_hot(decoder_targets, depth=vocab_size, dtype=tf.float32),
    logits=decoder_logits,
)

loss = tf.reduce_mean(stepwise_cross_entropy)
train_op = tf.train.AdamOptimizer().minimize(loss)

sess.run(tf.global_variables_initializer())

Training

batch_size = 100

batches = helpers.random_sequences(length_from=3, length_to=8,
                                   vocab_lower=2, vocab_upper=10,
                                   batch_size=batch_size)

print('head of the batch:')
for seq in next(batches)[:10]:
    print(seq)

head of the batch:
[5, 5, 7, 8, 8]
[7, 3, 3]
[3, 3, 2, 3]
[3, 4, 7, 5, 9, 2]
[4, 9, 7, 8, 6, 5, 6]
[7, 6, 2]
[2, 9, 2, 7, 9, 5]
[5, 7, 2, 8, 6, 2, 9]
[9, 4, 9, 4, 4, 7]
[5, 2, 7]

def next_feed():
    batch = next(batches)
    encoder_inputs_, _ = helpers.batch(batch)
    decoder_targets_, _ = helpers.batch(
        [(sequence) + [EOS] for sequence in batch]
    )
    decoder_inputs_, _ = helpers.batch(
        [[EOS] + (sequence) for sequence in batch]
    )
    return {
        encoder_inputs: encoder_inputs_,
        decoder_inputs: decoder_inputs_,
        decoder_targets: decoder_targets_,
    }

输入 [5, 6, 7], decoder_targets [5, 6, 7, 1], 其中 1 代表 EOS, decoder_inputs [1, 5, 6, 7]

loss_track = []

max_batches = 3001
batches_in_epoch = 1000

try:
    for batch in range(max_batches):
        fd = next_feed()
        _, l = sess.run([train_op, loss], fd)
        loss_track.append(l)

        if batch == 0 or batch % batches_in_epoch == 0:
            print('batch {}'.format(batch))
            print('  minibatch loss: {}'.format(sess.run(loss, fd)))
            predict_ = sess.run(decoder_prediction, fd)
            for i, (inp, pred) in enumerate(zip(fd[encoder_inputs].T, predict_.T)):
                print('  sample {}:'.format(i + 1))
                print('    input     > {}'.format(inp))
                print('    predicted > {}'.format(pred))
                if i >= 2:
                    break
            print()
except KeyboardInterrupt:
    print('training interrupted')

batch 0
minibatch loss: 2.3390607833862305
sample 1:
input > [3 6 9 5 6 4 2 6]
predicted > [6 5 8 9 9 8 9 9 8]
sample 2:
input > [4 2 6 6 5 0 0 0]
predicted > [6 7 7 7 7 8 9 9 9]
sample 3:
input > [3 8 9 6 9 4 8 5]
predicted > [6 3 3 9 3 9 3 3 9]
batch 1000
minibatch loss: 0.30939382314682007
sample 1:
input > [9 5 9 9 7 2 0 0]
predicted > [9 9 9 9 7 2 1 0 0]
sample 2:
input > [9 3 7 7 4 9 0 0]
predicted > [9 3 7 7 4 9 1 0 0]
sample 3:
input > [7 7 2 4 2 7 5 0]
predicted > [7 7 2 4 2 7 5 1 0]
batch 2000
minibatch loss: 0.15077874064445496
sample 1:
input > [5 7 2 4 3 8 0 0]
predicted > [5 7 2 4 3 8 1 0 0]
sample 2:
input > [4 8 4 6 3 6 7 7]
predicted > [4 8 4 6 7 7 7 7 1]
sample 3:
input > [4 9 4 0 0 0 0 0]
predicted > [4 9 4 1 0 0 0 0 0]
batch 3000
minibatch loss: 0.09187103807926178
sample 1:
input > [3 9 3 0 0 0 0 0]
predicted > [3 9 3 1 0 0 0 0 0]
sample 2:
input > [9 5 2 3 2 0 0 0]
predicted > [9 5 2 3 2 1 0 0 0]
sample 3:
input > [7 8 7 0 0 0 0 0]
predicted > [7 8 7 1 0 0 0 0 0]

%matplotlib inline
import matplotlib.pyplot as plt
plt.plot(loss_track)
print('loss {:.4f} after {} examples (batch_size={})'.format(loss_track[-1], len(loss_track)*batch_size, batch_size))