NLP、Language Model、LSTM、Attention model

最新推荐文章于 2024-08-11 20:46:07 发布

·清尘·

最新推荐文章于 2024-08-11 20:46:07 发布

阅读量1.3k

点赞数 1

本文链接：https://blog.csdn.net/u012969412/article/details/77767541

版权

参考代码：https://github.com/yunjey/pytorch-tutorial/tree/master/tutorials/02-intermediate/language_model

lstm简单介绍：https://zhuanlan.zhihu.com/p/24720659

一、RNN与LSTM结构

1) RNN:

$\large \\ h^{'}=\sigma(W^hh+W^ih) \\ y = \sigma(W^oh^{'})$

2) LSTM:

设： $\large input\_dim=m$ , $\large hidden\_dim=k$ , $\large x \in R^{ m\times 1}$ , $\large h \in R^{ k\times 1}$ ， $\large \bigodot$ 表示哈达马积(对应元素乘积)。
$\large \\C^{t}=Z^{f}\bigodot C^{t-1} + Z^{i}\bigodot Z \qquad C\in R^{k \times 1} \\ h^{t} = Z^o \bigodot tanh(C^t) \qquad h\in R^{k \times 1} \\ y^t = \sigma(W^{'}h^t) \qquad y\in R^{m \times 1} \qquad W^{'}\in R^{m \times k}$
其中：
$\large \\Z = tanh(W \times (x^t,h^{t-1})) \qquad W\in R^{k \times (m+k)} \\ Z^i = \sigma(W^i*(x^t,h^{t-1})) \qquad W^i \in R^{k \times (m+k)} \\ Z^f = \sigma(W^f*(x^t,h^{t-1})) \qquad W^f \in R^{k \times (m+k)} \\ Z^o = \sigma(W^o*(x^t,h^{t-1})) \qquad W^o\in R^{k \times (m+k)}$

二、pytorch 的Embedding与LSTM接口用法

官方文档: http://pytorch-cn.readthedocs.io/zh/latest/package_references/torch-nn/#recurrent-layers

1、nn.Embedding接口初始化

class Embedding(Module):
    def __init__(
        self, 
        num_embeddings, # vocabulary_size len(word2idx.keys()) 
        embedding_dim, # 最终生成的word2vec的维度
        padding_idx=None,
        max_norm=None, 
        norm_type=2, 
        scale_grad_by_freq=False,
        sparse=False):

2、nn.LSTM接口初始化

class RNNBase(Module):
    def __init__(
        self,
        mode,
        input_size, # word2vec的维度
        hidden_size, # 隐藏层一个节点的维度
        num_layers=1, # 隐藏层的个数，层数
        bias=True,
        batch_first=False,
        dropout=0,
        bidirectional=False):

范例：

# LSTM 参数中不关心句子的长度
lstm = nn.LSTM(input_size=50, hidden_size=1024, num_layers=2, batch_first=True)
input = Variable(torch.randn(1000, 40, 50)) # 句子最大长度为40，word2vec维度为50
# h0 = Variable(torch.randn(3, 1000, 1024))
# c0 = Variable(torch.randn(3, 1000, 1024))
output, (hn,cn) = lstm(input)
print( output.size(),hn.size(),cn.size() )
# output.size()=(1000*40*1024) hn.size()=2*1000*1024 cn.size()=2*1000*1024

上述例子反映:

一个batch有1000个句子, 句子中单词的word2vec的维度为50，句子最大长度为40，不够40的补<EOS>，多于40的截断。

最后输出的output的size为：1000个句子*句子最大长度为40*每个单词的size为1024, hn的size为：2个隐层*句子最大长度1000*每个单词的维度1024。

该LSTM设计结构如下:

输入单词的维度(input_size)=50, 隐藏层有2层, 隐节点的维度为1024。

Pytorch中对可变长度序列的处理（LSTM）

三、Language Model

语言模型的工作是计算一句话是否为正常的语言。

注意模型中:每个短句无重叠。

batch_size: 样本分批训练的批次大小

seq_len:是序列长度(人为定义大小，一般取30)，就是默认的语句长度

corpus:是字典集合，语料库。

第一步：将所有文本切词添加到语料库

将文本库内的所有单词进行编号。一个单词对应唯一一个编号。

第二步：将文本语料无交叉构造成训练数据矩阵

data矩阵的每一行表示一句话(长度固定30)，其中包含分句符。每个值为该句话中该单词所在语料库中的id值(一般用idx表示)。

target矩阵的每一行对应data矩阵每一行一句话向右平移后的(长度为30)的语句。

datas = Variable(ids[:, i:i+seq_length]) 
targets = Variable(ids[:, (i+1):(i+1)+seq_length].contiguous())

注: ids的大小为[ batchsize*(seq_length*t) ] 应该分t组数据(loader)处理。

第三步：建立模型

模型的输入为上述矩阵，输出为原语句平移1个单位后的语句。

四、Image Caption(图像描述)

任务：输入一张图像，输出该图像的文字描述。

数据处理流程如下图所示：

五、Seq2Seq(Attention Model)

参考网址：https://zhuanlan.zhihu.com/p/22081325 这里的4个公式一定要看

pytorch-60分钟教程： http://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html

我的Seq2Seq代码。

Attention model翻译模型整体结构如下：

图中Attention模块矩阵计算整理如下：

我的整理：假设 $\large \dpi{120} \fn_jvn \large eh \in R^{h\times 1}$ 表示当前时间步LSTM的输出状态。 $\large out \in R^{m\times h}$ 表示Encoder的输出序列(各时间步的隐向量输出)，其中。记 $\large eh\_copy=(eh,eh,...,eh)^{'} \in R^{m\times h}$ ，其中 $\large m$ 是输入序列长度， $\large h$ 是隐层输出维度。则有

$\dpi{120} \fn_jvn \large \\energy=W^e\cdot \left ( eh\_copy, out \right )^{'} \\ . \qquad W^{e} \in R^{h \times 2h} \qquad energy \in R^{h\times m}$

$\large \\ atte = Softmax \left ( energy^{'} \cdot v \right ) \\ context = out^{'} \cdot atte \\ . \qquad v \in R^{h\times 1} \qquad atte \in R^{m\times 1} \qquad context \in R^{h\times 1}$