RNN构建语言模型（用前一个单词预测下一个单词）

最新推荐文章于 2024-01-27 21:02:27 发布

陈正长

最新推荐文章于 2024-01-27 21:02:27 发布

阅读量1.5k

点赞数 1

文章标签：深度学习神经网络

本文链接：https://blog.csdn.net/weixin_40261309/article/details/104434276

版权

训练RNN的时候，根据反向传播，梯度会不断相乘，很容易出现梯度消失和梯度爆炸

通常的解决方法：

对于梯度爆炸：Gradient Clipping：如果梯度太大就把它往下卡

对于梯度消失：提出了一些新模型，Long Short-term Memory长短记忆网络、Gated Recurrent Unit(GRU)

使用 torchtext 来创建vocabulary, 然后把数据读成batch的格式。

import torchtext
from torchtext.vocab import Vectors
import torch
import numpy as np
import random

USE_CUDA = torch.cuda.is_available()

# 为了保证实验结果可以复现，我们经常会把各种random seed固定在某一个值
random.seed(53113)
np.random.seed(53113)
torch.manual_seed(53113)
if USE_CUDA:
    torch.cuda.manual_seed(53113)

BATCH_SIZE = 32 #一个BATCH里有多少个句子
EMBEDDING_SIZE = 650 #输入的时候把单词embed成多少维
MAX_VOCAB_SIZE = 50000

TorchText的一个重要概念是Field，它决定了你的数据会如何被处理。我们使用TEXT这个field来处理文本数据。我们的TEXT field有lower=True这个参数，所以所有的单词都会被lowercase。
torchtext提供了LanguageModelingDataset这个class来帮助我们处理语言模型数据集。
build_vocab可以根据我们提供的训练数据集来创建最高频单词的单词表，max_size帮助我们限定单词总量。
BPTTIterator可以连续地得到连贯的句子，BPTT的全程是back propagation through time。

TEXT = torchtext.data.Field(lower=True)
train, val, test = torchtext.datasets.LanguageModelingDataset.splits(path=".", 
    train="text8.train.txt", #训练集
    validation="text8.dev.txt", #验证集
    test="text8.test.txt", #测试集
    text_field=TEXT)
TEXT.build_vocab(train, max_size=MAX_VOCAB_SIZE)
print("vocabulary size: {}".format(len(TEXT.vocab)))

device = torch.device("cuda" if USE_CUDA else "cpu")
VOCAB_SIZE = len(TEXT.vocab)
train_iter, val_iter, test_iter = torchtext.data.BPTTIterator.splits(
    (train, val, test), 
    batch_size=BATCH_SIZE, 
    device=device, 
    bptt_len=32,
    repeat=False, #使epoch为1
    shuffle=True
)

cewshiji为什么我们的单词表有50002个单词而不是50000呢？因为TorchText给我们增加了两个特殊的token，<unk>表示未知的单词，<pad>表示padding。
模型的输入是一串文字，模型的输出也是一串文字，他们之间相差一个位置，因为语言模型的目标是根据之前的单词预测下一个单词。

查看

it = iter(train_iter)
batch = next(it)
print(" ".join([TEXT.vocab.itos[i] for i in batch.text[:,1].data]))
print(" ".join([TEXT.vocab.itos[i] for i in batch.target[:,1].data]))

输出结果：
had dropped to just three zero zero zero k it was then cool enough to allow the nuclei to capture electrons this process is called recombination during which the first neutral atoms
dropped to just three zero zero zero k it was then cool enough to allow the nuclei to capture electrons this process is called recombination during which the first neutral atoms took

定义模型

继承nn.Module
初始化函数
forward函数
其余可以根据模型需要定义相关的函数

import torch
import torch.nn as nn

class RNNModel(nn.Module):
    """ 一个简单的循环神经网络
    @ntoken是vocabulary size
    @ninp是embed_size
    """
    def __init__(self, rnn_type, ntoken, ninp, nhid, nlayers, dropout=0.5):
        ''' 该模型包含以下几层:
            - 词嵌入层
            - 一个循环神经网络层(RNN, LSTM, GRU)
            - 一个线性层，从hidden state到输出单词表
            - 一个dropout层，用来做regularization
        '''
        super(RNNModel, self).__init__()
        self.drop = nn.Dropout(dropout)
        self.encoder = nn.Embedding(ntoken, ninp)
        if rnn_type in ['LSTM', 'GRU']:
            self.rnn = getattr(nn, rnn_type)(ninp, nhid, nlayers, dropout=dropout)
        else:
            try:
                nonlinearity = {'RNN_TANH': 'tanh', 'RNN_RELU': 'relu'}[rnn_type]
            except KeyError:
                raise ValueError( """An invalid option for `--model` was supplied,
                                 options are ['LSTM', 'GRU', 'RNN_TANH' or 'RNN_RELU']""")
            self.rnn = nn.RNN(ninp, nhid, nlayers, nonlinearity=nonlinearity, dropout=dropout)
        self.decoder = nn.Linear(nhid, ntoken)

        self.init_weights()

        self.rnn_type = rnn_type
        self.nhid = nhid
        self.nlayers = nlayers

    def init_weights(self):
        initrange = 0.1
        self.encoder.weight.data.uniform_(-initrange, initrange)
        self.decoder.bias.data.zero_()
        self.decoder.weight.data.uniform_(-initrange, initrange)

    def forward(self, input, hidden):
        ''' Forward pass:
            - word embedding
            - 输入循环神经网络
            - 一个线性层从hidden state转化为输出单词表
        '''
        emb = self.drop(self.encoder(input))
        output, hidden = self.rnn(emb, hidden)
        output = self.drop(output)
        decoded = self.decoder(output.view(output.size(0)*output.size(1), output.size(2)))
        return decoded.view(output.size(0), output.size(1), decoded.size(1)), hidden

    def init_hidden(self, bsz, requires_grad=True):
        weight = next(self.parameters())
        if self.rnn_type == 'LSTM':
            return (weight.new_zeros((self.nlayers, bsz, self.nhid), requires_grad=requires_grad),
                    weight.new_zeros((self.nlayers, bsz, self.nhid), requires_grad=requires_grad))
        else:
            return weight.new_zeros((self.nlayers, bsz, self.nhid), requires_grad=requires_grad)

陈正长

关注

1
点赞
踩
6

收藏

觉得还不错? 一键收藏
0
评论
RNN构建语言模型（用前一个单词预测下一个单词）

训练RNN的时候，根据反向传播，梯度会不断相乘，很容易出现梯度消失和梯度爆炸通常的解决方法：对于梯度爆炸：Gradient Clipping：如果梯度太大就把它往下卡对于梯度消失：提出了一些新模型，Long Short-term Memory长短记忆网络、Gated Recurrent Unit(GRU)...
复制链接

扫一扫