nlp 中文文本纠错_NLP中文文本分类任务的笔记(二)

本文探讨了LSTM和Attention在中文文本分类任务中的使用,包括LSTM的演变、GRU的结构,以及self-attention的工作原理。通过结合LSTM和self-attention构建文本分类模型,并介绍了防止过拟合的措施。实验结果显示,TextCNN和LSTM+Self-Attention在不同数据集上表现各有优势。
摘要由CSDN通过智能技术生成

9b7feee0d585ee96f4aa9eb5bb882398.png

接着上一篇的笔记

邓文涛:NLP中文文本分类任务的笔记(一)​zhuanlan.zhihu.com

这一篇介绍下文本分类在LSTM和Attention的探索:

LSTM

在讲到LSTM之前,先介绍下RNN到LSTM的演变过程, RNN的计算公式是:

其中

时的hidden state。

由于RNN没办法解决梯度爆炸和梯度消失的问题,引入了LSTM (long short term memory)结构,通过如下的公式来表现这个结构:

forget gate(遗忘门) 选择性记忆

的信息,同时利用input gate(输入门)和output gate(输出门)控制输入和输出的信息整合。
class LSTM(nn.Module):
    def __init__(self,input_dim, output_dim, batch_size, num_direction):
        super().__init__()
        self.hidden_size = output_dim
        self.num_direction = num_direction
        if num_direction == 2:
            bidirectional = True
        else:
            bidirectional = False
        self.hc_state = self.init_hidden(batch_size)
        self.lstm = nn.LSTM(input_dim, output_dim,  batch_first=True, bidirectional=bidirectional)

    def init_hidden(self, batch_size):
        return (Variable(torch.zeros(1*self.num_direction, batch_size, self.hidden_size)),
               Variable(torch.zeros(1*self.num_direction, batch_size, self.hidden_size)))

    def forward(self, embeddings):
        self.hc_state = self.init_hidden(embeddings.size(0))
        outputs, (ht,ct) = self.lstm(embeddings, self.hc_state)
        return outputs, (ht,ct)

相较于LSTM,2014年提出了另一个结构GRU(gated recurrent unit),它能够实现与LSTM相似的结果,同时能够便于计算,GRU的主要由reset gate和update gate两部分组成:

reset gate:重置

;

update gate: 将遗忘门和输入门结合在一起

Attention(主要是分析self-attention)

self-attention 自注意力模型:首先根据定义的 Q(query), K (key), V(value), self-attention可以被定义为, 这个也是transformers里面使用的self-attention公式, 需要注意的是在代码中我把Q, K, V三者统一为LSTM的outputs,进行权重学习。

import torch
import torch.nn as nn
import numpy as np

class ScaledDotProductAttention(nn.Module):

    def __init__(self, d_model, attn_dropout=0.1):
        super(ScaledDotProductAttention, self).__init__()
        self.temper = np.power(d_model, 0.5)
        self.dropout = nn.Dropout(attn_dropout)
        self.softmax = nn.Softmax(1)

    def forward(self, q, k, v, attn_mask=None):
        attn = torch.bmm(q, k.transpose(1, 2)) / self.temper
        if attn_mask is not None:

            assert attn_mask.size() == attn.size(), 
                    'Attention mask shape {} mismatch ' 
                    'with Attention logit tensor shape ' 
                    '{}.'.format(attn_mask.size(), attn.size())

            attn.data.masked_fill_(attn_mask, -float('inf'))

        attn = self.softmax(attn)
        attn = self.dropout(attn)
        output = torch.bmm(attn, v)
        return output, attn

结合LSTM和self-attention

基于LSTM和self-attention我们可以构建一个文本分类的方式,如下:

class MultiClassCLS(nn.Module):
    '''
    The class is an implementation of the paper A Structured Self-Attentive Sentence Embedding including regularization
    and without pruning. Slight modifications have been done for speedup
    '''
    def __init__(self, args):
        '''

        :param batch_size:{int} batch_size used for training
        :param hidden_size:{int} hidden dimension for lstm
        :param d_a:{int} hidden dimension for the dense layer
        :param r:{int} attention-hops or attention heads
        :param max_len:{int} number of lstm timesteps
        :param emb_dim:{int} embeddingd for time steps
        :param num_labels:{int} number of classes
        :param type:[0,1, 3] 0-->binary classification 1-->multiclass classification 2-->multilabel classification
        '''
        super(MultiClassCLS, self).__init__()
        self.num_labels = args['num_labels']
        self.batch_size = args['batch_size']
        self.hidden_size = args['hidden_size']
        self.max_len = args['max_len']

        self.type =args['type']
        self.vocab_size = args["vocab_size"]
        self.emb_dim = args["emb_dim"]
        self.att_droput = args["att_dropout"]

        self.bidirection = args["bidirection"]
        if self.bidirection:
            self.num_direction = 2
        else:
            self.num_direction = 1

        # self.bert_path = os.path.join(current_path, bert_path)
        self.embeddings = nn.Embedding(self.vocab_size, self.emb_dim)
        if vectors:
            self.embeddings.from_pretrained(torch.tensor(vectors).type(torch.float32), freeze=True)

        self.lstm = LSTM(self.emb_dim, self.hidden_size, self.batch_size, self.num_direction)
        self.bn = LayerNorm(self.num_direction*self.hidden_size)
        # self.attention = Attention(self.num_direction*self.hidden_size, self.d_a, self.r)
        self.attention = SelfAttention(self.num_direction*self.hidden_size, attn_dropout=self.att_droput)

        self.label_layer = nn.Linear(self.num_direction*self.hidden_size, self.num_labels)

    def forward(self, sentence, reduction = 'mean'):
        sentence_emb= self.embeddings(sentence)

        outputs, _ = self.lstm(sentence_emb)
        outputs = self.bn(outputs)

        attention_output, _ = self.attention(outputs, outputs, outputs)

        if reduction == 'mean':
            fc_output = torch.sum(attention_output,1)/self.max_len
        else:
            fc_output= torch.max(attention_output, 1)[0]

        label_logits = self.label_layer(fc_output)

        if self.type in [0,2]:
            return torch.sigmoid(label_logits)
        if self.type == 1:
            return F.log_softmax(label_logits, dim=1)

为了防止过拟合,这边做了两个处理:

  • 将LSTM的outputs进行LayerNorm的操作
self.bn = LayerNorm(self.num_direction*self.hidden_size)
outputs = self.bn(outputs)

class LayerNorm(nn.Module):
    def __init__(self, hidden_size, eps=1e-6):
        super().__init__()
        self.eps = eps
        self.weight = nn.Parameter(torch.ones(hidden_size))
        self.bias = nn.Parameter(torch.zeros(hidden_size))

    def forward(self, input):
        mu = torch.mean(input, dim=-1, keepdim=True)
        sigma = torch.std(input, dim=-1, keepdim=True).clamp(min=self.eps)
        output = (input - mu) / sigma
        return output * self.weight.expand_as(output) + self.bias.expand_as(output)
  • 在self-attention加入dropout机制:
 self.dropout = nn.Dropout(attn_dropout)

一般我会选择把attn_dropout设置为0.5。

对比了textcnn 和 LSTM+self-attention两种方法的一些结论:

数据1: label size = 35

accuracy10-epochcpumemory
self-attn0.77637s6-7340M
text-cnn0.76193s4-5320M

数据2: label size = 83

accuracy10-epochcpumemory
self-attn0.92650s6-7370M
text-cnn0.938128s4-5320M

就两个数据上来看,两个算法各有千秋,text-cnn训练时间慢但是需要的资源少,LSTM+self-attention训练时间快了2-3倍,但是需要的cpu和memory资源更多。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值