bilstmcrf词性标注_序列标注模型-BiLSTM+CRF机理概述

最新推荐文章于 2025-05-18 14:45:00 发布

原创最新推荐文章于 2025-05-18 14:45:00 发布 · 436 阅读

1 ·

CC 4.0 BY-SA版权

文章标签：

#bilstmcrf词性标注

本文介绍了序列标注问题，如分词、词性标注等，重点讲解了BiLSTM+CRF模型的工作流程。BiLSTM用于捕捉序列中的上下文信息，CRF用于计算序列标注的概率。文中详细阐述了BiLSTM的运行机制、CRF损失的计算方法，并提供了代码实现。

@[toc]

1.序列标注模型简介

序列标注问题包括自然语言处理中的分词，词性标注，命名实体识别，关键词抽取，词义角色标注等等。

例如，命名实体识别(NER)的标注问题就是：对长度为N的输入序列，对其中的每个元素打上标签，得到长度也为N的label，例如人名、地点等标签。

2.BiLSTM+CRF模型流程

2.1 为什么用BiLSTM+CRF模型

CRF是非常经典的序列标注模型，深度学习发展起来之后，深度学习+CRF的模型得到广泛应用。其中的代表就是BiLSTM+CRF。双向LSTM能更好的捕捉序列中上下文的信息，提高标注的准确性。

2.2 一种典型结构

2.2.1 数据预处理

字符串在输入模型之前已经做了数值化处理：一般是根据字符串在字典中的编号，将字符串转化为一个整数数组。

例如 "中华人民共和国"，根据每个字符在字典中的位置，查到这7个字对应的整数值，得到它的数值化结果，例如是[1,2,3,4,5,6,7]

2.2.2 典型结构

还可以在下述结构中添加 dropout等等层，此处略去。

3. 几个关键问题

3.1 BiLSTM的运行机制

一般的LSTM，假设隐层神经元数目为100个，将“中国人”的embedding按次序输入LSTM,一共3个time_step，得到3个向量，它们记为[L0,L1,L2]，分别对应“中”、“国”、“人”。

把“中国人”的embedding按倒序输入LSTM，也得到3个time_step的3个向量，记为[R0,R1,R2]。

将同一个字符对应的前向和后向两个向量拼接起来，就得到BiLSTM层的输出，即为：“中”对应[L0,R2]，“国”对应[L1,R1]，"人"对应[L2,R0]。

3.2 CRF loss的计算方法

3.2.1 标记序列的score

对于一个输入序列，对于给定的一个标记序列label，它的得分定义为：

S=EmissionScore+TransitionScore ————————————————————————————————

简言之，EmissionScore是BilSTM给这个标记的打分(序列中每个位置的对应标签打分的和)。

————————————————————————————————

简言之，TransitionScore就是该序列状态转移矩阵中对应的和(序列的i位置为A，i+1序列为B，这之间对应一个transition score,可以理解成概率)。

3.2.2 CRF loss计算公式

给定一个输入序列，它的标记序列可能有很多。模型的目的是：使真实的序列的score在所有可能的序列的score和的占比最高。

(1)直接求解 P_RealPath 给定BiLSTM的输出emit_score、CRF的Transition矩阵、一个标签序列，可以根据3.2.1的计算方法计算该序列的score，从而得出P_RealPathde=exp(score)。

(2)动态规划求解 P_1+P_2+...+P_N 这个计算的困难在于，所有可能的path可能太多了(所有可能的标记序列太多了)。解决办法是可以用动态规划来求解。详细过程请参考 The total score of all the paths 。

3.2.3 代码实现

def cal_loss(self, tags, mask, emit_score):

""" Calculate CRF lossArgs:tags (tensor): a batch of tags, shape (b, len)mask (tensor): mask for the tags, shape (b, len), values in PAD position is 0emit_score (tensor): emit matrix, shape (b, len, K)Returns:loss (tensor): loss of the batch, shape (b,)"""

batch_size, sent_len = tags.shape

# calculate score for the tags

score = torch.gather(emit_score, dim=2, index=tags.unsqueeze(dim=2)).squeeze(dim=2) # shape: (b, len)

score[:, 1:] += self.transition[tags[:, :-1], tags[:, 1:]]

#### total_score为P_realPath

total_score = (score * mask.type(torch.float)).sum(dim=1) # shape: (b,)

# calculate the scaling factor

d = torch.unsqueeze(emit_score[:, 0], dim=1) # shape: (b, 1, K)

for i in range(1, sent_len):

n_unfinished = mask[:, i].sum()

d_uf = d[: n_unfinished] # shape: (uf, 1, K)

emit_and_transition = emit_score[: n_unfinished, i].unsqueeze(dim=1) + self.transition # shape: (uf, K, K)

log_sum = d_uf.transpose(1, 2) + emit_and_transition # shape: (uf, K, K)

max_v = log_sum.max(dim=1)[0].unsqueeze(dim=1) # shape: (uf, 1, K)

log_sum = log_sum - max_v # shape: (uf, K, K)

d_uf = max_v + torch.logsumexp(log_sum, dim=1).unsqueeze(dim=1) # shape: (uf, 1, K)

d = torch.cat((d_uf, d[n_unfinished:]), dim=0)

d = d.squeeze(dim=1) # shape: (b, K)

max_d = d.max(dim=-1)[0] # shape: (b,)

#### 用动态规划求得 P_1+P_2+...+P_N

d = max_d + torch.logsumexp(d - max_d.unsqueeze(dim=1), dim=1) # shape: (b,)

llk = total_score - d # shape: (b,)

loss = -llk # shape: (b,)

return loss

值得一提的是：上述代码中，出现了两次减去最大值之后，再计算 logsumexp，这是为了防止向上溢出，原理上理解时可以忽略。

3.2.4 得到loss之后，可以更新模型参数

3.3 新样本的infer过程

3.3.1 思路

训练好模型之后，对于一个新的输入，需要给它打标签，还是用基于动态规划的维特比算法。详细过程请参考 Infer the labels for a new sentence 。

3.3.2 代码实现

def predict(self, sentences, sen_lengths):

"""Args:sentences (tensor): sentences, shape (b, len). Lengths are in decreasing order, len is the lengthof the longest sentencesen_lengths (list): sentence lengthsReturns:tags (list[list[str]]): predicted tags for the batch"""

batch_size = sentences.shape[0]

mask = (sentences != self.sent_vocab[self.sent_vocab.PAD]) # shape: (b, len)

sentences = sentences.transpose(0, 1) # shape: (len, b)

sentences = self.embedding(sentences) # shape: (len, b, e)

emit_score = self.encode(sentences, sen_lengths) # shape: (b, len, K)

tags = [[[i] for i in range(len(self.tag_vocab))]] * batch_size # list, shape: (b, K, 1)

d = torch.unsqueeze(emit_score[:, 0], dim=1) # shape: (b, 1, K)

for i in range(1, sen_lengths[0]):

n_unfinished = mask[:, i].sum()

d_uf = d[: n_unfinished] # shape: (uf, 1, K)

emit_and_transition = self.transition + emit_score[: n_unfinished, i].unsqueeze(dim=1) # shape: (uf, K, K)

new_d_uf = d_uf.transpose(1, 2) + emit_and_transition # shape: (uf, K, K)