jointbert读源码

最新推荐文章于 2024-07-03 11:24:22 发布

hanghangnie

最新推荐文章于 2024-07-03 11:24:22 发布

阅读量648

点赞数

分类专栏： NLP算法遨游之路文章标签：深度学习机器学习 pytorch

本文链接：https://blog.csdn.net/qq_51556906/article/details/124844693

版权

NLP算法遨游之路专栏收录该内容

10 篇文章 0 订阅

订阅专栏

paper:https://arxiv.org/pdf/1902.10909v1.pdf

code:https://arxiv.org/pdf/1902.10909v1.pdf

1. 除了模型之外的代码：

他的模型直接给出了intent和slots的联合loss，而且是一起放进去训练的，最后训练给出的loss也是一起的，这loss和训练的结果，在训练中直接把所有参数给到模型，然后后传进行调优，没有区别。

在验证中也是，给了输入后会直接输出intent和slots的pred和logit，后面直接使用，后面一个计算计算mertric的函数中，有三个子函数，分别是独立计算intent和独立计算slots和一个联合计算，联合计算sementic_frame_acc的方法是分别计算出两个acc后进行np.multiply()元素点乘。

2. 模型代码

他有三种模型可供使用：分别是bert distilbert albert 我们先拿最常用的也是原论文里的bert举例

他在训练的代码中，直接使用了bert模型后输出的loss，它的bert基础是使用了transformers的预训练后自己接了任务，他把bert的输出接了出来：

outputs = self.bert(input_ids, attention_mask=attention_mask,
                            token_type_ids=token_type_ids)  
# sequence_output, pooled_output, (hidden_states), (attentions)
sequence_output = outputs[0]
pooled_output = outputs[1]  # [CLS]

接出来后，首先是intent，比较简单，是一个简单的分类任务

self.dropout = nn.Dropout(dropout_rate)
self.linear = nn.Linear(input_dim, num_slot_labels)

分类后，logits放入softmax中进行类别分类或者像他这样直接计算loss

if intent_label_ids is not None:
     if self.num_intent_labels == 1:
          intent_loss_fct = nn.MSELoss()
          intent_loss = intent_loss_fct(intent_logits.view(-1), intent_label_ids.view(-1))
     else:
          intent_loss_fct = nn.CrossEntropyLoss()
          intent_loss = intent_loss_fct(intent_logits.view(-1, self.num_intent_labels),         
                                                             intent_label_ids.view(-1))
     total_loss += intent_loss

slots本质上是个ner任务，他获得ner的标签后，一样进行了这个操作先是使用了一个分类器，进行了一个分类，获得logits:

class SlotClassifier(nn.Module):
    def __init__(self, input_dim, num_slot_labels, dropout_rate=0.):
        super(SlotClassifier, self).__init__()
        self.dropout = nn.Dropout(dropout_rate)
        self.linear = nn.Linear(input_dim, num_slot_labels)

    def forward(self, x):
        x = self.dropout(x)
        return self.linear(x)

后面分为使用crf和不使用，使用的直接得到loss 记得-1，不使用的确定了损失函数后，跟上面一样，铺平后跟真实标签输入得到loss，他还是使用了attentionmask 确定了哪些是可用的哪些是填充的：

if slot_labels_ids is not None:
   if self.args.use_crf:
      slot_loss = self.crf(slot_logits, slot_labels_ids, mask=attention_mask.byte(), reduction='mean')
       slot_loss = -1 * slot_loss  # negative log-likelihood
   else:
       slot_loss_fct = nn.CrossEntropyLoss(ignore_index=self.args.ignore_index)
         # Only keep active parts of the loss
         if attention_mask is not None:
              ctive_loss = attention_mask.view(-1) == 1
              active_logits = slot_logits.view(-1, self.num_slot_labels)[active_loss]
              active_labels = slot_labels_ids.view(-1)[active_loss]
              slot_loss = slot_loss_fct(active_logits, active_labels)
          else:
              slot_loss = slot_loss_fct(slot_logits.view(-1, self.num_slot_labels),     
                                                        slot_labels_ids.view(-1))
     total_loss += self.args.slot_loss_coef * slot_loss

最后要乘一个slot ner的系数，他将loss放到一起输出：

outputs = ((intent_logits, slot_logits),) + outputs[2:]  # add hidden states and attention if they are here

outputs = (total_loss,) + outputs

return outputs  # (loss), logits, (hidden_states), (attentions) # Logits is a tuple of intent and slot logits

后面用了distilbert就是蒸馏模型，接任务的用法和bert是一样的，但是蒸馏模型没有cls输出，只有一整个状态输出你要自己取：

outputs = self.distilbert(input_ids, attention_mask=attention_mask)  # last-layer hidden-state, (hidden_states), (attentions)
sequence_output = outputs[0]
pooled_output = sequence_output[:, 0]  # [CLS]

其他几乎一样

albert跟bert是一样的使用方式（比较意外），模型输出格式是一样的

这个源代码先读到这里，后续会更新跟distilbert相关的蒸馏模型和albert相关的模型介绍。