文章目录
往期文章目录链接
Note
This post is the second part of overall summarization of the competition. The first half is here.
Noteworthy ideas in 1st place solution
Idea
First step:
Use transformers to extract token level start and end probabilities.
Second step:
Feed these probabilities to a character level model. This step gives the team a huge improve on the final score since it handled the “noise” in the data properly.
Last step:
Ensemble.
Second level models Architectures
The following three Char-NN architectures uses character-level probabilities as input. The first level models output token-level probabilities and the following code convert token-level probabilities to character-level probabilities. The idea in the following cide is to assigning each character the probability of the corresponding token.
def token_level_to_char_level(text, offsets, preds):
    probas_char = np.zeros(len(text))
    for i, offset in enumerate(offsets):
        if offset[0] or offset[1]: # remove padding and sentiment
            probas_char[offset[0]:offset[1]] = preds[i]
    
    return probas_char
 
Things you need to know for nn.Embedding
The following architectures all train the embedding from scratch. Here we want to shortly discuss how nn.Embedding works.
nn.Embedding holds a Tensor of dimension (vocab_size, vector_size), i.e., of (the size of the vocabulary, the dimension of each vector embedding), and a method that does the lookup. When you create an embedding layer, the Tensor is initialised randomly.
You can also add pretrained weights with the command nn.Embedding.from_pretrained(weight).
Architecture 1: RNN
 
In the following, the parameter len_voc is calculated by
tokenizer.fit_on_texts(df_train['text'].values)
len_voc = len(tokenizer.word_index) + 1
 
Compare the following code with the figure above.
class TweetCharModel(nn.Module):
    # check the config in the original code post
    def __init__(self, len_voc, use_msd=True,
                 embed_dim=64, lstm_dim=64, char_embed_dim=32, sent_embed_dim=32, ft_lstm_dim=32, n_models=1):
        super().__init__()
        self.use_msd = use_msd
        
        self.char_embeddings = nn.Embedding(len_voc, char_embed_dim)
        self.sentiment_embeddings = nn.Embedding(3, sent_embed_dim) # 3 sentiments
        
        self.proba_lstm = nn.LSTM(n_models * 2, ft_lstm_dim, batch_first=True, bidirectional=True)
        
        self.lstm = nn.LSTM(char_embed_dim + ft_lstm_dim * 2 + sent_embed_dim, lstm_dim, batch_first=True, bidirectional=True)
        self.lstm2 = nn.LSTM(lstm_dim * 2, lstm_dim, batch_first=True, bidirectional=True)
        self.logits = nn.Sequential(
            nn.Linear(lstm_dim *  4, lstm_dim),
            nn.ReLU(),
            nn.Linear(lstm_dim, 2))
        
        self.high_dropout = nn.Dropout(p=0.5)
    
    def forward(self, tokens, sentiment, start_probas, end_probas):
        bs, T = tokens.size()
        
        probas = torch.cat([start_probas, end_probas], -1)
        probas_fts, _ = self.proba_lstm(probas)
        char_fts = self.char_embeddings(tokens)
        
        sentiment_fts = self.sentiment_embeddings(sentiment).view(bs, 1, -1)
        sentiment_fts = sentiment_fts.repeat((1, T, 1))
        
        features = torch.cat([char_fts, sentiment_fts, probas_fts], -1)
        features, _ = self.lstm(features)
        features2, _ = self.lstm2(features)
        
        features = torch.cat([features, features2], -1)
        
        # Multi-sample dropout (MSD)
        if self.use_msd and self.training:
            logits = torch.mean(
                torch.stack(
                    [self.logits(self.high_dropout(features)) for _ in range(5)],
                    dim=0),
                dim=0)
        else:
            logits = self.logits(features)
        start_logits, end_logits = logits[:, :, 0], logits[:, :, 1]
        return start_logits, end_logits
 
Architecture 2: CNN
 
class ConvBlock(nn.Module):
    # check the config in the original code post
    def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, dilation=1, padding="same", use_bn=True):
        super().__init__()
        if padding == "same":
            padding = kernel_size // 2 * dilation
        
        if use_bn:
            self.conv = nn.Sequential(
                nn.Conv1d(in_channels, out_channels, kernel_size, padding=padding, stride=stride, dilation=dilation),
                nn.BatchNorm1d(out_channels),
                nn.ReLU())
        else:
            self.conv 
                
                  
                  
                  
                  
本文是Kaggle比赛'Tweet Sentiment Extraction'的第二部分总结,聚焦金牌策略。冠军团队首先使用transformers提取token级起始和结束概率,然后通过字符级模型处理数据噪声,最后进行ensemble。亚军团队采用不同种子的模型融合,并使用reranking模型提高准确性。第三名团队使用beamsearch-like解码器和GRU头,以及字符级模型。第四名团队利用多头预测和重新排名策略。
          
最低0.47元/天 解锁文章
                          
                      
      
          
                
                
                
                
              
                
                
                
                
                
              
                
                
              
            
                  
					915
					
被折叠的  条评论
		 为什么被折叠?
		 
		 
		
    
  
    
  
            


            