Ch7. Example 7-3 Surname Generation

最新推荐文章于 2022-06-05 13:31:01 发布

剑齿薄荷

最新推荐文章于 2022-06-05 13:31:01 发布

阅读量155

点赞数

分类专栏： NLP with PyTorch 读书笔记文章标签：序列生成

本文链接：https://blog.csdn.net/w295286543/article/details/101001268

版权

NLP with PyTorch 读书笔记专栏收录该内容

7 篇文章 0 订阅

订阅专栏

本例使用了GRU来进行文本生成，并简单介绍了其原理，强调了门控制在长短程记忆以及梯度控制上的优势。

待解决的问题：利用输入的名字生成新名字，文本生成任务。

数据维度变化

有制约和无制约的差别

Conditioned和Unconditioned，我的翻译不知准不准确。文本生成的一大特点就是，预测结果可以被“诱导”，本例中，如果在GRU细胞中加入embedded后的“国家”隐藏层，则会大幅提升预测效果。从这一点来看，有制约和无制约的区别就在于有无信息注入---Information Injection。

采样函数 sample_from_model

利用已经训练好的模型来生成名字，各种维度变化用到的方法十分巧妙，值得借鉴。

在softmax之后，利用torch.multinomial()进行取样，将每个sample，也就是每行的最大概率所在位置取出来，以供解码器进行解码。

def sample_from_model(model, vectorizer, num_samples=1, sample_size=20, 
                      temperature=1.0):
    """Sample a sequence of indices from the model
    
    Args:
        model (SurnameGenerationModel): the trained model
        vectorizer (SurnameVectorizer): the corresponding vectorizer
        num_samples (int): the number of samples =  number of names we want to predict
        sample_size (int): the max length of the samples
        temperature (float): accentuates or flattens 
            the distribution. 
            0.0 < temperature < 1.0 will make it peakier. 
            temperature > 1.0 will make it more uniform
    Returns:
        indices (torch.Tensor): the matrix of indices; 
        shape = (num_samples, sample_size)
    """
    begin_seq_index = [vectorizer.char_vocab.begin_seq_index 
                       for _ in range(num_samples)]
    begin_seq_index = torch.tensor(begin_seq_index, 
                                   dtype=torch.int64).unsqueeze(dim=1)
    indices = [begin_seq_index]
    h_t = None
    
    for time_step in range(sample_size):
        x_t = indices[time_step] # the previous prediction is the input of this step
        x_emb_t = model.char_emb(x_t)
        rnn_out_t, h_t = model.rnn(x_emb_t, h_t) # rnn_out_t not equal to h_t, especially the dimension
        prediction_vector = model.fc(rnn_out_t.squeeze(dim=1))
        probability_vector = F.softmax(prediction_vector / temperature, dim=1)
        temp=torch.multinomial(probability_vector, num_samples=1)  # return only one location of each row by the values in prob_vector
        indices.append(temp)
        #indices.append(torch.multinomial(probability_vector, num_samples=1)) 
    indices1 = torch.stack(indices)# -->(sample_size, num_samples, 1)
    indices2=indices1.squeeze()    # -->(sample_size, num_samples)
    indices3=indices2.permute(1,0) # -->(num_samples, sample_size)
    return indices3

代码笔记

1. 代码能跑，但是准确率异常的高(99%)，真是又想哭又想笑。

答：如此高的准确率，说明数据和目标十分匹配，经过调试发现每一批from_vector和to_vector的值都一模一样，也就是说问题再次出在Vectorize()上，还真是重灾区。倒数第5行，复制粘贴出错，正确代码应该为to_indices = indices[1 : ]

    def vectorize(self, surname, vector_length=-1):
        """Vectorize a surname into a vector of observations and targets
           The outputs are the vectorized surname split into two vectors:
              surname[:-1] and surname[1:]
           At each timestep, the first vector is the observation and the second vector is the target. 
        Args:
            surname (str): the surname to be vectorized
            vector_length (int): an argument for forcing the length of index vector
        Returns:
            a tuple: (from_vector, to_vector)
            from_vector (numpy.ndarray): the observation vector 
            to_vector (numpy.ndarray): the target prediction vector
        """
        indices = [self.surname_vocab.begin_seq_index]
        indices.extend(self.surname_vocab.lookup_token(token) 
                        for token in surname)
        indices.append(self.surname_vocab.end_seq_index)
        if vector_length < 0:
            vector_length = len(indices) - 1

        from_vector = np.empty(vector_length, dtype=np.int64)
        from_indices = indices[:-1] # Drop the end_seq_index
        from_vector[:len(from_indices)] = from_indices
        from_vector[len(from_indices):] = self.surname_vocab.mask_index

        to_vector = np.empty(vector_length, dtype=np.int64)
        to_indices = indices[:-1] # Drop the begin_seq_index
        to_vector[:len(to_indices)] = to_indices
        to_vector[len(to_indices):] = self.surname_vocab.mask_index

        return from_vector, to_vector

2. 重新定义了compute accuracy函数

def normalize_sizes(y_pred, y_true):
    """Normalize tensor sizes
    Args:
        y_pred (torch.Tensor): the output of the model
            If a 3-dimensional tensor, reshapes to a matrix
        y_true (torch.Tensor): the target predictions
            If a matrix, reshapes to be a vector
    """
    if len(y_pred.size()) == 3:
        y_pred = y_pred.contiguous().view(-1, y_pred.size(2))
    if len(y_true.size()) == 2:
        y_true = y_true.contiguous().view(-1)
    return y_pred, y_true

def compute_accuracy(y_pred, y_true, mask_index):
    y_pred, y_true = normalize_sizes(y_pred, y_true)

    _, y_pred_indices = y_pred.max(dim=1) # return the max value of each row
    # float() transforms Bool type to float.
    correct_indices = torch.eq(y_pred_indices, y_true).float() # find locations where pred matches truth
    valid_indices = torch.ne(y_true, mask_index).float() # find locations where values don't represent mask
    n_correct = (correct_indices * valid_indices).sum().item() # dot product is essential, 
                                                               # because the final correct value should satisfy 
                                                               # conditions of being matching and valid at the same time
    n_valid = valid_indices.sum().item()

3. 重新定义了loss的计算函数

由于y_pred是3维张量，y_true是2维矩阵，必须先利用normalize_sizes()，对两个矩阵各降一个维度，才可以由cross entropy()计算损失熵。

def normalize_sizes(y_pred, y_true):
    """Normalize tensor sizes
        Args:
        y_pred (torch.Tensor): the output of the model
        If a 3dimensional tensor, reshapes to a matrix
        y_true (torch.Tensor): the target predictions
        If a matrix, reshapes to be a vector
    """    
    if len(y_pred.size()) == 3:
        y_pred = y_pred.contiguous().view(-1, y_pred.size(2))
    if len(y_true.size()) == 2:
        y_true = y_true.contiguous().view(-1)
    return y_pred, y_true

def sequence_loss(y_pred, y_true, mask_index):
    y_pred, y_true = normalize_sizes(y_pred, y_true)
    return F.cross_entropy(y_pred, y_true, ignore_index=mask_index)

剑齿薄荷

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Ch7. Example 7-3 Surname Generation

本例使用了GRU来进行文本生成，并简单介绍了其原理，强调了门控制在长短程记忆以及梯度控制上的优势。待解决的问题：利用输入的名字生成新名字，文本生成任务。数据维度变化有制约和无制约的差别Conditioned和Unconditioned，我的翻译不知准不准确。文本生成的一大特点就是，预测结果可以被“诱导”，本例中，如果在GRU细胞中加入embedded后的“国家”隐藏层，则会大...
复制链接

扫一扫