Ch7. Example 7-3 Surname Generation

本例使用了GRU来进行文本生成,并简单介绍了其原理,强调了门控制在长短程记忆以及梯度控制上的优势。

待解决的问题:利用输入的名字生成新名字,文本生成任务。

数据维度变化

 

有制约和无制约的差别

Conditioned和Unconditioned,我的翻译不知准不准确。文本生成的一大特点就是,预测结果可以被“诱导”,本例中,如果在GRU细胞中加入embedded后的“国家”隐藏层,则会大幅提升预测效果。从这一点来看,有制约和无制约的区别就在于有无信息注入---Information Injection。

采样函数 sample_from_model

利用已经训练好的模型来生成名字,各种维度变化用到的方法十分巧妙,值得借鉴。

在softmax之后,利用torch.multinomial()进行取样,将每个sample,也就是每行的最大概率所在位置取出来,以供解码器进行解码。

def sample_from_model(model, vectorizer, num_samples=1, sample_size=20, 
                      temperature=1.0):
    """Sample a sequence of indices from the model
    
    Args:
        model (SurnameGenerationModel): the trained model
        vectorizer (SurnameVectorizer): the corresponding vectorizer
        num_samples (int): the number of samples =  number of names we want to predict
        sample_size (int): the max length of the samples
        temperature (float): accentuates or flattens 
            the distribution. 
            0.0 < temperature < 1.0 will make it peakier. 
            temperature > 1.0 will make it more uniform
    Returns:
        indices (torch.Tensor): the matrix of indices; 
        shape = (num_samples, sample_size)
    """
    begin_seq_index = [vectorizer.char_vocab.begin_seq_index 
                       for _ in range(num_samples)]
    begin_seq_index = torch.tensor(begin_seq_index, 
                                   dtype=torch.int64).unsqueeze(dim=1)
    indices = [begin_seq_index]
    h_t = None
    
    for time_step in range(sample_size):
        x_t = indices[time_step] # the previous prediction is the input of this step
        x_emb_t = model.char_emb(x_t)
        rnn_out_t, h_t = model.rnn(x_emb_t, h_t) # rnn_out_t not equal to h_t, especially the dimension
        prediction_vector = model.fc(rnn_out_t.squeeze(dim=1))
        probability_vector = F.softmax(prediction_vector / temperature, dim=1)
        temp=torch.multinomial(probability_vector, num_samples=1)  # return only one location of each row by the values in prob_vector
        indices.append(temp)
        #indices.append(torch.multinomial(probability_vector, num_samples=1)) 
    indices1 = torch.stack(indices)# -->(sample_size, num_samples, 1)
    indices2=indices1.squeeze()    # -->(sample_size, num_samples)
    indices3=indices2.permute(1,0) # -->(num_samples, sample_size)
    return indices3

代码笔记

1. 代码能跑,但是准确率异常的高(99%),真是又想哭又想笑。

答:如此高的准确率,说明数据和目标十分匹配,经过调试发现每一批from_vector和to_vector的值都一模一样,也就是说问题再次出在Vectorize()上,还真是重灾区。倒数第5行,复制粘贴出错,正确代码应该为to_indices = indices[1 : ] 

    def vectorize(self, surname, vector_length=-1):
        """Vectorize a surname into a vector of observations and targets
           The outputs are the vectorized surname split into two vectors:
              surname[:-1] and surname[1:]
           At each timestep, the first vector is the observation and the second vector is the target. 
        Args:
            surname (str): the surname to be vectorized
            vector_length (int): an argument for forcing the length of index vector
        Returns:
            a tuple: (from_vector, to_vector)
            from_vector (numpy.ndarray): the observation vector 
            to_vector (numpy.ndarray): the target prediction vector
        """
        indices = [self.surname_vocab.begin_seq_index]
        indices.extend(self.surname_vocab.lookup_token(token) 
                        for token in surname)
        indices.append(self.surname_vocab.end_seq_index)
        if vector_length < 0:
            vector_length = len(indices) - 1

        from_vector = np.empty(vector_length, dtype=np.int64)
        from_indices = indices[:-1] # Drop the end_seq_index
        from_vector[:len(from_indices)] = from_indices
        from_vector[len(from_indices):] = self.surname_vocab.mask_index

        to_vector = np.empty(vector_length, dtype=np.int64)
        to_indices = indices[:-1] # Drop the begin_seq_index
        to_vector[:len(to_indices)] = to_indices
        to_vector[len(to_indices):] = self.surname_vocab.mask_index

        return from_vector, to_vector

2. 重新定义了compute accuracy函数

def normalize_sizes(y_pred, y_true):
    """Normalize tensor sizes
    Args:
        y_pred (torch.Tensor): the output of the model
            If a 3-dimensional tensor, reshapes to a matrix
        y_true (torch.Tensor): the target predictions
            If a matrix, reshapes to be a vector
    """
    if len(y_pred.size()) == 3:
        y_pred = y_pred.contiguous().view(-1, y_pred.size(2))
    if len(y_true.size()) == 2:
        y_true = y_true.contiguous().view(-1)
    return y_pred, y_true

def compute_accuracy(y_pred, y_true, mask_index):
    y_pred, y_true = normalize_sizes(y_pred, y_true)

    _, y_pred_indices = y_pred.max(dim=1) # return the max value of each row
    # float() transforms Bool type to float.
    correct_indices = torch.eq(y_pred_indices, y_true).float() # find locations where pred matches truth
    valid_indices = torch.ne(y_true, mask_index).float() # find locations where values don't represent mask
    n_correct = (correct_indices * valid_indices).sum().item() # dot product is essential, 
                                                               # because the final correct value should satisfy 
                                                               # conditions of being matching and valid at the same time
    n_valid = valid_indices.sum().item()

3. 重新定义了loss的计算函数 

由于y_pred是3维张量,y_true是2维矩阵,必须先利用normalize_sizes(),对两个矩阵各降一个维度,才可以由cross entropy()计算损失熵。

def normalize_sizes(y_pred, y_true):
    """Normalize tensor sizes
        Args:
        y_pred (torch.Tensor): the output of the model
        If a 3­dimensional tensor, reshapes to a matrix
        y_true (torch.Tensor): the target predictions
        If a matrix, reshapes to be a vector
    """    
    if len(y_pred.size()) == 3:
        y_pred = y_pred.contiguous().view(-1, y_pred.size(2))
    if len(y_true.size()) == 2:
        y_true = y_true.contiguous().view(-1)
    return y_pred, y_true

def sequence_loss(y_pred, y_true, mask_index):
    y_pred, y_true = normalize_sizes(y_pred, y_true)
    return F.cross_entropy(y_pred, y_true, ignore_index=mask_index)

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值