本例使用了GRU来进行文本生成,并简单介绍了其原理,强调了门控制在长短程记忆以及梯度控制上的优势。
待解决的问题:利用输入的名字生成新名字,文本生成任务。
数据维度变化
有制约和无制约的差别
Conditioned和Unconditioned,我的翻译不知准不准确。文本生成的一大特点就是,预测结果可以被“诱导”,本例中,如果在GRU细胞中加入embedded后的“国家”隐藏层,则会大幅提升预测效果。从这一点来看,有制约和无制约的区别就在于有无信息注入---Information Injection。
采样函数 sample_from_model
利用已经训练好的模型来生成名字,各种维度变化用到的方法十分巧妙,值得借鉴。
在softmax之后,利用torch.multinomial()进行取样,将每个sample,也就是每行的最大概率所在位置取出来,以供解码器进行解码。
def sample_from_model(model, vectorizer, num_samples=1, sample_size=20,
temperature=1.0):
"""Sample a sequence of indices from the model
Args:
model (SurnameGenerationModel): the trained model
vectorizer (SurnameVectorizer): the corresponding vectorizer
num_samples (int): the number of samples = number of names we want to predict
sample_size (int): the max length of the samples
temperature (float): accentuates or flattens
the distribution.
0.0 < temperature < 1.0 will make it peakier.
temperature > 1.0 will make it more uniform
Returns:
indices (torch.Tensor): the matrix of indices;
shape = (num_samples, sample_size)
"""
begin_seq_index = [vectorizer.char_vocab.begin_seq_index
for _ in range(num_samples)]
begin_seq_index = torch.tensor(begin_seq_index,
dtype=torch.int64).unsqueeze(dim=1)
indices = [begin_seq_index]
h_t = None
for time_step in range(sample_size):
x_t = indices[time_step] # the previous prediction is the input of this step
x_emb_t = model.char_emb(x_t)
rnn_out_t, h_t = model.rnn(x_emb_t, h_t) # rnn_out_t not equal to h_t, especially the dimension
prediction_vector = model.fc(rnn_out_t.squeeze(dim=1))
probability_vector = F.softmax(prediction_vector / temperature, dim=1)
temp=torch.multinomial(probability_vector, num_samples=1) # return only one location of each row by the values in prob_vector
indices.append(temp)
#indices.append(torch.multinomial(probability_vector, num_samples=1))
indices1 = torch.stack(indices)# -->(sample_size, num_samples, 1)
indices2=indices1.squeeze() # -->(sample_size, num_samples)
indices3=indices2.permute(1,0) # -->(num_samples, sample_size)
return indices3
代码笔记
1. 代码能跑,但是准确率异常的高(99%),真是又想哭又想笑。
答:如此高的准确率,说明数据和目标十分匹配,经过调试发现每一批from_vector和to_vector的值都一模一样,也就是说问题再次出在Vectorize()上,还真是重灾区。倒数第5行,复制粘贴出错,正确代码应该为to_indices = indices[1 : ]
def vectorize(self, surname, vector_length=-1):
"""Vectorize a surname into a vector of observations and targets
The outputs are the vectorized surname split into two vectors:
surname[:-1] and surname[1:]
At each timestep, the first vector is the observation and the second vector is the target.
Args:
surname (str): the surname to be vectorized
vector_length (int): an argument for forcing the length of index vector
Returns:
a tuple: (from_vector, to_vector)
from_vector (numpy.ndarray): the observation vector
to_vector (numpy.ndarray): the target prediction vector
"""
indices = [self.surname_vocab.begin_seq_index]
indices.extend(self.surname_vocab.lookup_token(token)
for token in surname)
indices.append(self.surname_vocab.end_seq_index)
if vector_length < 0:
vector_length = len(indices) - 1
from_vector = np.empty(vector_length, dtype=np.int64)
from_indices = indices[:-1] # Drop the end_seq_index
from_vector[:len(from_indices)] = from_indices
from_vector[len(from_indices):] = self.surname_vocab.mask_index
to_vector = np.empty(vector_length, dtype=np.int64)
to_indices = indices[:-1] # Drop the begin_seq_index
to_vector[:len(to_indices)] = to_indices
to_vector[len(to_indices):] = self.surname_vocab.mask_index
return from_vector, to_vector
2. 重新定义了compute accuracy函数
def normalize_sizes(y_pred, y_true):
"""Normalize tensor sizes
Args:
y_pred (torch.Tensor): the output of the model
If a 3-dimensional tensor, reshapes to a matrix
y_true (torch.Tensor): the target predictions
If a matrix, reshapes to be a vector
"""
if len(y_pred.size()) == 3:
y_pred = y_pred.contiguous().view(-1, y_pred.size(2))
if len(y_true.size()) == 2:
y_true = y_true.contiguous().view(-1)
return y_pred, y_true
def compute_accuracy(y_pred, y_true, mask_index):
y_pred, y_true = normalize_sizes(y_pred, y_true)
_, y_pred_indices = y_pred.max(dim=1) # return the max value of each row
# float() transforms Bool type to float.
correct_indices = torch.eq(y_pred_indices, y_true).float() # find locations where pred matches truth
valid_indices = torch.ne(y_true, mask_index).float() # find locations where values don't represent mask
n_correct = (correct_indices * valid_indices).sum().item() # dot product is essential,
# because the final correct value should satisfy
# conditions of being matching and valid at the same time
n_valid = valid_indices.sum().item()
3. 重新定义了loss的计算函数
由于y_pred是3维张量,y_true是2维矩阵,必须先利用normalize_sizes(),对两个矩阵各降一个维度,才可以由cross entropy()计算损失熵。
def normalize_sizes(y_pred, y_true):
"""Normalize tensor sizes
Args:
y_pred (torch.Tensor): the output of the model
If a 3dimensional tensor, reshapes to a matrix
y_true (torch.Tensor): the target predictions
If a matrix, reshapes to be a vector
"""
if len(y_pred.size()) == 3:
y_pred = y_pred.contiguous().view(-1, y_pred.size(2))
if len(y_true.size()) == 2:
y_true = y_true.contiguous().view(-1)
return y_pred, y_true
def sequence_loss(y_pred, y_true, mask_index):
y_pred, y_true = normalize_sizes(y_pred, y_true)
return F.cross_entropy(y_pred, y_true, ignore_index=mask_index)