3、解码
3.1、贪心搜索
贪心搜索是一种来自计算机科学的算法,生成第一个词的分布以后,它将会根据你的条件语言模型挑选出最有可能的第一个词进入你的机器翻译模型中,在挑选出第一个词之后它将会继续挑选出最有可能的第二个词,然后继续挑选第三个最有可能的词,这种算法就叫做贪心搜索。这种算法就是在每一步中挑选当前最大概率的结果,并将这个结果作为下一次的输入,并得到这一步的最大概率作为这一次的结果,以此类推。该代码位于:./model/components/greedy_decoder_cell.py中。
def step(self, time, state, embedding, finished):
# next step of attention cell
logits, new_state = self._attention_cell.step(embedding, state)
# get ids of words predicted and get embedding
# 计算当前结果的最大值的下标,logits.shape=[N,301]
new_ids = tf.cast(tf.argmax(logits, axis=-1), tf.int32)
new_embedding = tf.nn.embedding_lookup(self._embeddings, new_ids)
# create new state of decoder
new_output = DecoderOutput(logits, new_ids)
new_finished = tf.logical_or(finished, tf.equal(new_ids, self._end_token))
return (new_output, new_state, new_embedding, new_finished)
当然这只是获取当前输出的结果,在./model/components/dynamic_decode.py中,通过dynamic_decode()函数实现在序列中将前时间的输出提供给下一次输入,其主要通过tf.while_loop来实现循环输入。
def dynamic_decode(decoder_cell, maximum_iterations):
"""Similar to dynamic_rnn but to decode
Args:
decoder_cell: (instance of DecoderCell) with step method
maximum_iterations: (int)
"""
try:
maximum_iterations = tf.convert_to_tensor(maximum_iterations, dtype=tf.int32)
except ValueError:
pass
# create TA for outputs by mimicing the structure of decodercell output
def create_tensor_array(d):
return tf.TensorArray(dtype=d, size=0, dynamic_size=True)
initial_time = tf.constant(0, dtype=tf.int32)
initial_outputs_ta = nest.map_structure(create_tensor_array, decoder_cell.output_dtype)
initial_state, initial_inputs, initial_finished = decoder_cell.initialize()
def condition(time, unused_outputs_ta, unused_state, unused_inputs,
finished):
return tf.logical_not(tf.reduce_all(finished))
def body(time, outputs_ta, state, inputs, finished):
new_output, new_state, new_inputs, new_finished = decoder_cell.step(
time, state, inputs, finished)
outputs_ta = nest.map_structure(lambda ta, out: ta.write(time, out),
outputs_ta, new_output)
new_finished = tf.logical_or(
tf.greater_equal(time, maximum_iterations),
new_finished)
return (time + 1, outputs_ta, new_state, new_inputs, new_finished)
with tf.variable_scope("rnn"):
res = tf.while_loop(
condition,
body,
loop_vars=[initial_time, initial_outputs_ta, initial_state,
initial_inputs, initial_finished],
back_prop=False)
# get final outputs and states
final_outputs_ta, final_state = res[1], res[2]
# unfold and stack the structure from the nested tas
final_outputs = nest.map_structure(lambda ta: ta.stack(), final_outputs_ta)
# finalize the computation from the decoder cell
final_outputs = decoder_cell.finalize(final_outputs, final_state)
# transpose the final output
final_outputs = nest.map_structure(transpose_batch_time, final_outputs)
return final_outputs, final_state
参考文献:
3.2、beam search
集束搜索会考虑多个选择,集束搜索算法会有一个参数B,叫做集束宽(beam width)。集束搜索可以认为是维特比算法的贪心形式,在维特比所有中由于利用动态规划导致当字典较大时效率低,而集束搜索使用beam size参数来限制在每一步保留下来的可能性词的数量。集束搜索是在测试阶段为了获得更好准确性而采取的一种策略,在训练阶段无需使用。
def step(self, time, state, embedding, finished):
"""
Args:
time: tensorf or int
embedding: shape [batch_size, beam_size, d]
state: structure of shape [bach_size, beam_size, ...]
finished: structure of shape [batch_size, beam_size, ...]
"""
# merge batch and beam dimension before callling step of cell
cell_state = nest.map_structure(merge_batch_beam, state.cell_state)
embedding = merge_batch_beam(embedding)
# compute new logits
logits, new_cell_state = self._cell.step(embedding, cell_state)
# split batch and beam dimension before beam search logic
new_logits = split_batch_beam(logits, self._beam_size)
new_cell_state = nest.map_structure(
lambda t: split_batch_beam(t, self._beam_size), new_cell_state)
# compute log probs of the step
# shape = [batch_size, beam_size, vocab_size]
step_log_probs = tf.nn.log_softmax(new_logits)
# shape = [batch_size, beam_size, vocab_size]
step_log_probs = mask_probs(step_log_probs, self._end_token, finished)
# shape = [batch_size, beam_size, vocab_size]
log_probs = tf.expand_dims(state.log_probs, axis=-1) + step_log_probs
log_probs = add_div_penalty(log_probs, self._div_gamma, self._div_prob,
self._batch_size, self._beam_size, self._vocab_size)
# compute the best beams
# shape = (batch_size, beam_size * vocab_size)
log_probs_flat = tf.reshape(log_probs,
[self._batch_size, self._beam_size * self._vocab_size])
# if time = 0, consider only one beam, otherwise beams are equal
log_probs_flat = tf.cond(time > 0, lambda: log_probs_flat,
lambda: log_probs[:, 0])
new_probs, indices = tf.nn.top_k(log_probs_flat, self._beam_size)
# of shape [batch_size, beam_size]
new_ids = indices % self._vocab_size
new_parents = indices // self._vocab_size
# get ids of words predicted and get embedding
new_embedding = tf.nn.embedding_lookup(self._embeddings, new_ids)
# compute end of beam
finished = gather_helper(finished, new_parents,
self._batch_size, self._beam_size)
new_finished = tf.logical_or(finished,
tf.equal(new_ids, self._end_token))
new_cell_state = nest.map_structure(
lambda t: gather_helper(t, new_parents, self._batch_size,
self._beam_size), new_cell_state)
# create new state of decoder
new_state = BeamSearchDecoderCellState(cell_state=new_cell_state,
log_probs=new_probs)
new_output = BeamSearchDecoderOutput(logits=new_logits, ids=new_ids,
parents=new_parents)
return (new_output, new_state, new_embedding, new_finished)
参考文献: