I love this section, for most popular model's principles are explained here (models in NLP).
We explain the text decoder principle by MindNLP.
auto-regression Language model:
the method supplied by MindNLP:
Greedy_search:
Beam search:
keep several top probable predictions in each step: (num_beams = 2)
tokenizer = GPT2Tokenizer.from_pretrained('iiBcai/gpt2',mirror = 'modelscope')
model = GPT2MHeadModel.from_pretrained('iiBcai/gpt2',pad_token_id = tokenizer.eos_token_id, mirror = 'modelscope')
input_ids = tokenizer.encode('I enjoy walking with my cute dog', return_tensors = 'ms')
beam_output = model.generate(
input_ids,
max_length=50,
num_beams = 5,
early_stopping = True
)
print('Output:\n ' + 100 * '-')
print(tokenizer.decode(beam_output[0], skip_special_tokens = True))
print(100 * '-')
beam_output = model.generate(
input_ids,
max_length = 50,
num_beams =5,
no_repeat_ngram_size = 2,
early_stopping = True
)
print('Beam search with ngram, Output:\n' + 100* '-')
print(tokenizer.decode(beam_output[0], skip_special_token = True))
print(100*'-')
beam_output = model.generate(
input_ids,
max_length = 50,
num_beams = 5,
no_repeat_ngram_size = 2,
num_return_sequences = 5,
early_stopping =True
)
print('return_num_sequences, Output :\n' + 100 * '-')
for i, beam_output in enumerate(beam_outputs):
print("{}:{}".format(i, tokenizer.decode(beam_output, skip_special_tokens = True)))
print(100 * '-')
good output:
n-gram:an n-gram model predicts the occurrence of a word based on the occurrence of its previous n - 1 words, making it a type of Markov model.
no_repeat_ngram_size :set those words to have probability 0 to appear more than n times
sample:
randomly choose output word according to the conditional distribution now
temperature:if high you can imagine that all hopeful output waiting to be choose look less different(i.e choose more randomly), otherwise they varies in probablity to be choose to a greater extent
mindspore.set_seed(1234)
sample_output = model.generate(
input_ids,
do_sample = True,
max_length = 50,
top_k = 0,
temperature= 0.7
)
TopK sample : choose those topK words and normalize them to choose the sample again
Top-P sample:
choose those words whose probablity > p
and normalize them to sample again
combine topk and topp
sample_outputs = model.generate(
input_ids,
do_samples = True,
max_length = 50,
top_k = 5,
top_p = 0.95,
num_return_sequences = 3
)
print('Output:\n' + 100*'-')
for i, sample_output in enumerate(sample_outputs):
print("{} : {}".format(i, tokenizer.decode(sample_output,skip_special_tokens = True)))