ONNX Runtime 推理 BERT 模型

初岘

已于 2023-08-04 13:09:35 修改

阅读量499

点赞数

文章标签： bert 人工智能深度学习

于 2023-08-04 08:39:35 首次发布

本文链接：https://blog.csdn.net/weixin_67051070/article/details/132076806

版权

本文介绍了如何使用BertTokenizer对文本进行预处理，将其转化为BERT模型所需的输入，并展示了如何通过ONNXRuntime和TensorRT进行推理，最后比较了两者的时间性能。

摘要由CSDN通过智能技术生成

假设输入文本为:"My name is John"

1. 使用BertTokenizer进行tokenize和encoding:

python
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

text = "My name is John"
encoded_input = tokenizer.encode_plus(text, max_length=10, truncation=True, return_tensors='pt')

print(encoded_input)
# 输出:
{'input_ids': tensor([[101, 1045, 1005, 2310, 2028, 1012, 102, 0, 0, 0]]), 
'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]),
'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 0, 0, 0]])}

input_ids: [CLS] 101, My 1045, name 1005, is 2310, John 2028, [SEP] 1012, [PAD] 0

101 - 表示[CLS]特殊token,放在序列开头

1045 - 表示token化后词汇"My"对应的id1005 - 表示token化后词汇"name"对应的id

2310 - 表示token化后词汇"is"对应的id

2028 - 表示token化后词汇"John"对应的id

1012 - 表示[SEP]特殊token,表示句子结束

0 - 在max_length不足的情况下,使用0进行padding

所以这个input_ids其实就是对原始文本"My name is John"进行了以下处理:

1. 使用BertTokenizer进行分词token化,映射到词汇表的id

2. 在开头添加[CLS]特殊token,结尾添加[SEP]

3. 使用0 pad到最大长度10

最后形成的输入就是:[101, 1045, 1005, 2310, 2028, 1012, 0, 0, 0, 0]

这可以表示为:[CLS] 101, My 1045, name 1005, is 2310, John 2028, [SEP] 1012, [PAD] 0

所以input_ids就是文本序列经过BERT模型所需预处理后的数字化表示。

token_type_ids: 全0,表示属于同一句子

attention_mask: 对应位置的注意力mask

2. 保存这些编码后的输入,作为ONNX Runtime的输入:

python
input_ids = encoded_input['input_ids'].numpy()
token_type_ids = encoded_input['token_type_ids'].numpy() 

# input_ids: [[101, 1045, 1005, 2310, 2028, 1012, 0, 0, 0, 0]]
# token_type_ids: [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]

np.savez('input.npz', 
         input_ids=input_ids,
         token_type_ids=token_type_ids)

3. ONNX Runtime推理:

python
import onnxruntime as rt

sess = rt.InferenceSession("bert.onnx")

data = np.load('input.npz')
# 还需补充attention_mask
input_ids = data['input_ids']
token_type_ids = data['token_type_ids']

outputs = sess.run(None, {
    'input_ids': input_ids,
    'token_type_ids': token_type_ids
})

4. 处理输出logits,得到预测mask位置的词:

python 
import torch

logits = outputs[0] 

# logits shape: (1, 10, 30522)
softmax = torch.softmax(torch.tensor(logits), dim=-1)  
values, indices = torch.topk(softmax, 5)

for i in indices[0]:
    print(tokenizer.decode([i])) 
# mask位置预测: 'is' 'was' 'were' 'has' 'had'