Transformer的BERT模型使用及参数解读
加载模型
首先安装transformers库
pip install transformers
-
远程加载
#开启魔法之后,可以从huggingface的模型库下载三个核心文件,但是这三个核心文件只能暂时存储在缓存中,不能长久保存在磁盘上 #缓存位置:(windows)C:\Users[用户名].cache\torch\transformers\目录 from transformers import BertTokenizer,BertModel model_name = 'hfl/chinese-roberta-wwm-ext' config = BertConfig.from_pretrained(model_name) tokenizer = BertTokenizer.from_pretrained(model_name) model = BertModel.from_pretrained(model_name)
-
远程下载
#开启魔法之后,可以从huggingface的模型库下载该模型页下面所有的文件,除了三个主要文件外其他的都很小,也不是必须得 from huggingface_hub import snapshot_download snapshot_download( repo_id="hfl/chinese-roberta-wwm-ext", local_dir=r"E:\mymodel\chatglm3-6b", )
-
手动下载
简单介绍一下BERT模型的结构
BERT模型的输入:
第一行是字符token,第二行是句子类别token,第三行是位置token
接着是BERT内部有一个词向量转换层(一般没人注意),12个encoder计算层,encoder的内部结构是多头注意力机制+残差连接+MLP
整个流程是定义句子,放入encode中得到input_ids,attention_ids,token_type_ids;取bert自己训练好的embedding,向量化这些ids;
通过12个encoder调整,encoder不改变输入的形状;
参数简单介绍
简单示例
s_a, s_b = "昕洋哥可爱咩", "公大第一突破手"
tokenizer = AutoTokenizer.from_pretrained(config["tokenizer_path"])
tokenizer = AutoTokenizer.from_pretrained(config["model_path"])
max_len=32
input_token = tokenizer.encode_plus(text=s_a,
text_pair=s_b,
add_special_tokens=True,##为True,其自动给你添加[CLS]、[SEP]的字典编号
max_length=max_len, # 设置句子最大值,下面的truncation如果为true则超过 max_length就会被截断
padding="max_length", #控制如何对序列进行填充,建议选'max_length'他控制短序列填充至 你设置的max_length
truncation=True, #截断
return_attention_mask=True,#输出返回注意力掩码
return_tensors='pt'#输出以tensor的形式
)
last_hidden_state, pooled_output = model(**input_token) # 输出形状分别是[1,32, 768], [1,768]
简单看一下tokenizer后的输出
s_a, s_b = "昕洋哥可爱咩", "公大第一突破手"
max_len=32
input_token = tokenizer.encode_plus(text=s_a,
text_pair=s_b,
add_special_tokens=True,##为True,其自动给你添加[CLS]、[SEP]的字典编号
max_length=max_len, # 设置句子最大值,下面的truncation如果为true则超过 max_length就会被截断
padding="max_length", #控制如何对序列进行填充,建议选'max_length'他控制短序列填充至 你设置的max_length
truncation=True, #截断
return_attention_mask=True,#输出返回注意力掩码
return_tensors='pt'#输出以tensor的形式
)
print(input_token)
Out:
{'input_ids': tensor([[ 101, 3213, 3817, 1520, 1377, 4263, 1487, 102, 1062, 1920, 5018, 671, 4960, 4788, 2797, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]),
'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0, 0, 0]]),
'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0, 0, 0]])}
简单看一下BERT的输出
#BERT的输出结果为:后两者需要在config指定为true
last_hidden_state,pooled_output,(hidden_states),(attentions)
#config.output_attentions=True,
#config.output_hidden_states=True
last_hidden_state:他就仅仅是bert最后一层的输出,及每个词的词向量,在这里形状为[1(batch),32,768]
last_hidden_state=model(**input_token).last_hidden_state
print(last_hidden_state)
print(last_hidden_state.shape)
OUT:
tensor([[[-0.1674, 0.6474, -0.0225, ..., 1.1659, -0.3393, -0.5934],
[-0.3419, 0.1861, 0.7744, ..., 0.4441, -0.0193, -0.6533],
[-0.1497, -0.3524, 1.6394, ..., -0.3114, 0.1749, -0.4402],
...,
[ 0.0915, 0.0391, -0.1304, ..., 0.2157, -0.4148, -0.6504],
[ 0.1022, 0.0319, -0.1545, ..., -0.0655, -0.1317, -0.4559],
[ 0.1168, -0.2943, 0.4040, ..., 0.3377, -0.2404, -0.8135]]],
grad_fn=<NativeLayerNormBackward0>)
torch.Size([1, 32, 768])
pooler_output: 他通过一个线性层(也称为密集层或全连接层)和一个Tanh激活函数处理[CLS]标记的隐藏状态得到的,所以不是最原始的CLS,形状为[1(batch),768]
pooler_output=model(**input_token).pooler_output
print(pooler_output)
print(pooler_output.shape)
OUT:
tensor([[ 0.9942, 0.9999, 0.9575, 0.9609, 0.9995, 0.6800, -0.9083, -0.9273,
0.9981, -0.9958, 1.0000, 0.9771, -0.4249, -0.9875, 1.0000, -0.9998,
-0.9706, 0.9362, 0.9946, 0.4023, 0.9999, -0.9999, -0.9942, 0.2553,
0.0574, 0.9939, 0.9799, -0.9651, -1.0000, 0.9950, 0.9817, 0.9997,
0.8999, -0.9999, -0.9975, 0.9511, 0.0456, 0.9763, -0.2017, -0.3505,
-0.5329, -0.9909, 0.1099, -0.9599, -0.9715, 0.4536, -1.0000, -0.9996,
-0.8363, 0.9999, -0.5633, -0.9994, 0.6703, -0.7068, -0.9977, 0.9837,
-0.9982, 0.8297, 1.0000, 0.7346, 0.9996, -0.9939, 0.4900, -0.9997,
1.0000, -0.9998, -0.9696, 0.3419, 1.0000, 1.0000, -0.7171, 0.9997,
1.0000, 0.9060, 0.9973, 0.9802, -0.9913, -0.1233, -1.0000, 0.8044,
1.0000, 0.9895, -0.9849, 0.9620, -0.9562, -1.0000, -0.9961, 0.9962,
-0.2306, 0.9996, 0.9909, -0.9998, -1.0000, 0.9980, -0.9996, -0.9978,
-0.9026, 0.9870, 0.3930, -0.4557, -0.5619, 0.8647, -0.9691, -0.8780,
0.8481, 0.9977, -0.3529, -0.9954, 0.9972, 0.5296, -1.0000, -0.8681,
-0.9894, -0.9992, -0.9418, 0.9999, 0.7843, -0.7159, 0.9996, -0.9362,
0.6999, -0.9981, -0.9911, 0.9768, 0.9813, 0.9999, 0.9919, -0.9959,
0.9779, 1.0000, 0.9935, 0.9724, -0.8999, 0.9570, 0.9617, -0.9599,
-0.7990, -0.5751, 1.0000, 0.9161, 0.7660, -0.9576, 0.9998, -0.9947,
0.9999, -0.9999, 0.9978, -1.0000, -0.9934, 0.9999, 0.7726, 1.0000,
-0.9636, 1.0000, -0.9987, -0.9953, 0.9576, -0.2487, 0.9894, -1.0000,
0.9541, -0.9862, 0.1822, -0.6420, -1.0000, 0.9999, -0.8949, 1.0000,
0.9725, -0.9816, -0.9965, -0.9975, 0.5108, -0.9929, -0.9039, 0.9983,
-0.6245, 0.9975, 0.6785, -0.9735, 0.9990, -0.5156, -0.9998, 0.9580,
-0.5811, 0.9931, 0.7621, 0.4462, 0.9662, 0.9627, -0.7024, 0.9999,
-0.3832, 0.9919, 0.9878, -0.3245, -0.7575, -0.9652, -0.9999, -0.8281,
...
-0.9970, 0.9827, -0.9896, 0.9781, -0.9990, 0.9795, 0.9081, 0.9805,
-0.9965, 1.0000, 0.9856, -0.9929, -0.9965, -0.9971, -0.9906, 0.8819]],
grad_fn=<TanhBackward0>)
torch.Size([1, 768])
#如果要获得[CLS]的原始词向量,可以使用
pooler_output=model(**input_token).last_hidden_state[:,0,:]
print(pooler_output)
print(pooler_output.shape)
tensor([[-1.6740e-01, 6.4736e-01, -2.2484e-02, 4.1642e-01, 1.3123e+00,
-1.1376e+00, -7.5436e-02, 3.1302e-02, -1.0443e-01, 5.5693e-01,
-3.4284e-01, -3.5247e-01, 5.6960e-01, -7.9111e-02, 2.1315e+00,
-7.5177e-01, 8.0393e-01, -1.3490e+00, -3.7192e-01, 1.3349e+00,
-3.6452e-01, 7.2413e-01, 9.0923e-02, 5.9850e-02, 9.6748e-01,
2.9095e-01, -2.9775e-01, -1.0763e+00, -4.8896e-01, 1.4444e+00,
-4.4402e-01, -7.6757e-02, -1.7471e+00, 3.4565e-01, 1.2701e+00,
-9.7642e-02, 4.9089e-01, 2.8789e-01, -2.9180e-02, 6.1670e-01,
-2.6741e-01, -5.2184e-01, -1.1344e+00, 1.4203e+00, 5.0695e-01,
4.0790e-01, -1.0032e-01, -7.3822e-02, -3.8627e-01, -4.9777e-01,
-5.3078e-01, 1.0311e+01, 9.0615e-01, -1.1979e-01, 3.2994e-02,
7.3485e-01, 7.6661e-01, 2.3520e-01, 5.5208e-01, -8.0590e-01,
-5.8051e-01, -1.4073e+00, 1.0662e-01, 5.6928e-01, 4.0605e-01,
-2.6799e-01, -7.7853e-02, 3.4208e-02, -2.4601e+00, -6.6159e-01,
-6.0222e-01, -1.3710e-01, 1.1202e+00, -4.0902e-01, -2.4807e-01,
1.0551e+00, -1.2966e-01, 1.3745e+00, -8.6654e-01, 1.6123e+00,
4.5216e-01, 2.2718e-01, -7.2256e-01, 3.6220e-01, -5.0524e-02,
-3.8297e-02, 3.0783e-01, -2.1411e+00, 3.8136e-01, 2.4180e-01,
1.3467e-01, 2.3749e-02, -3.4857e-01, 1.1531e+00, 5.7212e-01,
-9.6699e-02, 4.6560e-01, -1.2970e-02, -5.8718e-01, -2.1731e+00,
3.4598e-01, 6.4314e-01, 1.9074e-01, -9.0572e-01, -1.5097e+00,
5.7951e-02, 1.0554e-01, 5.6217e-02, 4.5577e-01, -2.8059e-01,
-8.1277e-01, -2.8510e-01, 3.8287e-01, -3.3996e-01, -2.3603e-01,
-4.0855e-01, -1.1174e-01, 8.7581e-01, -2.1896e+00, 7.4248e-02,
1.0939e+00, -4.7369e-01, -6.2837e-01, -1.0369e+00, 7.4548e-01,
...
-5.1899e-03, -7.8996e-01, -2.4110e-01, 2.6795e-01, -9.5601e-01,
-2.4670e-01, -1.1371e+00, 5.2228e-01, -4.2710e-01, -3.9417e-01,
1.1659e+00, -3.3935e-01, -5.9336e-01]], grad_fn=<SliceBackward0>)
torch.Size([1, 768])
结尾
本文仅仅是简单的讲解了一下transformers如何使用预训练模型的,虽然仅仅使用了BERT模型,但是对于大部分预训练模型使用的方式几乎是一样的。在使用任何模型做任务之前,一定要弄明白模型的结构,这样可以加快我们的使用速度,不会白白浪费时间和算力资源。