Transformer的BERT模型使用及参数解读

Transformer的BERT模型使用及参数解读

加载模型

首先安装transformers库

pip install transformers
  • 远程加载

    #开启魔法之后,可以从huggingface的模型库下载三个核心文件,但是这三个核心文件只能暂时存储在缓存中,不能长久保存在磁盘上
    #缓存位置:(windows)C:\Users[用户名].cache\torch\transformers\目录
    from transformers import BertTokenizer,BertModel
    model_name = 'hfl/chinese-roberta-wwm-ext'
    config = BertConfig.from_pretrained(model_name)	
    tokenizer = BertTokenizer.from_pretrained(model_name)		
    model = BertModel.from_pretrained(model_name)		
    
  • 远程下载

    #开启魔法之后,可以从huggingface的模型库下载该模型页下面所有的文件,除了三个主要文件外其他的都很小,也不是必须得
    from huggingface_hub import snapshot_download
    snapshot_download(
        repo_id="hfl/chinese-roberta-wwm-ext",
        local_dir=r"E:\mymodel\chatglm3-6b",
    )
    
  • 手动下载

手动下载

简单介绍一下BERT模型的结构

BERT模型的输入:
第一行是字符token,第二行是句子类别token,第三行是位置token
接着是BERT内部有一个词向量转换层(一般没人注意),12个encoder计算层,encoder的内部结构是多头注意力机制+残差连接+MLP

img

整个流程是定义句子,放入encode中得到input_ids,attention_ids,token_type_ids;取bert自己训练好的embedding,向量化这些ids;

通过12个encoder调整,encoder不改变输入的形状;

参数简单介绍

简单示例

s_a, s_b = "昕洋哥可爱咩", "公大第一突破手"
tokenizer = AutoTokenizer.from_pretrained(config["tokenizer_path"])
tokenizer = AutoTokenizer.from_pretrained(config["model_path"])
max_len=32
input_token = tokenizer.encode_plus(text=s_a,
                                 text_pair=s_b,
								add_special_tokens=True,##为True,其自动给你添加[CLS]、[SEP]的字典编号
            					max_length=max_len,  # 设置句子最大值,下面的truncation如果为true则超过																max_length就会被截断
           						padding="max_length",  #控制如何对序列进行填充,建议选'max_length'他控制短序列填充至														你设置的max_length
            					truncation=True,  #截断
            					return_attention_mask=True,#输出返回注意力掩码
            					return_tensors='pt'#输出以tensor的形式
								)

last_hidden_state, pooled_output = model(**input_token) # 输出形状分别是[1,32, 768], [1,768]

简单看一下tokenizer后的输出

s_a, s_b = "昕洋哥可爱咩", "公大第一突破手"
max_len=32
input_token = tokenizer.encode_plus(text=s_a,
                                 text_pair=s_b,
								add_special_tokens=True,##为True,其自动给你添加[CLS]、[SEP]的字典编号
            					max_length=max_len,  # 设置句子最大值,下面的truncation如果为true则超过																max_length就会被截断
           						padding="max_length",  #控制如何对序列进行填充,建议选'max_length'他控制短序列填充至														你设置的max_length
            					truncation=True,  #截断
            					return_attention_mask=True,#输出返回注意力掩码
            					return_tensors='pt'#输出以tensor的形式
								)
print(input_token)
Out:
{'input_ids': tensor([[ 101, 3213, 3817, 1520, 1377, 4263, 1487,  102, 1062, 1920, 5018,  671, 4960, 4788, 2797, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]),
 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0, 0, 0]]), 
 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0, 0, 0]])}

简单看一下BERT的输出

#BERT的输出结果为:后两者需要在config指定为true
last_hidden_state,pooled_output,(hidden_states),(attentions)
#config.output_attentions=True,
#config.output_hidden_states=True

last_hidden_state:他就仅仅是bert最后一层的输出,及每个词的词向量,在这里形状为[1(batch),32,768]

last_hidden_state=model(**input_token).last_hidden_state
print(last_hidden_state)
print(last_hidden_state.shape)
OUT:
tensor([[[-0.1674,  0.6474, -0.0225,  ...,  1.1659, -0.3393, -0.5934],
         [-0.3419,  0.1861,  0.7744,  ...,  0.4441, -0.0193, -0.6533],
         [-0.1497, -0.3524,  1.6394,  ..., -0.3114,  0.1749, -0.4402],
         ...,
         [ 0.0915,  0.0391, -0.1304,  ...,  0.2157, -0.4148, -0.6504],
         [ 0.1022,  0.0319, -0.1545,  ..., -0.0655, -0.1317, -0.4559],
         [ 0.1168, -0.2943,  0.4040,  ...,  0.3377, -0.2404, -0.8135]]],
       grad_fn=<NativeLayerNormBackward0>)
torch.Size([1, 32, 768])

pooler_output: 他通过一个线性层(也称为密集层或全连接层)和一个Tanh激活函数处理[CLS]标记的隐藏状态得到的,所以不是最原始的CLS,形状为[1(batch),768]

pooler_output=model(**input_token).pooler_output
print(pooler_output)
print(pooler_output.shape)
OUT:
tensor([[ 0.9942,  0.9999,  0.9575,  0.9609,  0.9995,  0.6800, -0.9083, -0.9273,
          0.9981, -0.9958,  1.0000,  0.9771, -0.4249, -0.9875,  1.0000, -0.9998,
         -0.9706,  0.9362,  0.9946,  0.4023,  0.9999, -0.9999, -0.9942,  0.2553,
          0.0574,  0.9939,  0.9799, -0.9651, -1.0000,  0.9950,  0.9817,  0.9997,
          0.8999, -0.9999, -0.9975,  0.9511,  0.0456,  0.9763, -0.2017, -0.3505,
         -0.5329, -0.9909,  0.1099, -0.9599, -0.9715,  0.4536, -1.0000, -0.9996,
         -0.8363,  0.9999, -0.5633, -0.9994,  0.6703, -0.7068, -0.9977,  0.9837,
         -0.9982,  0.8297,  1.0000,  0.7346,  0.9996, -0.9939,  0.4900, -0.9997,
          1.0000, -0.9998, -0.9696,  0.3419,  1.0000,  1.0000, -0.7171,  0.9997,
          1.0000,  0.9060,  0.9973,  0.9802, -0.9913, -0.1233, -1.0000,  0.8044,
          1.0000,  0.9895, -0.9849,  0.9620, -0.9562, -1.0000, -0.9961,  0.9962,
         -0.2306,  0.9996,  0.9909, -0.9998, -1.0000,  0.9980, -0.9996, -0.9978,
         -0.9026,  0.9870,  0.3930, -0.4557, -0.5619,  0.8647, -0.9691, -0.8780,
          0.8481,  0.9977, -0.3529, -0.9954,  0.9972,  0.5296, -1.0000, -0.8681,
         -0.9894, -0.9992, -0.9418,  0.9999,  0.7843, -0.7159,  0.9996, -0.9362,
          0.6999, -0.9981, -0.9911,  0.9768,  0.9813,  0.9999,  0.9919, -0.9959,
          0.9779,  1.0000,  0.9935,  0.9724, -0.8999,  0.9570,  0.9617, -0.9599,
         -0.7990, -0.5751,  1.0000,  0.9161,  0.7660, -0.9576,  0.9998, -0.9947,
          0.9999, -0.9999,  0.9978, -1.0000, -0.9934,  0.9999,  0.7726,  1.0000,
         -0.9636,  1.0000, -0.9987, -0.9953,  0.9576, -0.2487,  0.9894, -1.0000,
          0.9541, -0.9862,  0.1822, -0.6420, -1.0000,  0.9999, -0.8949,  1.0000,
          0.9725, -0.9816, -0.9965, -0.9975,  0.5108, -0.9929, -0.9039,  0.9983,
         -0.6245,  0.9975,  0.6785, -0.9735,  0.9990, -0.5156, -0.9998,  0.9580,
         -0.5811,  0.9931,  0.7621,  0.4462,  0.9662,  0.9627, -0.7024,  0.9999,
         -0.3832,  0.9919,  0.9878, -0.3245, -0.7575, -0.9652, -0.9999, -0.8281,
...
         -0.9970,  0.9827, -0.9896,  0.9781, -0.9990,  0.9795,  0.9081,  0.9805,
         -0.9965,  1.0000,  0.9856, -0.9929, -0.9965, -0.9971, -0.9906,  0.8819]],
       grad_fn=<TanhBackward0>)
torch.Size([1, 768])
#如果要获得[CLS]的原始词向量,可以使用
pooler_output=model(**input_token).last_hidden_state[:,0,:]
print(pooler_output)
print(pooler_output.shape)
tensor([[-1.6740e-01,  6.4736e-01, -2.2484e-02,  4.1642e-01,  1.3123e+00,
         -1.1376e+00, -7.5436e-02,  3.1302e-02, -1.0443e-01,  5.5693e-01,
         -3.4284e-01, -3.5247e-01,  5.6960e-01, -7.9111e-02,  2.1315e+00,
         -7.5177e-01,  8.0393e-01, -1.3490e+00, -3.7192e-01,  1.3349e+00,
         -3.6452e-01,  7.2413e-01,  9.0923e-02,  5.9850e-02,  9.6748e-01,
          2.9095e-01, -2.9775e-01, -1.0763e+00, -4.8896e-01,  1.4444e+00,
         -4.4402e-01, -7.6757e-02, -1.7471e+00,  3.4565e-01,  1.2701e+00,
         -9.7642e-02,  4.9089e-01,  2.8789e-01, -2.9180e-02,  6.1670e-01,
         -2.6741e-01, -5.2184e-01, -1.1344e+00,  1.4203e+00,  5.0695e-01,
          4.0790e-01, -1.0032e-01, -7.3822e-02, -3.8627e-01, -4.9777e-01,
         -5.3078e-01,  1.0311e+01,  9.0615e-01, -1.1979e-01,  3.2994e-02,
          7.3485e-01,  7.6661e-01,  2.3520e-01,  5.5208e-01, -8.0590e-01,
         -5.8051e-01, -1.4073e+00,  1.0662e-01,  5.6928e-01,  4.0605e-01,
         -2.6799e-01, -7.7853e-02,  3.4208e-02, -2.4601e+00, -6.6159e-01,
         -6.0222e-01, -1.3710e-01,  1.1202e+00, -4.0902e-01, -2.4807e-01,
          1.0551e+00, -1.2966e-01,  1.3745e+00, -8.6654e-01,  1.6123e+00,
          4.5216e-01,  2.2718e-01, -7.2256e-01,  3.6220e-01, -5.0524e-02,
         -3.8297e-02,  3.0783e-01, -2.1411e+00,  3.8136e-01,  2.4180e-01,
          1.3467e-01,  2.3749e-02, -3.4857e-01,  1.1531e+00,  5.7212e-01,
         -9.6699e-02,  4.6560e-01, -1.2970e-02, -5.8718e-01, -2.1731e+00,
          3.4598e-01,  6.4314e-01,  1.9074e-01, -9.0572e-01, -1.5097e+00,
          5.7951e-02,  1.0554e-01,  5.6217e-02,  4.5577e-01, -2.8059e-01,
         -8.1277e-01, -2.8510e-01,  3.8287e-01, -3.3996e-01, -2.3603e-01,
         -4.0855e-01, -1.1174e-01,  8.7581e-01, -2.1896e+00,  7.4248e-02,
          1.0939e+00, -4.7369e-01, -6.2837e-01, -1.0369e+00,  7.4548e-01,
...
         -5.1899e-03, -7.8996e-01, -2.4110e-01,  2.6795e-01, -9.5601e-01,
         -2.4670e-01, -1.1371e+00,  5.2228e-01, -4.2710e-01, -3.9417e-01,
          1.1659e+00, -3.3935e-01, -5.9336e-01]], grad_fn=<SliceBackward0>)
torch.Size([1, 768])

结尾

本文仅仅是简单的讲解了一下transformers如何使用预训练模型的,虽然仅仅使用了BERT模型,但是对于大部分预训练模型使用的方式几乎是一样的。在使用任何模型做任务之前,一定要弄明白模型的结构,这样可以加快我们的使用速度,不会白白浪费时间和算力资源。

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值