BERT模型文本语义匹配训练尝试

BERT模型文本语义匹配训练尝试

一、利用BERT模型进行文本语义匹配训练

1.1下载相关代码及语料库

下载BERT开源代码 https://github.com/google-research/bert

在该网页关于BERT的介绍中下载所需要使用的模型,我选择下载了BERT-base, Uncased(点击蓝色字体即可下载)
image-20220807135320142
下载GLUE数据集中的MRPC语料库(根据需要选择相应的语料库进行下载),官网下载网址为,不推荐官网下载,可以通过下面分享的链接进行下载

链接:https://pan.baidu.com/s/1i8GnPZoSKeOJMFhZo03aMg?pwd=hmhh
提取码:hmhh
–来自百度网盘超级会员V5的分享

1.2 tensorflow1.X下载及安装

具体详见本人博客https://blog.csdn.net/qq_43732303/article/details/126073586?ops_request_misc=&request_id=&biz_id=102&spm=1018.2226.3001.4187

因为BERT所使用的tensorflow版本为1.11.0最好安装相应版本的tensoflow进行使用。

1.3 运行参数配置

官网提供的MRPC的运行参数配置如下:

export BERT_BASE_DIR=/path/to/bert/uncased_L-12_H-768_A-12
export GLUE_DIR=/path/to/glue
export TRAINED_CLASSIFIER=/path/to/fine/tuned/classifier

python run_classifier.py \
  --task_name=MRPC \
  --do_predict=true \
  --data_dir=$GLUE_DIR/MRPC \
  --vocab_file=$BERT_BASE_DIR/vocab.txt \
  --bert_config_file=$BERT_BASE_DIR/bert_config.json \
  --init_checkpoint=$TRAINED_CLASSIFIER \
  --max_seq_length=128 \
  --output_dir=/mrpc_output/

点击pycharm的运行下的编辑配置,将上面的内容复制到parameter中,并进行相应修改

博主的配置参数如下:

Script path:

D:\PycharmProject\bert-master\run_classifier.py

parameters:

--task_name=MRPC
\
--do_train=true
\
--do_eval=true
\
--data_dir=D:\PycharmProject\bert-master\GLUE\MPRC
\
--vocab_file=D:\PycharmProject\bert-master\uncased_L-12_H-768_A-12\vocab.txt
\
--bert_config_file=D:\PycharmProject\bert-master\uncased_L-12_H-768_A-12\bert_config.json
\
--init_checkpoint=D:\PycharmProject\bert-master\uncased_L-12_H-768_A-12\bert_model.ckpt
\
--max_seq_length=128
\
--train_batch_size=4
\
--learning_rate=2e-5
\
--num_train_epochs=3.0
\
--output_dir=D:\PycharmProject\bert-master\output\mrpc_output\

对于output_dir需要提前创建好,否则程序会报错。

运行的是BERT模型的run_classifier.py,对于相应的文件路径根据自己的情况进行修改,如果用CPU进行训练需要将其中的tran_batch_size和num_train_epochs的值减小,num_train_epochs为1,2,3中的一个即可。

如果在运行中有报错根据提示的报错信息进行修改。

1.4运行结果

评估结果在eval_results文件中。

eval_accuracy = 0.8480392
eval_loss = 1.0256581
global_step = 11004
loss = 1.0256581

运行得到的文件如下:
image-20220807143857866

INFO:tensorflow:*** Example ***
INFO:tensorflow:guid: train-5
INFO:tensorflow:tokens: [CLS] the stock rose $ 2 . 11 , or about 11 percent , to close friday at $ 21 . 51 on the new york stock exchange . [SEP] pg & e corp . shares jumped $ 1 . 63 or 8 percent to $ 21 . 03 on the new york stock exchange on friday . [SEP]
INFO:tensorflow:input_ids: 101 1996 4518 3123 1002 1016 1012 2340 1010 2030 2055 2340 3867 1010 2000 2485 5958 2012 1002 2538 1012 4868 2006 1996 2047 2259 4518 3863 1012 102 18720 1004 1041 13058 1012 6661 5598 1002 1015 1012 6191 2030 1022 3867 2000 1002 2538 1012 6021 2006 1996 2047 2259 4518 3863 2006 5958 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
INFO:tensorflow:label: 1 (id = 1)
INFO:tensorflow:***** Running training *****
INFO:tensorflow:  Num examples = 3668
INFO:tensorflow:  Batch size = 1
INFO:tensorflow:  Num steps = 11004
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Running train on CPU
INFO:tensorflow:*** Features ***
INFO:tensorflow:  name = input_ids, shape = (1, 128)
INFO:tensorflow:  name = input_mask, shape = (1, 128)
INFO:tensorflow:  name = is_real_example, shape = (1,)
INFO:tensorflow:  name = label_ids, shape = (1,)
INFO:tensorflow:  name = segment_ids, shape = (1, 128)
INFO:tensorflow:**** Trainable Variables ****
INFO:tensorflow:  name = bert/embeddings/word_embeddings:0, shape = (30522, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/embeddings/token_type_embeddings:0, shape = (2, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/embeddings/position_embeddings:0, shape = (512, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/embeddings/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/embeddings/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_0/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_0/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_0/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_0/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_0/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_0/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_0/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_0/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_0/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_0/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_0/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_0/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_0/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_0/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_0/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_0/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_1/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*

1、input_id来自vocab.txt文件,label表示,两个句子经过测试判断是否语义上是匹配的。若label为0,则语义匹配,若label为1,则为不匹配。

2、batch_size:一次计算的单词个数
N u m s t e p s = ( N u m e x a m p l e s B a t c h s i z e ) × n u m t r a i n e p o c h s \mathrm{Num} \mathrm{steps}=\left( \frac{\mathrm{Num} \mathrm{examples}}{\mathrm{Batch} \mathrm{size}} \right) \times \mathrm{num} \mathrm{train} \mathrm{epochs} Numsteps=(BatchsizeNumexamples)×numtrainepochs

说明

该方法是博主初入坑BERT模型的时候做的总结,参考了一些博客文章的方法,但因时间久远记不清具体引用链接,如有读者发现其相似之处,可留言告知于我进行修改,感谢。

  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值