BERT模型文本语义匹配训练尝试
一、利用BERT模型进行文本语义匹配训练
1.1下载相关代码及语料库
下载BERT开源代码 https://github.com/google-research/bert
在该网页关于BERT的介绍中下载所需要使用的模型,我选择下载了BERT-base, Uncased(点击蓝色字体即可下载)
下载GLUE数据集中的MRPC语料库(根据需要选择相应的语料库进行下载),官网下载网址为,不推荐官网下载,可以通过下面分享的链接进行下载
链接:https://pan.baidu.com/s/1i8GnPZoSKeOJMFhZo03aMg?pwd=hmhh
提取码:hmhh
–来自百度网盘超级会员V5的分享
1.2 tensorflow1.X下载及安装
具体详见本人博客https://blog.csdn.net/qq_43732303/article/details/126073586?ops_request_misc=&request_id=&biz_id=102&spm=1018.2226.3001.4187
因为BERT所使用的tensorflow版本为1.11.0最好安装相应版本的tensoflow进行使用。
1.3 运行参数配置
官网提供的MRPC的运行参数配置如下:
export BERT_BASE_DIR=/path/to/bert/uncased_L-12_H-768_A-12
export GLUE_DIR=/path/to/glue
export TRAINED_CLASSIFIER=/path/to/fine/tuned/classifier
python run_classifier.py \
--task_name=MRPC \
--do_predict=true \
--data_dir=$GLUE_DIR/MRPC \
--vocab_file=$BERT_BASE_DIR/vocab.txt \
--bert_config_file=$BERT_BASE_DIR/bert_config.json \
--init_checkpoint=$TRAINED_CLASSIFIER \
--max_seq_length=128 \
--output_dir=/mrpc_output/
点击pycharm的运行下的编辑配置,将上面的内容复制到parameter中,并进行相应修改
博主的配置参数如下:
Script path:
D:\PycharmProject\bert-master\run_classifier.py
parameters:
--task_name=MRPC
\
--do_train=true
\
--do_eval=true
\
--data_dir=D:\PycharmProject\bert-master\GLUE\MPRC
\
--vocab_file=D:\PycharmProject\bert-master\uncased_L-12_H-768_A-12\vocab.txt
\
--bert_config_file=D:\PycharmProject\bert-master\uncased_L-12_H-768_A-12\bert_config.json
\
--init_checkpoint=D:\PycharmProject\bert-master\uncased_L-12_H-768_A-12\bert_model.ckpt
\
--max_seq_length=128
\
--train_batch_size=4
\
--learning_rate=2e-5
\
--num_train_epochs=3.0
\
--output_dir=D:\PycharmProject\bert-master\output\mrpc_output\
对于output_dir需要提前创建好,否则程序会报错。
运行的是BERT模型的run_classifier.py,对于相应的文件路径根据自己的情况进行修改,如果用CPU进行训练需要将其中的tran_batch_size和num_train_epochs的值减小,num_train_epochs为1,2,3中的一个即可。
如果在运行中有报错根据提示的报错信息进行修改。
1.4运行结果
评估结果在eval_results文件中。
eval_accuracy = 0.8480392
eval_loss = 1.0256581
global_step = 11004
loss = 1.0256581
运行得到的文件如下:
INFO:tensorflow:*** Example ***
INFO:tensorflow:guid: train-5
INFO:tensorflow:tokens: [CLS] the stock rose $ 2 . 11 , or about 11 percent , to close friday at $ 21 . 51 on the new york stock exchange . [SEP] pg & e corp . shares jumped $ 1 . 63 or 8 percent to $ 21 . 03 on the new york stock exchange on friday . [SEP]
INFO:tensorflow:input_ids: 101 1996 4518 3123 1002 1016 1012 2340 1010 2030 2055 2340 3867 1010 2000 2485 5958 2012 1002 2538 1012 4868 2006 1996 2047 2259 4518 3863 1012 102 18720 1004 1041 13058 1012 6661 5598 1002 1015 1012 6191 2030 1022 3867 2000 1002 2538 1012 6021 2006 1996 2047 2259 4518 3863 2006 5958 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
INFO:tensorflow:label: 1 (id = 1)
INFO:tensorflow:***** Running training *****
INFO:tensorflow: Num examples = 3668
INFO:tensorflow: Batch size = 1
INFO:tensorflow: Num steps = 11004
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Running train on CPU
INFO:tensorflow:*** Features ***
INFO:tensorflow: name = input_ids, shape = (1, 128)
INFO:tensorflow: name = input_mask, shape = (1, 128)
INFO:tensorflow: name = is_real_example, shape = (1,)
INFO:tensorflow: name = label_ids, shape = (1,)
INFO:tensorflow: name = segment_ids, shape = (1, 128)
INFO:tensorflow:**** Trainable Variables ****
INFO:tensorflow: name = bert/embeddings/word_embeddings:0, shape = (30522, 768), *INIT_FROM_CKPT*
INFO:tensorflow: name = bert/embeddings/token_type_embeddings:0, shape = (2, 768), *INIT_FROM_CKPT*
INFO:tensorflow: name = bert/embeddings/position_embeddings:0, shape = (512, 768), *INIT_FROM_CKPT*
INFO:tensorflow: name = bert/embeddings/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = bert/embeddings/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = bert/encoder/layer_0/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
INFO:tensorflow: name = bert/encoder/layer_0/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = bert/encoder/layer_0/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
INFO:tensorflow: name = bert/encoder/layer_0/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = bert/encoder/layer_0/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
INFO:tensorflow: name = bert/encoder/layer_0/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = bert/encoder/layer_0/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
INFO:tensorflow: name = bert/encoder/layer_0/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = bert/encoder/layer_0/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = bert/encoder/layer_0/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = bert/encoder/layer_0/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*
INFO:tensorflow: name = bert/encoder/layer_0/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*
INFO:tensorflow: name = bert/encoder/layer_0/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*
INFO:tensorflow: name = bert/encoder/layer_0/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = bert/encoder/layer_0/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = bert/encoder/layer_0/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = bert/encoder/layer_1/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
1、input_id来自vocab.txt文件,label表示,两个句子经过测试判断是否语义上是匹配的。若label为0,则语义匹配,若label为1,则为不匹配。
2、batch_size:一次计算的单词个数
N
u
m
s
t
e
p
s
=
(
N
u
m
e
x
a
m
p
l
e
s
B
a
t
c
h
s
i
z
e
)
×
n
u
m
t
r
a
i
n
e
p
o
c
h
s
\mathrm{Num} \mathrm{steps}=\left( \frac{\mathrm{Num} \mathrm{examples}}{\mathrm{Batch} \mathrm{size}} \right) \times \mathrm{num} \mathrm{train} \mathrm{epochs}
Numsteps=(BatchsizeNumexamples)×numtrainepochs
说明
该方法是博主初入坑BERT模型的时候做的总结,参考了一些博客文章的方法,但因时间久远记不清具体引用链接,如有读者发现其相似之处,可留言告知于我进行修改,感谢。