【踩坑】单GPU微调BERT(SQuAD 1.1)

最新推荐文章于 2024-08-14 09:36:31 发布

atporter

最新推荐文章于 2024-08-14 09:36:31 发布

阅读量1k

点赞数

分类专栏：自然语言处理文章标签：自然语言处理

本文链接：https://blog.csdn.net/atporter/article/details/114436548

版权

自然语言处理专栏收录该内容

1 篇文章 0 订阅

订阅专栏

官方代码地址 https://github.com/google-research/bert

用SQuAD 1.1数据集fine fune时，遇到不少坑，在此分享一下

一、环境

1.报错：AttributeError: module ‘tensorflow._api.v2.train’ has no attribute ‘Optimizer’

问题为tensorflow版本不正确，必须使用1.x版本，我使用的是tensorflow1.15.0

注意：python3.8不支持tensorflow1.15.0，需创建python3.7环境

conda create -n py37 python=3.7
conda activate py37

2.※※大坑※※

接下来下载tensorflow1.15.0时，一定不要用pip install tensorflow-gpu==1.15.0，这样还是会用CPU跑，据说要跑120h

使用conda：

conda install tensorflow-gpu==1.15

二、参数

大体与官方参数一致，模型使用BERT-Base

报错：Resource exhausted: OOM when allocating tensor with shape[4608,3072] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc

问题为显存不够，我使用的是GeForce RTX 2080 Ti，11G显存

将batch size由12调整到8即可

最终参数

python run_squad.py \
  --vocab_file=$BERT_BASE_DIR/vocab.txt \
  --bert_config_file=$BERT_BASE_DIR/bert_config.json \
  --init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt \
  --do_train=True \
  --train_file=$SQUAD_DIR/train-v1.1.json \
  --do_predict=True \
  --predict_file=$SQUAD_DIR/dev-v1.1.json \
  --train_batch_size=8 \
  --learning_rate=3e-5 \
  --num_train_epochs=2.0 \
  --max_seq_length=384 \
  --doc_stride=128 \
  --output_dir=squad_base/