微调脚本文件下载
微调脚本:是huggingface研究机构向我们提供了针对GLUE数据集合任务类型的微调脚本, 这些微调脚本的核心都是微调模型的最后一个全连接层,通过参数配置来指定GLUE中存在任务类型, 以及指定需要微调的预训练模型
下载huggingface的transfomers文件
git clone https://github.com/huggingface/transformers.git
transformers文件夹下的文件有:
[root@bhs transformers]# ls
CONTRIBUTING.md docs LICENSE model_cards setup.cfg templates utils
deploy_multi_version_doc.sh examples Makefile notebooks setup.py tests valohai.yaml
docker hubconf.py MANIFEST.in README.md src transformers-cli
cd transformers
安装python的transformer工具包
pip install .
配置微调脚本examples/run_glue.py参数进行微调
export DATA_DIR='/root/data/glue_data'
export SAVE_DIR='./bert_finetuning_test' #该目录需要放在transformers下
python3 run_glue.py \
--model_type BERT \
--model_name_or_path bert-base-uncased \
--task_name MRPC \
--do_train \
--do_eval \
--data_dir $DATA_DIR/MRPC/ \
--max_seq_length 128 \
--learning_rate 2e-5 \
--num_train_epochs 1.0 \
--output_dir $SAVE_DIR \
--overwrite_output_dir
运行输出:
04/29/2020 13:05:08 - INFO - __main__ - ***** Eval results mrpc *****
04/29/2020 13:05:08 - INFO - __main__ - acc = 0.8161764705882353
04/29/2020 13:05:08 - INFO - __main__ - f1 = 0.878048780487805
04/29/2020 13:05:08 - INFO - __main__ - acc_and_f1 = 0.8471126255380201
04/29/2020 13:05:08 - INFO - __main__ - loss = 0.46413972374855306
#test.tsv --do_predict
#模型直接用(将自己的数据集替换进去--task_name MyData)
微调后模型使用
登录huggingface的账号
[root@bhs transformers]# transformers-cli login
_| _| _| _| _|_|_| _|_|_| _|_|_| _| _| _|_|_| _|_|_|_| _|_| _|_|_| _|_|_|_|
_| _| _| _| _| _| _| _|_| _| _| _| _| _| _| _|
_|_|_|_| _| _| _| _|_| _| _|_| _| _| _| _| _| _|_| _|_|_| _|_|_|_| _| _|_|_|
_| _| _| _| _| _| _| _| _| _| _|_| _| _| _| _| _| _| _|
_| _| _|_| _|_|_| _|_|_| _|_|_| _| _| _|_|_| _| _| _| _|_|_| _|_|_|_|
Username: baihaisheng
Password:
Login successful
Your token: NtENPTXMQTdXMIgwJWGPkWZJwzcXfezxVHvLrzWSqCwwJnLQQOhOTERgxqbWjbKluQaJRJNcKDtNbSYvirinJYfigKrVqhyAEDwMPHylwLPzr
OuInATTVzbfxMXayXBk
Your token has been saved to /root/.huggingface/token
将微调的模型上传
transformers-cli upload ./bert_finetuning_test/
100%|██████████████████████████████████████████████████████████Your file now lives at: https://s3.amazonaws.com/models.huggingface.co/bert/baihaisheng/bert_finetuning_test/pytorch_model.bin
Your file now lives at:
https://s3.amazonaws.com/models.huggingface.co/bert/baihaisheng/bert_finetuning_test/special_tokens_map.json
pytorch.hub加载模型使用
import torch
source='huggingface/pytorch-transformers'
part='tokenizer'
model_name='baihaisheng/bert_finetuning_test'
tokenizer=torch.hub.load(source,part,model_name)
text="who are you","I am baihaisheng"
text_index=tokenizer.encode(text)
mask=102
index=text_index.index(mask)
segments_ind=[0]*(index+1)+[1]*(len(text_index)-index-1)
token_text=torch.tensor([text_index])
token_index=torch.tensor([segments_ind])
model=torch.hub.load(source,'modelForSequenceClassification',model_name)
with torch.no_grad():
model_output=model(token_text,token_type_ids=token_index)
print(model_output[0])
输出结果
tensor([[ 0.1700, -0.1157]])