文章目录
这篇文章是用来记录怎么跑起来bert模型的,从环境的搭建到项目的运行的完整记录,这个是代码能跑起来后写的,所以会带有一些上帝视角,也会提到遇到问题后的解决办法
运行环境只有cpu的window,运行软件是pycharm
运行的源文件资源
Bert代码
对于要跑的bert模型,找的是官方的文件,在github上搜索bert。 可以直接复制网址
https://github.com/google-research/bert
数据集
在此文件中的readme文件中,找到运行的代码,跑这个,需要的数据集是GLUE数据集和uncased_L-12_H-768_A-12这个。
GLUE数据集
点击GLUE数据,无法下载,选择点击此脚本用代码下载。
在项目中跑以下py文件,这个文件就是源于’此脚本‘。但是这个代码跑不起来,加了两句
import io
URLLIB = urllib.request
import os
import sys
import shutil
import argparse
import tempfile
import urllib.request
import zipfile
import io
URLLIB = urllib.request
TASKS = ["CoLA", "SST", "MRPC", "QQP", "STS", "MNLI", "QNLI", "RTE", "WNLI", "diagnostic"]
TASK2PATH = {
"CoLA":'https://dl.fbaipublicfiles.com/glue/data/CoLA.zip',
"SST":'https://dl.fbaipublicfiles.com/glue/data/SST-2.zip',
"QQP":'https://dl.fbaipublicfiles.com/glue/data/QQP-clean.zip',
"STS":'https://dl.fbaipublicfiles.com/glue/data/STS-B.zip',
"MNLI":'https://dl.fbaipublicfiles.com/glue/data/MNLI.zip',
"QNLI":'https://dl.fbaipublicfiles.com/glue/data/QNLIv2.zip',
"RTE":'https://dl.fbaipublicfiles.com/glue/data/RTE.zip',
"WNLI":'https://dl.fbaipublicfiles.com/glue/data/WNLI.zip',
"diagnostic":'https://dl.fbaipublicfiles.com/glue/data/AX.tsv'}
MRPC_TRAIN = 'https://dl.fbaipublicfiles.com/senteval/senteval_data/msr_paraphrase_train.txt'
MRPC_TEST = 'https://dl.fbaipublicfiles.com/senteval/senteval_data/msr_paraphrase_test.txt'
def download_and_extract(task, data_dir):
print("Downloading and extracting %s..." % task)
if task == "MNLI":
print("\tNote (12/10/20): This script no longer downloads SNLI. You will need to manually download and format the data to use SNLI.")
data_file = "%s.zip" % task
urllib.request.urlretrieve(TASK2PATH[task], data_file)
with zipfile.ZipFile(data_file) as zip_ref:
zip_ref.extractall(data_dir)
os.remove(data_file)
print(