transformers包缓存模型
from transformers import AutoTokenizer, TFAutoModel
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased",cache_dir='D://xx//transformermodel')# 模型会下载到这个文件夹下
model = TFAutoModel.from_pretrained("bert-base-uncased",cache_dir='D://xx//transformermodel')
inputs = tokenizer("Hello world!", return_tensors="tf")
outputs = model(**inputs)
修改文件名字可以不联网使用模型
下载模型
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained("bert-base-cased",cache_dir='./transformermodel/BertTokenizer')
sequence = "A Titan RTX has 24GB of VRAM"
tokenized_sequence = tokenizer.tokenize(sequence)# 分词
print(tokenized_sequence)
# 编码
inputs = tokenizer(sequence)
encoded_sequence = inputs["input_ids"]# input_ids,token_type_ids,attention_mask
print(encoded_sequence)
# 解码
decoded_sequence = tokenizer.decode(encoded_sequence)
print(decoded_sequence)
加载缓存
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained('./transformermodel/BertTokenizer')
sequence = "A Titan RTX has 24GB of VRAM"
tokenized_sequence = tokenizer.tokenize(sequence)# 分词
print(tokenized_sequence)
# 编码
inputs = tokenizer(sequence)
encoded_sequence = inputs["input_ids"]# input_ids,token_type_ids,attention_mask
print(encoded_sequence)
# 解码
decoded_sequence = tokenizer.decode(encoded_sequence)
print(decoded_sequence)
微调模型
examples包下的run_xx.py脚本是微调脚本
序列分类
微调脚本
run_glue.py, run_tf_glue.py, run_tf_text_classification.py or run_xnli.py scripts.