不乱码、下载 Transformers 模型 (抱抱脸、model)
概述
目的: 因为需要对预训练模型等做一些查看、转移操作,不想要乱码,不想频繁下载模型等;
- a. (可不乱码) 使用 huggingface_hub 的 snapshot_download(推荐);
- b. (不乱码) 使用 wget 手动下载;
- c. 使用 git lfs;
- d. 使用 本地已经下载好的.
1. (可不乱码) 使用 huggingface_hub 的 snapshot_download
配置 local_dir_use_symlinks=False
就不乱码了;
from huggingface_hub import snapshot_download
# repo_id = "ziqingyang/chinese-alpaca-lora-7b"
repo_id = "nghuyong/ernie-3.0-micro-zh"
local_dir = repo_id.replace("/", "_")
cache_dir = local_dir + "/cache"
snapshot_download(cache_dir=cache_dir,
local_dir=local_dir,
repo_id=repo_id,
local_dir_use_symlinks=False, # 不转为缓存乱码的形式, auto, Small files (<5MB) are duplicated in `local_dir` while a symlink is created for bigger files.
resume_download=True,
allow_patterns=["*.model", "*.json", "*.bin",
"*.py", "*.md", "*.txt"],
ignore_patterns=["*.safetensors", "*.msgpack",
"*.h5", "*.ot", ],
)
2. (不乱码)使用 wget 手动下载
但是现在大模型的权重太大了,一般会拆分成比较多的文件,下载速度也有点慢;
根据地址下载, https://huggingface.co/models/{{repo_id}}
下载路径为: https://huggingface.co/{{repo_id}}/resolve/main/{{具体的文件名}}
以为repo_id=="THUDM/chatglm-6b"为例子
网址: https://huggingface.co/THUDM/chatglm-6b
比如linux可以直接使用wget
wget https://huggingface.co/THUDM/chatglm-6b/resolve/main/README.md
wget https://huggingface.co/THUDM/chatglm-6b/resolve/main/config.json
wget https://huggingface.co/THUDM/chatglm-6b/resolve/main/configuration_chatglm.py
wget https://huggingface.co/THUDM/chatglm-6b/resolve/main/tokenizer_config.json
wget https://huggingface.co/THUDM/chatglm-6b/resolve/main/ice_text.model
wget https://huggingface.co/THUDM/chatglm-6b/resolve/main/quantization.py
wget https://huggingface.co/THUDM/chatglm-6b/resolve/main/tokenization_chatglm.py
wget https://huggingface.co/THUDM/chatglm-6b/resolve/main/modeling_chatglm.py
wget https://huggingface.co/THUDM/chatglm-6b/resolve/main/pytorch_model.bin.index.json
wget https://huggingface.co/THUDM/chatglm-6b/resolve/main/pytorch_model-00001-of-00008.bin
wget https://huggingface.co/THUDM/chatglm-6b/resolve/main/pytorch_model-00002-of-00008.bin
wget https://huggingface.co/THUDM/chatglm-6b/resolve/main/pytorch_model-00003-of-00008.bin
wget https://huggingface.co/THUDM/chatglm-6b/resolve/main/pytorch_model-00004-of-00008.bin
wget https://huggingface.co/THUDM/chatglm-6b/resolve/main/pytorch_model-00005-of-00008.bin
wget https://huggingface.co/THUDM/chatglm-6b/resolve/main/pytorch_model-00006-of-00008.bin
wget https://huggingface.co/THUDM/chatglm-6b/resolve/main/pytorch_model-00007-of-00008.bin
wget https://huggingface.co/THUDM/chatglm-6b/resolve/main/pytorch_model-00008-of-00008.bin
3. 使用 git lfs
安装git lfs: git lfs install
下载模型: git clone https://huggingface.co/THUDM/chatglm-6b
4. 使用 已经下载好的.
本地已经下载好的可以使用, 也可以转移模型目录,
默认windows地址在: C:\Users\{{账户}}\.cache\huggingface\hub
默认linux地址在: {{账户}}/.cache\huggingface\hub
from transformers import BertTokenizer, BertModel
repo_id = "nghuyong/ernie-3.0-micro-zh"
cache_dir = {{填实际地址}}
tokenizer = BertTokenizer.from_pretrained(repo_id, cache_dir=cache_dir)
model = BertModel.from_pretrained(repo_id, cache_dir=cache_dir)
参考
- https://github.com/huggingface/transformers
- https://github.com/huggingface/huggingface_hub
- https://huggingface.co/docs/huggingface_hub/v0.13.3/guides/download
- 如何优雅的下载huggingface-transformers模型
- https://git-lfs.com/
希望对你有所帮助!
lan.zhihu.com/p/475260268)
希望对你有所帮助!