如何快速有效的从huggingface上下载模型及加载

xiaomu_347

已于 2024-07-11 15:39:26 修改

阅读量1.2w

点赞数 30

文章标签： llm 人工智能

于 2024-06-03 10:10:55 首次发布

本文链接：https://blog.csdn.net/xiaomu_347/article/details/131509971

版权

　　对transformer熟悉的小伙伴，相信对http://huggingface.co这个网站不会陌生，但是如果你想从事那个面下载一些模型预先训练权重，尤其是比较大的时候，由于一些与原因，你总是会出现下载中断，或者根本打不开的情况，常见的网上提供了如下几种方式：

一、直接huggingface下载

方法一：使用huggingface 官方提供的 huggingface-cli 命令行工具。

(1) 安装依赖

pip install -U huggingface_hub

(2) 基本命令示例：

export HF_ENDPOINT=https://hf-mirror.com

huggingface-cli download --resume-download --local-dir-use-symlinks False bigscience/bloom-560m --local-dir bloom-560m

(3) 下载需要登录的模型（Gated Model）
请添加--token hf_***参数，其中hf_***是 access token，请在huggingface官网这里获取。示例：

huggingface-cli download --token hf_*** --resume-download --local-dir-use-symlinks False meta-llama/Llama-2-7b-hf --local-dir Llama-2-7b-hf

方法二：使用url直接下载时，将 huggingface.co 直接替换为本站域名hf-mirror.com。使用浏览器或者 wget -c、curl -L、aria2c 等命令行方式即可。
下载需登录的模型需命令行添加 --header hf_*** 参数，token 获取具体参见上文。
方法三：(非侵入式，能解决大部分情况)huggingface 提供的包会获取系统变量，所以可以使用通过设置变量来解决。

HF_ENDPOINT=https://hf-mirror.com python your_script.py

不过有些数据集有内置的下载脚本，那就需要手动改一下脚本内的地址来实现了

方法四：修改脚本

在下载模型前加上下面代码即可

import os
os.environ['HF_ENDPOINT'] = 'https://hf-mirror.com'

上述下载的模型默认会存放在~/.cache/huggingface/hub路径下，如果上述方式下载失效，可以手动下载然后放在该路径即可。

二、从modelscope下载

对应的，如果第一种方式还是有问题，可以切换到ModelScope平台，对有感兴趣的模型并希望能下载至本地，则ModelScope提供了多种下载模型的方式

》使用Library下载模型

若该模型已集成至ModelScope的Library中，则您只需要几行代码即可加载模型。您可以点击模型卡片中的“快速使用”按钮查看如何使用Library进行模型的下载。前期是需要先安装ModelScope的Library。只需要知道模型id，以及希望使用的模型版本(默认为master)，就可以通过一行代码，完成模型寻找，下载，以及加载的工作：

from modelscope.models import Model
model = Model.from_pretrained('damo/nlp_xlmr_named-entity-recognition_viet-ecommerce-title', revision='v1.0.1')
# revision为可选参数，不指定版本会取模型默认版本，默认版本，默认版本为ModelScope library发布前最后一个版本
# 如何得到发布时间
# import modelscope
# print(modelscope.version.__release_datetime__)
#model = Model.from_pretrained('damo/nlp_structbert_word-segmentation_chinese-base')

》使用Library Hub下载模型

您可以使用modelscope modelhub从 repos 创建、删除、更新和检索信息。您还可以从 repos 下载文件或将它们集成到您的库中，并且可指定下载模型的地址。但是这种方式需要魔法从huggingface上下载。

from modelscope.hub.snapshot_download import snapshot_download

model_dir = snapshot_download('damo/nlp_xlmr_named-entity-recognition_viet-ecommerce-title', cache_dir='path/to/local/dir', revision='v1.0.1')
您也可以使用modelscope modelhub从repos中指定下载单个文件。

from modelscope.hub.file_download import model_file_download

model_dir = model_file_download(model_id='AI-ModelScope/rwkv-4-world',file_path='RWKV-4-World-CHNtuned-7B-v1-20230709-ctx4096.pth',revision='v1.0.0')

默认模型会下载到~/.cache/modelscope/hub中，如果需要修改下载目录，可以手动指定环境变量：MODELSCOPE_CACHE，modelscope会将模型和数据集下载到该环境变量指定的目录中。

》使用Git下载模型

# 公开模型下载
git lfs install
git clone https://www.modelscope.cn/<namespace>/<model-name>.git
# 例如: git clone https://www.modelscope.cn/damo/ofa_image-caption_coco_large_en.git

# 私有模型下载，前提是您有响应模型权限 方法1
git lfs install
git clone http://oauth2:your_git_token@www.modelscope.cn/<namespace>/<model-name>.git
# 方法2
git clone http://your_user_name@www.modelscope.cn/<namespace>/<model-name>.git
# Password for 'http://your_user_name@modelscope.cn':
# input git token

三、离线加载

对于离线下载的模型，结合langchain进行加载以及后续的推理开发，可以结合transformers库来实现，以Qwen/Qwen2-1.5B-Instruct为例，具体操作如下：

from langchain_community.llms import HuggingFacePipeline
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from langchain import PromptTemplate, LLMChain

#加载大模型
model_id = "/home/amax/.cache/huggingface/hub/models--Qwen--Qwen2-1.5B-Instruct/snapshots/ba1cf1846d7df0a0591d6c00649f57e798519da8"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")


###case 1
加载管道
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, max_new_tokens=100)
#加载huggingface 管道
model = HuggingFacePipeline(pipeline=pipe)

#创建提示词
template = """Question: {question}
Answer: """
prompt = PromptTemplate(template=template, input_variables=["question"])
llm_chain = LLMChain(prompt=prompt, llm=model)
#运行大模型
question = "What is electroencephalography?"
response = llm_chain.run(question)
print(response)


#####case 2
model = HuggingFacePipeline.from_model_id(
    model_id=model_id,
    task="text-generation",
    pipeline_kwargs={"max_new_tokens": 100},
    device=-1,
)

# 输出结果
print(model('What is electroencephalography?'))

###case 3
# 增加 max_new_tokens 的值以生成更长的文本
max_new_tokens = 200  # 可以根据需要调整这个值
# 构建管道
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, max_new_tokens=max_new_tokens)
hf = HuggingFacePipeline(pipeline=pipe)
# 构建提示词模版
template = """Question: {question}
Answer: Let's think step by step."""
prompt = PromptTemplate.from_template(template)
# LCEL
chain = prompt | hf
# 提问
question = "What is electroencephalography?"
# 输出
print(chain.invoke({"question": question}))

同时补充一下，如果是基于ollama部署到本地的模型加载方式如下

from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain.llms import Ollama

prompt_template = "What is a good name for a company that makes {product}?"

ollama_llm = Ollama(model="llama3:8b")
llm_chain = LLMChain(
    llm = ollama_llm,
    prompt = PromptTemplate.from_template(prompt_template)
)
print(llm_chain("colorful socks")["text"])

最后的最后安利一下自己的个人公众号，有兴趣的可以关注一下