HuggingFace中的 Files and versions 如何优雅下载到本地?(Python requests,tqdm)

最新推荐文章于 2025-01-31 18:06:09 发布

CSU迦叶

最新推荐文章于 2025-01-31 18:06:09 发布

阅读量1.2k

点赞数 1

分类专栏： Python 文章标签： python 开发语言

本文链接：https://blog.csdn.net/weixin_44997802/article/details/132590495

版权

Python 专栏收录该内容

57 篇文章

订阅专栏

前言

在使用huggingface把玩各种大模型时，如果选择从远程加载模型，这个过程可能因为网络问题而非常耗时甚至直接失败，所以把模型、分词器等相关文件下载到本地，再直接从本地加载就成了不可回避的流程。

在进入具体版本的模型后，我们可以去Files and Versions这个菜单项下面找到需要下载到本地的全部模型(以WizardCoder为例)

第一步是获取我们想要下载的文件的下载时url

在文件大小的右侧，可以看到一个向下的箭头表示下载，鼠标移动到箭头上，右键，选择“复制链接地址”，这样就得到了下载时url

我们将这些url存放到list中。

第二步就是python代码，这里除了用于请求的request库，我还使用了tqdm库，tqdm也可通过pip install来安装，它的作用是在下载较大的文件时，我们可以在终端看到下载的速度和进度

import requests
import os
from tqdm import tqdm

urls = [
    "https://huggingface.co/WizardLM/WizardCoder-15B-V1.0/resolve/main/pytorch_model.bin"
]

filepath = "WizardCoder/WizardCoder-15B-V1.0"


def download_file(url):
    filename = url.split("/")[-1]
    download_path = os.path.join(filepath, filename)

    response = requests.get(url, stream=True, verify=False)
    response.raise_for_status()

    file_size = int(response.headers.get("Content-Length", 0))  # 获取待下载的文件大小
    chunk_size = 8192  # 读取的数据块的大小是8千字节
    
    with open(download_path, "wb") as file, tqdm(
        total=file_size, unit="B", unit_scale=True, unit_divisor=1024, desc=filename
    ) as progress_bar:
        for chunk in response.iter_content(chunk_size=chunk_size):
            if chunk:
                file.write(chunk)
                progress_bar.update(1)


for url in urls:
    download_file(url)