【Hugggingface.co】关于huggingface.co无法访问&大模型下载运行报错解决We couldn‘t connect to ‘https://huggingface.co‘ to.

Itfuture03

已于 2024-03-12 17:11:34 修改

阅读量5.4w

点赞数 137

分类专栏： AI前沿技术部署&Linux运维文章标签：人工智能语言模型 ai python transformer

于 2024-01-05 10:53:16 首次发布

本文链接：https://blog.csdn.net/weixin_43431218/article/details/135403324

版权

部署&Linux运维同时被 2 个专栏收录

69 篇文章

订阅专栏

AI前沿技术

14 篇文章

订阅专栏

一、问题

在训练模型的过程中，遇到以下几个错误：

1.Datasets使用时的数据下载问题。
报错：
Couldn't reach https://huggingface.co/datasets/codeparrot/self-instruct-starcoder/resolve/fdfa8ceb317670e982aa246d8e799c52338a74a7/data/curated-00000-of-00001-c12cc48b3c68688f.parquet (ConnectionError(ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))))

2.Transformer使用报错
报错:We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like baichuan-inc/Baichuan2-7B-Base is not the path to a directory containing a file named config.json. Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.

二、原因

因为网络的无法连接到 https://huggingface.co 然后提示我们有离线模式。https://huggingface.co/docs/transformers/installation#offline-mode！

其实还是那个问题，国内连接断断续续。目前已经无法访问huggingface。

但是呢，transforms对于huggingface依赖非常强，而模型和数据下载也是必经之路。

三、解决办法

方案一：使用代理
不作描述。

方案二：使用镜像网站：
1.https://hf-mirror.com/

方法0：进入这个镜像网站，我们可以在网站上找到模型，进行下载。

方法一：使用huggingface 官方提供的 huggingface-cli 命令行工具。
huggingface-cli 是 Hugging Face 官方提供的命令行工具，自带完善的下载功能。
(1) 安装依赖

pip install -U huggingface_hub

(2) 基本命令示例：
设置环境：

# Linux
export HF_ENDPOINT=https://hf-mirror.com

# Windows Powershell
$env:HF_ENDPOINT = "https://hf-mirror.com"

建议将上面这一行写入 ~/.bashrc。

(3)下载模型：

huggingface-cli download --resume-download gpt2 --local-dir gpt2

(4)下载数据集

huggingface-cli download --repo-type dataset --resume-download wikitext --local-dir wikitext

有个–local-dir-use-symlinks False 参数可选，因为huggingface的工具链默认会使用符号链接来存储下载的文件，导致–local-dir指定的目录中都是一些“链接文件”，真实模型则存储在~/.cache/huggingface下，如果不喜欢这个可以用 --local-dir-use-symlinks False取消这个逻辑。

  注意：除此之外如果需下载需要登录的模型（Gated Model）
  请添加--token hf_***参数，其中hf_***是 access token，请在huggingface官网这里获取。示例：
  ```
  huggingface-cli download --token hf_*** --resume-download --local-dir-use-symlinks False meta-llama/Llama-2-7b-hf --local-dir Llama-2-7b-hf
  ```

方法二：使用url直接下载时，将 huggingface.co 直接替换为本站域名hf-mirror.com。使用浏览器或者 wget -c、curl -L、aria2c 等命令行方式即可。
下载需登录的模型需命令行添加 --header hf_*** 参数，token 获取具体参见上文。
方法三：使用 hfd
hfd 是hf-mirror.com/开发的 huggingface 专用下载工具，基于成熟工具 git+aria2，可以做到稳定下载不断线。
(1)下载hfd
```
wget https://hf-mirror.com/hfd/hfd.shchmod a+x hfd.sh
```
(2)设置环境变量
```
# Linux
export HF_ENDPOINT=https://hf-mirror.com
```
```
# Windows Powershell
$env:HF_ENDPOINT = "https://hf-mirror.com"
```
建议将上面这一行写入 ~/.bashrc。
(3)下载模型
```
  ./hfd.sh gpt2 --tool aria2c -x 4
```
(4)下载数据集
```
./hfd.sh wikitext --dataset --tool aria2c -x 4
```
方法四：(非侵入式，能解决大部分情况)huggingface 提供的包会获取系统变量，所以可以使用通过设置变量来解决。
```
HF_ENDPOINT=https://hf-mirror.com 
python your_script.py
```
不过有些数据集有内置的下载脚本，那就需要手动改一下脚本内的地址来实现了

2.https://aliendao.cn/
去huggingface 镜像网站，先将图中的model_download.py下载到服务器中的模型路径里。
下载代码为：

wget https://aliendao.cn/model_download.py

然后运行代码：

pip install huggingface_hub
python model_download.py --repo_id （模型ID)

不知道模型ID的可以去搜索栏搜索模型名称，比如baichuan2-7B-Chat 在这里插入图片描述

如图所示，会给出相应的下载代码：python model_download.py --repo_id baichuan-inc/Baichuan2-7B-Chat

这样就能直接在服务器上下载huggingface模型了，而且会显示下载进度条。速度大约是2M/s.

温馨提示
模型下载通常要很久，别忘了打开tmux窗口防止电脑休眠导致网络中断哦
如果你忘了打开tmux窗口也没关系，可以按ctrl-z暂停任务，再打开tmux，重新运行python model_download.py --repo_id 模型id 这行下载代码，可继续下载

方案三：离线

参考：

三、下载需要登录模型怎么办？

Q1:如何下载hf上需要登陆的模型?
由于模型发布者的版权的要求，部分模型无法公开访问下载，需要在 huggingface 上申请许可通过后，才可以下载。这类模型称之为 Gated Model。基本步骤是：

1.申请许可
2.获取 access token（用于命令行和python方法调用）
3.下载

申请许可
此步骤必须在 huggingface 官网注册登录后申请，由于网络安全原因，镜像站一般不支持。
在这里插入图片描述

申请后一般等待几分钟到几天不等（一般几分钟就行），会发邮件通知你审批结果。

获取 access token
申请通过后，就可以在模型主页的 Files and versions 中看到模型文件了，浏览器的话直接点击下载即可。但是如果想要用工具例如 huggingface-cli 下载，则需要获取 access token。

Access Token 获取地址： https://huggingface.co/settings/tokens
在这里插入图片描述
访问 huggingface 设置页面的 token 管理页，选择 New 一个 token，只需要 Read 权限即可，创建后便可以在工具中调用时使用了。