HuggingFace 模型转换为 GGUF/GGML

小禾家的

已于 2025-03-07 09:08:35 修改

阅读量987

点赞数 11

分类专栏： LLM AI GGUF 文章标签：人工智能

于 2025-03-06 20:43:55 首次发布

本文链接：https://blog.csdn.net/u011234288/article/details/146079334

版权

LLM 同时被 3 个专栏收录

2 篇文章

订阅专栏

2 篇文章

订阅专栏

GGUF

2 篇文章

订阅专栏

环境安装可参照

.safetensors转换成.GGUF-CSDN博客

Llama.cpp是在 CPU 和 GPU 上高效运行 LLM 的好方法。缺点但是，您需要将模型转换为 Llama.cpp 支持的格式，现在是 GGUF 文件格式。在这篇博文中，您将学习如何转换HuggingFace 模型（Vicuna 13b v1.5）到 GGUF 模型。

Llama.cpp 支持以下模型：

LLaMA
LLaMA 2
Falcon
Alpaca
GPT4All
Chinese LLaMA / Alpaca and Chinese LLaMA-2 / Alpaca-2
Vigogne (French)
Vicuna
Koala
OpenBuddy (Multilingual)
Pygmalion 7B / Metharme 7B
WizardLM
Baichuan-7B and its derivations (such as baichuan-7b-sft)
Aquila-7B / AquilaChat-7B

步骤：

下载模型

参考内网环境下如何快速下载大模型-CSDN博客

转换模型

克隆 llama.cpp 存储库

git clone https://github.com/ggerganov/llama.cpp.git

安装所需的 python 库：

pip install -r llama.cpp/requirements.txt
window下可能会有问题请使用
pip3 install -r llama.cpp/requirements.txt

验证脚本是否存在并了解各种选项：

python llama.cpp/convert.py -h

将 HF 模型转换为 GGUF 模型：

python llama.cpp/convert.py vicuna-hf \
  --outfile vicuna-13b-v1.5.gguf \
  --outtype q8_0

# 如果不量化，保留模型的效果
python llama.cpp/convert_hf_to_gguf.py ./qwen2_0.5b_instruct  --outtype f16 --verbose --outfile qwen2_0.5b_instruct_f16.gguf
# 如果需要量化（加速并有损效果），直接执行下面脚本就可以
python llama.cpp/convert_hf_to_gguf.py ./qwen2_0.5b_instruct  --outtype q8_0 --verbose --outfile qwen2_0.5b_instruct_q8_0.gguf

测试可用

查看模型架构

由于模型架构、权重格式有些模型不能转换

查看模型架构：

在 Hugging Face 模型页面查看

Hugging Face 上的模型通常会在 "Model Card"里描述使用的架构

在模型的 README.md 或 Model Card 里，通常会有关于架构的信息，例如：

base_model: llama-2
architecture: transformer
具体使用了 GPT, Llama, Mistral, ViT 还是 BERT 之类的结构。

在`config.json` 文件中查看

如果模型作者没有明确写明架构，可以查看模型的 config.json：

在 Hugging Face 模型页面，点击 "Files" 选项卡。
找到 config.json 文件，点击打开。

查找 architectures 字段，例如：

{
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "hidden_act": "silu",
  "hidden_size": 1536,
  "initializer_range": 0.02,
  "intermediate_size": 8960,
  "max_position_embeddings": 32768,
  "max_window_layers": 28,
  "model_type": "qwen2",
  "num_attention_heads": 12,
  "num_hidden_layers": 28,
  "num_key_value_heads": 2,
  "rms_norm_eps": 1e-06,
  "rope_theta": 1000000.0,
  "sliding_window": 32768,
  "tie_word_embeddings": true,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.40.1",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 151936
}

这里可以看到 architectures 是 Qwen2ForCausalLM，说明这个模型采用与 LLaMA 相似的 Transformer 解码器架构。

比如：

https://huggingface.co/AIDC-AI/Ovis2-1B 不能转换