基于llama.cpp项目的LLM量化

最新推荐文章于 2025-03-07 16:29:30 发布

HY.T

最新推荐文章于 2025-03-07 16:29:30 发布

阅读量433

点赞数 9

文章标签： llama

本文链接：https://blog.csdn.net/weixin_45486294/article/details/140751967

版权

1 克隆或者下载llama.cpp项目

项目地址：https://github.com/ggerganov/llama.cpp

或者git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
pip install -r requirements/requirements-convert-hf-to-gguf.txt

2 安装下载CMAKE工具

地址：https://make.org/download

CMAKE工具编译llama.cpp项目

cmake -B build
cmake --build build --config Release

3. huggingface格式转gguf格式

1.在项目文件里面找到咱们要用的转换工具，
>>>convert-hf-to-gguf.py 模型存放的目录路径 --outtype f16 --outfile 输出路径\自定义模型名字.gguf
例如：D:\AI\Qwen2-0.5B-output\my_qwen0.5B.gguf
D:\AI\llama.cpp>
python convert_hf_to_gguf.py D:\AI\qwen7B --outtype f16 --outfile D:\AI\qwen-7b-output\myqwen7b.gguf

2.进入到这个路径
>>>llama.cpp\build\bin\Release
>>>D:\AI\llama.cpp\build\bin\Release\llama-quantize.exe 需要量化模型存放路径\my_qwen1.8B.gguf 输出路径\quantize_model.gguf q4_0