使用gpu构建llama.cpp
更多详情参见https://github.com/abetlen/llama-cpp-python,官网网站会随着版本迭代更新。
下载并进入llama.cpp
地址:https://github.com/ggerganov/llama.cpp
可以下载到本地再传到服务器上
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
编译源码(make)
生成./main和./quantize等二进制文件。详见:https://github.com/ggerganov/llama.cpp/blob/master/docs/build.md
使用CPU
make
使用GPU
make GGML_CUDA=1
可能出现的报错及解决方法
I ccache not found. Consider installing it for faster compilation.
sudo apt-get install ccache
Makefile:1002: *** I ERROR: For CUDA versions < 11.7 a target CUDA architecture must be explicitly provided via environment variable CUDA_DOCKER_ARCH, e.g. by running "export CUDA_DOCKER_ARCH=compute_XX" on Unix-like systems, where XX is the minimum compute capability that the code needs to run on. A list with compute capabilities can be found here: https://developer.nvidia.com/cuda-gpus . Stop.
说明cuda版本太低,如果不是自己下载好的,参考该文章nvcc -V 显示的cuda版本和实际版本不一致更换
NOTICE: The 'server' binary is deprecated. Please use '