git clone https://github.com/ggerganov/llama.cpp cd llama.cpp
修改Makefile使能mfma参数
MK_CFLAGS += -mfma -mf16c -mavx
MK_CXXFLAGS += -mfma -mf16c -mavx
安装python3依赖
cat ./requirements/requirements-convert_legacy_llama.txt
numpy~=1.26.4
sentencepiece~=0.2.0
transformers>=4.40.1,<5.0.0
gguf>=0.1.0
protobuf>=4.21.0,<5.0.0
依次pip3 install numpy/pip3 install sentencepiece/pip3 install transformers/pip3 install gguf/pip3 install protobuf
下载https://huggingface.co/4bit/Llama-2-7b-chat-hf
转换出llama-2-7b-chat.gguf
python3 convert_hf_to_gguf.py ./models/Llama-2-7b-chat-hf --outfile llama-2-7b-chat.gguf
启动
./llama-cli -m ./llama-2-7b-chat.gguf -co -cnv -p "You are a helpful assistant."