llama.cpp：llama模型c语言推理@FreeBSD

本文链接：https://blog.csdn.net/skywalk8163/article/details/136984166

llama中文名羊驼，Meta AI推出的一款大型语言模型，其性能在多个自然语言处理任务上表现优异是一个非常棒的自然语言生成模型。

llama.cpp是一个使用c语言推理llama的软件包，它支持FreeBSD、Linux等多种平台。

GitHub - ggerganov/llama.cpp: LLM inference in C/C++

源码编译安装

下载源码

git clone https://github.com/ggerganov/llama.cpp

编译

mkdir build
cd build
cmake ..
cmake --build . --config Release

大约只需要10-20分钟就能编译好，速度很快！

FreeBSD里默认是没有sudo的，用root账户将编译好的文件放入/usr/bin又担心干干扰，所以使用加环境变量的方法解决安装问题。

创建env.sh文件，文件内容：

export PATH=/home/skywalk/github/llama.cpp/build/bin:$PAT

每次使用前执行source env.sh即可。

之所以不放入.cshrc或者.bashrc，也是不想让它影响整个系统。毕竟后面可能安装其它模型。

下载模型文件

llama中文模型

官网：GitHub - ymcui/Chinese-LLaMA-Alpaca-2: 中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)

可以通过百度和谷歌网盘下载模型文件

下载地址：

以下是完整版模型，直接下载即可使用，无需其他合并步骤。推荐网络带宽充足的用户。

模型名称	类型	大小	下载地址	GGUF
Chinese-LLaMA-2-13B	基座模型	24.7 GB	[百度] [Google] [🤗HF]	[🤗HF]
Chinese-LLaMA-2-7B	基座模型	12.9 GB	[百度] [Google] [🤗HF]	[🤗HF]
Chinese-LLaMA-2-1.3B	基座模型	2.4 GB	[百度] [Google] [🤗HF]	[🤗HF]
Chinese-Alpaca-2-13B	指令模型	24.7 GB	[百度] [Google] [🤗HF]	[🤗HF]
Chinese-Alpaca-2-7B	指令模型	12.9 GB	[百度] [Google] [🤗HF]	[🤗HF]
Chinese-Alpaca-2-1.3B	指令模型	2.4 GB	[百度] [Google][🤗HF]	[🤗HF]

ps，huggingface的模型可以去镜像网站看看：HF-Mirror - Huggingface 镜像站

FreeBSD下pkg安装

使用命令pkg install llama-cpp

pkg install llama-cpp 
Updating FreeBSD repository catalogue...
pkg: No SRV record found for the repo 'FreeBSD'
Fetching meta.conf:   0%
FreeBSD repository is up to date.
All repositories are up to date.
The following 1 package(s) will be affected (of 0 checked):

New packages to be INSTALLED:
	llama-cpp: 3285

Number of packages to be installed: 1

The process will require 22 MiB more space.
3 MiB to be downloaded.

Proceed with this action? [y/N]: y
[1/1] Fetching llama-cpp-3285.pkg: 100%    3 MiB   3.1MB/s    00:01    
Checking integrity... done (0 conflicting)
[1/1] Installing llama-cpp-3285...

pkg安装好后，执行文件放在/usr/local目录里：

root@fbhost:/usr/local/bin #  ls llama*
llama-baby-llama		llama-llava-cli
llama-batched			llama-lookahead
llama-batched-bench		llama-lookup
llama-bench			llama-lookup-create
llama-bench-matmult		llama-lookup-merge
llama-cli			llama-lookup-stats
llama-convert-llama2c-to-ggml	llama-parallel
llama-cvector-generator		llama-passkey
llama-embedding			llama-perplexity
llama-eval-callback		llama-quantize
llama-export-lora		llama-quantize-stats
llama-finetune			llama-retrieval
llama-gbnf-validator		llama-save-load-state
llama-gguf			llama-server
llama-gguf-split		llama-simple
llama-gritlm			llama-speculative
llama-imatrix			llama-tokenize
llama-infill			llama-train-text-from-scratch

测试Chinese-Alpaca-2-1.3B模型

这个模型小一些，从百度网盘下载还方便一点。

将模型下载到本地后，是这些文件：

ls -l ~/work/model/chinesellama/
total 4935424
-rw-r--r-- 1 skywalk skywalk      339595 3月 24 20:30 chinesellama.tar.gz
-rw-r--r-- 1 skywalk skywalk         671 3月 24 20:06 config.json
-rw-r--r-- 1 skywalk skywalk         170 3月 24 20:06 generation_config.json
-rw-r--r-- 1 skywalk skywalk 2525058738 3月 24 21:01 pytorch_model.bin
-rw-r--r-- 1 skywalk skywalk         435 3月 24 20:08 special_tokens_map.json
-rw-r--r-- 1 skywalk skywalk         766 3月 24 20:08 tokenizer_config.json
-rw-r--r-- 1 skywalk skywalk      844403 3月 24 20:08 tokenizer.model

转换模型

python convert.py ~/work/model/chinesellama/

模型写入：Wrote /home/skywalk/work/model/chinesellama/ggml-model-f16.gguf

执行

main -m ~/work/model/chinesellama/ggml-model-f16.gguf  -p "Building a website can be done in 10 simple steps:\nStep 1:" -n 400 -e

我的8G内存本本直接崩了，意料之中。

不过整个流程算是跑通了！

使用llama.cpp，可以在FreeBSD下跑llama中文模型，太棒了！

附赠llama2.c

llama2.c 用纯c 700行代码推理llama2 模型！关键还是跨平台的，FreeBSD下一样好使！

官网：https://github.com/karpathy/llama2.c

下载代码：

git clone https://github.com/karpathy/llama2.c

进入项目目录并下载模型：

cd llama2.c
wget https://karpathy.ai/llama2c/model.bin -P out

这个模型是stories15M模型

编译并运行：

gcc -O3 -o run run.c -lm
./run out/model.bin

英文小说输出效果：

Once upon a time, there was a big bookcase in a little girl's room. The little girl, named Lucy, loved to read. She would sit on the chair and read all day. One day, Lucy saw a scary monster in her room. The monster had big teeth and big eyes. Lucy was scared, but she wanted to find out who was scary.
Lucy thought and thought. Then, she had an idea. She would change her clothes and draw a face on the monster with a big crayon. The monster thought it was a funny picture. Lucy went back to her room and started to draw on the bookcase.
As Lucy drew, the monster from the book came to life! It was a funny looking monster that looked at Lucy's drawings. Lucy was not scared anymore. She laughed and played with her new friend. The monster and Lucy were happy friends forever.
achieved tok/s: 44.499106

可以下载更大的模型，比如stories110M.bin：

https://huggingface.co/karpathy/tinyllamas/resolve/main/stories110M.bin

具体模型如下：

model	dim	n_layers	n_heads	n_kv_heads	max context length	parameters	val loss	download
260K	64	5	8	4	512	260K	1.297	stories260K
OG	288	6	6	6	256	15M	1.072	stories15M.bin
42M	512	8	8	8	1024	42M	0.847	stories42M.bin
110M	768	12	12	12	1024	110M	0.760	stories110M.bin

理论上，可以推理任何的llama模型，不过作者说因为是float32推理，所以大于7b的不建议。

总结

llama.cpp和llama2.c能在FreeBSD平台进行AI推理，真的是太棒了！

调试

直接在根目录make编译报错

make: "/usr/home/skywalk/github/llama.cpp/Makefile" line 627: Unknown modifier " For CUDA versions < 11.7 a target CUDA architecture must be explicitly provided via CUDA_DOCKER_ARCH"
make: "/usr/home/skywalk/github/llama.cpp/Makefile" line 627: Invalid line type
make: "/usr/home/skywalk/github/llama.cpp/Makefile" line 628: Invalid line type
make: "/usr/home/skywalk/github/llama.cpp/Makefile" line 629: Invalid line type
make: "/usr/home/skywalk/github/llama.cpp/Makefile" line 630: Invalid line type
make: "/usr/home/skywalk/github/llama.cpp/Makefile" line 631: Invalid line type
make: "/usr/home/skywalk/github/llama.cpp/Makefile" line 632: Invalid line type
make: "/usr/home/skywalk/github/llama.cpp/Makefile" line 833: Invalid line type
make: "/usr/home/skywalk/github/llama.cpp/Makefile" line 836: Invalid line type
make: Fatal errors encountered -- cannot continue

改成cmake

mkdir build
cd build
cmake ..
cmake --build . --config Release