git clone https://github.com/karpathy/llama2.c.git
cd llama2.c
python export.py llama2_7b.bin --meta-llama path/to/llama/model/7B
python export.py llama2_7b_q80.bin --version 2 --meta-llama path/to/llama/model/7B
make runomp
OMP_NUM_THREADS=64 ./run llama2_7b.bin -n 40
OMP_NUM_THREADS=64 ./runq llama2_7b_q80.bin -n 40
在Intel® Xeon® Platinum 8269CY CPU @ 2.50GHz配置下测试
/mnt/workspace/llama2.c> OMP_NUM_THREADS=64 ./run llama2_7b.bin -n 40
I am xxxxx, an undergraduate student from the Department of Electronic Engineering of Fudan University. I have always been fascinated by the superconductor technology
achieved tok/s: 2.766742
/mnt/workspace/llama2.c> OMP_NUM_THREADS=64 ./runq llama2_7b_q80.bin -n 40
As of late a lot of the network security solutions we have implemented require an expensive Security Information and Event Management (SIEM) to accomplish a lot of their monitoring and analytics. I am not saying
achieved tok/s: 5.959658
分别是25.1G,6.7G