1 服务器规划
- 4台910B物理服务器,每台服务器均已接入8块NPU卡,确认NPU网卡指示灯亮起。
- 确保每台服务器至少有1.5T的存储空间,并安装操作系统openEuler 22.03 LTS;关闭防火墙和selinux
- ds-3是master节点,其余节点是slave节点
主机名称 | ip地址规划 | NPU卡IP地址 |
---|---|---|
ds-3 | 10.82.27.3/24 | 10.82.29.17~24 |
ds-4 | 10.82.27.3/24 | 10.82.29.15~32 |
ds-5 | 10.82.27.3/24 | 10.82.29.33~40 |
ds-6 | 10.82.27.3/24 | 10.82.29.41~48 |
[root@ds-6 ~]# cat /etc/os-release
NAME="openEuler"
VERSION="22.03 LTS"
ID="openEuler"
VERSION_ID="22.03"
PRETTY_NAME="openEuler 22.03 LTS"
ANSI_COLOR="0;31"
[root@ds-6 ~]# uname -a
Linux ds-6 5.10.0-60.18.0.50.oe2203.aarch64 #1 SMP Wed Mar 30 02:43:08 UTC 2022 aarch64 aarch64 aarch64 GNU/Linux
2 配置NPU卡
2.1 安装NPU驱动
1.所有节点安装驱动,NPU驱动在华为社区可以找到
# 安装依赖软件,部分软件安装不上,不影响
yum -y install dkms gcc linux-header kernel-dev kernel-headers
# 安装驱动
rpm -ivh Ascend-hdk-910b-npu-driver-24.1.0-1.aarch64.rpm
# 更新固件
./Ascend-hdk-910b-npu-firmware_7.5.0.3.220.run --check
# 可能提示需要重启主机
./Ascend-hdk-910b-npu-firmware_7.5.0.3.220.run --full
# 检查驱动情况(可选)
/usr/local/Ascend/driver/tools/upgrade-tool --device_index -1 --component -1 --version
2.使用命令npu-smi,确认驱动安装成功.(下面的是已运行大模型的NPU使用情况)
[root@ds-6 ~]# npu-smi info
+------------------------------------------------------------------------------------------------+
| npu-smi 24.1.0 Version: 24.1.0 |
+---------------------------+---------------+----------------------------------------------------+
| NPU Name | Health | Power(W) Temp(C) Hugepages-Usage(page)|
| Chip | Bus-Id | AICore(%) Memory-Usage(MB) HBM-Usage(MB) |
+===========================+===============+====================================================+
| 0 910B3 | OK | 86.9 28 0 / 0 |
| 0 | 0000:C1:00.0 | 0 0 / 0 62887/ 65536 |
+===========================+===============+====================================================+
| 1 910B3 | OK | 87.4 29 0 / 0 |
| 0 | 0000:C2:00.0 | 0 0 / 0 62888/ 65536 |
+===========================+===============+====================================================+
| 2 910B3 | OK | 86.2 27 0 / 0 |
| 0 | 0000:81:00.0 | 0 0 / 0 62866/ 65536 |
+===========================+===============+====================================================+
| 3 910B3 | OK | 88.5 26 0 / 0 |
| 0 | 0000:82:00.0 | 0 0 / 0 62888/ 65536 |
+===========================+===============+====================================================+
| 4 910B3 | OK | 90.2 32 0 / 0 |
| 0 | 0000:01:00.0 | 0 0 / 0 62887/ 65536 |
+===========================+===============+====================================================+
| 5 910B3 | OK | 91.3 32 0 / 0 |
| 0 | 0000:02:00.0 | 0 0 / 0 62887/ 65536 |
+===========================+===============+====================================================+
| 6 910B3 | OK | 93.3 32 0 / 0 |
| 0 | 0000:41:00.0 | 0 0 / 0 62887/ 65536 |
+===========================+===============+====================================================+
| 7 910B3 | OK | 83.8 32 0 / 0 |
| 0 | 0000:42:00.0 | 0 0 / 0 62887/ 65536 |
+===========================+===============+====================================================+
+---------------------------+---------------+----------------------------------------------------+
| NPU Chip | Process id | Process name | Process memory(MB) |
+===========================+===============+====================================================+
| 0 0 | 245342 | mindie_llm_back | 59529 |
+===========================+===============+====================================================+
| 1 0 | 245344 | mindie_llm_back | 59529 |
+===========================+===============+====================================================+
| 2 0 | 245346 | mindie_llm_back | 59509 |
+===========================+===============+====================================================+
| 3 0 | 245353 | mindie_llm_back | 59529 |
+===========================+===============+====================================================+
| 4 0 | 245360 | mindie_llm_back | 59529 |
+===========================+===============+====================================================+
| 5 0 | 245367 | mindie_llm_back | 59529 |
+===========================+===============+====================================================+
| 6 0 | 245375 | mindie_llm_back | 59529 |
+===========================+===============+====================================================+
| 7 0 | 245386 | mindie_llm_back | 59529 |
+===========================+===============+====================================================+
2.2 NPU卡配置IP
1.所有节点操作,以ds-3节点为例配置NPU卡ip地址和网关为例,其他节点按照ip地址规划进行修改。后续的rank_table.json、容器内的config.json文件与ip地址规划有关联
hccn_tool -i 0 -ip -s address 10.82.29.17 netmask 255.255.255.0
hccn_tool -i 1 -ip -s address 10.82.29.18 netmask 255.255.255.0
hccn_tool -i 2 -ip -s address 10.82.29.19 netmask 255.255.255.0
hccn_tool -i 3 -ip -s address 10.82.29.20 netmask 255.255.255.0
hccn_tool -i 4 -ip -s address 10.82.29.21 netmask 255.255.255.0
hccn_tool -i 5 -ip -s address 10.82.29.22 netmask 255.255.255.0
hccn_tool -i 6 -ip -s address 10.82.29.23 netmask 255.255.255.0
hccn_tool -i 7 -ip -s address 10.82.29.24 netmask 255.255.255.0
# 给NPU卡配置网关
for i in {0..7}; do hccn_tool -i $i -gateway -s gateway 10.82.29.254; done
for i in {0..7}; do hccn_tool -i $i -netdetect -s address 10.82.29.254; done
2.验证IP配置是否生效
# 检测NPU卡的ip地址是否生效、可ping通
for i in {17..24}; do hccn_tool -i 0 -ping -g address 10.82.29.$i pkt 3; done
# 查看卡的ip
for i in {0..7};do hccn_tool -i $i -ip -g; done
3.关闭tls校验
# 检查NPU底层tls校验行为一致性,建议全0,如果未配置,会导致模型加载超时
for i in {0..7}; do hccn_tool -i $i -tls -g ; done | grep switch
# NPU底层tls校验行为置0操作
for i in {0..7}; do hccn_tool -i $i -tls -s enable 0; done
3 软件和相关配置
3.1 yum安装软件
1.docker必装,其余为工具(可选)
yum -y install docker lrzsz zip unizp tcpdump screen
3.2 编写rank_table文件
1.在所有节点编写配置文件,注意文件权限必须为640。此文件路经后期作为容器的环境变量
此配置文件用于申明NPU集群,配置中记录的的各个节点的IP地址和各节点的NPU卡IP地址,参考链接
"server_count": "4"
,节点数量4,因为本次使用的服务器数量为4- 其余配置项为IP地址
- ds-3节点设置为了master节点,必须把它配置文件放到最上面
[root@ds-3 ~]# ll /data/rank_table_full.json
-rw-r----- 1 root root 3.3K Feb 20 21:31 /data/rank_table_full.json
文件内容如下:
{
"version": "1.0",
"server_count": "4",
"server_list": [
{
"server_id": "10.82.27.3",
"container_ip": "10.82.27.3",
"device": [
{ "device_id": "0", "device_ip": "10.82.29.17", "rank_id": "0" },
{ "device_id": "1", "device_ip": "10.82.29.18", "rank_id": "1" },
{ "device_id": "2", "device_ip": "10.82.29.19", "rank_id": "2" },
{ "device_id": "3", "device_ip": "10.82.29.20", "rank_id": "3" },
{ "device_id": "4", "device_ip": "10.82.29.21", "rank_id": "4" },
{ "device_id": "5", "device_ip": "10.82.29.22", "rank_id": "5" },
{ "device_id": "6", "device_ip": "10.82.29.23", "rank_id": "6" },
{ "device_id": "7", "device_ip": "10.82.29.24", "rank_id": "7" }
]
},
{
"server_id": "10.82.27.4",
"container_ip": "10.82.27.4",
"device": [
{ "device_id": "0", "device_ip": "10.82.29.25", "rank_id": "8" },
{ "device_id": "1", "device_ip": "10.82.29.26", "rank_id": "9" },
{ "device_id": "2", "device_ip": "10.82.29.27", "rank_id": "10" },
{ "device_id": "3", "device_ip": "10.82.29.28", "rank_id": "11" },
{ "device_id": "4", "device_ip": "10.82.29.29", "rank_id": "12" },
{ "device_id": "5", "device_ip": "10.82.29.30", "rank_id": "13" },
{ "device_id": "6", "device_ip": "10.82.29.31", "rank_id": "14" },
{ "device_id": "7", "device_ip": "10.82.29.32", "rank_id": "15" }
]
},
{
"server_id": "10.82.27.5",
"container_ip": "10.82.27.5",
"device": [
{ "device_id": "0", "device_ip": "10.82.29.33", "rank_id": "16" },
{ "device_id": "1", "device_ip": "10.82.29.34", "rank_id": "17" },
{ "device_id": "2", "device_ip": "10.82.29.35", "rank_id": "18" },
{ "device_id": "3", "device_ip": "10.82.29.36", "rank_id": "19" },
{ "device_id": "4", "device_ip": "10.82.29.37", "rank_id": "20" },
{ "device_id": "5", "device_ip": "10.82.29.38", "rank_id": "21" },
{ "device_id": "6", "device_ip": "10.82.29.39", "rank_id": "22" },
{ "device_id": "7", "device_ip": "10.82.29.40", "rank_id": "23" }
]
},
{
"server_id": "10.82.27.6",
"container_ip": "10.82.27.6",
"device": [
{ "device_id": "0", "device_ip": "10.82.29.41", "rank_id": "24" },
{ "device_id": "1", "device_ip": "10.82.29.42", "rank_id": "25" },
{ "device_id": "2", "device_ip": "10.82.29.43", "rank_id": "26" },
{ "device_id": "3", "device_ip": "10.82.29.44", "rank_id": "27" },
{ "device_id": "4", "device_ip": "10.82.29.45", "rank_id": "28" },
{ "device_id": "5", "device_ip": "10.82.29.46", "rank_id": "29" },
{ "device_id": "6", "device_ip": "10.82.29.47", "rank_id": "30" },
{ "device_id": "7", "device_ip": "10.82.29.48", "rank_id": "31" }
]
}
],
"status": "completed"
}
4 准备模型文件
模型文件也叫权重,以其中2个下载地址示例
-
国外下载地址:https://huggingface.co/deepseek-ai/DeepSeek-R1/tree/main
-
国内下载地址:https://modelers.cn/spaces/State_Cloud/DeepSeek-R1/tree/main
1.下载模型文件
mkdir -p /data; cd /data
# 首先保证主机已安装git-lfs命令(https://git-lfs.com,https://github.com/git-lfs/git-lfs)
git lfs install
# 文件大小为1.3T,下载耗时较长
git clone https://modelers.cn/State_Cloud/DeepSeek-R1.git
# 查看下载的模型
ll /data/DeepSeek-R1-origin
2.把原模型文件转化为bf16形式(NPU侧权重转换),目前npu转换脚本不会自动复制原路径下的tokenizer、config等配置文件,需要手动把这些配置文件拷贝到权重目录bf16下面。
git clone https://gitee.com/ascend/ModelZoo-PyTorch.git
cd ModelZoo-PyTorch\MindIE\LLM\DeepSeek\DeepSeek-V2\NPU_inference
# 转换时间视服务器CPU配置而定,大约1小时
python fp8_cast_bf16.py --input-fp8-hf-path /data/DeepSeek-R1-origin --output-bf16-hf-path /data/to/deepseek-R1-bf16
3.在所有节点宿主机查看转换后的权重路径,文件大小为1.3T。注意设置模型文件权限750,否则后期启动模型报错
5 运行docker容器
5.1 创建容器
1.官方的服务化测试和模型测试可以跳过;docker镜像在华为社区,需要权限才能下载,大小为14G
下载地址:https://www.hiascend.com/developer/ascendhub/detail/af85b724a7e5469ebd7ea13c3439d48f
# 以root登录容器引擎所在的虚拟机,获取登录访问权限,并复制到节点执行
docker login -u XXXXXXX swr.cn-south-1.myhuaweicloud.com
# 输入密码
XXXXXXXXXXXXXX
# 下载镜像
docker pull swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie:2.0.T3-800I-A2-py311-openeuler24.03-lts
2.所有节点运行容器,也可以编写启动脚本。本次部署所需的模型文件和配置文件均在/data目录下,所以把/data挂载给了容器/data
设置的环境变量参考:https://gitee.com/ascend/ModelZoo-PyTorch/tree/master/MindIE/LLM/DeepSeek/DeepSeek-R1
#!/bin/bash
docker run -itd \
--privileged \
--name=DeepSeek-R1-full \
--net=host \
--shm-size 500g \
--device=/dev/davinci0 \
--device=/dev/davinci1 \
--device=/dev/davinci2 \
--device=/dev/davinci3 \
--device=/dev/davinci4 \
--device=/dev/davinci5 \
--device=/dev/davinci6 \
--device=/dev/davinci7 \
--device=/dev/davinci_manager \
--device=/dev/hisi_hdc \
--device=/dev/devmm_svm \
-v /etc/localtime:/etc/localtime \
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
-v /usr/local/Ascend/firmware:/usr/local/Ascend/firmware \
-v /usr/local/sbin/npu-smi:/usr/local/sbin/npu-smi \
-v /usr/local/sbin:/usr/local/sbin \
-v /etc/hccn.conf:/etc/hccn.conf \
-v /data:/data \
-e ATB_LLM_HCCL_ENABLE=1 \
-e ATB_LLM_COMM_BACKEND="hccl" \
-e HCCL_CONNECT_TIMEOUT=7200 \
-e WORLD_SIZE=32 \
-e HCCL_EXEC_TIMEOUT=0 \
-e PYTORCH_NPU_ALLOC_CONF=expandable_segments:True \
-e RANKTABLEFILE=/data/rank_table_full.json \
-e MIES_CONTAINER_IP=`hostname -I |awk '{print $1}'` \
-e OMP_NUM_THREADS=1 \
-e NPU_MEMORY_FRACTION=0.95 \
swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie:2.0.T3-800I-A2-py311-openeuler24.03-lts \
bash
5.2 修改mindie配置文件
1.所有节点进入容器后再修改 vim /usr/local/lib/python3.11/site-packages/mindiebenchmark/config/config.json
docker exec -it DeepSeek-R1-full bash
vim /usr/local/lib/python3.11/site-packages/mindiebenchmark/config/config.json
由于ds-3是主节点,因此所有的节点的配置文件ipAddress" 都是 "10.82.27.3
所有节点修改为:
{
"Version" : "1.0.0",
"LogConfig" :
{
"logLevel" : "Info",
"logFileSize" : 20,
"logFileNum" : 20,
"logPath" : "logs/mindie-server.log"
},
"ServerConfig" :
{
"ipAddress" : "10.82.27.3",
"managementIpAddress" : "10.82.27.3",
"port" : 1025,
"managementPort" : 1026,
"metricsPort" : 1027,
"allowAllZeroIpListening" : false,
"maxLinkNum" : 300,
"httpsEnabled" : false,
"fullTextEnabled" : false,
"tlsCaPath" : "security/ca/",
"tlsCaFile" : ["ca.pem"],
"tlsCert" : "security/certs/server.pem",
"tlsPk" : "security/keys/server.key.pem",
"tlsPkPwd" : "security/pass/key_pwd.txt",
"tlsCrlPath" : "security/certs/",
"tlsCrlFiles" : ["server_crl.pem"],
"managementTlsCaFile" : ["management_ca.pem"],
"managementTlsCert" : "security/certs/management/server.pem",
"managementTlsPk" : "security/keys/management/server.key.pem",
"managementTlsPkPwd" : "security/pass/management/key_pwd.txt",
"managementTlsCrlPath" : "security/management/certs/",
"managementTlsCrlFiles" : ["server_crl.pem"],
"kmcKsfMaster" : "tools/pmt/master/ksfa",
"kmcKsfStandby" : "tools/pmt/standby/ksfb",
"inferMode" : "standard",
"interCommTLSEnabled" : false,
"interCommPort" : 1121,
"interCommTlsCaPath" : "security/grpc/ca/",
"interCommTlsCaFiles" : ["ca.pem"],
"interCommTlsCert" : "security/grpc/certs/server.pem",
"interCommPk" : "security/grpc/keys/server.key.pem",
"interCommPkPwd" : "security/grpc/pass/key_pwd.txt",
"interCommTlsCrlPath" : "security/grpc/certs/",
"interCommTlsCrlFiles" : ["server_crl.pem"],
"openAiSupport" : "vllm"
},
"BackendConfig" : {
"backendName" : "mindieservice_llm_engine",
"modelInstanceNumber" : 1,
"npuDeviceIds" : [[0,1,2,3,4,5,6,7]],
"tokenizerProcessNumber" : 8,
"multiNodesInferEnabled" : true,
"multiNodesInferPort" : 1120,
"interNodeTLSEnabled" : false,
"interNodeTlsCaPath" : "security/grpc/ca/",
"interNodeTlsCaFiles" : ["ca.pem"],
"interNodeTlsCert" : "security/grpc/certs/server.pem",
"interNodeTlsPk" : "security/grpc/keys/server.key.pem",
"interNodeTlsPkPwd" : "security/grpc/pass/mindie_server_key_pwd.txt",
"interNodeTlsCrlPath" : "security/grpc/certs/",
"interNodeTlsCrlFiles" : ["server_crl.pem"],
"interNodeKmcKsfMaster" : "tools/pmt/master/ksfa",
"interNodeKmcKsfStandby" : "tools/pmt/standby/ksfb",
"ModelDeployConfig" :
{
"maxSeqLen" : 24576,
"maxInputTokenLen" : 16384,
"truncation" : true,
"ModelConfig" : [
{
"modelInstanceType" : "Standard",
"modelName" : "deepseekr1",
"modelWeightPath" : "/data/DeepSeek-R1-bf16",
"worldSize" : 8,
"cpuMemSize" : 5,
"npuMemSize" : -1,
"backendType" : "atb",
"trustRemoteCode" : false
}
]
},
"ScheduleConfig" :
{
"templateType" : "Standard",
"templateName" : "Standard_LLM",
"cacheBlockSize" : 128,
"maxPrefillBatchSize" : 8,
"maxPrefillTokens" : 16384,
"prefillTimeMsPerReq" : 150,
"prefillPolicyType" : 0,
"decodeTimeMsPerReq" : 50,
"decodePolicyType" : 0,
"maxBatchSize" : 8,
"maxIterTimes" : 8192,
"maxPreemptCount" : 0,
"supportSelectBatch" : false,
"maxQueueDelayMicroseconds" : 5000
}
}
}
6 启动模型
注:此步骤出错可能性较大,须仔细排查
1.在所有节点进入容器:docker exec -it DeepSeek-R1 bash
执行该脚本,启动模型,通常耗时1小时内。超过2小时启动不成功会自动超时退出
cd /usr/local/Ascend/mindie/latest/mindie-service/
./bin/mindieservice_daemon
2.观察启动时的日志输出
主节点的部分启动日志输出:
所有节点的最后输出为"Daemon start success!"表示模型运行成功
其他节点的部分启动日志输出:
3.所有节点可以看到显存占用逐渐升高,若显存占用超过40分钟没有达到总显存的95%,或长时间不更新,很有可能是出错了
7 验证测试
1.在主节点ds-3查看1025端口监听状态:
[root@ds-3 ~]# ss -antp | grep 1025
LISTEN 0 5 10.82.27.3:1025 0.0.0.0:* users:(("mindieservice_d",pid=185939,fd=41))
ESTAB 0 0 10.82.27.3:1025 10.82.0.31:38682 users:(("mindieservice_d",pid=185939,fd=46))
ESTAB 0 0 10.82.27.3:1025 10.82.0.31:33536 users:(("mindieservice_d",pid=185939,fd=3))
2.进行接口测试,可以看到问题回复
[root@ds-3 ~]# curl 10.82.27.3:1025/generate -X POST -d '{"inputs":"题目:糖果的数量,小明、小红和小刚一共有24颗糖果。已知:小明比小红多2颗糖果;小红比小刚少4颗糖果。问题:小明、小红和小刚分别有多少颗糖果?","parameters":{"max_new_tokens":500},"temperature":0.3, "top_p":0.3, "top_k":5, "do_sample":true, "repetition_penalty":1.05, "seed":128}'
{"generated_text":"答案:小明有8颗,小红有6颗,小刚有10颗。解析:设小红有x颗糖果,则小明有x+2颗,小刚有x+4颗。根据总数为24颗,建立方程x + (x+2) + (x+4) = 24,解得x=6。因此,小明8颗,小红6颗,小刚10颗。<|end▁of▁sentence|>"}[root@ds-3 ~]#
8 常见问题
1 权限问题
- 保证权重路径是可用的,执行命令修改权限(
chmod 750 -R /data/DeepSeek-R1-bf16
),注意是整个父级目录的权限 - rank_table.json:NPU卡集群的配置文件权限必须是640
2 IP地址参数配置错误
-
在容器内的/usr/local/Ascend/mindie/2.0.T3/mindie-service/conf/config.json ,master的IP必须统一为ds-3节点的IP
-
rank-table.json 中的NPU卡IP地址和序列号不一致,即NPU卡配置的IP地址和配置文件对应不上
3 模型加载超时
最后运行模型时不报错,但是进度也不更新,直至超时退出。可能是npu的TLS校验未关闭
4 掉卡
部分节点的NPU卡可能掉线,板卡指示灯不亮,需要使用ping检测
# 改为节点的NPU卡IP地址范围
for i in {17..24}; do hccn_tool -i 0 -ping -g address 10.82.29.$i pkt 3; done
9 性能测试
9.1 测试前准备
1.任意一个节点运行1个容器
#!/bin/bash
docker run -itd \
--privileged \
--name=mindie_benchmark \
--net=host \
--shm-size 500g \
--device=/dev/davinci0 \
--device=/dev/davinci1 \
--device=/dev/davinci2 \
--device=/dev/davinci3 \
--device=/dev/davinci4 \
--device=/dev/davinci5 \
--device=/dev/davinci6 \
--device=/dev/davinci7 \
--device=/dev/davinci_manager \
--device=/dev/hisi_hdc \
--device=/dev/devmm_svm \
-v /etc/localtime:/etc/localtime \
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
-v /usr/local/Ascend/firmware:/usr/local/Ascend/firmware \
-v /usr/local/sbin/npu-smi:/usr/local/sbin/npu-smi \
-v /usr/local/sbin:/usr/local/sbin \
-v /etc/hccn.conf:/etc/hccn.conf \
-v /data:/data \
-e ATB_LLM_HCCL_ENABLE=1 \
-e ATB_LLM_COMM_BACKEND="hccl" \
-e HCCL_CONNECT_TIMEOUT=7200 \
-e WORLD_SIZE=32 \
-e HCCL_EXEC_TIMEOUT=0 \
-e PYTORCH_NPU_ALLOC_CONF=expandable_segments:True \
-e RANKTABLEFILE=/data/rank_table_full.json \
-e MIES_CONTAINER_IP=`hostname -I |awk '{print $1}'` \
-e OMP_NUM_THREADS=1 \
-e NPU_MEMORY_FRACTION=0.95 \
swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie:2.0.T3-800I-A2-py311-openeuler24.03-lts \
bash
2.进入容器修改文件,并设置权限为640
chmod 640 /usr/local/lib/python3.11/site-packages/mindiebenchmark/config/config.json
chmod 640 /usr/local/lib/python3.11/site-packages/mindiebenchmark/config/synthetic_config.json
chmod 640 /usr/local/lib/python3.11/site-packages/mindieclient/python/config/config.jso
9.2 开始测试
提供了如下3种测试方式:
- Concurrency模拟用户并发数
- MaxOutputLen最大输出长度
benchmark \
--DatasetType "synthetic" \
--ModelName deepseekr1 \
--ModelPath "/data/DeepSeek-R1-bf16/" \
--TestType vllm_client \
--Http https://10.82.27.3:1025 \
--ManagementHttp https://10.82.27.3:1025 \
--Concurrency 128 \
--MaxOutputLen 20 \
--TaskKind stream \
--Tokenizer True \
--SyntheticConfigPath /data/Benchmark/synthetic_config.json
benchmark \
--DatasetPath "/data/Benchmark/synthetic_config.json" \
--DatasetType "synthetic" \
--ModelName deepseekr1 \
--ModelPath "/data/DeepSeek-R1-bf16/" \
--TaskKind stream \
--Concurrency 1 \
--MaxOutputLen 100 \
--TestType openai \
--Http http://10.82.27.3:1025
benchmark \
--DatasetPath "/data/Benchmark/synthetic_config.json" \
--DatasetType "synthetic" \
--ModelName deepseekr1 \
--ModelPath "/data/DeepSeek-R1-bf16/" \
--TaskKind stream \
--Concurrency 100 \
--MaxOutputLen 100 \
--TestType openai \
--Http http://10.82.27.3:1025
2.查看运行结果
[root@ds-3 ~]# cd /root/mindie/log/debug
[root@ds-3 debug]# ll
total 136
drwxr-x--- 3 root root 4096 Feb 24 15:57 instance
-r--r----- 1 root root 11883 Feb 21 16:09 mindie-benchmark_1059_20250221160904.log
-r--r----- 1 root root 11879 Feb 21 16:16 mindie-benchmark_1266_20250221161610.log
-r--r----- 1 root root 11880 Feb 21 16:21 mindie-benchmark_1475_20250221162121.log
-r--r----- 1 root root 11847 Feb 24 15:14 mindie-benchmark_1745_20250224151403.log
-r--r----- 1 root root 11865 Feb 24 15:39 mindie-benchmark_1951_20250224153930.log
-r--r----- 1 root root 11987 Feb 24 15:57 mindie-benchmark_2218_20250224155759.log
-r--r----- 1 root root 1601 Feb 21 15:12 mindie-benchmark_473_20250221151213.log
-r--r----- 1 root root 11936 Feb 21 15:38 mindie-benchmark_496_20250221153833.log
-r--r----- 1 root root 11864 Feb 21 15:54 mindie-benchmark_786_20250221155448.log
-r--r----- 1 root root 668 Feb 21 16:07 mindie-client_1059_20250221160904.log
-r--r----- 1 root root 668 Feb 21 16:14 mindie-client_1266_20250221161610.log
-r--r----- 1 root root 668 Feb 21 16:19 mindie-client_1475_20250221162121.log
-r--r----- 1 root root 668 Feb 24 15:12 mindie-client_1745_20250224151403.log
-r--r----- 1 root root 668 Feb 24 15:37 mindie-client_1951_20250224153930.log
-r--r----- 1 root root 668 Feb 24 15:47 mindie-client_2218_20250224155759.log
-r--r----- 1 root root 665 Feb 21 15:27 mindie-client_496_20250221153833.log
-r--r----- 1 root root 665 Feb 21 15:52 mindie-client_786_20250221155448.l
[2025-02-24 15:57:54.612+08:00] [2218] [281473615305280] [benchmark] [INFO] [output.py:115]
The BenchMark test performance metric result is:
+---------------------+-----------------+-----------------+----------------+-----------------+-----------------+-----------------+-----------------+-----+
| Metric | average | max | min | P75 | P90 | SLO_P90 | P99 | N |
+---------------------+-----------------+-----------------+----------------+-----------------+-----------------+-----------------+-----------------+-----+
| FirstTokenTime | 135.4161 ms | 191.7768 ms | 83.1695 ms | 163.2805 ms | 173.1221 ms | 173.1221 ms | 190.0502 ms | 100 |
| DecodeTime | 67.9848 ms | 6735.3363 ms | 59.0086 ms | 69.3274 ms | 72.6631 ms | 70.7305 ms | 80.2135 ms | 100 |
| LastDecodeTime | 68.9629 ms | 77.6713 ms | 62.8839 ms | 71.9116 ms | 74.4654 ms | 74.4654 ms | 77.4178 ms | 100 |
| MaxDecodeTime | 169.5477 ms | 6735.3363 ms | 71.0449 ms | 81.7754 ms | 85.9031 ms | 85.9031 ms | 728.0497 ms | 100 |
| GenerateTime | 6480.2226 ms | 12720.5217 ms | 4565.1791 ms | 6823.2514 ms | 7043.4134 ms | 7043.4134 ms | 7397.2595 ms | 100 |
| InputTokens | 100.06 | 195 | 4 | 148.75 | 176.3 | 176.3 | 194.01 | 100 |
| GeneratedTokens | 94.31 | 100 | 68 | 100.0 | 100.0 | 100.0 | 100.0 | 100 |
| GeneratedTokenSpeed | 14.6433 token/s | 15.5779 token/s | 7.0752 token/s | 15.0536 token/s | 15.2666 token/s | 15.2666 token/s | 15.4018 token/s | 100 |
| GeneratedCharacters | 375.64 | 483 | 207 | 421.5 | 451.0 | 451.0 | 469.14 | 100 |
| Tokenizer | 1.0166 ms | 23.54 ms | 0.2642 ms | 0.8698 ms | 1.0637 ms | 1.0637 ms | 6.1662 ms | 100 |
| Detokenizer | 1.0548 ms | 1.3294 ms | 0.7761 ms | 1.1096 ms | 1.121 ms | 1.121 ms | 1.3171 ms | 100 |
| CharactersPerToken | 3.983 | / | / | / | / | / | / | 100 |
| PostProcessingTime | 0 ms | 0 ms | 0 ms | 0 ms | 0 ms | 0 ms | 0 ms | 100 |
| ForwardTime | 0 ms | 0 ms | 0 ms | 0 ms | 0 ms | 0 ms | 0 ms | 100 |
+---------------------+-----------------+-----------------+----------------+-----------------+-----------------+-----------------+-----------------+-----+
[2025-02-24 15:57:54.613+08:00] [2218] [281473615305280] [benchmark] [INFO] [output.py:121]
The BenchMark test common metric result is:
+------------------------+---------------------------------------+
| Common Metric | Value |
+------------------------+---------------------------------------+
| CurrentTime | 2025-02-24 15:57:54 |
| TimeElapsed | 648.1494 s |
| DataSource | /data/Benchmark/synthetic_config.json |
| Failed | 0( 0.0% ) |
| Returned | 100( 100.0% ) |
| Total | 100[ 100.0% ] |
| Concurrency | 1 |
| ModelName | deepseekr1 |
| lpct | 1.3533 ms |
| Throughput | 0.1543 req/s |
| GenerateSpeed | 14.5507 token/s |
| GenerateSpeedPerClient | 14.5507 token/s |
| accuracy | / |
+------------------------+---------------------------------------+
Failed | 0( 0.0% ) |
| Returned | 100( 100.0% ) |
| Total | 100[ 100.0% ] |
| Concurrency | 1 |
| ModelName | deepseekr1 |
| lpct | 1.3533 ms |
| Throughput | 0.1543 req/s |
| GenerateSpeed | 14.5507 token/s |
| GenerateSpeedPerClient | 14.5507 token/s |
| accuracy | / |
+------------------------+---------------------------------------+