基于mindie部署满血DeepSeek-R1 671B

1 服务器规划

  • 4台910B物理服务器,每台服务器均已接入8块NPU卡,确认NPU网卡指示灯亮起。
  • 确保每台服务器至少有1.5T的存储空间,并安装操作系统openEuler 22.03 LTS;关闭防火墙和selinux
  • ds-3是master节点,其余节点是slave节点
主机名称ip地址规划NPU卡IP地址
ds-310.82.27.3/2410.82.29.17~24
ds-410.82.27.3/2410.82.29.15~32
ds-510.82.27.3/2410.82.29.33~40
ds-610.82.27.3/2410.82.29.41~48
[root@ds-6 ~]# cat /etc/os-release 
NAME="openEuler"
VERSION="22.03 LTS"
ID="openEuler"
VERSION_ID="22.03"
PRETTY_NAME="openEuler 22.03 LTS"
ANSI_COLOR="0;31"

[root@ds-6 ~]# uname -a
Linux ds-6 5.10.0-60.18.0.50.oe2203.aarch64 #1 SMP Wed Mar 30 02:43:08 UTC 2022 aarch64 aarch64 aarch64 GNU/Linux

2 配置NPU卡

2.1 安装NPU驱动

1.所有节点安装驱动,NPU驱动在华为社区可以找到

# 安装依赖软件,部分软件安装不上,不影响
yum -y install dkms gcc linux-header kernel-dev kernel-headers

# 安装驱动
rpm -ivh Ascend-hdk-910b-npu-driver-24.1.0-1.aarch64.rpm

# 更新固件
./Ascend-hdk-910b-npu-firmware_7.5.0.3.220.run --check

# 可能提示需要重启主机
./Ascend-hdk-910b-npu-firmware_7.5.0.3.220.run --full

# 检查驱动情况(可选)
/usr/local/Ascend/driver/tools/upgrade-tool --device_index -1 --component -1 --version

2.使用命令npu-smi,确认驱动安装成功.(下面的是已运行大模型的NPU使用情况)

[root@ds-6 ~]# npu-smi info
+------------------------------------------------------------------------------------------------+
| npu-smi 24.1.0                   Version: 24.1.0                                               |
+---------------------------+---------------+----------------------------------------------------+
| NPU   Name                | Health        | Power(W)    Temp(C)           Hugepages-Usage(page)|
| Chip                      | Bus-Id        | AICore(%)   Memory-Usage(MB)  HBM-Usage(MB)        |
+===========================+===============+====================================================+
| 0     910B3               | OK            | 86.9        28                0    / 0             |
| 0                         | 0000:C1:00.0  | 0           0    / 0          62887/ 65536         |
+===========================+===============+====================================================+
| 1     910B3               | OK            | 87.4        29                0    / 0             |
| 0                         | 0000:C2:00.0  | 0           0    / 0          62888/ 65536         |
+===========================+===============+====================================================+
| 2     910B3               | OK            | 86.2        27                0    / 0             |
| 0                         | 0000:81:00.0  | 0           0    / 0          62866/ 65536         |
+===========================+===============+====================================================+
| 3     910B3               | OK            | 88.5        26                0    / 0             |
| 0                         | 0000:82:00.0  | 0           0    / 0          62888/ 65536         |
+===========================+===============+====================================================+
| 4     910B3               | OK            | 90.2        32                0    / 0             |
| 0                         | 0000:01:00.0  | 0           0    / 0          62887/ 65536         |
+===========================+===============+====================================================+
| 5     910B3               | OK            | 91.3        32                0    / 0             |
| 0                         | 0000:02:00.0  | 0           0    / 0          62887/ 65536         |
+===========================+===============+====================================================+
| 6     910B3               | OK            | 93.3        32                0    / 0             |
| 0                         | 0000:41:00.0  | 0           0    / 0          62887/ 65536         |
+===========================+===============+====================================================+
| 7     910B3               | OK            | 83.8        32                0    / 0             |
| 0                         | 0000:42:00.0  | 0           0    / 0          62887/ 65536         |
+===========================+===============+====================================================+
+---------------------------+---------------+----------------------------------------------------+
| NPU     Chip              | Process id    | Process name             | Process memory(MB)      |
+===========================+===============+====================================================+
| 0       0                 | 245342        | mindie_llm_back          | 59529                   |
+===========================+===============+====================================================+
| 1       0                 | 245344        | mindie_llm_back          | 59529                   |
+===========================+===============+====================================================+
| 2       0                 | 245346        | mindie_llm_back          | 59509                   |
+===========================+===============+====================================================+
| 3       0                 | 245353        | mindie_llm_back          | 59529                   |
+===========================+===============+====================================================+
| 4       0                 | 245360        | mindie_llm_back          | 59529                   |
+===========================+===============+====================================================+
| 5       0                 | 245367        | mindie_llm_back          | 59529                   |
+===========================+===============+====================================================+
| 6       0                 | 245375        | mindie_llm_back          | 59529                   |
+===========================+===============+====================================================+
| 7       0                 | 245386        | mindie_llm_back          | 59529                   |
+===========================+===============+====================================================+

2.2 NPU卡配置IP

1.所有节点操作,以ds-3节点为例配置NPU卡ip地址和网关为例,其他节点按照ip地址规划进行修改。后续的rank_table.json、容器内的config.json文件与ip地址规划有关联

hccn_tool -i 0 -ip -s address 10.82.29.17 netmask 255.255.255.0
hccn_tool -i 1 -ip -s address 10.82.29.18 netmask 255.255.255.0
hccn_tool -i 2 -ip -s address 10.82.29.19 netmask 255.255.255.0
hccn_tool -i 3 -ip -s address 10.82.29.20 netmask 255.255.255.0
hccn_tool -i 4 -ip -s address 10.82.29.21 netmask 255.255.255.0
hccn_tool -i 5 -ip -s address 10.82.29.22 netmask 255.255.255.0
hccn_tool -i 6 -ip -s address 10.82.29.23 netmask 255.255.255.0
hccn_tool -i 7 -ip -s address 10.82.29.24 netmask 255.255.255.0

# 给NPU卡配置网关
for i in {0..7}; do hccn_tool -i $i -gateway -s gateway 10.82.29.254; done
for i in {0..7}; do hccn_tool -i $i -netdetect -s address 10.82.29.254; done

2.验证IP配置是否生效

# 检测NPU卡的ip地址是否生效、可ping通
for i in {17..24}; do hccn_tool -i 0 -ping -g address 10.82.29.$i pkt 3; done

# 查看卡的ip
for i in {0..7};do hccn_tool -i $i -ip -g; done

3.关闭tls校验

# 检查NPU底层tls校验行为一致性,建议全0,如果未配置,会导致模型加载超时
for i in {0..7}; do hccn_tool -i $i -tls -g ; done | grep switch

# NPU底层tls校验行为置0操作
for i in {0..7}; do hccn_tool -i $i -tls -s enable 0; done

3 软件和相关配置

3.1 yum安装软件

1.docker必装,其余为工具(可选)

yum -y install docker lrzsz zip unizp tcpdump screen

3.2 编写rank_table文件

1.在所有节点编写配置文件,注意文件权限必须为640。此文件路经后期作为容器的环境变量

此配置文件用于申明NPU集群,配置中记录的的各个节点的IP地址和各节点的NPU卡IP地址,参考链接

  • "server_count": "4",节点数量4,因为本次使用的服务器数量为4
  • 其余配置项为IP地址
  • ds-3节点设置为了master节点,必须把它配置文件放到最上面
[root@ds-3 ~]# ll /data/rank_table_full.json
-rw-r----- 1 root root 3.3K Feb 20 21:31 /data/rank_table_full.json

文件内容如下:

{
    "version": "1.0",
    "server_count": "4",
    "server_list": [
        {
            "server_id": "10.82.27.3",
            "container_ip": "10.82.27.3",
            "device": [
                { "device_id": "0", "device_ip": "10.82.29.17", "rank_id": "0" },
                { "device_id": "1", "device_ip": "10.82.29.18", "rank_id": "1" },
                { "device_id": "2", "device_ip": "10.82.29.19", "rank_id": "2" },
                { "device_id": "3", "device_ip": "10.82.29.20", "rank_id": "3" },
                { "device_id": "4", "device_ip": "10.82.29.21", "rank_id": "4" },
                { "device_id": "5", "device_ip": "10.82.29.22", "rank_id": "5" },
                { "device_id": "6", "device_ip": "10.82.29.23", "rank_id": "6" },
                { "device_id": "7", "device_ip": "10.82.29.24", "rank_id": "7" }
            ]
        },
        {
            "server_id": "10.82.27.4",
            "container_ip": "10.82.27.4",
            "device": [
                { "device_id": "0", "device_ip": "10.82.29.25", "rank_id": "8" },
                { "device_id": "1", "device_ip": "10.82.29.26", "rank_id": "9" },
                { "device_id": "2", "device_ip": "10.82.29.27", "rank_id": "10" },
                { "device_id": "3", "device_ip": "10.82.29.28", "rank_id": "11" },
                { "device_id": "4", "device_ip": "10.82.29.29", "rank_id": "12" },
                { "device_id": "5", "device_ip": "10.82.29.30", "rank_id": "13" },
                { "device_id": "6", "device_ip": "10.82.29.31", "rank_id": "14" },
                { "device_id": "7", "device_ip": "10.82.29.32", "rank_id": "15" }
            ]
        },
        {
            "server_id": "10.82.27.5",
            "container_ip": "10.82.27.5",
            "device": [
                { "device_id": "0", "device_ip": "10.82.29.33", "rank_id": "16" },
                { "device_id": "1", "device_ip": "10.82.29.34", "rank_id": "17" },
                { "device_id": "2", "device_ip": "10.82.29.35", "rank_id": "18" },
                { "device_id": "3", "device_ip": "10.82.29.36", "rank_id": "19" },
                { "device_id": "4", "device_ip": "10.82.29.37", "rank_id": "20" },
                { "device_id": "5", "device_ip": "10.82.29.38", "rank_id": "21" },
                { "device_id": "6", "device_ip": "10.82.29.39", "rank_id": "22" },
                { "device_id": "7", "device_ip": "10.82.29.40", "rank_id": "23" }
            ]
        },
        {
            "server_id": "10.82.27.6",
            "container_ip": "10.82.27.6",
            "device": [
                { "device_id": "0", "device_ip": "10.82.29.41", "rank_id": "24" },
                { "device_id": "1", "device_ip": "10.82.29.42", "rank_id": "25" },
                { "device_id": "2", "device_ip": "10.82.29.43", "rank_id": "26" },
                { "device_id": "3", "device_ip": "10.82.29.44", "rank_id": "27" },
                { "device_id": "4", "device_ip": "10.82.29.45", "rank_id": "28" },
                { "device_id": "5", "device_ip": "10.82.29.46", "rank_id": "29" },
                { "device_id": "6", "device_ip": "10.82.29.47", "rank_id": "30" },
                { "device_id": "7", "device_ip": "10.82.29.48", "rank_id": "31" }
            ]
        }
    ],
    "status": "completed"
}

4 准备模型文件

模型文件也叫权重,以其中2个下载地址示例

  • 国外下载地址:https://huggingface.co/deepseek-ai/DeepSeek-R1/tree/main

  • 国内下载地址:https://modelers.cn/spaces/State_Cloud/DeepSeek-R1/tree/main

1.下载模型文件

mkdir -p /data; cd /data

# 首先保证主机已安装git-lfs命令(https://git-lfs.com,https://github.com/git-lfs/git-lfs)
git lfs install

# 文件大小为1.3T,下载耗时较长
git clone https://modelers.cn/State_Cloud/DeepSeek-R1.git

# 查看下载的模型
ll /data/DeepSeek-R1-origin

2.把原模型文件转化为bf16形式(NPU侧权重转换),目前npu转换脚本不会自动复制原路径下的tokenizer、config等配置文件,需要手动把这些配置文件拷贝到权重目录bf16下面。

git clone https://gitee.com/ascend/ModelZoo-PyTorch.git

cd ModelZoo-PyTorch\MindIE\LLM\DeepSeek\DeepSeek-V2\NPU_inference

# 转换时间视服务器CPU配置而定,大约1小时
python fp8_cast_bf16.py --input-fp8-hf-path /data/DeepSeek-R1-origin --output-bf16-hf-path /data/to/deepseek-R1-bf16

3.在所有节点宿主机查看转换后的权重路径,文件大小为1.3T。注意设置模型文件权限750,否则后期启动模型报错

在这里插入图片描述

5 运行docker容器

5.1 创建容器

1.官方的服务化测试和模型测试可以跳过;docker镜像在华为社区,需要权限才能下载,大小为14G

下载地址:https://www.hiascend.com/developer/ascendhub/detail/af85b724a7e5469ebd7ea13c3439d48f

# 以root登录容器引擎所在的虚拟机,获取登录访问权限,并复制到节点执行
docker login -u XXXXXXX swr.cn-south-1.myhuaweicloud.com

# 输入密码
XXXXXXXXXXXXXX

#  下载镜像
docker pull swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie:2.0.T3-800I-A2-py311-openeuler24.03-lts

2.所有节点运行容器,也可以编写启动脚本。本次部署所需的模型文件和配置文件均在/data目录下,所以把/data挂载给了容器/data

设置的环境变量参考:https://gitee.com/ascend/ModelZoo-PyTorch/tree/master/MindIE/LLM/DeepSeek/DeepSeek-R1

#!/bin/bash
docker run -itd \
  --privileged \
  --name=DeepSeek-R1-full \
  --net=host \
  --shm-size 500g \
  --device=/dev/davinci0 \
  --device=/dev/davinci1 \
  --device=/dev/davinci2 \
  --device=/dev/davinci3 \
  --device=/dev/davinci4 \
  --device=/dev/davinci5 \
  --device=/dev/davinci6 \
  --device=/dev/davinci7 \
  --device=/dev/davinci_manager \
  --device=/dev/hisi_hdc \
  --device=/dev/devmm_svm \
  -v /etc/localtime:/etc/localtime \
  -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
  -v /usr/local/Ascend/firmware:/usr/local/Ascend/firmware \
  -v /usr/local/sbin/npu-smi:/usr/local/sbin/npu-smi \
  -v /usr/local/sbin:/usr/local/sbin \
  -v /etc/hccn.conf:/etc/hccn.conf \
  -v /data:/data \
  -e ATB_LLM_HCCL_ENABLE=1 \
  -e ATB_LLM_COMM_BACKEND="hccl" \
  -e HCCL_CONNECT_TIMEOUT=7200 \
  -e WORLD_SIZE=32 \
  -e HCCL_EXEC_TIMEOUT=0 \
  -e PYTORCH_NPU_ALLOC_CONF=expandable_segments:True \
  -e RANKTABLEFILE=/data/rank_table_full.json \
  -e MIES_CONTAINER_IP=`hostname -I |awk '{print $1}'` \
  -e OMP_NUM_THREADS=1 \
  -e NPU_MEMORY_FRACTION=0.95 \
  swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie:2.0.T3-800I-A2-py311-openeuler24.03-lts \
  bash

5.2 修改mindie配置文件

参考链接1 参考链接2

1.所有节点进入容器后再修改 vim /usr/local/lib/python3.11/site-packages/mindiebenchmark/config/config.json

docker exec -it DeepSeek-R1-full bash

vim /usr/local/lib/python3.11/site-packages/mindiebenchmark/config/config.json

由于ds-3是主节点,因此所有的节点的配置文件ipAddress" 都是 "10.82.27.3

所有节点修改为:

{
    "Version" : "1.0.0",
    "LogConfig" :
    {
        "logLevel" : "Info",
        "logFileSize" : 20,
        "logFileNum" : 20,
        "logPath" : "logs/mindie-server.log"
    },

    "ServerConfig" :
    {
        "ipAddress" : "10.82.27.3",
        "managementIpAddress" : "10.82.27.3",
        "port" : 1025,
        "managementPort" : 1026,
        "metricsPort" : 1027,
        "allowAllZeroIpListening" : false,
        "maxLinkNum" : 300,
        "httpsEnabled" : false,
        "fullTextEnabled" : false,
        "tlsCaPath" : "security/ca/",
        "tlsCaFile" : ["ca.pem"],
        "tlsCert" : "security/certs/server.pem",
        "tlsPk" : "security/keys/server.key.pem",
        "tlsPkPwd" : "security/pass/key_pwd.txt",
        "tlsCrlPath" : "security/certs/",
        "tlsCrlFiles" : ["server_crl.pem"],
        "managementTlsCaFile" : ["management_ca.pem"],
        "managementTlsCert" : "security/certs/management/server.pem",
        "managementTlsPk" : "security/keys/management/server.key.pem",
        "managementTlsPkPwd" : "security/pass/management/key_pwd.txt",
        "managementTlsCrlPath" : "security/management/certs/",
        "managementTlsCrlFiles" : ["server_crl.pem"],
        "kmcKsfMaster" : "tools/pmt/master/ksfa",
        "kmcKsfStandby" : "tools/pmt/standby/ksfb",
        "inferMode" : "standard",
        "interCommTLSEnabled" : false,
        "interCommPort" : 1121,
        "interCommTlsCaPath" : "security/grpc/ca/",
        "interCommTlsCaFiles" : ["ca.pem"],
        "interCommTlsCert" : "security/grpc/certs/server.pem",
        "interCommPk" : "security/grpc/keys/server.key.pem",
        "interCommPkPwd" : "security/grpc/pass/key_pwd.txt",
        "interCommTlsCrlPath" : "security/grpc/certs/",
        "interCommTlsCrlFiles" : ["server_crl.pem"],
        "openAiSupport" : "vllm"
    },

    "BackendConfig" : {
        "backendName" : "mindieservice_llm_engine",
        "modelInstanceNumber" : 1,
        "npuDeviceIds" : [[0,1,2,3,4,5,6,7]],
        "tokenizerProcessNumber" : 8,
        "multiNodesInferEnabled" : true,
        "multiNodesInferPort" : 1120,
        "interNodeTLSEnabled" : false,
        "interNodeTlsCaPath" : "security/grpc/ca/",
        "interNodeTlsCaFiles" : ["ca.pem"],
        "interNodeTlsCert" : "security/grpc/certs/server.pem",
        "interNodeTlsPk" : "security/grpc/keys/server.key.pem",
        "interNodeTlsPkPwd" : "security/grpc/pass/mindie_server_key_pwd.txt",
        "interNodeTlsCrlPath" : "security/grpc/certs/",
        "interNodeTlsCrlFiles" : ["server_crl.pem"],
        "interNodeKmcKsfMaster" : "tools/pmt/master/ksfa",
        "interNodeKmcKsfStandby" : "tools/pmt/standby/ksfb",
        "ModelDeployConfig" :
        {
            "maxSeqLen" : 24576,
            "maxInputTokenLen" : 16384,
            "truncation" : true,
            "ModelConfig" : [
                {
                    "modelInstanceType" : "Standard",
                    "modelName" : "deepseekr1",
                    "modelWeightPath" : "/data/DeepSeek-R1-bf16",
                    "worldSize" : 8,
                    "cpuMemSize" : 5,
                    "npuMemSize" : -1,
                    "backendType" : "atb",
                    "trustRemoteCode" : false
                }
            ]
        },

        "ScheduleConfig" :
        {
            "templateType" : "Standard",
            "templateName" : "Standard_LLM",
            "cacheBlockSize" : 128,

            "maxPrefillBatchSize" : 8,
            "maxPrefillTokens" : 16384,
            "prefillTimeMsPerReq" : 150,
            "prefillPolicyType" : 0,

            "decodeTimeMsPerReq" : 50,
            "decodePolicyType" : 0,

            "maxBatchSize" : 8,
            "maxIterTimes" : 8192,
            "maxPreemptCount" : 0,
            "supportSelectBatch" : false,
            "maxQueueDelayMicroseconds" : 5000
        }
    }
}

6 启动模型

注:此步骤出错可能性较大,须仔细排查

1.在所有节点进入容器:docker exec -it DeepSeek-R1 bash

执行该脚本,启动模型,通常耗时1小时内。超过2小时启动不成功会自动超时退出

cd /usr/local/Ascend/mindie/latest/mindie-service/
./bin/mindieservice_daemon

2.观察启动时的日志输出

主节点的部分启动日志输出:在这里插入图片描述

所有节点的最后输出为"Daemon start success!"表示模型运行成功

其他节点的部分启动日志输出:在这里插入图片描述

3.所有节点可以看到显存占用逐渐升高,若显存占用超过40分钟没有达到总显存的95%,或长时间不更新,很有可能是出错了

在这里插入图片描述

7 验证测试

1.在主节点ds-3查看1025端口监听状态:

[root@ds-3 ~]# ss -antp | grep 1025
LISTEN 0      5               10.82.27.3:1025             0.0.0.0:*     users:(("mindieservice_d",pid=185939,fd=41))
ESTAB  0      0               10.82.27.3:1025          10.82.0.31:38682 users:(("mindieservice_d",pid=185939,fd=46))
ESTAB  0      0               10.82.27.3:1025          10.82.0.31:33536 users:(("mindieservice_d",pid=185939,fd=3))

2.进行接口测试,可以看到问题回复

[root@ds-3 ~]# curl 10.82.27.3:1025/generate -X POST -d '{"inputs":"题目:糖果的数量,小明、小红和小刚一共有24颗糖果。已知:小明比小红多2颗糖果;小红比小刚少4颗糖果。问题:小明、小红和小刚分别有多少颗糖果?","parameters":{"max_new_tokens":500},"temperature":0.3, "top_p":0.3, "top_k":5, "do_sample":true, "repetition_penalty":1.05, "seed":128}'
{"generated_text":"答案:小明有8颗,小红有6颗,小刚有10颗。解析:设小红有x颗糖果,则小明有x+2颗,小刚有x+4颗。根据总数为24颗,建立方程x + (x+2) + (x+4) = 24,解得x=6。因此,小明8颗,小红6颗,小刚10颗。<|end▁of▁sentence|>"}[root@ds-3 ~]# 

在这里插入图片描述

8 常见问题

1 权限问题

  • 保证权重路径是可用的,执行命令修改权限(chmod 750 -R /data/DeepSeek-R1-bf16),注意是整个父级目录的权限
  • rank_table.json:NPU卡集群的配置文件权限必须是640

2 IP地址参数配置错误

  • 在容器内的/usr/local/Ascend/mindie/2.0.T3/mindie-service/conf/config.json ,master的IP必须统一为ds-3节点的IP

  • rank-table.json 中的NPU卡IP地址和序列号不一致,即NPU卡配置的IP地址和配置文件对应不上

3 模型加载超时

最后运行模型时不报错,但是进度也不更新,直至超时退出。可能是npu的TLS校验未关闭

4 掉卡

部分节点的NPU卡可能掉线,板卡指示灯不亮,需要使用ping检测

# 改为节点的NPU卡IP地址范围
for i in {17..24}; do hccn_tool -i 0 -ping -g address 10.82.29.$i pkt 3; done

9 性能测试

9.1 测试前准备

1.任意一个节点运行1个容器

#!/bin/bash
docker run -itd \
  --privileged \
  --name=mindie_benchmark \
  --net=host \
  --shm-size 500g \
  --device=/dev/davinci0 \
  --device=/dev/davinci1 \
  --device=/dev/davinci2 \
  --device=/dev/davinci3 \
  --device=/dev/davinci4 \
  --device=/dev/davinci5 \
  --device=/dev/davinci6 \
  --device=/dev/davinci7 \
  --device=/dev/davinci_manager \
  --device=/dev/hisi_hdc \
  --device=/dev/devmm_svm \
  -v /etc/localtime:/etc/localtime \
  -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
  -v /usr/local/Ascend/firmware:/usr/local/Ascend/firmware \
  -v /usr/local/sbin/npu-smi:/usr/local/sbin/npu-smi \
  -v /usr/local/sbin:/usr/local/sbin \
  -v /etc/hccn.conf:/etc/hccn.conf \
  -v /data:/data \
  -e ATB_LLM_HCCL_ENABLE=1 \
  -e ATB_LLM_COMM_BACKEND="hccl" \
  -e HCCL_CONNECT_TIMEOUT=7200 \
  -e WORLD_SIZE=32 \
  -e HCCL_EXEC_TIMEOUT=0 \
  -e PYTORCH_NPU_ALLOC_CONF=expandable_segments:True \
  -e RANKTABLEFILE=/data/rank_table_full.json \
  -e MIES_CONTAINER_IP=`hostname -I |awk '{print $1}'` \
  -e OMP_NUM_THREADS=1 \
  -e NPU_MEMORY_FRACTION=0.95 \
  swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie:2.0.T3-800I-A2-py311-openeuler24.03-lts \
  bash

2.进入容器修改文件,并设置权限为640

chmod 640 /usr/local/lib/python3.11/site-packages/mindiebenchmark/config/config.json 
chmod 640  /usr/local/lib/python3.11/site-packages/mindiebenchmark/config/synthetic_config.json 
chmod 640  /usr/local/lib/python3.11/site-packages/mindieclient/python/config/config.jso

9.2 开始测试

提供了如下3种测试方式:

  • Concurrency模拟用户并发数
  • MaxOutputLen最大输出长度
benchmark \
--DatasetType "synthetic" \
--ModelName deepseekr1 \
--ModelPath "/data/DeepSeek-R1-bf16/" \
--TestType vllm_client \
--Http https://10.82.27.3:1025 \
--ManagementHttp https://10.82.27.3:1025 \
--Concurrency 128 \
--MaxOutputLen 20 \
--TaskKind stream \
--Tokenizer True \
--SyntheticConfigPath /data/Benchmark/synthetic_config.json

benchmark \
--DatasetPath "/data/Benchmark/synthetic_config.json" \
--DatasetType "synthetic" \
--ModelName deepseekr1 \
--ModelPath "/data/DeepSeek-R1-bf16/" \
--TaskKind stream \
--Concurrency 1 \
--MaxOutputLen 100 \
--TestType openai \
--Http http://10.82.27.3:1025 

benchmark \
--DatasetPath "/data/Benchmark/synthetic_config.json" \
--DatasetType "synthetic" \
--ModelName deepseekr1 \
--ModelPath "/data/DeepSeek-R1-bf16/" \
--TaskKind stream \
--Concurrency 100 \
--MaxOutputLen 100 \
--TestType openai \
--Http http://10.82.27.3:1025 

2.查看运行结果

[root@ds-3 ~]# cd /root/mindie/log/debug
[root@ds-3 debug]# ll
total 136
drwxr-x--- 3 root root  4096 Feb 24 15:57 instance
-r--r----- 1 root root 11883 Feb 21 16:09 mindie-benchmark_1059_20250221160904.log
-r--r----- 1 root root 11879 Feb 21 16:16 mindie-benchmark_1266_20250221161610.log
-r--r----- 1 root root 11880 Feb 21 16:21 mindie-benchmark_1475_20250221162121.log
-r--r----- 1 root root 11847 Feb 24 15:14 mindie-benchmark_1745_20250224151403.log
-r--r----- 1 root root 11865 Feb 24 15:39 mindie-benchmark_1951_20250224153930.log
-r--r----- 1 root root 11987 Feb 24 15:57 mindie-benchmark_2218_20250224155759.log
-r--r----- 1 root root  1601 Feb 21 15:12 mindie-benchmark_473_20250221151213.log
-r--r----- 1 root root 11936 Feb 21 15:38 mindie-benchmark_496_20250221153833.log
-r--r----- 1 root root 11864 Feb 21 15:54 mindie-benchmark_786_20250221155448.log
-r--r----- 1 root root   668 Feb 21 16:07 mindie-client_1059_20250221160904.log
-r--r----- 1 root root   668 Feb 21 16:14 mindie-client_1266_20250221161610.log
-r--r----- 1 root root   668 Feb 21 16:19 mindie-client_1475_20250221162121.log
-r--r----- 1 root root   668 Feb 24 15:12 mindie-client_1745_20250224151403.log
-r--r----- 1 root root   668 Feb 24 15:37 mindie-client_1951_20250224153930.log
-r--r----- 1 root root   668 Feb 24 15:47 mindie-client_2218_20250224155759.log
-r--r----- 1 root root   665 Feb 21 15:27 mindie-client_496_20250221153833.log
-r--r----- 1 root root   665 Feb 21 15:52 mindie-client_786_20250221155448.l
[2025-02-24 15:57:54.612+08:00] [2218] [281473615305280] [benchmark] [INFO] [output.py:115] 
The BenchMark test performance metric result is:
+---------------------+-----------------+-----------------+----------------+-----------------+-----------------+-----------------+-----------------+-----+
|              Metric |         average |             max |            min |             P75 |             P90 |         SLO_P90 |             P99 |   N |
+---------------------+-----------------+-----------------+----------------+-----------------+-----------------+-----------------+-----------------+-----+
|      FirstTokenTime |     135.4161 ms |     191.7768 ms |     83.1695 ms |     163.2805 ms |     173.1221 ms |     173.1221 ms |     190.0502 ms | 100 |
|          DecodeTime |      67.9848 ms |    6735.3363 ms |     59.0086 ms |      69.3274 ms |      72.6631 ms |      70.7305 ms |      80.2135 ms | 100 |
|      LastDecodeTime |      68.9629 ms |      77.6713 ms |     62.8839 ms |      71.9116 ms |      74.4654 ms |      74.4654 ms |      77.4178 ms | 100 |
|       MaxDecodeTime |     169.5477 ms |    6735.3363 ms |     71.0449 ms |      81.7754 ms |      85.9031 ms |      85.9031 ms |     728.0497 ms | 100 |
|        GenerateTime |    6480.2226 ms |   12720.5217 ms |   4565.1791 ms |    6823.2514 ms |    7043.4134 ms |    7043.4134 ms |    7397.2595 ms | 100 |
|         InputTokens |          100.06 |             195 |              4 |          148.75 |           176.3 |           176.3 |          194.01 | 100 |
|     GeneratedTokens |           94.31 |             100 |             68 |           100.0 |           100.0 |           100.0 |           100.0 | 100 |
| GeneratedTokenSpeed | 14.6433 token/s | 15.5779 token/s | 7.0752 token/s | 15.0536 token/s | 15.2666 token/s | 15.2666 token/s | 15.4018 token/s | 100 |
| GeneratedCharacters |          375.64 |             483 |            207 |           421.5 |           451.0 |           451.0 |          469.14 | 100 |
|           Tokenizer |       1.0166 ms |        23.54 ms |      0.2642 ms |       0.8698 ms |       1.0637 ms |       1.0637 ms |       6.1662 ms | 100 |
|         Detokenizer |       1.0548 ms |       1.3294 ms |      0.7761 ms |       1.1096 ms |        1.121 ms |        1.121 ms |       1.3171 ms | 100 |
|  CharactersPerToken |           3.983 |               / |              / |               / |               / |               / |               / | 100 |
|  PostProcessingTime |            0 ms |            0 ms |           0 ms |            0 ms |            0 ms |            0 ms |            0 ms | 100 |
|         ForwardTime |            0 ms |            0 ms |           0 ms |            0 ms |            0 ms |            0 ms |            0 ms | 100 |
+---------------------+-----------------+-----------------+----------------+-----------------+-----------------+-----------------+-----------------+-----+
[2025-02-24 15:57:54.613+08:00] [2218] [281473615305280] [benchmark] [INFO] [output.py:121] 
The BenchMark test common metric result is:
+------------------------+---------------------------------------+
|          Common Metric |                                 Value |
+------------------------+---------------------------------------+
|            CurrentTime |                   2025-02-24 15:57:54 |
|            TimeElapsed |                            648.1494 s |
|             DataSource | /data/Benchmark/synthetic_config.json |
|                 Failed |                             0( 0.0% ) |
|               Returned |                         100( 100.0% ) |
|                  Total |                         100[ 100.0% ] |
|            Concurrency |                                     1 |
|              ModelName |                            deepseekr1 |
|                   lpct |                             1.3533 ms |
|             Throughput |                          0.1543 req/s |
|          GenerateSpeed |                       14.5507 token/s |
| GenerateSpeedPerClient |                       14.5507 token/s |
|               accuracy |                                     / |
+------------------------+---------------------------------------+


       Failed |                             0( 0.0% ) |
|               Returned |                         100( 100.0% ) |
|                  Total |                         100[ 100.0% ] |
|            Concurrency |                                     1 |
|              ModelName |                            deepseekr1 |
|                   lpct |                             1.3533 ms |
|             Throughput |                          0.1543 req/s |
|          GenerateSpeed |                       14.5507 token/s |
| GenerateSpeedPerClient |                       14.5507 token/s |
|               accuracy |                                     / |
+------------------------+---------------------------------------+


### DeepSeek-R1 671B 技术文档概述 对于希望获取有关 DeepSeek-R1 671B 的技术文档、版本下载、安装指南以及配置和故障排除方法的信息,可以从多个方面入手。 #### 文档访问途径 官方提供了详细的模型说明和技术文档。由于该模型的权重是开源的,可以通过 HuggingFace 平台访问相关资源[^1]。具体来说,HuggingFace 提供了全面的技术文档和支持材料,帮助开发者了解如何有效地利用这些预训练模型。 #### 版本管理与下载 针对不同需求,平台通常会提供多种版本的选择。建议定期查看项目页面上的更新日志来确认最新的发布情况并选择适合的应用程序接口(API) 或命令行工具进行下载操作。例如,在 HuggingFace 上可以直接通过网页界面找到所需的特定版本,并按照指引完成下载过程。 #### 安装指导 一般情况下,这类大型语言模型的部署依赖于 Python 环境及其配套库文件的支持。以下是基于 pip 工具的一个简单示例脚本来展示基本的环境搭建流程: ```bash pip install transformers torch datasets evaluate ``` 这段代码片段展示了如何使用 `pip` 来安装必要的Python包,以便能够加载和运行像 DeepSeek 这样的大模型。 #### 配置优化 为了使模型能够在目标硬件上高效运作,可能还需要调整一些参数设置或采用专门设计的数据管道方案。这往往涉及到修改配置文件中的超参设定或是引入额外的加速组件(如 GPU 支持)。具体的调优策略应参照官方提供的最佳实践案例来进行实施。 #### 故障排查技巧 当遇到问题时,首先应当查阅官方论坛或者 GitHub Issues 页面寻找相似的问题报告及解决方案。如果仍无法解决,则考虑收集尽可能详尽的日志信息并向社区寻求进一步的帮助。此外,保持软件栈处于最新状态也有助于减少兼容性引发的问题发生几率。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值