EP模式下容器内进行推理测试
本文的软硬件环境如下:
机器:x86台式机一台
OS: 5.4.0-26-generic Ubuntu20.04 LTS
推理卡:DLAP200-HP-2(凌华基于atlas200模块打造的两模块推理卡)
1. 推理卡固件和驱动安装
凌华的推理卡与华为的A300-3010推理卡的架构和实现方式一致,所以具体过程请参考华为官网的A300-3010推理卡的固件和驱动安装过程。安装成功后结果如下:
HwHiAiUser@ChengMing-3900:~$ npu-smi info
+--------------------------------------------------------------------------------------------------------+
| npu-smi 23.0.rc2 Version: 23.0.rc2 |
+-------------------------------+-----------------+------------------------------------------------------+
| NPU Name | Health | Power(W) Temp(C) Hugepages-Usage(page) |
| Chip Device | Bus-Id | AICore(%) Memory-Usage(MB) |
+===============================+=================+======================================================+
| 0 310 | OK | 12.8 48 0 / 969 |
| 0 0 | 0000:03:00.0 | 0 587 / 7759 |
+===============================+=================+======================================================+
| 1 310 | OK | 12.8 47 0 / 969 |
| 0 1 | 0000:04:00.0 | 0 573 / 7759 |
+===============================+=================+======================================================+
+-------------------------------+-----------------+------------------------------------------------------+
| NPU Chip | Process id | Process name | Process memory(MB) |
+===============================+=================+======================================================+
| No running processes found in NPU 0 |
+===============================+=================+======================================================+
| No running processes found in NPU 1 |
+===============================+=================+======================================================+
2. 安装docker
运行一下命令安装docker
HwHiAiUser@ChengMing-3900:~$ sudo apt install docker.io
HwHiAiUser@ChengMing-3900:~$ sudo groupadd docker
HwHiAiUser@ChengMing-3900:~$ sudo usermod -aG docker HwHiAiUser
3. 获取华为的推理镜像
通过https://www.hiascend.com/developer/ascendhub登陆到华为官方的镜像仓库。
选择推理镜像—>infer-modelzoo
根据自己npu-smi info显示的版本信息,选择对应的镜像版本,我的驱动版本是23.0.RC2,所以我选择23.0.RC2-mxvision,点击立即下载。要求输入你在华为的账号密码。
根据弹出的镜像下载步骤,来下载镜像。
HwHiAiUser@ChengMing-3900:~$ sudo docker login -u cn-south-1@H2W7IKXWB30I9YP30X8A swr.cn-south-1.myhuaweicloud.com
根据提示输入密码
HwHiAiUser@ChengMing-3900:~$ sudo docker pull swr.cn-south-1.myhuaweicloud.com/ascendhub/infer-modelzoo:23.0.RC2-mxvision-x86
镜像下载完成后,运行一下命令查看镜像。
HwHiAiUser@ChengMing-3900:~$ sudo docker images
[sudo] HwHiAiUser 的密码:
REPOSITORY TAG IMAGE ID CREATED SIZE
swr.cn-south-1.myhuaweicloud.com/ascendhub/infer-modelzoo 23.0.RC2-mxvision-x86 6a41f21ad7cc 10 months ago 6.28GB
4. 运行容器
根据镜像描述文件中的启动容器命令,编写一个运行脚本。
HwHiAiUser@ChengMing-3900:~$ vim run_docker.sh
输入以下内容:
docker run -it \
-u root \
--device=/dev/davinci0 \
--device=/dev/davinci1 \
--device=/dev/davinci_manager \
--device=/dev/devmm_svm \
--device=/dev/hisi_hdc \
-v /usr/local/dcmi:/usr/local/dcmi \
-v /var/log/npu:/var/log/npu \
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
-v /usr/slog:/usr/slog \
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
-v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
-v /usr/local/Ascend/driver/tools/:/usr/local/Ascend/driver/tools/ \
-v /usr/local/Ascend/add-ons/:/usr/local/Ascend/add-ons/ \
-v /data:/data \
swr.cn-south-1.myhuaweicloud.com/ascendhub/infer-modelzoo:23.0.RC2-mxvision-x86 \
/bin/bash
给脚本赋予执行权限
HwHiAiUser@ChengMing-3900:~$ chmod +x ./run_docker.sh
启动容器
HwHiAiUser@ChengMing-3900:~$ ./run_docker.sh
在root用户下访问硬件设备
root@4a7c46a13bdf:/home/HwHiAiUser# npu-smi info
+--------------------------------------------------------------------------------------------------------+
| npu-smi 23.0.rc2 Version: 23.0.rc2 |
+-------------------------------+-----------------+------------------------------------------------------+
| NPU Name | Health | Power(W) Temp(C) Hugepages-Usage(page) |
| Chip Device | Bus-Id | AICore(%) Memory-Usage(MB) |
+===============================+=================+======================================================+
| 0 310 | OK | 12.8 48 0 / 969 |
| 0 0 | 0000:03:00.0 | 0 587 / 7759 |
+===============================+=================+======================================================+
| 1 310 | OK | 12.8 48 0 / 969 |
| 0 1 | 0000:04:00.0 | 0 573 / 7759 |
+===============================+=================+======================================================+
+-------------------------------+-----------------+------------------------------------------------------+
| NPU Chip | Process id | Process name | Process memory(MB) |
+===============================+=================+======================================================+
| No running processes found in NPU 0 |
+===============================+=================+======================================================+
| No running processes found in NPU 1 |
+===============================+=================+======================================================+
切换到HwHiAiUser用户下,再次访问设备,如果报以下错误,可能是由于容器中的HwHiAiUser用户的ID和宿主机中对应的用户ID不一致导致的,则需要修改用户的ID和组ID。
root@4a7c46a13bdf:/home/HwHiAiUser# su HwHiAiUser
HwHiAiUser@4a7c46a13bdf:~$ npu-smi info
DrvMngGetConsoleLogLevel failed. (g_conLogLevel=3)
dcmi module initialize failed. ret is -8005
退出容器,查看宿主机HwHiAiUser的用户ID, 组ID信息
HwHiAiUser@ChengMing-3900:~$ id HwHiAiUser
用户id=998(HwHiAiUser) 组id=1001(HwHiAiUser) 组=1001(HwHiAiUser),4(adm),27(sudo),134(docker)
再次启动容器
HwHiAiUser@ChengMing-3900:~$ sudo docker run ca7c521c074d
修改HwHiAiUser的用户ID和组ID,保持和宿主机中的一致。
root@ca7c521c074d:/home/HwHiAiUser# id HwHiAiUser #查看容器中的HwHiAiUser的用户ID和组ID
uid=1000(HwHiAiUser) gid=1000(HwHiAiUser) groups=1000(HwHiAiUser)
root@ca7c521c074d:/home/HwHiAiUser# usermod -u 998 HwHiAiUser #修改用户ID
root@ca7c521c074d:/home/HwHiAiUser# groupmod -g 1001 HwHiAiUser #修改组ID
再次切换到HwHiAiUser用户。
root@4a7c46a13bdf:/home/HwHiAiUser# su HwHiAiUser
再次在容器中运行npu-smi info查看能否正常访问硬件
HwHiAiUser@ca7c521c074d:~$ npu-smi info
+--------------------------------------------------------------------------------------------------------+
| npu-smi 23.0.rc2 Version: 23.0.rc2 |
+-------------------------------+-----------------+------------------------------------------------------+
| NPU Name | Health | Power(W) Temp(C) Hugepages-Usage(page) |
| Chip Device | Bus-Id | AICore(%) Memory-Usage(MB) |
+===============================+=================+======================================================+
| 0 310 | OK | 12.8 48 0 / 969 |
| 0 0 | 0000:03:00.0 | 0 587 / 7759 |
+===============================+=================+======================================================+
| 1 310 | OK | 12.8 48 0 / 969 |
| 0 1 | 0000:04:00.0 | 0 573 / 7759 |
+===============================+=================+======================================================+
+-------------------------------+-----------------+------------------------------------------------------+
| NPU Chip | Process id | Process name | Process memory(MB) |
+===============================+=================+======================================================+
| No running processes found in NPU 0 |
+===============================+=================+======================================================+
| No running processes found in NPU 1 |
+===============================+=================+======================================================+
5. 运行推理测试程序
切换到HwHiAiUser用户
root@ca7c521c074d:/home/HwHiAiUser# su HwHiAiUser
运行测试程序:
HwHiAiUser@ca7c521c074d:~$ bash test_model.sh
Begin to initialize Log.
The output directory of logs file doesn't exist.
Create directory to save logs information.
WARNING: Logging before InitGoogleLogging() is written to STDERR
I20240618 02:51:41.334579 59 FileUtils.cpp:330] The input file is empty
I20240618 02:51:41.334590 59 FileUtils.cpp:472] Check Other group permission: Current permission is 4, but required no greater than 0.
Save logs information to specified directory.
sdk run time: 6814
process img0: image_0051.jpg, infer result: {"MxpiClass":[{"classId":504,"className":" 504: 'coffee mug',","confidence":6.26953125},{"classId":968,"className":" 968: 'cup',","confidence":5.8203125},{"classId":901,"className":" 901: 'whiskey jug',","confidence":4.9453125},{"classId":725,"className":" 725: 'pitcher, ewer',","confidence":4.31640625},{"classId":505,"className":" 505: 'coffeepot',","confidence":4.16796875}]}
sdk run time: 6066
process img1: image_0019.jpg, infer result: {"MxpiClass":[{"classId":504,"className":" 504: 'coffee mug',","confidence":6.42578125},{"classId":968,"className":" 968: 'cup',","confidence":5.453125},{"classId":901,"className":" 901: 'whiskey jug',","confidence":4.79296875},{"classId":505,"className":" 505: 'coffeepot',","confidence":4.38671875},{"classId":550,"className":" 550: 'espresso maker',","confidence":4.015625}]}
sdk run time: 7644
如果终端打印出如下格式的推理结果,则证明推理执行成功
infer result: {"MxpiClass":[{"classId":504,"className":" 504: 'coffee mug',","confidence":6.26953125},{"classId":968,"className":" 968: 'cup',","confidence":5.8203125},{"classId":901,"className":" 901: 'whiskey jug',","confidence":4.9453125},{"classId":725,"className":" 725: 'pitcher, ewer',","confidence":4.31640625},{"classId":505,"className":" 505: 'coffeepot',","confidence":4.16796875}]}