1. Install toolbox
根据设备驱动版本,安装对应版本的Ascend-cann-toolbox_x.x.x_linux-aarch64.run,我的驱动版本是21.0.1,在此我安装的toolbox版本如下:
HwHiAiUser@davinci-mini:~$ sudo ./Ascend-cann-toolbox_5.0.1_linux-aarch64.run --install
Verifying archive integrity... All good.
Uncompressing ascend-cann-toolbox 100%
[Toolbox] [20210806-07:22:37] [INFO] create /var/log/ascend_seclog/ascend_toolbox_install.log success
[Toolbox] [20210806-07:22:37] [INFO] LogFile /var/log/ascend_seclog/ascend_toolbox_install.log
[Toolbox] [20210806-07:22:37] [INFO] install start
[Toolbox] [20210806-07:22:37] [INFO] The install path is /usr/local/Ascend/toolbox/5.0.1 !
[Toolbox] [20210806-07:22:38] [WARNING] environment is neither inference nor training
[Toolbox] [20210806-07:22:38] [INFO] add latest symbol link:ln -s 5.0.1 /usr/local/Ascend/toolbox/latest
[Toolbox] [20210806-07:22:38] [INFO] install Ascend-SDK-Manager.. /usr/local/Ascend/toolbox/5.0.1
[Toolbox] [20210806-07:22:38] [INFO] install Ascend-SDK-Manager /usr/local/Ascend/toolbox/5.0.1 success
[Toolbox] [20210806-07:22:45] [INFO] install Ascend-msInstaller.. /usr/local/Ascend/toolbox/5.0.1
[Toolbox] [20210806-07:22:45] [INFO] msInstaller tar file=:MsInstaller/msInstaller-1.0.0-arm64.tar.gz
[Toolbox] [20210806-07:22:47] [INFO] install Ascend-msInstaller.. /usr/local/Ascend/toolbox/5.0.1 success
install Ascend-DMI.. /usr/local/Ascend/toolbox/5.0.1
[Toolbox] [20210806-07:23:05] [INFO] install ascend-dmi cmd=AXESMI/Ascend-dmi-5.0.1-centos7.6-arm64.run --install --install-path=/usr/local/Ascend/toolbox/5.0.1 --install-username=HwHiAiUser --install-usergroup=HwHiAiUser
Verifying archive integrity... All good.
Uncompressing Ascend-DMI-Package 100%
--install-path=/usr/local/Ascend/toolbox/5.0.1
--install-username=HwHiAiUser
--install-usergroup=HwHiAiUser
./bin/ascend-dmi
/usr/local/Ascend/toolbox/5.0.1
INFO: your install path is /usr/local/Ascend/toolbox/5.0.1
INFO: install is success
[Toolbox] [20210806-07:23:06] [INFO] install docker plugin cmd=ascend-docker-plugin/Ascend-docker-runtime-5.0.1-aarch64.run
Verifying archive integrity... All good.
Uncompressing ascend-docker-runtime 100%
installing ascend docker runtime
install executable files success
/etc/docker/daemon.json
/etc/docker/daemon.json.29949
add
create damom.json success
please reboot docker daemon to take effect
[Toolbox] [20210806-07:23:08] [INFO] Please make sure that:
PATH includes :
/usr/local/Ascend/toolbox/5.0.1/Ascend-DMI/bin:
LD_LIBRARY_PATH includes :
/usr/local/dcmi:
/usr/local/Ascend/driver/lib64:
/usr/local/Ascend/driver/lib64/driver:
/usr/local/Ascend/toolbox/5.0.1/Ascend-DMI/lib64:
/usr/local/Ascend/nnae/latest/fwkacllib/lib64:
/usr/local/Ascend/nnrt/latest/acllib/lib64:
ASCEND_AICPU_PATH includes:
train:
/usr/local/Ascend/nnae/latest
infer:
/usr/local/Ascend/nnrt/latest
ASCEND_OPP_PATH includes:
train:
/usr/local/Ascend/nnae/latest/opp
infer:
/usr/local/Ascend/nnrt/latest/opp
[Toolbox] [20210806-07:23:08] [INFO] If your service is started using the shell script, you can call the /usr/local/Ascend/toolbox/set_env.sh script to configure environment variables. Note that this script can not be executed mannually.
[Toolbox] [20210806-07:23:08] [INFO] Ascend-cann-toolbox_5.0.1_linux-aarch64.run install success,The install path is /usr/local/Ascend !
2. 配置环境变量(可选)
根据安装后的提示配置好环境变量, 建议配置在root用户home目录下载.bashrc
root@davinci-mini:/home/HwHiAiUser# vim /root/.bashrc
添加以下内容到文件末尾:
PATH=/usr/local/Ascend/toolbox/5.0.1/Ascend-DMI/bin:$PATH
LD_LIBRARY_PATH=/usr/lib64:/usr/local/dcmi:/usr/local/Ascend/driver/lib64:/usr/local/Ascend/driver/lib64/driver:/usr/local/Ascend/toolbox/5.0.1/Ascend-DMI/lib64:/usr/local/Ascend/nnae/latest/fwkacllib/lib64:/usr/local/Ascend/nnrt/latest/acllib/lib64:$LD_LIBRARY_PATH
ASCEND_AICPU_PATH=/usr/local/Ascend/nnrt/latest
ASCEND_OPP_PATH=/usr/local/Ascend/nnrt/latest/opp
3. 制作镜像
-
将下表中的软件包上传到DLAP221的同一目录(如“/home/HwHiAiUser”)。
软件包 说明 获取方法 Ascend-cann-nnrt_{version}_linux-aarch64.run 离线推理引擎包。
*{version}*表示软件包版本。获取链接 Dockerfile 制作镜像所需文件。 用户根据业务自行准备。 ascend_install.info 软件包安装日志文件。 从Host拷贝“/etc/ascend_install.info”文件,以实际路径为准。 version.info Driver版本信息文件。 从Host拷贝“/var/davinci/driver/version.info”文件,以实际路径为准。 SDK业务推理程序压缩包 业务推理程序合集,支持tar、tgz格式。
业务推理程序的压缩包格式,应为容器内自带的压缩程序支持的格式。安装SDK开发套件包后,将模型文件参照样例介绍进行模型转换并放置于models文件夹下,然后使用tar命令压缩成“SDK业务推理程序压缩包”。 -
执行以下步骤准备dockerfile文件。
a. 以root用户登录DLAP221,执行id HwHiAiUser命令查询并记录宿主机上HwHiAiUser用户的UID和GID。root@davinci-mini:/home/HwHiAiUser# id HwHiAiUser uid=1000(HwHiAiUser) gid=1000(HwHiAiUser) groups=1000(HwHiAiUser)
b. 进入步骤1中软件包上传目录,执行以下命令创建dockerfile文件(文件名示例“Dockerfile”)。
root@davinci-mini:/home/HwHiAiUser# vim Dockerfile
c. 写入以下内容后执行:wq命令保存内容。(以下内容仅为编写示例,请用户根据实际情况结合自身理解进行二次开发)
#操作系统及版本号,根据实际修改 FROM ubuntu:18.04 #设置离线推理引擎包参数 ARG NNRT_PKG #设置环境变量 ARG ASCEND_BASE=/usr/local/Ascend ENV LD_LIBRARY_PATH=\ $LD_LIBRARY_PATH:\ $ASCEND_BASE/nnrt/latest/acllib/lib64:\ /usr/lib64 ENV ASCEND_AICPU_PATH=$ASCEND_BASE/nnrt/latest #安装运行sdk所需的第三方工具 RUN apt update && \ apt install -y python3.7 python3.7-dev curl g++ pkg-config vim libblas3 \ liblapack3 liblapack-dev libblas-dev gfortran libhdf5-dev libffi-dev #设置进入启动后的容器的目录 WORKDIR /root #拷贝离线推理引擎包 COPY $NNRT_PKG . COPY ascend_install.info /etc/ RUN mkdir -p /var/davinci/driver COPY version.info /var/davinci/driver #安装离线推理引擎包 RUN umask 0022 && \ groupadd -g 1000 HwHiAiUser && useradd -g HwHiAiUser -d /home/HwHiAiUser -m HwHiAiUser \ && usermod -u 1000 HwHiAiUser &&\ chmod +x ${NNRT_PKG} &&\ ./${NNRT_PKG} --quiet --install &&\ rm ${NNRT_PKG} &&\ rm -rf /var/davinci/driver #拷贝业务推理程序压缩包、安装脚本与运行脚本 ARG DIST_PKG COPY $DIST_PKG . COPY install.sh . RUN true COPY run.sh /usr/local/bin RUN true #运行安装脚本 RUN chmod +x /usr/local/bin/run.sh &&\ chmod +x install.sh &&\ sh install.sh &&\ rm -f $DIST_PKG &&\ rm -f install.sh #容器启动时默认执行的程序 CMD run.sh
Dockerfile中
groupadd -g gid HwHiAiUser && useradd -g HwHiAiUser -d /home/HwHiAiUser -m HwHiAiUser && usermod -u uid HwHiAiUser
为在容器内创建HwHiAiUser用户。gid、uid为宿主机上HwHiAiUser用户的UID和GID,用户可根据宿主机实际的UID和GID进行替换。
d. 修改Dockerfile文件权限。root@davinci-mini:/home/HwHiAiUser# chmod 600 Dockerfile
-
进入软件包所在目录,执行以下命令,构建容器镜像。
docker build -t image-name –build-arg NNRT_PKG= nnrt-name .
注意不要遗漏命令结尾的“.”,命令参数说明如下表所示。
表2 命令参数说明
参数 说明 image-name 镜像名称与标签,用户可自行设置。 –build-arg 指定Dockerfile文件内的参数。 NNRT_PKG nnrt-name为离线推理引擎包名称,注意不要遗漏文件后缀。 Removing intermediate container 1558925ff7fa ---> 5c66102a702f Step 13/18 : ARG DIST_PKG ---> Running in 07e50b329c75 Removing intermediate container 07e50b329c75 ---> 9bf1ab88c62c Step 14/18 : COPY $DIST_PKG . ---> 3e477ddb0aa8 Step 15/18 : COPY install.sh . ---> 2f29fb971fd3 Step 16/18 : COPY run.sh . ---> c53c0995189c Step 17/18 : RUN chmod +x run.sh &&chmod +x install.sh &&sh install.sh &&rm -f $DIST_PKG &&rm -f install.sh ---> Running in ac62589a2c0b Removing intermediate container ac62589a2c0b ---> d39395c47bd2 Step 18/18 : CMD run.sh ---> Running in 00b56fb65bf1 Removing intermediate container 00b56fb65bf1 ---> b09a017a2544 Successfully built b09a017a2544 Successfully tagged davinci-mini:latest
当出现“Successfully built xxx”字样表示镜像构建成功。
-
构建完成后,执行以下命令查看镜像信息。
docker images
root@davinci-mini:/home/HwHiAiUser# docker images REPOSITORY TAG IMAGE ID CREATED SIZE davinci-mini latest b09a017a2544 13 hours ago 996MB
业务推理编写示例
- install.sh示例
root@davinci-mini:~# cat install.sh #!/bin/bash #进入容器工作目录 cd /root #解压业务推理程序压缩包,请根据压缩包格式适配 tar xf dist.tar
- run.sh示例
root@davinci-mini:~# cat run.sh #!/bin/bash #设置AI CPU相关库文件的软链接 sh /usr/local/Ascend/nnrt/latest/arm64-linux/run_aicpu_toolkit.sh #启动slogd守护进程 mkdir -p /usr/slog /var/slogd & #启动DMP(设备管理)守护进程 mkdir -p /run/driver /var/dmp_daemon -I -U 8087 & #进入业务推理程序的可执行文件所在目录 cd /root/dist #运行可执行文件 ./main -i test.jpg
4. 部署推理容器
- 以root用户登录DLAP221
- 执行以下命令启动容器镜像(用户请根据实际情况修改)。
以上命令示例会默认执行业务程序,若用户需要直接进入容器,请在以上命令的末尾添加 /bin/bash。进入容器后,根据实际情况执行run.sh中的命令。docker run -it \ --device=/dev/davinci0 \ --device=/dev/davinci_manager \ --device=/dev/svm0 \ --device=/dev/log_drv \ --device=/dev/event_sched \ --device=/dev/upgrade \ --device=/dev/hi_dvpp \ --device=/dev/memory_bandwidth \ --device=/dev/ts_aisle \ -v /var:/var \ -v /usr/lib64:/usr/lib64 \ -v /usr/local/sbin/npu-smi:/usr/local/sbin/npu-smi \ -v /etc/hdcBasic.cfg:/etc/hdcBasic.cfg \ -v /etc/rc.local:/etc/rc.local \ -v /sys:/sys \ -v /usr/bin/sudo:/usr/bin/sudo \ -v /usr/lib/sudo/:/usr/lib/sudo/ \ -v /etc/sudoers:/etc/sudoers/ \ davinci-mini:latest
说明
本版本支持多个容器挂载同一块芯片:
- 建议最多不超过16个容器
- 各容器通过抢占方式获取芯片算力,不支持内存隔离和算力切分。
- 默认关闭device共享模式:
请在宿主机上执行以下命令开启device共享:
npu-smi set -t device-share -i 0 -c 0 -d 1
可执行以下命令查询device共享状态:
npu-smi info -t device-share -i 0 -c 0
重启或升级后多容器共享功能关闭。
5. 下载已经生产的Image
根据本文档,我生成了一个21.0.1版本的Docker image,你可以通过以下的命令下载到你的DLAP221设备,前提是你的系统版本是21.0.1。
$docker pull shinerchen/dlap221-infer:21.0.1
运行docker image,直接进行推理
root@davinci-mini:~# docker run -it --device=/dev/davinci0 --device=/dev/davinci_manager --device=/dev/svm0 --device=/dev/log_drv --device=/dev/event_sched --device=/dev/upgrade --device=/dev/hi_dvpp --device=/dev/memory_bandwidth --device=/dev/ts_aisle -v /var:/var -v /usr/lib64:/usr/lib64 -v /usr/local/sbin/npu-smi:/usr/local/sbin/npu-smi -v /etc/hdcBasic.cfg:/etc/hdcBasic.cfg -v /etc/rc.local:/etc/rc.local -v /sys:/sys -v /usr/bin/sudo:/usr/bin/sudo -v /usr/lib/sudo/:/usr/lib/sudo/ -v /etc/sudoers:/etc/sudoers/ shinerchen/dlap221-infer:21.0.1
[Info ][2021-08-07 13:54:02:674011][ResourceManager.cpp InitResource:75] Initialized acl successfully.
aicpu_kernels_device/
aicpu_kernels_device/libDvpp_jpeg_decoder.so
aicpu_kernels_device/libOMX_hisi_video_encoder.so
aicpu_kernels_device/libDvpp_api.so
aicpu_kernels_device/libdvpp_kernels.so
aicpu_kernels_device/libpt_kernels.so
aicpu_kernels_device/libtf_kernels.so
aicpu_kernels_device/libDvpp_png_decoder.so
aicpu_kernels_device/version.info
aicpu_kernels_device/libtensorflow.so
aicpu_kernels_device/libaicpu_kernels.so
aicpu_kernels_device/libcpu_kernels.so
aicpu_kernels_device/libtorch_cpu.so
aicpu_kernels_device/libDvpp_vpc.so
aicpu_kernels_device/libc10.so
aicpu_kernels_device/libDvpp_jpeg_encoder.so
aicpu_kernels_device/libOMX_hisi_video_decoder.so
[Info ][2021-08-07 13:54:08:419532][ResourceManager.cpp InitResource:84] Open device 0 successfully.
[Info ][2021-08-07 13:54:08:421054][ResourceManager.cpp InitResource:91] Created context for device 0 successfully
[Info ][2021-08-07 13:54:08:423448][ResourceManager.cpp InitResource:102] Init resource successfully.
[Info ][2021-08-07 13:54:08:423643][AclProcess.cpp InitResource:118] Created the acl context successfully.
[Info ][2021-08-07 13:54:08:424377][AclProcess.cpp InitResource:124] Created the acl stream successfully.
[Info ][2021-08-07 13:54:08:439061][ModelProcess.cpp Init:240] ModelProcess:Begin to init instance.
[Info ][2021-08-07 13:54:08:725656][AclProcess.cpp InitModule:87] Initialized the model process module successfully.
[Info ][2021-08-07 13:54:08:733168][AclProcess.cpp InitModule:92] Initialized the cast operator successfully.
[Info ][2021-08-07 13:54:08:733284][AclProcess.cpp InitModule:97] Initialized the argMax operator successfully.
[Info ][2021-08-07 13:54:08:734489][AclProcess.cpp InitModule:103] Loaded label successfully.
[Info ][2021-08-07 13:54:08:743732][AclProcess.cpp WriteResult:278] inference output index: 384
[Info ][2021-08-07 13:54:08:743916][AclProcess.cpp WriteResult:280] classname: 384: 'indri, indris, Indri indri, Indri brevicaudatus',
[Info ][2021-08-07 13:54:08:744382][AclProcess.cpp Process:465] [Process Delay] cost: 9.40694ms.
[Info ][2021-08-07 13:54:08:746582][ModelProcess.cpp DeInit:150] Model[resnet50][0] deinit begin
[Info ][2021-08-07 13:54:08:755659][ModelProcess.cpp DeInit:189] Model[resnet50][0] deinit success
[Info ][2021-08-07 13:54:08:763378][ResourceManager.cpp Release:44] Finalized acl successfully.
root@davinci-mini:~#
6. Q&A
问题1:
发现运行docker image后,直接运行npu-smi info发生以下错误,找不到NPU设备
root@f75549a927c5:~# npu-smi info
[ERROR] DRV(12,npu-smi):2021-08-07-05:33:53.405.381 [dm_udp.c:84][dmp] [__dm_send_msg 84] sendmsg fail:No such file or directory.
[ERROR] DRV(12,npu-smi):2021-08-07-05:33:53.406.195 [dm_udp.c:129][dmp] [__dm_udp_send 129] __dm_send_msg: sendto fail.errno=2
[ERROR] DRV(12,npu-smi):2021-08-07-05:33:53.406.258 [dm_msg_intf.c:734][dmp] [dm_send_req 734] failed call intf->send_msg, ret = 2
[ERROR] DRV(12,npu-smi):2021-08-07-05:33:53.406.323 [dsmi_common.c:566][dmp] [dsmi_send_msg_rec_res 566] call dev_mon_send_request error:27.
[ERROR] DRV(12,npu-smi):2021-08-07-05:33:53.406.386 [dsmi_dmp_command.c:644][dmp] [dsmi_cmd_get_board_id 644] dev(0) dsmi_send_msg_rec_res failed, ret = 27.
[ERROR] DRV(12,npu-smi):2021-08-07-05:33:53.406.441 [dsmi_common_interface.c:1357][dmp] [dsmi_get_board_id 1357] devid 0 dsmi_cmd_get_board_id failed 27
[ERROR] DRV(12,npu-smi):2021-08-07-05:33:53.406.494 [dsmi_common_interface.c:1410][dmp] [dsmi_get_board_info 1410] devid 0 dsmi_board_id call error ret = 27!
+------------------------------------------------------------------------------+
| npu-smi 21.0.1 Version: |
+-------------------+-----------------+----------------------------------------+
| NPU Name | Health | Power(W) Temp(C) |
| Chip Device | Bus-Id | AICore(%) Memory-Usage(MB) |
+===================+=================+========================================+
解决方法:
运行以下命令,或者先运行“run.sh",做一次推理,run.sh脚本中有调用以下命令。
root@a0784e7df16a:~# mkdir -p /usr/slog
root@a0784e7df16a:~# /var/slogd &
root@a0784e7df16a:~# mkdir -p /run/driver
root@a0784e7df16a:~# /var/dmp_daemon -I -U 8087 &
root@a0784e7df16a:~# npu-smi info
+------------------------------------------------------------------------------+
| npu-smi 21.0.1 Version: UNKNOWN |
+-------------------+-----------------+----------------------------------------+
| NPU Name | Health | Power(W) Temp(C) |
| Chip Device | Bus-Id | AICore(%) Memory-Usage(MB) |
+===================+=================+========================================+
| 0 310 | OK | 12.8 61 |
| 0 0 | 0000:00:00.0 | 0 3440 / 8192 |
+===================+=================+========================================+
问题2:
我们测试发现偶尔有docker service无法启动的问题,登录以后可以手动启动。
解决方法:
分析有可能是某些依赖的服务没有完全起来,导致的,所以我修改了docker的启动方式,修改/lib/systemd/system/docker.service文件,将启动的Type=notify改为idle。