在DLAP221上部署容器进行推理操作

1. Install toolbox

根据设备驱动版本,安装对应版本的Ascend-cann-toolbox_x.x.x_linux-aarch64.run,我的驱动版本是21.0.1,在此我安装的toolbox版本如下:

HwHiAiUser@davinci-mini:~$ sudo ./Ascend-cann-toolbox_5.0.1_linux-aarch64.run --install
Verifying archive integrity... All good.
Uncompressing ascend-cann-toolbox  100%
[Toolbox] [20210806-07:22:37] [INFO] create /var/log/ascend_seclog/ascend_toolbox_install.log success
[Toolbox] [20210806-07:22:37] [INFO] LogFile /var/log/ascend_seclog/ascend_toolbox_install.log
[Toolbox] [20210806-07:22:37] [INFO] install start
[Toolbox] [20210806-07:22:37] [INFO] The install path is /usr/local/Ascend/toolbox/5.0.1 !
[Toolbox] [20210806-07:22:38] [WARNING] environment is neither inference nor training
[Toolbox] [20210806-07:22:38] [INFO] add latest symbol link:ln -s 5.0.1 /usr/local/Ascend/toolbox/latest
[Toolbox] [20210806-07:22:38] [INFO] install Ascend-SDK-Manager.. /usr/local/Ascend/toolbox/5.0.1
[Toolbox] [20210806-07:22:38] [INFO] install Ascend-SDK-Manager /usr/local/Ascend/toolbox/5.0.1 success
[Toolbox] [20210806-07:22:45] [INFO] install Ascend-msInstaller.. /usr/local/Ascend/toolbox/5.0.1
[Toolbox] [20210806-07:22:45] [INFO] msInstaller tar file=:MsInstaller/msInstaller-1.0.0-arm64.tar.gz
[Toolbox] [20210806-07:22:47] [INFO] install Ascend-msInstaller.. /usr/local/Ascend/toolbox/5.0.1 success
install Ascend-DMI.. /usr/local/Ascend/toolbox/5.0.1
[Toolbox] [20210806-07:23:05] [INFO] install ascend-dmi cmd=AXESMI/Ascend-dmi-5.0.1-centos7.6-arm64.run --install --install-path=/usr/local/Ascend/toolbox/5.0.1 --install-username=HwHiAiUser --install-usergroup=HwHiAiUser
Verifying archive integrity... All good.
Uncompressing Ascend-DMI-Package  100%
--install-path=/usr/local/Ascend/toolbox/5.0.1
--install-username=HwHiAiUser
--install-usergroup=HwHiAiUser
./bin/ascend-dmi
/usr/local/Ascend/toolbox/5.0.1
INFO: your install path is /usr/local/Ascend/toolbox/5.0.1
INFO: install is success
[Toolbox] [20210806-07:23:06] [INFO] install docker plugin cmd=ascend-docker-plugin/Ascend-docker-runtime-5.0.1-aarch64.run
Verifying archive integrity... All good.
Uncompressing ascend-docker-runtime  100%
installing ascend docker runtime
install executable files success
/etc/docker/daemon.json
/etc/docker/daemon.json.29949
add
create damom.json success
please reboot docker daemon to take effect
[Toolbox] [20210806-07:23:08] [INFO] Please make sure that:
PATH includes :
        /usr/local/Ascend/toolbox/5.0.1/Ascend-DMI/bin:
LD_LIBRARY_PATH includes :
        /usr/local/dcmi:
        /usr/local/Ascend/driver/lib64:
        /usr/local/Ascend/driver/lib64/driver:
        /usr/local/Ascend/toolbox/5.0.1/Ascend-DMI/lib64:
        /usr/local/Ascend/nnae/latest/fwkacllib/lib64:
        /usr/local/Ascend/nnrt/latest/acllib/lib64:
ASCEND_AICPU_PATH includes:
        train:
        /usr/local/Ascend/nnae/latest
        infer:
        /usr/local/Ascend/nnrt/latest
ASCEND_OPP_PATH includes:
        train:
        /usr/local/Ascend/nnae/latest/opp
        infer:
        /usr/local/Ascend/nnrt/latest/opp
[Toolbox] [20210806-07:23:08] [INFO] If your service is started using the shell script, you can call the /usr/local/Ascend/toolbox/set_env.sh script to configure environment variables. Note that this script can not be executed mannually.
[Toolbox] [20210806-07:23:08] [INFO] Ascend-cann-toolbox_5.0.1_linux-aarch64.run install success,The install path is /usr/local/Ascend !

2. 配置环境变量(可选)

根据安装后的提示配置好环境变量, 建议配置在root用户home目录下载.bashrc

root@davinci-mini:/home/HwHiAiUser# vim /root/.bashrc
添加以下内容到文件末尾:
PATH=/usr/local/Ascend/toolbox/5.0.1/Ascend-DMI/bin:$PATH
LD_LIBRARY_PATH=/usr/lib64:/usr/local/dcmi:/usr/local/Ascend/driver/lib64:/usr/local/Ascend/driver/lib64/driver:/usr/local/Ascend/toolbox/5.0.1/Ascend-DMI/lib64:/usr/local/Ascend/nnae/latest/fwkacllib/lib64:/usr/local/Ascend/nnrt/latest/acllib/lib64:$LD_LIBRARY_PATH
ASCEND_AICPU_PATH=/usr/local/Ascend/nnrt/latest
ASCEND_OPP_PATH=/usr/local/Ascend/nnrt/latest/opp

3. 制作镜像

  1. 将下表中的软件包上传到DLAP221的同一目录(如“/home/HwHiAiUser”)。

    软件包说明获取方法
    Ascend-cann-nnrt_{version}_linux-aarch64.run离线推理引擎包。

    *{version}*表示软件包版本。
    获取链接
    Dockerfile制作镜像所需文件。用户根据业务自行准备。
    ascend_install.info软件包安装日志文件。从Host拷贝“/etc/ascend_install.info”文件,以实际路径为准。
    version.infoDriver版本信息文件。从Host拷贝“/var/davinci/driver/version.info”文件,以实际路径为准。
    SDK业务推理程序压缩包业务推理程序合集,支持tar、tgz格式。

    业务推理程序的压缩包格式,应为容器内自带的压缩程序支持的格式。
    安装SDK开发套件包后,将模型文件参照样例介绍进行模型转换并放置于models文件夹下,然后使用tar命令压缩成“SDK业务推理程序压缩包”。
  2. 执行以下步骤准备dockerfile文件。
    a. 以root用户登录DLAP221,执行id HwHiAiUser命令查询并记录宿主机上HwHiAiUser用户的UID和GID。

    root@davinci-mini:/home/HwHiAiUser# id HwHiAiUser
    uid=1000(HwHiAiUser) gid=1000(HwHiAiUser) groups=1000(HwHiAiUser)
    

    b. 进入步骤1中软件包上传目录,执行以下命令创建dockerfile文件(文件名示例“Dockerfile”)。

    root@davinci-mini:/home/HwHiAiUser# vim Dockerfile
    

    c. 写入以下内容后执行:wq命令保存内容。(以下内容仅为编写示例,请用户根据实际情况结合自身理解进行二次开发)

    	#操作系统及版本号,根据实际修改
    	FROM ubuntu:18.04
    
    	#设置离线推理引擎包参数
    	ARG NNRT_PKG
    
    	#设置环境变量
    	ARG ASCEND_BASE=/usr/local/Ascend
    
    	ENV LD_LIBRARY_PATH=\
    	$LD_LIBRARY_PATH:\
    	$ASCEND_BASE/nnrt/latest/acllib/lib64:\
    	/usr/lib64
    	ENV ASCEND_AICPU_PATH=$ASCEND_BASE/nnrt/latest
    	#安装运行sdk所需的第三方工具
    	RUN apt update && \
    		apt install -y python3.7 python3.7-dev curl g++ pkg-config vim libblas3 \
    			liblapack3 liblapack-dev libblas-dev gfortran libhdf5-dev libffi-dev
    
    	#设置进入启动后的容器的目录
    	WORKDIR /root
    	#拷贝离线推理引擎包
    	COPY $NNRT_PKG .
    	COPY ascend_install.info /etc/
    	RUN mkdir -p /var/davinci/driver
    	COPY version.info /var/davinci/driver
    	#安装离线推理引擎包
    	RUN umask 0022 && \
    	groupadd -g 1000 HwHiAiUser && useradd -g HwHiAiUser -d /home/HwHiAiUser -m HwHiAiUser \
    	&& usermod -u 1000 HwHiAiUser &&\
    	chmod +x ${NNRT_PKG} &&\
    	./${NNRT_PKG} --quiet --install &&\
    	rm ${NNRT_PKG} &&\
    	rm -rf /var/davinci/driver
    	#拷贝业务推理程序压缩包、安装脚本与运行脚本
    	ARG DIST_PKG
    	COPY $DIST_PKG .
    	COPY install.sh .
    	RUN true
    	COPY run.sh /usr/local/bin
    	RUN true
    
    	#运行安装脚本
    	RUN chmod +x /usr/local/bin/run.sh &&\
    	chmod +x install.sh &&\
    	sh install.sh &&\
    	rm -f $DIST_PKG &&\
    	rm -f install.sh
    	#容器启动时默认执行的程序
    	CMD run.sh
    
    
    

    Dockerfile中

    groupadd -g gid HwHiAiUser && useradd -g HwHiAiUser -d /home/HwHiAiUser -m HwHiAiUser && usermod -u uid HwHiAiUser
    
    

    为在容器内创建HwHiAiUser用户。gid、uid为宿主机上HwHiAiUser用户的UID和GID,用户可根据宿主机实际的UID和GID进行替换。
    d. 修改Dockerfile文件权限。

    root@davinci-mini:/home/HwHiAiUser# chmod 600 Dockerfile
    
  3. 进入软件包所在目录,执行以下命令,构建容器镜像。

    docker build -t image-name –build-arg NNRT_PKG= nnrt-name .

    注意不要遗漏命令结尾的“.”,命令参数说明如下表所示。

    表2 命令参数说明

    参数说明
    image-name镜像名称与标签,用户可自行设置。
    –build-arg指定Dockerfile文件内的参数。
    NNRT_PKGnnrt-name为离线推理引擎包名称,注意不要遗漏文件后缀。
    Removing intermediate container 1558925ff7fa
     ---> 5c66102a702f
    Step 13/18 : ARG DIST_PKG
     ---> Running in 07e50b329c75
    Removing intermediate container 07e50b329c75
     ---> 9bf1ab88c62c
    Step 14/18 : COPY $DIST_PKG .
     ---> 3e477ddb0aa8
    Step 15/18 : COPY install.sh .
     ---> 2f29fb971fd3
    Step 16/18 : COPY run.sh .
     ---> c53c0995189c
    Step 17/18 : RUN chmod +x run.sh &&chmod +x install.sh &&sh install.sh &&rm -f $DIST_PKG &&rm -f install.sh
     ---> Running in ac62589a2c0b
    Removing intermediate container ac62589a2c0b
     ---> d39395c47bd2
    Step 18/18 : CMD run.sh
     ---> Running in 00b56fb65bf1
    Removing intermediate container 00b56fb65bf1
     ---> b09a017a2544
    Successfully built b09a017a2544
    Successfully tagged davinci-mini:latest
    
    

    当出现“Successfully built xxx”字样表示镜像构建成功。

  4. 构建完成后,执行以下命令查看镜像信息。

    docker images

    root@davinci-mini:/home/HwHiAiUser# docker images
    REPOSITORY                                               TAG       IMAGE ID       CREATED        SIZE
    davinci-mini                                             latest    b09a017a2544   13 hours ago   996MB
    

业务推理编写示例

  1. install.sh示例
    root@davinci-mini:~# cat install.sh
    #!/bin/bash
    #进入容器工作目录
    cd /root
    #解压业务推理程序压缩包,请根据压缩包格式适配
    tar xf dist.tar
    
    
  2. run.sh示例
    root@davinci-mini:~# cat run.sh
    #!/bin/bash
    #设置AI CPU相关库文件的软链接
    sh /usr/local/Ascend/nnrt/latest/arm64-linux/run_aicpu_toolkit.sh
    #启动slogd守护进程
    mkdir -p /usr/slog
    /var/slogd &
    #启动DMP(设备管理)守护进程
    mkdir -p /run/driver
    /var/dmp_daemon -I -U 8087 &
    #进入业务推理程序的可执行文件所在目录
    cd /root/dist
    #运行可执行文件
    ./main -i test.jpg
    
    

4. 部署推理容器

  1. 以root用户登录DLAP221
  2. 执行以下命令启动容器镜像(用户请根据实际情况修改)。
    docker run -it \
    	--device=/dev/davinci0 \
    	--device=/dev/davinci_manager \
    	--device=/dev/svm0 \
    	--device=/dev/log_drv \
    	--device=/dev/event_sched \
    	--device=/dev/upgrade \
    	--device=/dev/hi_dvpp \
    	--device=/dev/memory_bandwidth \
    	--device=/dev/ts_aisle \
    	-v /var:/var \
    	-v /usr/lib64:/usr/lib64 \
    	-v /usr/local/sbin/npu-smi:/usr/local/sbin/npu-smi \
    	-v /etc/hdcBasic.cfg:/etc/hdcBasic.cfg \
    	-v /etc/rc.local:/etc/rc.local \
    	-v /sys:/sys \
    	-v /usr/bin/sudo:/usr/bin/sudo \
    	-v /usr/lib/sudo/:/usr/lib/sudo/ \
    	-v /etc/sudoers:/etc/sudoers/  \
    	davinci-mini:latest  
    
    以上命令示例会默认执行业务程序,若用户需要直接进入容器,请在以上命令的末尾添加 /bin/bash。进入容器后,根据实际情况执行run.sh中的命令。

说明
本版本支持多个容器挂载同一块芯片:
- 建议最多不超过16个容器
- 各容器通过抢占方式获取芯片算力,不支持内存隔离和算力切分。
- 默认关闭device共享模式:
请在宿主机上执行以下命令开启device共享:
npu-smi set -t device-share -i 0 -c 0 -d 1
可执行以下命令查询device共享状态:
npu-smi info -t device-share -i 0 -c 0
重启或升级后多容器共享功能关闭。

5. 下载已经生产的Image

根据本文档,我生成了一个21.0.1版本的Docker image,你可以通过以下的命令下载到你的DLAP221设备,前提是你的系统版本是21.0.1。

$docker pull shinerchen/dlap221-infer:21.0.1

运行docker image,直接进行推理

root@davinci-mini:~# docker run -it --device=/dev/davinci0 --device=/dev/davinci_manager --device=/dev/svm0 --device=/dev/log_drv --device=/dev/event_sched --device=/dev/upgrade --device=/dev/hi_dvpp --device=/dev/memory_bandwidth --device=/dev/ts_aisle -v /var:/var -v /usr/lib64:/usr/lib64 -v /usr/local/sbin/npu-smi:/usr/local/sbin/npu-smi -v /etc/hdcBasic.cfg:/etc/hdcBasic.cfg -v /etc/rc.local:/etc/rc.local -v /sys:/sys -v /usr/bin/sudo:/usr/bin/sudo -v /usr/lib/sudo/:/usr/lib/sudo/ -v /etc/sudoers:/etc/sudoers/  shinerchen/dlap221-infer:21.0.1
[Info ][2021-08-07 13:54:02:674011][ResourceManager.cpp InitResource:75] Initialized acl successfully.
aicpu_kernels_device/
aicpu_kernels_device/libDvpp_jpeg_decoder.so
aicpu_kernels_device/libOMX_hisi_video_encoder.so
aicpu_kernels_device/libDvpp_api.so
aicpu_kernels_device/libdvpp_kernels.so
aicpu_kernels_device/libpt_kernels.so
aicpu_kernels_device/libtf_kernels.so
aicpu_kernels_device/libDvpp_png_decoder.so
aicpu_kernels_device/version.info
aicpu_kernels_device/libtensorflow.so
aicpu_kernels_device/libaicpu_kernels.so
aicpu_kernels_device/libcpu_kernels.so
aicpu_kernels_device/libtorch_cpu.so
aicpu_kernels_device/libDvpp_vpc.so
aicpu_kernels_device/libc10.so
aicpu_kernels_device/libDvpp_jpeg_encoder.so
aicpu_kernels_device/libOMX_hisi_video_decoder.so
[Info ][2021-08-07 13:54:08:419532][ResourceManager.cpp InitResource:84] Open device 0 successfully.
[Info ][2021-08-07 13:54:08:421054][ResourceManager.cpp InitResource:91] Created context for device 0 successfully
[Info ][2021-08-07 13:54:08:423448][ResourceManager.cpp InitResource:102] Init resource successfully.
[Info ][2021-08-07 13:54:08:423643][AclProcess.cpp InitResource:118] Created the acl context successfully.
[Info ][2021-08-07 13:54:08:424377][AclProcess.cpp InitResource:124] Created the acl stream successfully.
[Info ][2021-08-07 13:54:08:439061][ModelProcess.cpp Init:240] ModelProcess:Begin to init instance.
[Info ][2021-08-07 13:54:08:725656][AclProcess.cpp InitModule:87] Initialized the model process module successfully.
[Info ][2021-08-07 13:54:08:733168][AclProcess.cpp InitModule:92] Initialized the cast operator successfully.
[Info ][2021-08-07 13:54:08:733284][AclProcess.cpp InitModule:97] Initialized the argMax operator successfully.
[Info ][2021-08-07 13:54:08:734489][AclProcess.cpp InitModule:103] Loaded label successfully.
[Info ][2021-08-07 13:54:08:743732][AclProcess.cpp WriteResult:278] inference output index: 384
[Info ][2021-08-07 13:54:08:743916][AclProcess.cpp WriteResult:280] classname:  384: 'indri, indris, Indri indri, Indri brevicaudatus',
[Info ][2021-08-07 13:54:08:744382][AclProcess.cpp Process:465] [Process Delay] cost: 9.40694ms.
[Info ][2021-08-07 13:54:08:746582][ModelProcess.cpp DeInit:150] Model[resnet50][0] deinit begin
[Info ][2021-08-07 13:54:08:755659][ModelProcess.cpp DeInit:189] Model[resnet50][0] deinit success
[Info ][2021-08-07 13:54:08:763378][ResourceManager.cpp Release:44] Finalized acl successfully.
root@davinci-mini:~#

6. Q&A

问题1:

发现运行docker image后,直接运行npu-smi info发生以下错误,找不到NPU设备

root@f75549a927c5:~# npu-smi info
[ERROR] DRV(12,npu-smi):2021-08-07-05:33:53.405.381 [dm_udp.c:84][dmp] [__dm_send_msg 84] sendmsg fail:No such file or directory.
[ERROR] DRV(12,npu-smi):2021-08-07-05:33:53.406.195 [dm_udp.c:129][dmp] [__dm_udp_send 129] __dm_send_msg: sendto fail.errno=2
[ERROR] DRV(12,npu-smi):2021-08-07-05:33:53.406.258 [dm_msg_intf.c:734][dmp] [dm_send_req 734] failed call intf->send_msg, ret = 2
[ERROR] DRV(12,npu-smi):2021-08-07-05:33:53.406.323 [dsmi_common.c:566][dmp] [dsmi_send_msg_rec_res 566] call dev_mon_send_request error:27.
[ERROR] DRV(12,npu-smi):2021-08-07-05:33:53.406.386 [dsmi_dmp_command.c:644][dmp] [dsmi_cmd_get_board_id 644] dev(0) dsmi_send_msg_rec_res failed, ret = 27.
[ERROR] DRV(12,npu-smi):2021-08-07-05:33:53.406.441 [dsmi_common_interface.c:1357][dmp] [dsmi_get_board_id 1357] devid 0 dsmi_cmd_get_board_id failed 27
[ERROR] DRV(12,npu-smi):2021-08-07-05:33:53.406.494 [dsmi_common_interface.c:1410][dmp] [dsmi_get_board_info 1410] devid 0 dsmi_board_id call error ret = 27!
+------------------------------------------------------------------------------+
| npu-smi 21.0.1                       Version:                                |
+-------------------+-----------------+----------------------------------------+
| NPU     Name      | Health          | Power(W)          Temp(C)              |
| Chip    Device    | Bus-Id          | AICore(%)         Memory-Usage(MB)     |
+===================+=================+========================================+

解决方法:

运行以下命令,或者先运行“run.sh",做一次推理,run.sh脚本中有调用以下命令。

root@a0784e7df16a:~# mkdir -p /usr/slog
root@a0784e7df16a:~# /var/slogd &
root@a0784e7df16a:~# mkdir -p /run/driver
root@a0784e7df16a:~# /var/dmp_daemon -I -U 8087 &
root@a0784e7df16a:~# npu-smi info
+------------------------------------------------------------------------------+
| npu-smi 21.0.1                       Version: UNKNOWN                        |
+-------------------+-----------------+----------------------------------------+
| NPU     Name      | Health          | Power(W)          Temp(C)              |
| Chip    Device    | Bus-Id          | AICore(%)         Memory-Usage(MB)     |
+===================+=================+========================================+
| 0       310       | OK              | 12.8              61                   |
| 0       0         | 0000:00:00.0    | 0                 3440 / 8192          |
+===================+=================+========================================+

问题2:

我们测试发现偶尔有docker service无法启动的问题,登录以后可以手动启动。

解决方法:

分析有可能是某些依赖的服务没有完全起来,导致的,所以我修改了docker的启动方式,修改/lib/systemd/system/docker.service文件,将启动的Type=notify改为idle。

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值