Ubuntu18.04(aarch64) + Atlas 300T训练卡安装MindSpore完整教程
更新:2024/2/6
最近看到还有朋友在浏览这篇文章,特别说明一下:该教程的安装流程目前仍适用,但内容已经过时,MindSpore官网和本文已大相径庭,更新本文也无太大意义,建议参考MindSpore官网安装教程。
一、安装前准备
1、确认安装信息
安装之前,需要先确认安装方式和软件包版本。进入MindSpore官网(https://www.mindspore.cn/),在安装板块下查看配套指南
(确认系统环境信息里的依赖后续会逐步安装),根据版本配套表
确定需要使用的各软件包版本。
如需安装MindSpore1.6.1,配套关系如下:
- driver:21.0.4
- firmware:1.0.13
- cann:5.0.4(商业版)
2、昇腾社区下载商用版固件驱动和CANN软件包(需申请)
根据系统和架构下载对应软件包,CANN包需下载cann-toolkit
NPU固件驱动下载:https://www.hiascend.com/hardware/firmware-drivers?tag=commercial
CANN下载:https://www.hiascend.com/software/cann/commercial
下载完成后执行chmod +x xxxxxx.run
为软件包添加执行权限
二、安装CANN开发环境
安装MindSpore前需要先安装CANN开发环境
1、创建安装及运行用户
创建运行用户HwHiAiUser(不可修改,否则无法安装NPU驱动),后续安装用户默认使用root(推荐)
groupadd HwHiAiUser
useradd -g HwHiAiUser -d /home/HwHiAiUser -m HwHiAiUser -s /bin/bash
2、安装驱动和固件
安装驱动
./A300t-9000-npu-driver_21.0.4_linux-aarch64.run --full
安装固件
./A300t-9000-npu-firmware_1.80.22.2.220.run --full
3、验证驱动和固件是否安装成功
reboot
等待服务器重启,重新登录后执行以下命令
npu-smi info
回显以下内容则安装成功
4、修改apt源
自行修改国内镜像源,本文使用华为镜像源:https://mirrors.huaweicloud.com/home
# 备份原配置文件
cp /etc/apt/sources.list /etc/apt/sources.list.bak
# 下载更新sources.list
wget -O /etc/apt/sources.list https://repo.huaweicloud.com/repository/conf/Ubuntu-Ports-bionic.list
# 更新索引
apt-get update
5、安装OS依赖
sudo apt-get install -y gcc g++ make cmake zlib1g zlib1g-dev openssl libsqlite3-dev libssl-dev libffi-dev unzip pciutils net-tools libblas-dev gfortran libblas3 libopenblas-dev
6、安装Python及依赖
安装Python3.7.5
wget https://www.python.org/ftp/python/3.7.5/Python-3.7.5.tgz
tar zxvf Python-3.7.5.tgz
cd Python-3.7.5
# 安装路径可自行修改
./configure --prefix=/usr/local/python3.7.5 --enable-loadable-sqlite-extensions --enable-shared
make -j8
sudo make install
配置python环境变量:
vim ~/.bashrc
# 添加以下内容并保存(按安装路径修改)
export LD_LIBRARY_PATH=/usr/local/python3.7.5/lib:$LD_LIBRARY_PATH
export PATH=/usr/local/python3.7.5/bin:$PATH
source一下使环境变量生效
source ~/.bashrc
添加软链接(可选)
sudo ln -s /usr/local/python3.7.5/bin/python3 /usr/local/python3.7.5/bin/python3.7.5
sudo ln -s /usr/local/python3.7.5/bin/pip3 /usr/local/python3.7.5/bin/pip3.7.5
修改pip源为国内源,本文使用豆瓣源,可自行修改
cd
mkdir .pip
vim .pip/pip.conf
# 添加以下内容并保存
# 豆瓣源
[global]
index-url = https://pypi.douban.com/simple
[install]
trusted-host = https://pypi.douban.com
升级pip
pip3 install --upgrade pip
安装python依赖(如使用非root用户安装,在命令行最后添加--user
)
pip3.7 install attrs numpy==1.17.2 decorator sympy cffi pyyaml pathlib2 psutil protobuf scipy requests
7、安装CANN软件包
安装开发环境(cann-toolkit),建议使用root用户安装,默认安装路径:/usr/local/Ascend,可通过--install_path
修改。
./Ascend-cann-toolkit_5.0.2_linux-aarch64.run --install
配置环境变量:
vim ~/.bashrc
# 添加以下内容并保存
# 修改为文件真实路径
source /usr/local/Ascend/ascend-toolkit/set_env.sh
source一下使环境变量生效
source ~/.bashrc
验证是否安装成功
mkdir resnet50
cd resnet50
wget https://modelzoo-train-atc.obs.cn-north-4.myhuaweicloud.com/003_Atc_Models/AE/ATC%20Model/resnet50/resnet50.caffemodel
wget https://modelzoo-train-atc.obs.cn-north-4.myhuaweicloud.com/003_Atc_Models/AE/ATC%20Model/resnet50/resnet50.prototxt
wget https://c7xcode.obs.cn-north-4.myhuaweicloud.com/models/resnet50/insert_op.cfg
atc --model=./resnet50.prototxt --weight=./resnet50.caffemodel --framework=0 --output=./resnet50_aipp --soc_version=Ascend910 --insert_op_conf=./insert_op.cfg
回显以下信息,生成resnet50_aipp.om
则安装成功
root@ubuntu:/home/resnet50# atc --model=./resnet50.prototxt --weight=./resnet50.caffemodel --framework=0 --output=./resnet50_aipp --soc_version=Ascend910 --insert_op_conf=./insert_op.cfg
ATC start working now, please wait for a moment.
ATC run success, welcome to the next use.
三、安装MindSpore
1、安装GMP 6.1.2
apt-get install m4
wget ftp://ftp.gnu.org/gnu/gmp/gmp-6.1.2.tar.xz
xz -d gmp-6.1.2.tar.xz
tar -xvf gmp-6.1.2.tar
cd gmp-6.1.2
./configure --enable-cxx
make -j8
make install
2、安装OpenMPI 4.0.3(可选)
涉及单机多卡或者多机多卡训练时需要安装,本文安装4.1.2
版本,安装时间较长(大概25分钟)
# 4.0.3下载链接:https://download.open-mpi.org/release/open-mpi/v4.0/openmpi-4.0.3.tar.gz
wget https://download.open-mpi.org/release/open-mpi/v4.1/openmpi-4.1.2.tar.gz
tar -zxvf openmpi-4.1.2.tar.gz
cd openmpi-4.1.2
./configure --prefix=/usr/local
#<...lots of output...>
make all install
#<...lots of output...>
3、获取安装命令并执行
注意使用前面安装的python3.7.5
的pip
pip3 install https://ms-release.obs.cn-north-4.myhuaweicloud.com/1.6.1/MindSpore/ascend/aarch64/mindspore_ascend-1.6.1-cp37-cp37m-linux_aarch64.whl --trusted-host ms-release.obs.cn-north-4.myhuaweicloud.com -i https://pypi.tuna.tsinghua.edu.cn/simple
配置环境变量
vim ~/.bashrc
# 添加以下内容并保存
# control log level. 0-DEBUG, 1-INFO, 2-WARNING, 3-ERROR, 4-CRITICAL, default level is WARNING.
export GLOG_v=2
# Conda environmental options
LOCAL_ASCEND=/usr/local/Ascend # the root directory of run package
# lib libraries that the run package depends on
export LD_LIBRARY_PATH=${LOCAL_ASCEND}/ascend-toolkit/latest/fwkacllib/lib64:${LOCAL_ASCEND}/driver/lib64:${LOCAL_ASCEND}/ascend-toolkit/latest/opp/op_impl/built-in/ai_core/tbe/op_tiling:${LD_LIBRARY_PATH}
# Environment variables that must be configured
export TBE_IMPL_PATH=${LOCAL_ASCEND}/ascend-toolkit/latest/opp/op_impl/built-in/ai_core/tbe # TBE operator implementation tool path
export ASCEND_OPP_PATH=${LOCAL_ASCEND}/ascend-toolkit/latest/opp # OPP path
export PATH=${LOCAL_ASCEND}/ascend-toolkit/latest/fwkacllib/ccec_compiler/bin/:${PATH} # TBE operator compilation tool path
export PYTHONPATH=${TBE_IMPL_PATH}:${PYTHONPATH} # Python library that TBE implementation depends on
source一下使环境变量生效
source ~/.bashrc
4、验证安装
执行验证命令
python3 -c "import mindspore;mindspore.run_check()"
回显以下内容则安装成功
MindSpore version: 1.6.1
The result of multiplication calculation is correct, MindSpore has been installed successfully!