- 目前文中还有ubuntu下最后编译Demo的bug未解决。
安装mmdeploy(v0.6.0)——linux
本文参考官方文档《Linux-x86_64 下构建方式》,注意:本文主要采用的是这种构建方式,辅之以《get_start》的构建方式,而后者提到官方提供了预编译包,更加方便一些,但本文在搞了一段时间后才发现提供了预编译包,因为并未采用这种方式。
安装构建和编译工具链
- 查看cmake版本:
cmake --version
- 查看ubuntu版本:
uname -a
或者cat /etc/*release
原本似乎没有cmake
,gcc
版本也不够,按照教程进行安装/升级。(注意cmake和后面要下载的tensorRT等库还是找个别的位置装整洁一点)。
wget https://github.com/Kitware/CMake/releases/download/v3.20.0/cmake-3.20.0-linux-x86_64.tar.gz
tar -xzvf cmake-3.20.0-linux-x86_64.tar.gz
sudo ln -sf $(pwd)/cmake-3.20.0-linux-x86_64/bin/* /usr/bin/
sudo apt-get update
sudo apt-get install gcc-7
sudo apt-get install g++-7
安装依赖包
conda create -n mmdeploy python=3.7 -y
conda activate mmdeploy
conda install pytorch==1.8.0 torchvision==0.9.0 cudatoolkit=11.1 -c pytorch -c conda-forge
export cu_version=cu111 # cuda 11.1
export torch_version=torch1.8
pip install mmcv-full==1.4.0 -f https://download.openmmlab.com/mmcv/dist/${cu_version}/${torch_version}/index.html
安装 MMDeploy SDK 依赖
OpenCV
安装:
sudo apt-get install libopencv-dev
报错:
E: Unable to correct problems, you have held broken packages.
为了解决这个问题,找了个博客,先进行sudo apt update
,发现有报错:
(mmdeploy) root@2bc997559bdb:/home/wangjy/research/mmdeploy# sudo apt update
Hit:1 http://mirrors.aliyun.com/ubuntu bionic InRelease
Ign:3 https://developer.download.nvidia.cn/compute/machine-learning/repos/ubuntu2004/x86_64 InRelease
Get:2 https://developer.download.nvidia.cn/compute/cuda/repos/ubuntu2004/x86_64 InRelease [1581 B]
Hit:4 https://developer.download.nvidia.cn/compute/machine-learning/repos/ubuntu2004/x86_64 Release
Err:2 https://developer.download.nvidia.cn/compute/cuda/repos/ubuntu2004/x86_64 InRelease
The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC
Reading package lists... Done
W: GPG error: https://developer.download.nvidia.cn/compute/cuda/repos/ubuntu2004/x86_64 InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC
E: The repository 'https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 InRelease' is not signed.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.
注意其中倒数第三行的E: ...
,我通过博客《
Notice: CUDA Linux Repository Key Rotation》和《apt-update in Azure Nvidia gives publickey error》解决了这一问题,具体代码(在第二个stackoverflow的博客中):
apt-key del 7fa2af80
rm /etc/apt/sources.list.d/cuda.list
rm /etc/apt/sources.list.d/nvidia-ml.list
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-keyring_1.0-1_all.deb
dpkg -i cuda-keyring_1.0-1_all.deb
但是这一块不解决实际问题,还是有上面的held broken packages
报错,搞了一天,最后百度出来一些博客是因为要换正确版本(我的版本是Ubuntu20.04,对应Codename为focal)的安装源。
pplcv
git clone https://github.com/openppl-public/ppl.cv.git
cd ppl.cv
export PPLCV_DIR=$(pwd)
git checkout tags/v0.6.2 -b v0.6.2
./build.sh cuda
本来最后一步报错,但搁置了一段时间后,可能是解决了libopencv-dev
和pip install pycuda
的问题后它也解决了。
安装推理引擎
ONNXRuntime
pip install onnxruntime==1.8.1
# 注意安装的目录,不要跟项目代码放在一起
wget https://github.com/microsoft/onnxruntime/releases/download/v1.8.1/onnxruntime-linux-x64-1.8.1.tgz
tar -zxvf onnxruntime-linux-x64-1.8.1.tgz
cd onnxruntime-linux-x64-1.8.1
export ONNXRUNTIME_DIR=$(pwd)
export LD_LIBRARY_PATH=$ONNXRUNTIME_DIR/lib:$LD_LIBRARY_PATH
TensorRT(+ cuDNN)
版本刚好和他的教程对得上,但要先去英伟达官网下载对应文件:
tar -zxvf TensorRT-8.2.3.0.Linux.x86_64-gnu.cuda-11.4.cudnn8.2.tar.gz
pip install TensorRT-8.2.3.0/python/tensorrt-8.2.3.0-cp37-none-linux_x86_64.whl
export TENSORRT_DIR=$(pwd)/TensorRT-8.2.3.0
export LD_LIBRARY_PATH=$TENSORRT_DIR/lib:$LD_LIBRARY_PATH
pip install pycuda
cd /the/path/of/cudnn/tgz/file
tar -zxvf cudnn-11.3-linux-x64-v8.2.1.32.tgz
export CUDNN_DIR=$(pwd)/cuda
export LD_LIBRARY_PATH=$CUDNN_DIR/lib64:$LD_LIBRARY_PATH
刚才的pip install pycuda
会报错:
ERROR: Could not build wheels for pycuda, which is required to install pyproject.toml-based projects
有的博客说是没有制定pycuda的版本,因而与cuda等的版本对不上,但我看下载的.whl文件的网址上居然没有cuda11.1的对应版本,暂时没找到解决方案,也就是说TensorRT差pycuda
没装完,这个码在这。
发现之前的gcc版本没用上,使用gcc --version
查看还是gcc-6的版本,不是gcc-7,折腾了一阵后使用which gcc
发现这个软连接的指向是定死在gcc-6的,遂进行修改。
(mmdeploy) root@2bc997559bdb:/usr/local/bin# ll
total 12
drwxr-xr-x 1 root root 4096 Jan 17 2021 ./
drwxr-xr-x 1 root root 4096 Jan 19 2021 ../
lrwxrwxrwx 1 root root 14 Jan 17 2021 g++ -> /usr/bin/g++-6*
lrwxrwxrwx 1 root root 14 Jan 17 2021 gcc -> /usr/bin/gcc-6*
(mmdeploy) root@2bc997559bdb:/usr/local/bin# ln -fs /usr/bin/gcc-7* gcc
(mmdeploy) root@2bc997559bdb:/usr/local/bin# ll
total 12
drwxr-xr-x 1 root root 4096 Jul 21 13:03 ./
drwxr-xr-x 1 root root 4096 Jan 19 2021 ../
lrwxrwxrwx 1 root root 14 Jan 17 2021 g++ -> /usr/bin/g++-6*
lrwxrwxrwx 1 root root 14 Jul 21 13:03 gcc -> /usr/bin/gcc-7*
(mmdeploy) root@2bc997559bdb:/usr/local/bin# ln -fs /usr/bin/g++-7* g++
(mmdeploy) root@2bc997559bdb:/usr/local/bin#
但是这个并没有解决pycuda的安装问题。后续又做了一些工作,才成功安装。包括之前的cuda没有nvcc,重新安装了cuda ToolKit,安装libevent-dev,修改~/.bashrc
文件以及用source ~/.bashrc
重新激活该文件等操作。具体啥起了作用不知道了,甚至可能是之前换ubuntu的安装源起的作用。
后续
需要将上述的临时环境变量添加到~/.bashrc
中,以onnx为例:
echo '# set env for onnxruntime' >> ~/.bashrc
echo "export ONNXRUNTIME_DIR=${ONNXRUNTIME_DIR}" >> ~/.bashrc
echo "export LD_LIBRARY_PATH=$ONNXRUNTIME_DIR/lib:$LD_LIBRARY_PATH" >> ~/.bashrc
source ~/.bashrc
编译mmdeploy
cd /the/root/path/of/MMDeploy
export MMDEPLOY_DIR=$(pwd)
ONNXRuntime 自定义算子
cd ${MMDEPLOY_DIR}
mkdir -p build && cd build
cmake -DCMAKE_CXX_COMPILER=g++-7 -DMMDEPLOY_TARGET_BACKENDS=ort -DONNXRUNTIME_DIR=${ONNXRUNTIME_DIR} ..
make -j$(nproc) && make install
其中有一些报错如:
CMake Error at csrc/mmdeploy/core/CMakeLists.txt:15 (add_subdirectory):
The source directory
/home/wangjy/research/mmdeploy/third_party/spdlog
does not contain a CMakeLists.txt file.
找来找去找到了一个命令git submodule update --init --recursive
,并且在一个issue——mmdeploy/issues/260里确认了需要这个命令
安装 Model Converter
cd ${MMDEPLOY_DIR}
pip install -e .
编译SDK(需要安装 MMDeploy SDK 依赖)
- cpu + ONNXRuntime
cd ${MMDEPLOY_DIR}
mkdir -p build && cd build
cmake .. \
-DCMAKE_CXX_COMPILER=g++-7 \
-DMMDEPLOY_BUILD_SDK=ON \
-DMMDEPLOY_BUILD_SDK_PYTHON_API=ON \
-DMMDEPLOY_TARGET_DEVICES=cpu \
-DMMDEPLOY_TARGET_BACKENDS=ort \
-DMMDEPLOY_CODEBASES=all \
-DONNXRUNTIME_DIR=${ONNXRUNTIME_DIR}
make -j$(nproc) && make install
- cuda + TensorRT
cd ${MMDEPLOY_DIR}
mkdir -p build && cd build
cmake .. \
-DCMAKE_CXX_COMPILER=g++-7 \
-DMMDEPLOY_BUILD_SDK=ON \
-DMMDEPLOY_BUILD_SDK_PYTHON_API=ON \
-DMMDEPLOY_TARGET_DEVICES="cuda;cpu" \
-DMMDEPLOY_TARGET_BACKENDS=trt \
-DMMDEPLOY_CODEBASES=all \
-Dpplcv_DIR=${PPLCV_DIR}/cuda-build/install/lib/cmake/ppl \
-DTENSORRT_DIR=${TENSORRT_DIR} \
-DCUDNN_DIR=${CUDNN_DIR}
make -j$(nproc) && make install
编译 Demo
cd ${MMDEPLOY_DIR}/build/install/example
mkdir -p build && cd build
cmake .. -DMMDeploy_DIR=${MMDEPLOY_DIR}/build/install/lib/cmake/MMDeploy
make -j$(nproc)
最后一个命令报错了:
/usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/7/libstdc++fs.a(ops.o): undefined reference to symbol '_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE6resizeEmc'
/usr/bin/ld: /home/wangjy/research/mmdeploy/build/install/lib/libmmdeploy_core.so: error adding symbols: DSO missing from command line
collect2: error: ld returned 1 exit status
网上有说换CMake_Cache.txt
的CXX编译源换成g++的,没用;有说使用c++filt
看一下报错对应语句,不过不知道有啥用,并没有解决问题:
(mmdeploy) root@2bc997559bdb:/# c++filt _ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE6resizeEmc
std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::resize(unsigned long, char)
编译mmdeploy——使用g++ -9
cd ${MMDEPLOY_DIR}
mkdir -p build && cd build
cmake -DCMAKE_CXX_COMPILER=g++-9 -DMMDEPLOY_TARGET_BACKENDS=ort -DONNXRUNTIME_DIR=${ONNXRUNTIME_DIR} ..
make -j$(nproc) && make install
cmake -DCMAKE_CXX_COMPILER=g++-9 -DMMDEPLOY_TARGET_BACKENDS=trt -DTENSORRT_DIR=${TENSORRT_DIR} -DCUDNN_DIR=${CUDNN_DIR} ..
make -j$(nproc) && make install
cd ${MMDEPLOY_DIR}
pip install -e .
cd build
cmake .. \
-DCMAKE_CXX_COMPILER=g++-9 \
-DMMDEPLOY_BUILD_SDK=ON \
-DMMDEPLOY_BUILD_EXAMPLES=ON \
-DMMDEPLOY_BUILD_SDK_PYTHON_API=ON \
-DMMDEPLOY_TARGET_DEVICES=cpu \
-DMMDEPLOY_TARGET_BACKENDS=ort \
-DMMDEPLOY_CODEBASES=all \
-DONNXRUNTIME_DIR=${ONNXRUNTIME_DIR}
make -j$(nproc) && make install
cmake .. \
-DCMAKE_CXX_COMPILER=g++-9 \
-DMMDEPLOY_BUILD_SDK=ON \
-DMMDEPLOY_BUILD_EXAMPLES=ON \
-DMMDEPLOY_BUILD_SDK_PYTHON_API=ON \
-DMMDEPLOY_TARGET_DEVICES="cuda;cpu" \
-DMMDEPLOY_TARGET_BACKENDS=trt \
-DMMDEPLOY_CODEBASES=all \
-Dpplcv_DIR=${PPLCV_DIR}/cuda-build/install/lib/cmake/ppl \
-DTENSORRT_DIR=${TENSORRT_DIR} \
-DCUDNN_DIR=${CUDNN_DIR}
make -j$(nproc) && make install
安装mmdeploy(v0.6.0)——Windows
不知道为什么windows安装文档开头没有说安装python。而后面安装tensorRT的时候会发现python版本最好为3.7.0,而那个时候再安装python==3.7.0要转半天。
conda create -n mmdeploy python=3.7 -y
conda activate mmdeploy
PS: 注意后面的一些使用$env:
添加环境变量的方式只能留在对应的命令行窗口中,这里整理了需要添加的环境变量。
$env:OpenCV_DIR="E:\ProgramData\OpenCV\opencv\build"
$env:path = "\$env:OpenCV_DIR\x64\vc15\bin" + "$env:path"
$env:PPLCV_DIR = "E:\ProgramData\ppl.cv"
$env:ONNXRUNTIME_DIR = "E:\ProgramData\onnxruntime-win-x64-1.8.1"
$env:path = "$env:ONNXRUNTIME_DIR\lib;" + $env:path
$env:TENSORRT_DIR = "E:\ProgramData\TensorRT-8.2.3.0"
$env:path = "$env:TENSORRT_DIR\lib;" + $env:path
$env:CUDNN_DIR="E:\ProgramData\cuda"
$env:path = "$env:CUDNN_DIR\bin;" + $env:path
$env:MMDEPLOY_DIR="E:\project\mmdeploy"
$env:path = "$env:MMDEPLOY_DIR/build/install/bin;" + $env:path
安装构建和编译工具链
安装依赖包
安装 MMDeploy Converter 依赖
- conda 略
- PyTorch + mmcv-full
conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge
pip install mmcv-full==1.4.0 -f https://download.openmmlab.com/mmcv/dist/cu113/torch1.10.0/index.html
安装 MMDeploy SDK 依赖
- OpenCV
在给出的网址找了一个.exe文件安装后将OpenCVConfig.cmake
所在路径添加到环境变量PATH
中(参考博客)。
# 在电脑->属性->环境变量->系统变量->PATH中添加
E:\ProgramData\OpenCV\opencv\build
E:\ProgramData\OpenCV\opencv\build\x64\vc15\bin
$env:path = "E:\ProgramData\OpenCV\opencv\build" + "$env:path"
$env:path = "E:\ProgramData\OpenCV\opencv\build\x64\vc15\bin" + "$env:path"
$env:OpenCV_DIR="E:\ProgramData\OpenCV\opencv\build"
- pplcv
build v0.6.2
or v0.6.3
时会有报错:
-- Configuring done
CMake Error at CMakeLists.txt:48 (add_library):
No SOURCES given to target: pplcv_static
CMake Generate step failed. Build files cannot be regenerated correctly.
而v0.7.0
没有报错,要问为什么要去build前两者,因为最后build tensorRT时有报错与前两者有关…
PS: 查看环境变量:ls env:
安装推理引擎
略
编译 MMDeploy
编译安装 Model Converter
- 编译自定义算子
- 安装 Model Converter
PS:适当的时候可能还是要和ubuntu一样进行
cd ${MMDEPLOY_DIR}
git checkout tags/v0.6.0
git submodule update --init --recursive
编译 SDK
OpenCV相关报错:
-- CMAKE_INSTALL_PREFIX: E:/project/mmdeploy/build/install
-- Selecting Windows SDK version 10.0.19041.0 to target Windows 10.0.18363.
-- Build ONNXRUNTIME custom ops.
CMake Error at cmake/opencv.cmake:2 (find_package):
By not providing "FindOpenCV.cmake" in CMAKE_MODULE_PATH this project has
asked CMake to find a package configuration file provided by "OpenCV", but
CMake did not find one.
Could not find a package configuration file provided by "OpenCV" with any
of the following names:
OpenCVConfig.cmake
opencv-config.cmake
Add the installation prefix of "OpenCV" to CMAKE_PREFIX_PATH or set
"OpenCV_DIR" to a directory containing one of the above files. If "OpenCV"
provides a separate development package or SDK, be sure it has been
installed.
Call Stack (most recent call first):
csrc/mmdeploy/CMakeLists.txt:7 (include)
-- Configuring incomplete, errors occurred!
See also "E:/project/mmdeploy/build/CMakeFiles/CMakeOutput.log".
最后似乎是在上文添加环境变量解决的$env:OpenCV_DIR="E:\ProgramData\OpenCV\opencv\build"
。
编译SDK——cuda + TensorRT时,pplcv相关报错:
By not providing "Findpplcv.cmake" in CMAKE_MODULE_PATH this project has asked CMake to find a package configuration file provided by "pplcv", but CMake did not find one.
Could not find a package configuration file provided by "pplcv" with any of the following names:
pplcvConfig.cmake
pplcv-config.cmake
重新安装pplcv,注意观察编译的cmake语句:
cmake .. -G "Visual Studio 16 2019" -A x64 -T v142 `
-DMMDEPLOY_BUILD_SDK=ON `
-DMMDEPLOY_TARGET_DEVICES="cuda" `
-DMMDEPLOY_TARGET_BACKENDS="trt" `
-DMMDEPLOY_CODEBASES="all" `
-Dpplcv_DIR="$env:PPLCV_DIR/pplcv-build/install/lib/cmake/ppl" `
-DTENSORRT_DIR="$env:TENSORRT_DIR" `
-DCUDNN_DIR="$env:CUDNN_DIR"
即注意之前安装的ppl.cv
项目中一定要有pplcv-build/install/lib/cmake/ppl
没有他就会报上面的错误,或者可能是其他pplcv相关的error,说明安装的不完整。
但是还是有pplcv相关的报错:
E:\project\mmdeploy\csrc\mmdeploy\preprocess\cuda\pad_impl.cpp(24): error C2039: "BORDER_TYPE_CONSTANT": 不是 "ppl::cv" 的成员 [E:\proje
ct\mmdeploy\build\csrc\mmdeploy\preprocess\cuda\mmdeploy_cuda_transform_impl.vcxproj]
E:\ProgramData\ppl.cv\pplcv-build\install\lib\cmake\ppl\..\..\..\include\ppl/cv/cuda/copymakeborder.h(26): note: 参见“ppl::cv”的声明
E:\project\mmdeploy\csrc\mmdeploy\preprocess\cuda\pad_impl.cpp(24): error C2065: “BORDER_TYPE_CONSTANT”: 未声明的标识符 [E:\project\mmdepl
oy\build\csrc\mmdeploy\preprocess\cuda\mmdeploy_cuda_transform_impl.vcxproj]
参考issues/824解决该问题,即将E:\ProgramData\ppl.cv\pplcv-build\install\lib\cmake\ppl
中的对应项改为:
set(PPLCV_VERSION_MAJOR 0)
set(PPLCV_VERSION_MINOR 6)
set(PPLCV_VERSION_PATCH 2)
但是在使用下文的编译demo时有错误:
[2022-07-31 11:40:37.623] [mmdeploy] [error] [net_module.cpp:37] Net backend not found: tensorrt
[2022-07-31 11:40:37.623] [mmdeploy] [error] [task.cpp:67] error parsing config: {
"context": {
"device": "<any>",
"model": "<any>",
"stream": "<any>"
},
"input": [
"prep_output"
],
"input_map": {
"img": "input"
},
"module": "Net",
"name": "maskrcnn",
"output": [
"infer_output"
],
"type": "Task"
}
cmake .. -G "Visual Studio 16 2019" -A x64 -T v142 `
-DMMDEPLOY_BUILD_SDK=ON `
-DMMDEPLOY_TARGET_DEVICES="cuda" `
-DMMDEPLOY_TARGET_BACKENDS="ort;trt" `
-DMMDEPLOY_CODEBASES="all" `
-Dpplcv_DIR="$env:PPLCV_DIR/pplcv-build/install/lib/cmake/ppl" `
-DTENSORRT_DIR="$env:TENSORRT_DIR" `
-DCUDNN_DIR="$env:CUDNN_DIR"
编译 Demo
略