【备忘】Fastdeploy编译中遇到nvcc fatal : Unsupported gpu architecture ‘compute_35‘的解决

LateLinux

已于 2024-04-30 21:09:38 修改

阅读量1.7k

点赞数 4

文章标签： paddle 人工智能

于 2024-04-30 21:05:26 首次发布

本文链接：https://blog.csdn.net/LateLinux/article/details/138356044

版权

Fastdeploy编译中遇到nvcc fatal : Unsupported gpu architecture ‘compute_35‘的解决

背景

使用Paddle的Fastdeploy，安装过程中需要对C++ SDK进行编译，编译过程中报标题所述的错误。后来在github上找到了解决办法。

环境

GPU: RTX3060Ti
Ubuntu 2204
cuda 12.1.1
TensorRT-8.6.1.6
opencv 4.7
Fastdeploy develop，commit id = cd0ee79c91d4ed1103abdc65ff12ccadd23d0827

复现路径

安装cuda-12.1.1（官网下载步骤及链接）
安装opencv, 到github官网git clone下来，手动编译，资料csdn很多就不贴了。
安装TensorRT。按照Paddle官网要求，CUDA 工具包 12.0 配合 cuDNN v8.9.1, 如需使用 PaddleTensorRT 推理，需配合 TensorRT8.6.1.6（官网链接提供了，tar包的，解压后设置一下路径就可以了，但下载需要nvidia developer帐号登录，免费注册）
安装FastDeploy，我按照的是这个教程，其中以下cmake选项有几处问题需要手动修改。

git clone https://github.com/PaddlePaddle/FastDeploy.git
cd FastDeploy
mkdir build && cd build
cmake .. -DENABLE_ORT_BACKEND=ON \
         -DENABLE_PADDLE_BACKEND=ON \
         -DENABLE_OPENVINO_BACKEND=ON \
         -DENABLE_TRT_BACKEND=ON \
         -DWITH_GPU=ON \
         -DTRT_DIRECTORY=/Paddle/TensorRT-8.4.1.5 \  # TensorRT的路径要根据你刚才解压的Tar包进行修改
         -DCUDA_DIRECTORY=/usr/local/cuda \
         -DCMAKE_INSTALL_PREFIX=${PWD}/compiled_fastdeploy_sdk \
         -DENABLE_VISION=ON \
         -DOPENCV_DIRECTORY=/usr/lib/x86_64-linux-gnu/cmake/opencv4 \  # 如果你是源码编译再make install的话，就不用改
         -DENABLE_TEXT=ON
make -j12
make install

注意点1 cmake的选项需要调整，具体见上面。
注意点2，个人认为就是Fastdeploy的问题，问题如下，留意那一堆的nvcc fatal。其中的compute_35其实就是老的计算架构SM_35，我的显卡是SM_86，不应该出现此问题。

make -j16
[  3%] Built target extern_onnxruntime
[  6%] Built target extern_paddle_inference
[  8%] Built target extern_fast_tokenizer
[ 10%] Built target extern_paddle2onnx
[ 21%] Built target yaml-cpp
[ 21%] Built target yaml-cpp-parse
[ 22%] Built target yaml-cpp-read
[ 23%] Built target yaml-cpp-sandbox
Consolidate compiler generated dependencies of target fastdeploy
[ 23%] Building CUDA object CMakeFiles/fastdeploy.dir/fastdeploy/runtime/backends/common/cuda/adaptive_pool2d_kernel.cu.o
[ 23%] Building CUDA object CMakeFiles/fastdeploy.dir/fastdeploy/function/cuda_cast.cu.o
[ 23%] Building CUDA object CMakeFiles/fastdeploy.dir/fastdeploy/runtime/backends/paddle/ops/grid_sample_3d.cu.o
[ 24%] Building CUDA object CMakeFiles/fastdeploy.dir/fastdeploy/runtime/backends/paddle/ops/voxelize_op.cu.o
[ 24%] Building CUDA object CMakeFiles/fastdeploy.dir/fastdeploy/runtime/backends/paddle/ops/iou3d_nms_kernel.cu.o
[ 25%] Building CUDA object CMakeFiles/fastdeploy.dir/fastdeploy/runtime/backends/paddle/ops/centerpoint_postprocess_op.cu.o
[ 25%] Building CXX object CMakeFiles/fastdeploy.dir/fastdeploy/vision/classification/contrib/yolov5cls/preprocessor.cc.o
[ 25%] Building CXX object CMakeFiles/fastdeploy.dir/fastdeploy/vision/classification/contrib/resnet.cc.o
[ 25%] Building CXX object CMakeFiles/fastdeploy.dir/fastdeploy/vision/classification/contrib/yolov5cls/yolov5cls.cc.o
[ 26%] Building CXX object CMakeFiles/fastdeploy.dir/fastdeploy/vision/classification/ppcls/model.cc.o
[ 26%] Building CXX object CMakeFiles/fastdeploy.dir/fastdeploy/vision/classification/ppcls/postprocessor.cc.o
[ 26%] Building CXX object CMakeFiles/fastdeploy.dir/fastdeploy/vision/classification/ppcls/preprocessor.cc.o
[ 27%] Building CXX object CMakeFiles/fastdeploy.dir/fastdeploy/vision/classification/ppshitu/ppshituv2_rec_postprocessor.cc.o
[ 28%] Building CXX object CMakeFiles/fastdeploy.dir/fastdeploy/vision/classification/ppshitu/ppshituv2_rec.cc.o
[ 28%] Building CXX object CMakeFiles/fastdeploy.dir/fastdeploy/vision/classification/contrib/yolov5cls/postprocessor.cc.o
nvcc fatal   : Unsupported gpu architecture 'compute_35'
nvcc fatal   : Unsupported gpu architecture 'compute_35'
nvcc fatal   : Unsupported gpu architecture 'compute_35'
nvcc fatal   : Unsupported gpu architecture 'compute_35'
nvcc fatal   : Unsupported gpu architecture 'compute_35'
[ 28%] Building CXX object CMakeFiles/fastdeploy.dir/fastdeploy/vision/classification/ppshitu/ppshituv2_rec_preprocessor.cc.o
nvcc fatal   : Unsupported gpu architecture 'compute_35'
make[2]: *** [CMakeFiles/fastdeploy.dir/build.make:496：CMakeFiles/fastdeploy.dir/fastdeploy/function/cuda_cast.cu.o] 错误 1
make[2]: *** 正在等待未完成的任务....
make[2]: *** [CMakeFiles/fastdeploy.dir/build.make:510：CMakeFiles/fastdeploy.dir/fastdeploy/runtime/backends/common/cuda/adaptive_pool2d_kernel.cu.o] 错误 1
make[2]: *** [CMakeFiles/fastdeploy.dir/build.make:706：CMakeFiles/fastdeploy.dir/fastdeploy/runtime/backends/paddle/ops/grid_sample_3d.cu.o] 错误 1
make[2]: *** [CMakeFiles/fastdeploy.dir/build.make:734：CMakeFiles/fastdeploy.dir/fastdeploy/runtime/backends/paddle/ops/voxelize_op.cu.o] 错误 1
make[2]: *** [CMakeFiles/fastdeploy.dir/build.make:720：CMakeFiles/fastdeploy.dir/fastdeploy/runtime/backends/paddle/ops/iou3d_nms_kernel.cu.o] 错误 1
make[2]: *** [CMakeFiles/fastdeploy.dir/build.make:692：CMakeFiles/fastdeploy.dir/fastdeploy/runtime/backends/paddle/ops/centerpoint_postprocess_op.cu.o] 错误 1
make[1]: *** [CMakeFiles/Makefile2:310：CMakeFiles/fastdeploy.dir/all] 错误 2
make: *** [Makefile:156：all] 错误 2

问题的解决

解决的方法也很简单，修改FastDeploy/cmake/cuda.cmake文件即可。

if(NOT WITH_GPU)
  return()
endif()

# This is to eliminate the CMP0104 warnings from cmake 3.18+.
# Instead of setting CUDA_ARCHITECTURES, we will set CMAKE_CUDA_FLAGS.
set(CMAKE_CUDA_ARCHITECTURES OFF)

if(BUILD_ON_JETSON)
  set(fd_known_gpu_archs "53 62 72")
  set(fd_known_gpu_archs10 "53 62 72")
else()
  message("Using New Release Strategy - All Arches Packge")
#  set(fd_known_gpu_archs "35 50 52 60 61 70 75 80 86") #原来
#  set(fd_known_gpu_archs10 "35 50 52 60 61 70 75")		#原来

  set(fd_known_gpu_archs "50 52 60 61 70 75 80 86")  #修改
  set(fd_known_gpu_archs10 "50 52 60 61 70 75")		 #修改

  set(fd_known_gpu_archs11 "50 60 61 70 75 80")
endif()

######################################################################################
# A function for automatic detection of GPUs installed  (if autodetection is enabled)
# Usage:
#   detect_installed_gpus(out_variable)

文件开头包含 “fd_known_gpu_archs” 和 “fd_known_gpu_archs10” 两个地方，删除35后，make即可通过。

[100%] Linking CUDA device code CMakeFiles/fastdeploy.dir/cmake_device_link.o
[100%] Linking CXX shared library libfastdeploy.so
[100%] Built target fastdeploy
[100%] Built target patchelf_paddle_inference