Ubuntu18.04 显卡:NVIDIA A40 驱动:525.147.05 cuda 11.3
环境配置
conda create -n isbnet python=3.7
conda activate isbnet
conda install pytorch==1.12.1 torchvision==0.13.1 cudatoolkit=11.3 -c pytorch
pip3 install spconv-cu113==2.1.25
pip3 install torch-scatter==2.0.9 -f https://data.pyg.org/whl/torch-1.12.1+cu113.html
pip3 install -r requirements.txt
git clone https://github.com/Karbo123/segmentator.git
如果ubuntu18.04连接不了github,参考:http://t.csdnimg.cn/xEDV9
cd segmentator/csrc
mkdir build && cd build
cmake .. \
-DCMAKE_PREFIX_PATH=`python -c 'import torch;print(torch.utils.cmake_prefix_path)'` \
-DPYTHON_INCLUDE_DIR=$(python -c "from distutils.sysconfig import get_python_inc; print(get_python_inc())") \
-DPYTHON_LIBRARY=$(python -c "import distutils.sysconfig as sysconfig; print(sysconfig.get_config_var('LIBDIR'))") \
-DCMAKE_INSTALL_PREFIX=`python -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())'`
make && make install
这里问题比较多,先解决完问题再继续
问题:CMake Error at /usr/local/share/cmake-3.30/Modules/Internal/CMakeCUDAArchitecturesValidate.cmake:7 (message): CMAKE_CUDA_ARCHITECTURES must be non-empty if set.
原因: CMake 需要在启用 CUDA 时指定 CUDA 架构,
但 CMAKE_CUDA_ARCHITECTURES 变量没有被设置。
当 CMake 检测到 CUDA 时,如果没有指定架构,就会出现这个错误。
CMakeLists.txt在segmentator/csrc里面
解决办法:
1.使用 cuda 工具查看
cd /usr/local/cuda/samples/1_Utilities/deviceQuery
sudo make
./deviceQuery
2.修改 Cmakelist.txt set(CMAKE_CUDA_ARCHITECTURES 86)
cmake_minimum_required(VERSION 3.18 FATAL_ERROR)
set(CMAKE_CXX_STANDARD 14)
set(CMAKE_BUILD_TYPE Release)
project(libsegmentator)
# 设置 CUDA 架构,针对 NVIDIA A40 GPU
set(CMAKE_CUDA_ARCHITECTURES 86)
find_package(Torch REQUIRED)
find_package(PythonLibs REQUIRED)
find_library(TORCH_PYTHON_LIBRARY torch_python PATHS "${TORCH_INSTALL_PREFIX}/lib")
set(SOURCES_LIB segmentator.cpp)
add_library(libsegmentator SHARED ${SOURCES_LIB})
set_target_properties(libsegmentator PROPERTIES POSITION_INDEPENDENT_CODE ON)
set_target_properties(libsegmentator PROPERTIES PREFIX "")
target_link_libraries(libsegmentator ${TORCH_PYTHON_LIBRARY})
target_compile_definitions(libsegmentator PUBLIC TORCH_EXTENSION_NAME=libsegmentator)
target_compile_definitions(libsegmentator PUBLIC TORCH_API_INCLUDE_EXTENSION_H)
target_compile_definitions(libsegmentator PUBLIC ${TORCH_CXX_FLAGS})
target_include_directories(libsegmentator PUBLIC ${TORCH_INCLUDE_DIRS})
target_include_directories(libsegmentator PUBLIC ${PYTHON_INCLUDE_DIRS})
# 安装部分
set(BARE_PROJECT_NAME segmentator)
install(CODE "execute_process( \
COMMAND ${CMAKE_COMMAND} -E create_symlink \
${PROJECT_SOURCE_DIR}/../../${BARE_PROJECT_NAME} \
${CMAKE_INSTALL_PREFIX}/${BARE_PROJECT_NAME} \
)"
)
问题:CMake Error at /usr/local/share/cmake-3.30/Modules/CMakeDetermineCompilerId.cmake:400 (file):file failed to open for writing (Permission denied):
原因:权限不够
解决办法:sudo cmake…
问题:`-- The CUDA compiler identification is unknown
CMake Error at /home/admin815/anaconda3/envs/isbnet/lib/python3.7/site-packages/torch/share/cmake/Caffe2/public/cuda.cmake:47 (enable_language):
The CMAKE_CUDA_COMPILER:
/usr/local/cuda/bin/nvcc-DCMAKE_CUDA_ARCHITECTURES=86
is not a full path to an existing compiler tool.
Tell CMake where to find the compiler by setting either the environment
variable “CUDACXX” or the CMake cache entry CMAKE_CUDA_COMPILER to the full
path to the compiler, or to the compiler name if it is in the PATH.`
原因:CMake 无法找到 CUDA 编译器 nvcc,原因是您提供的路径不正确或 nvcc 不在系统的 PATH 环境变量中。
解决办法:CMake 配置过程中直接指定 CMAKE_CUDA_COMPILER
cmake -DCMAKE_CUDA_COMPILER=/usr/local/cuda-11.3/bin/nvcc
有关cmake的报错已全部解决,接下来继续执行命令
sudo apt-get install libsparsehash-dev
cd ISBNet/isbnet/pointnet2
python3 setup.py bdist_wheel
cd ./dist
pip3 install <.whl> <.whl>指的是dist文件夹的whl
cd ISBNet
python3 setup.py build_ext develop
问题:OSError: /home/admin815/anaconda3/envs/isbnet/lib/python3.7/site-packages/nvidia/cublas/lib/libcublas.so.11: symbol cublasLtGetStatusString version libcublasLt.so.11 not defined in file libcublasLt.so.11 with link time reference
原因:路径配置不对,无法找到文件libcublasLt.so.11
解决办法:修改配置环境变量的文件sh,使路径为libcublasLt.so.11所在路径,
sudo gedit ~/.bashrc
export LD_LIBRARY_PATH={libcublasLt.so.11所在路径}:$LD_LIBRARY_PATH
环境配置完成!
下面开始数据准备
本文使用s3dis数据集进行训练,所以自行准备数据集Stanford3dDataset_v1.2_Aligned_Version
数据集里有不少错误需要修改,不然后面会报错,请参考:http://t.csdnimg.cn/BPPZY
准备 superpoints
https://datasets.d2.mpi-inf.mpg.de/box2mask/segment_labels.tar.gz
格式如下:
注意!!!项目里面一定全部用绝对路径 ,后面很多问题说找不到文件,其实都是因为没有用绝对路径!!!!
cd ISBNet/dataset/s3dis
bash prepare_data.sh
接下来是训练
# Pretrain step
python3 tools/train.py configs/s3dis/isbnet_backbone_s3dis_area5.yaml --only_backbone --exp_name default
# Train entire model
python3 tools/train.py configs/s3dis/isbnet_s3dis_area5.yaml --trainall --exp_name default
问题: File “/home/admin815/PycharmProjects/ISBNet-master/isbnet/data/s3dis.py”, line 35, in get_filenames assert len(filenames) > 0, f"Empty {p}"AssertionError: Empty Area_1`
原因:isbnet/data/s3dis.py 代码有问题
filenames = glob(osp.join(self.data_root, "preprocess", p + "*" + self.suffix))
这里+"*" 原本是指取文件夹里面全部文件,但是不知道为什么他解析不出来当成了一个字符而已,导致preprocess文件夹里面的内容取不出来,于是重写代码
解决:
`# 指定文件夹路径
directory = ‘/home/admin815/PycharmProjects/ISBNet-master/dataset/s3dis/preprocess/’
# 获取文件夹内的所有文件名
filenames = os.listdir(directory)
# 过滤掉非文件的项目(如子文件夹)
filenames = [f for f in filenames if os.path.isfile(os.path.join(directory, f))]
filenames = [os.path.join(directory, filenames) for filenames in filenames]
# 对文件名进行排序
filenames = sorted(filenames)`
问题:大概就是说superpoint…找不到,报错位置也是在isbnet/data/s3dis.py
在函数def load(self, filename):里
spp_filename = osp.join(self.data_root, “superpoints”, scan_id + “.pth”)
原因:scan_id = osp.basename(filename).replace(self.suffix, “”)没有执行成功
解决办法:修改configs/s3dis/isbnet_backbone_s3dis_area5.yaml
suffix:’_inst_nostuff.pth’ 修改为 ‘_inst_nostuff’
问题:RuntimeError: /io/build/temp.linux-x86_64-cpython-37/spconv/build/core_cc/src/csrc/sparse/all/SpconvOps/SpconvOps_get_indice_pairs.cc(65)not implemented for CPU ONLY build.
原因:这个错误表明你在构建或运行一个仅支持 GPU 的库或功能,但你的环境或配置是 CPU-only。spconv 是一个用于稀疏卷积的库,通常依赖于 GPU。
解决办法:pip install spconv-cu113
解决完之后终于跑起来了!!太不容易了
最后测试并可视化:
测试
python3 tools/test.py configs/s3dis/isbnet_s3dis_area5.yaml /home/admin815/PycharmProjects/ISBNet-master/work_dirs/s3dis/isbnet_backbone_s3dis_area5/default/latest.pth --out results/isbnet_s3dis_val
问题:visualization/vis_s3dis.py有个地方要改
xyz, rgb, semantic_label, instance_label = torch.load(
f"{args.data_root}/{args.split}/{args.scene_name}_inst_nostuff.pth"
)
改为
xyz, rgb, semantic_label, instance_label = torch.load(
f"{args.data_root}/{args.split}/{args.scene_name}_inst_nostuff"
)
可视化
python3 visualization/vis_s3dis.py --data_root /home/admin815/PycharmProjects/ISBNet-master/dataset/s3dis/ --scene_name Area_1_conferenceRoom_2 --prediction_path /home/admin815/PycharmProjects/ISBNet-master/results/isbnet_s3dis_val --task inst_pred
cd /home/admin815/PycharmProjects/ISBNet-master/visualization/pyviz3d
python -m http.server 6008
这里应该是我训练次数不够多,看不出实例分割(应该吧),训练30轮都要一天,不训练这么久了,赶紧试一下自己的数据集算了