准备
PointPillars开源代码路径
错误汇总
1. ModuleNotFoundError: No module named ‘second’
报错:
进入到./second路径下,执行下面命令
python create_data.py create_kitti_info_file --data_path=/data/sets/kitti_second/
会报错,
ModuleNotFoundError: No module named ‘second’
除此之外还有许多警告,先忽略警告内容。
原因
当我们导入一个模块时: import xxx ,默认情况下python解释器会搜索当前目录、已安装的内置模块和第三方模块。
但目前无法找到‘second’module,而‘second’在上一层目录,因此需要添加该路径,使得程序可以成功import second。
解决方案
在create_data.py的’from skimage import io as imgio’后添加:
import sys
sys.path.append("..")
重新编译,这个错误消失。
2. /usr/lib/x86_64-linux-gnu/libcuda.so: file too short
报错:
进入到./second路径下,执行下面命令
python create_data.py create_kitti_info_file --data_path=/data/sets/kitti_second/
会报错,
numba.cuda.cudadrv.error.CudaSupportError: Error at driver init:
Possible CUDA driver libraries are found but error occurred during load:
/usr/lib/x86_64-linux-gnu/libcuda.so: file too short
原因
源码文档中提供的docker中安装好了环境,但是在docker中无法使用本机的GPU,因此需要安装nvidia-docker2。
nvidia-docker2是一个可以使用GPU的docker,nvidia-docker是在docker上做了一层封装,通过nvidia-docker-plugin,然后调用到docker上。
解决方案
安装nvidia-docker2。前提是在本机上已经安装好了nvidia显卡驱动,这里不再赘述显卡驱动安装过程。
1. 安装
过程参考官网链接
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker
步骤如下:
- 设置稳定版本的库及GPG密钥
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
- 更新好包列表之后,安装nvidia-docker2包及其依赖:
sudo apt-get update
sudo apt-get install -y nvidia-docker2
Get:1 https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64 libnvidia-container1 1.10.0-1 [926 kB]
Get:2 https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64 libnvidia-container-tools 1.10.0-1 [24.1 kB]
Get:3 https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64 nvidia-container-toolkit 1.10.0-1 [1,961 kB]
Get:4 https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64 nvidia-docker2 2.11.0-1 [5,544 B]
Fetched 2,917 kB in 1min 36s (30.3 kB/s)
- 重启Docker后台驻留程序:
sudo systemctl restart docker
- 现在可以通过运行base CUDA container来测试一个working setup
sudo docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi
- 控制台的输出如下,说明nvidia-docker2安装成功:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 On | N/A |
| N/A 54C P8 9W / N/A | 260MiB / 6070MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
2. 使用
安装完成后,使用下面命令进入一个新的docker,千万不要加--rm
参数,否则退出后docker会消失,
sudo docker run -it --gpus all smallmunich/suke_pointpillars:v1 /bin/bash
输入命令
nvidia-smi
控制台的输出如下,说明docker中可以使用gpu:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 On | N/A |
| N/A 54C P8 9W / N/A | 260MiB / 6070MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
重新拉去git,准备数据集,重新执行create_data.py,报错消失,打印出如下信息说明运行成功:
warnings.warn(errors.NumbaDeprecationWarning(msg, self.func_ir.loc))
Kitti info train file is saved to /data/sets/kitti_second/kitti_infos_train.pkl
Kitti info val file is saved to /data/sets/kitti_second/kitti_infos_val.pkl
Kitti info trainval file is saved to /data/sets/kitti_second/kitti_infos_trainval.pkl
Kitti info test file is saved to /data/sets/kitti_second/kitti_infos_test.pkl
3. FileNotFoundError: [Errno 2] No such file or directory: ‘/media/holo/B834B57734B538E8/kitti/data/kitti_dbinfos_train.pkl’
报错
路径切换到/second下,运行
python ./pytorch/train.py train --config_path=./configs/pointpillars/car/xyres_16.proto --model_dir=/path/to/model_dir
报错:
FileNotFoundError: [Errno 2] No such file or directory: '/media/holo/B834B57734B538E8/kitti/data/kitti_dbinfos_train.pkl'
原因
"Modify config file"步骤不知道需要修改的文件路径在哪儿,因此没有执行。
解决方法
打开文件
vim second/configs/pointpillars/cat/xyres_16.proto
将相关路径修改为(前提是数据集位置和数据集保持一致,都放在/data/sets/kitti_second/中)
train_input_reader: {
...
database_sampler {
database_info_path: "/data/sets/kitti_second/kitti_dbinfos_train.pkl"
...
}
kitti_info_path: "/data/sets/kitti_second/kitti_infos_train.pkl"
kitti_root_path: "/data/sets/kitti_second"
}
...
eval_input_reader: {
...
kitti_info_path: "/data/sets/kitti_second/kitti_infos_val.pkl"
kitti_root_path: "/data/sets/kitti_second"
}
其他几个xyres_x.proto不需要修改,已经是正确的了。怀疑作者故意将xyres_16.proto改掉,然后让其他人知道这个地方需要根据数据集路径进行修改。
opt/conda/conda-bld/pytorch_1565272279342/work/aten/src/ATen/native/IndexingUtils.h:20: UserWarning: indexing with dtype torch.uint8 is now deprecated, please use a dtype torch.bool instead.
4. 可视化 ‘ModuleNotFoundError: No module named ‘OpenGL’’
报错
使用kittiviewer进行可视化过程,输入命令
python viewer.py
报错
ModuleNotFoundError: No module named 'OpenGL'
原因
没有安装OpenGL
AttributeError: ‘NoneType’ object has no attribute ‘glGetError’
解决方法
-
换源:查看ubuntu版本
cat /etc/issue
,并根据网上方法换源 -
安装OpenGL:
apt-get install python3-opengl
5. docker无法界面显示
过程参考链接
https://www.modb.pro/db/337891