Ubuntu18.04实现bmask-rcnn
配置环境
cuda 11.1在这里插入代码片
cudnn
cudatoolkit 11.1.1
python 3.8 刚开始使用的是3.6,但是在安装torch1.9版本的时候冲突,需要python3.7及以上,所以直接安装了3.8
torch 1.9.0+cu111
torchaudio 0.9.0
torchvision 0.10.0+cu111
(选用 torch1.9是因为torch1.8和bmask rcnn使用的框架detectron2冲突,会出现错误RuntimeError: radix_sort: failed on 1st step: cudaErrorInvalidDevice: invalid device ordinal)
// 创建环境
conda create --name detectron2 python==3.8
//下载cudatoolkit,在base环境中安装过了cuda和cudnn,这里没有安装也可使用
conda activate detectron2
conda install -c anaconda cudatoolkit=11.1.1
//安装totch等
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
//安装detectron2
pip install fvcore
pip install cython
pip install pycocotools
python -m pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu111/torch1.9/index.html
//最容易安装的一句话,和版本也对着,直接安装成功
//开始运行
cd projects/BMaskR-CNN
python train_net.py --config-file configs/bmask_rcnn_R_50_FPN_1x.yaml --num-gpus 1
//看自己的GPU有几个,直接写就行
到这里大概率有点小错误,都是一些版本太高,不匹配的问题。
1.AttributeError: module ‘numpy‘ has no attribute ‘bool‘.
降低版本
pip uninstall numpy
pip install numpy==1.19.2
2.安装opencv-python
pip install opencv-python
3.AttributeError: module ‘PIL.Image’ has no attribute ‘LINEAR’.降低pillow版本
pip uninstall pillow
pip install pillow==8.4.0
4.AttributeError: module ‘distutils‘ has no attribute ‘version‘,
修改了这个函数,4,6,7,10加了注释,就可以了
import tensorboard
from setuptools import distutils
#LooseVersion = distutils.version.LooseVersion
#if not hasattr(tensorboard, '__version__') or LooseVersion(tensorboard.__version__) < LooseVersion('1.15'):
# raise ImportError('TensorBoard logging requires TensorBoard version 1.15 or above')
del distutils
#del LooseVersion
del tensorboard
from .writer import FileWriter, SummaryWriter # noqa: F401
from tensorboard.summary.writer.record_writer import RecordWriter # noqa: F401
5.FloatingPointError: Predicted boxes or scores contain Inf/Nan. Training has diverged.
还有GPU内存不够的问题。
_BASE_: Base-BMask-R-CNN-FPN.yaml
MODEL:
WEIGHTS: detectron2://ImageNetPretrained/MSRA/R-50.pkl
MASK_ON: true
RESNETS:
DEPTH: 50
INPUT:
MIN_SIZE_TRAIN: (800,)
TEST:
EVAL_PERIOD: 10000
DATASETS:
TRAIN: ("coco_2017_train",)
TEST: ("coco_2017_val",)
SOLVER:
#IMS_PER_BATCH: 16
IMS_PER_BATCH: 4
#BASE_LR: 0.02
BASE_LR: 0.001
STEPS: (60000, 80000)
MAX_ITER: 90000
OUTPUT_DIR: "output/bmask_rcnn_r50_1x"
主要修改IMS_PER_BATCH: 4, BASE_LR: 0.001
好啦,到这里就配置完成了环境,再次运行语句,就可以成功运行了,一般来说,出现问题都是环境没有配置好,要认真检查。
下面是一张成功运行的截图
COCO数据集下载链接,巨快!
https://aistudio.baidu.com/datasetdetail/7122
下载后保存的格式如下