一、背景:
之前一篇博文中已经实现了maskrcnn_tf1.15.0环境的win10+cpu模型训练,但cpu训练实在是非常的耗时,据说tf1.x是支持RTX1060的(本人未测试),但不支持最新的RTX3090,查阅了很多资料,原因应该是tf1.x与tf2.x的差别比较大,必须升级到tf2.x,才可以正常使用rtx3090。
下面是maskrcnn_tf1.15.0的开发案例,本人亲测可用。
二、maskrcnn_tf1升级到maskrcnn_tf2的解决方法:
1.安装必要的包
用镜像安装速度更快,格式如:(pip install -i https://pypi.douban.com/simple/ tensorflow)
主要的pip list:
tensorflow 2.6.0
keras 2.6.0
matplotlib 3.2.2
h5py 3.1.0
numpy 1.19.5
scikit-image 0.16.2
tensorflow-gpu 2.6.2
opencv-python 4.5.4.60
详细pip list如下:
(py36_maskrcnn_env_bak) C:\Users\dell>pip list
Package Version
------------------------ -------------------
absl-py 0.15.0
astor 0.8.1
astunparse 1.6.3
backcall 0.2.0
bleach 1.5.0
cached-property 1.5.2
cachetools 4.2.4
certifi 2020.6.20
charset-normalizer 2.0.9
clang 5.0
colorama 0.4.4
cycler 0.11.0
Cython 0.29.28
dataclasses 0.8
decorator 4.4.2
flatbuffers 1.12
gast 0.4.0
google-auth 1.35.0
google-auth-oauthlib 0.4.6
google-pasta 0.2.0
grpcio 1.44.0
h5py 3.1.0
html5lib 0.9999999
idna 3.3
imageio 2.13.5
imgaug 0.4.0
imgviz 1.4.1
importlib-metadata 4.8.3
ipython 7.16.2
ipython-genutils 0.2.0
jedi 0.17.2
keras 2.6.0
Keras-Applications 1.0.8
Keras-Preprocessing 1.1.2
kiwisolver 1.3.1
labelme 3.16.2
libclang 11.1.0
Markdown 3.3.6
matplotlib 3.2.2
mock 4.0.3
networkx 2.5.1
numpy 1.19.5
nvidia-pyindex 1.0.9
oauthlib 3.1.1
opencv-python 4.5.4.60
opt-einsum 3.3.0
packaging 21.3
parso 0.7.1
pickleshare 0.7.5
Pillow 8.3.2
pip 21.2.2
prompt-toolkit 3.0.24
protobuf 3.17.3
pyasn1 0.4.8
pyasn1-modules 0.2.8
Pygments 2.10.0
pyparsing 3.0.6
PyQt5 5.15.6
PyQt5-Qt5 5.15.2
PyQt5-sip 12.9.0
python-dateutil 2.8.2
PyWavelets 1.1.1
PyYAML 6.0
QtPy 1.9.0
requests 2.26.0
requests-oauthlib 1.3.0
rsa 4.8
scikit-image 0.16.2
scipy 1.4.1
setuptools 58.0.4
Shapely 1.8.1.post1
six 1.15.0
tb-nightly 2.6.0a20210806
tensorboard 2.6.0
tensorboard-data-server 0.6.1
tensorboard-plugin-wit 1.8.1
tensorflow 2.6.0
tensorflow-estimator 2.6.0
tensorflow-gpu 2.6.2
tensorflow-gpu-estimator 2.2.0
termcolor 1.1.0
tf-estimator-nightly 2.7.0.dev2021092408
tifffile 2020.9.3
traitlets 4.3.3
typing-extensions 3.7.4.3
urllib3 1.26.7
wcwidth 0.2.5
Werkzeug 2.0.2
wheel 0.37.0
wincertstore 0.2
wrapt 1.12.1
zipp 3.6.0
2.用新mrcnn替换原mrcnn路径下的全部代码文件
mrcnn 替换掉官方的mrcnn文件夹即可开箱使用
新mrcnn代码路径如下:
https://github.com/junlintianxiatjm/MaskRCNN_TF2https://github.com/junlintianxiatjm/MaskRCNN_TF2注意调小Batch size,即调小IMAGES_PER_GPU,本项目测试是8,太大会提示内存耗尽。
错误信息如下:
2022-03-11 11:02:03.235518: I tensorflow/stream_executor/cuda/cuda_dnn.cc:369] Loaded cuDNN version 8200
2022-03-11 11:02:09.947906: W tensorflow/core/common_runtime/bfc_allocator.cc:272] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.40GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
设置 train.py里面的 “IMAGES_PER_GPU = 8”或者更新值即可解决。
gpu rtx3090训练效果如下: