最全的基于COCO的Mask RCNN 模型（避坑指南）

最新推荐文章于 2025-03-13 11:18:43 发布

LN烟雨缥缈

最新推荐文章于 2025-03-13 11:18:43 发布

阅读量4.3k

点赞数 21

分类专栏：目标检测文章标签：目标检测人工智能计算机视觉深度学习神经网络

本文链接：https://blog.csdn.net/wind82465/article/details/119667485

版权

目标检测专栏收录该内容

7 篇文章

订阅专栏

这两天有点思路准备修改基于Mask RCNN网络模型，思路整理了一下，准备跑一下Mask RCNN，最起码先把base模型跑通再进行修改实验嘛，结果这个Mask RCNN模型的demo环境搞了两天(⊙﹏⊙)b，为了这年这两天光荣的日子，还是写一篇博客纪念一下，也为其他小伙伴提供个参考。

1、实验环境

2、网络模型

3、遇到的坑

1、AttributeError: module ‘tensorflow’ has no attribute ‘log’；

2、AttributeError: module ‘tensorflow._api.v2.sets’ has no attribute ‘set_intersection’

3、ValueError: Tried to convert ‘shape’ to a tensor and failed. Error: None values not supported.

4、AttributeError: module ‘keras.engine.saving’ has no attribute ‘load_weights_from_hdf5_group_by_name

5、Unable to open file (truncated file: eof = 7340032, sblock->base_addr = 0, stored_eof = 126651688)

6、在jupyter notebook上运行模型长时间卡顿

7、demo可以正常运行，但是出现*** No instances to display ***

4、Tensorflow2.X版本能否运行

5、最后看一下基于COCO的demo运行效果把：

6、总结

1、实验环境

跑通一个模型肯定需要补充对模型的了解，找一个对模型复现比较好的base模型可以对模型细节更加理解通透（只看论文虽然模型觉得已经掌握，但是细节还是看代码），并且对自己的实验或者训练自己的数据集更方便的base模型才是想要的。我的实验环境：

GPU:RTX3090；
内存：64G；
CPU：Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz；
系统：Ubuntu18.04；

2、网络模型

搜了一遍网上代码，这个代码还是对模型理解不错的，并且星标人数也是最多的，果然群众的眼睛都是雪亮的：模型链接，实验环境如下：

numpy
scipy
Pillow
cython
matplotlib
scikit-image
tensorflow>=1.3.0
keras>=2.0.8
opencv-python
h5py
imgaug
IPython[all]

好了，介绍完实验背景和环境，下面就说一下我接下来遇到的坑，希望可以给大家提供帮助。这篇博文不是对环境部署详细解释的，如果有小伙伴需要环境部署的详细说明，可以参考我之前的文章：深度学习环境搭建，这篇文章的显卡是2070的。至于模型环境的部署，模型链接里面的Readme就可以，直接按步骤下载和安装好了。

3、遇到的坑

言归正传，我搭建好环境之后开始测试，就出现各种问题。以下问题都是在model.py中进行修改：

1、AttributeError: module ‘tensorflow’ has no attribute ‘log’；

解决办法：

#将log2_grap函数修改如下：
def log2_graph(x):
    """Implementation of Log2. TF doesn't have a native implementation."""
    return tf.math.log(x) / tf.math.log(2.0)

2、AttributeError: module ‘tensorflow._api.v2.sets’ has no attribute ‘set_intersection’

解决办法：

#将TensorFlow引入改为v1版本
import tensorflow as tf
变为:
import tensorflow.compat.v1 as tf

3、ValueError: Tried to convert ‘shape’ to a tensor and failed. Error: None values not supported.

解决办法：

#将如下代码：
mrcnn_bbox = KL.Reshape((s[1], num_classes, 4), name="mrcnn_bbox")(x)
#修改为：
if s[1] is None:
        mrcnn_bbox = KL.Reshape((-1, num_classes, 4), name="mrcnn_bbox")(x)
    else:
        mrcnn_bbox = KL.Reshape((s[1], num_classes, 4), name="mrcnn_bbox")(x)
#将如下代码：
indices = tf.stack([tf.range(probs.shape[0]), class_ids], axis=1)
#修改为：
indices = tf.stack([tf.range(tf.shape(probs)[0]), class_ids], axis = 1)

4、AttributeError: module ‘keras.engine.saving’ has no attribute ‘load_weights_from_hdf5_group_by_name

解决办法：

#将如下代码：
if by_name:
    saving.load_weights_from_hdf5_group_by_name(f, layers)
else:
    saving.load_weights_from_hdf5_group(f, layers)
#修改为：
keras_model.load_weights(filepath, by_name=by_name)

5、Unable to open file (truncated file: eof = 7340032, sblock->base_addr = 0, stored_eof = 126651688)

解决办法：

查看一下自己本地的coco预训练模型：mask_rcnn_coco.h5，查看大小与原网站链接上的文件大小是否一致，如不一致请重新下载。

6、在jupyter notebook上运行模型长时间卡顿

问题描述：在导入本地Mask CRNN库和检测推理时候程序长时间卡顿，我计时过大概得有10分钟，我以下面的本地库导入代码举例：

import os
import sys
import random
import math
import numpy as np
import skimage.io
import matplotlib
import matplotlib.pyplot as plt

# Root directory of the project
ROOT_DIR = os.path.abspath("../")
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
print(1)
import tensorflow as tf
print(2)
tf.debugging.set_log_device_placement(True)
print(3)
gpus = tf.config.list_physical_devices('GPU')
print(4)
tf.config.experimental.set_visible_devices(gpus[0], 'GPU')
print(5)
os.environ['CUDA_VISIBLE_DEVICES']="0" # 指定哪块GPU训练 
config=tf.compat.v1.ConfigProto() 
print(6)
# 设置最大占有GPU不超过显存的80%（可选）
# config.gpu_options.per_process_gpu_memory_fraction=0.8
config.gpu_options.allow_growth = True  # 设置动态分配GPU内存
print(7)
sess=tf.compat.v1.Session(config=config)
print(8)

tf.compat.v1.disable_eager_execution()#保证sess.run()能够正常运行
hello = tf.constant('hello,tensorflow')
sess= tf.compat.v1.Session(config=tf.compat.v1.ConfigProto(log_device_placement=True))#版本2.0的函数
print(sess.run(hello))

    
# Import Mask RCNN
sys.path.append(ROOT_DIR)  # To find local version of the library
from mrcnn import utils
import mrcnn.model as modellib
from mrcnn import visualize
# Import COCO config
sys.path.append(os.path.join(ROOT_DIR, "samples/coco/"))  # To find local version
import coco

%matplotlib inline 

# Directory to save logs and trained model
MODEL_DIR = os.path.join(ROOT_DIR, "logs")

# Local path to trained weights file
COCO_MODEL_PATH = os.path.join(ROOT_DIR, "mask_rcnn_coco.h5")
# Download COCO trained weights from Releases if needed
if not os.path.exists(COCO_MODEL_PATH):
    utils.download_trained_weights(COCO_MODEL_PATH)

# Directory of images to run detection on
IMAGE_DIR = os.path.join(ROOT_DIR, "images")

上诉代码是base模型demo程序的导入本地库、加载模型路径的代码，我只是加上了针对GPU增长方式的控制。就上面的代码，卡顿了10分钟，甚至连模型都没加载（COCO预训练模型已经下载好到本地），问题出在哪呢？从上面代码可以看见，我每部都做了print(NUM)，结果发现导入基本库就卡住了，基本都是与TensorFlow库相关卡住。

解决办法：

原因是NVIDIA驱动版本、TensorFlow、keras、CUDA、CUDNN的版本要相对应。我的NVIDIA的版本是460.X，对应版本的CUDA是11.2，CUDNN对应8.X版本。TensorFlow版本是2.6.0，对应keras版本是2.6.0。过程我就不叙述，实在折磨人，这是我最后更改的版本，运行程序再无卡顿情况。

7、demo可以正常运行，但是出现* No instances to display *

解决办法：

这个问题着实困扰我很久，网上有很多也说改这个代码，改那个代码的。我都实验过，其实还是部署环境时候各个模块的版本要对应上。与我配置一致的小伙伴可以参考我的配置肯定可以解决：我的NVIDIA的版本是460.X，对应版本的CUDA是11.2，CUDNN对应8.X版本。TensorFlow版本是2.6.0，对应keras版本是2.6.0。如果是版本比较老的可以参考网上的版本对应信息，也基本都可以解决。