一、项目主框架代码(tensorflow版本的Mask RCNN)
Mask RCNN论文及代码解析参考我的另一篇博客:https://blog.csdn.net/qq_32172681/article/details/99761084
Mask RCNN keras实现代码,大神的github地址:https://github.com/matterport/Mask_RCNN
运行环境:tensorflow_gpu 1.14.0,CUDA版本是10.0,cudnn版本号7.4.1,python3.6,tensorflow+keras实现。
二、训练过程
1、数据预处理/标签预处理
数据预处理:
去均值
数据增强imgaug
共12000+张图像,其中10000张为训练集,其他为验证集
标签预处理:
(1)标签格式
训练集的标签采用RLE编码,RLE编码见我的另一篇博客:https://blog.csdn.net/qq_32172681/article/details/100537042
如下图所示:ImageId_ClassId为image_name+"_"+class_id,EncodedPixels为图片的RLE编码,一共有4类缺陷,因此每4行数据表示一个图片的标签。
(2)将EncodedPixels转换为mask和bbox
- 1个EncodedPixels得到1个mask,它的size为图像大小[256,1600],1表示mask,0表示background
- 1个mask得到多个bbox:(x1,y1,x2,y2)
此部分代码参考自kaggle public kernel:https://www.kaggle.com/applefish/get-bboxes-from-segmentation-labels
"""将rle编码转换为mask数组(size=[256,1600],1表示mask,0表示background)"""
def rle_decode(mask_rle, shape=(768, 768)):
"""
mask_rle: run-length as string formated (start length)
shape: (height,width) of array to return
Returns numpy array, 1 - mask, 0 - background
"""
s = mask_rle.split()
starts, lengths = [np.asarray(x, dtype=int) for x in (s[0:][::2], s[1:][::2])]
starts -= 1
ends = starts + lengths
img = np.zeros(shape[0] * shape[1], dtype=np.uint8)
for lo, hi in zip(starts, ends):
img[lo:hi] = 1
return img.reshape((shape[1], shape[0])).T # Needed to align to RLE direction
""""""
def masks_as_image(in_mask_list, all_masks=None, shape=(256, 1600)):
# Take the individual masks and create a single mask array
if all_masks is None:
all_masks = np.zeros(shape, dtype=np.int16)
# if isinstance(in_mask_list, list):
for mask in in_mask_list:
if isinstance(mask, str):
all_masks += rle_decode(mask, shape)
return np.expand_dims(all_masks, -1)
"""从一个rle编码的mask,获取所有的bbox"""
def get_bboxes_from_rle(encoded_pixels, return_mask=False):
"""get all bboxes from a whole mask label"""
"""将rle编码转换为mask(size=[256,1600],1表示mask,0表示background)"""
mask = masks_as_image([encoded_pixels])