目录
data_generator(最关键部分)
上面这段代码只关注数据处理部分,也就是当中的data_generator函数,这里我只介绍train_generator是如何生成的。
def data_generator(dataset, config, shuffle=True, augment=False, augmentation=None,
random_rois=0, batch_size=1, detection_targets=False,
no_augmentation_sources=None):
输入参数:
dataset:就是主函数中进行过参数初始化及更新的数据类
config:config.py以及主函数中修改过的配置文件
shuffle:True
augment:False
augmentation:imgaug中的aug
random_rois:0
batch_size:1
detection_targets:False
no_augmentation_sources:None
参数初始化,其中image_ids=[0,1,2,...,122216,122217],no_augmentation_sources=[].
backbone_shapes = compute_backbone_shapes(config, config.IMAGE_SHAPE)
调用了model.py中的compute_backbone_shapes()
def compute_backbone_shapes(config, image_shape): """Computes the width and height of each stage of the backbone network. Returns: [N, (height, width)]. Where N is the number of stages """ if callable(config.BACKBONE): return config.COMPUTE_BACKBONE_SHAPE(image_shape) # Currently supports ResNet only assert config.BACKBONE in ["resnet50", "resnet101"] return np.array( [[int(math.ceil(image_shape[0] / stride)), int(math.ceil(image_shape[1] / stride))] for stride in config.BACKBONE_STRIDES])
其实这里在进入rpn时所使用的,当中包括P2(256),P3(128),P4(64),P5(32),P6(16),backbone_shapes具体值如下所示:
[[256 256]
[128 128]
[ 64 64]
[ 32 32]
[ 16 16]]
generate_pyramid_anchor
anchors = utils.generate_pyramid_anchors(config.RPN_ANCHOR_SCALES, config.RPN_ANCHOR_RATIOS, backbone_shapes, config.BACKBONE_STRIDES, config.RPN_ANCHOR_STRIDE)
使用utils.py中的generate_pyramid_anchor函数
def generate_pyramid_anchors(scales, ratios, feature_shapes, feature_strides, anchor_stride): """Generate anchors at different levels of a feature pyramid. Each scale is associated with a level of the pyramid, but each ratio is used in all levels of the pyramid. Returns: anchors: [N, (y1, x1, y2, x2)]. All generated anchors in one array. Sorted with the same order of the given scales. So, anchors of scale[0] come first, then anchors of scale[1], and so on. """ # Anchors # [anchor_count, (y1, x1, y2, x2)] anchors = [] for i in range(len(scales)): anchors.append(generate_anchors(scales[i], ratios, feature_shapes[i], feature_strides[i], anchor_stride)) return np.concatenate(anchors, axis=0)
输入参数为:
scales:config.RPN_ANCHOR_SCALES=(32,64,128,256,512)
ratios:config.RPN_ANCHOR_RATIOS=(0.5,1,2)
feature_shapes:backbone_shapes=[[256 256] [128 128] [ 64 64] [ 32 32] [ 16 16]]
feature_strides:[4,8,16,32,64]这个时与上面的feature_shapes一一对应的,因为规定的输入图像尺寸为1024*1024
anchor_stride:1
这里的anchors共循环了5次,那么来看看一个的generate_anchors,这个函数在utils.py中,
scales 32 64 128 256 512 ratio [0.5, 1, 2] feature_shapes (256,256) (128,128) (64,64) (32,32) (16,16) feature_stride 4 8 16 32 64 anchor_stride 1 1 1 1 1 将生成的(256*256*3, 4),(128*128*3, 4),(64*64*3,4),(32*32*3,4),(16*16*3,4)进行组合,就得到最终的所有特征图的预设anchors坐标,其形状为(261888,4) generate_anchors
def generate_anchors(scales, ratios, shape, feature_stride, anchor_stride):
输入参数:
scales:scales[0]=32
ratios:[0.5,1,2]
shape:feature_shape[0]=[256,256]
feature_stride:feature_strides[0]=4
anchor_stride:anchor_stride=1
scales, ratios = np.meshgrid(np.array(scales), np.array(ratios)) scales = scales.flatten() ratios = ratios.flatten()
不懂np.meshgrid可以参考https://blog.csdn.net/u013066730/article/details/101776821。所以
scales:[32,32,32]
ratios:[0.5,1. ,2. ]
heights = scales / np.sqrt(ratios) widths = scales * np.sqrt(ratios)
计算了预设的anchor的宽高,得到的结果为,
heights:[45.254834,32., 22.627417]
widths:[22.627417,32.,45.254834]
shifts_y = np.arange(0, shape[0], anchor_stride) * feature_stride shifts_x = np.arange(0, shape[1], anchor_stride) * feature_stride shifts_x, shifts_y = np.meshgrid(shifts_x, shifts_y)
这里的shifts_y和shifts_x是在对应的特征图(这里特征图大小为256*256),在没乘feature_stride之前,是在256*256上的坐标,乘了feature_stride就变成了输入图像1024*1024的坐标了。紧接着就是画网格,来看一下
shifts_x:
[[ 0 4 8 ... 1012 1016 1020]
[ 0 4 8 ... 1012 1016 1020]
[ 0 4 8 ... 1012 1016 1020]
...
[ 0 4 8 ... 1012 1016 1020]
[ 0 4 8 ... 1012 1016 1020]
[ 0 4 8 ... 1012 1016 1020]]shifts_y:
[[ 0 0 0 ... 0 0 0]
[ 4 4 4 ... 4 4 4]
[ 8 8 8 ... 8 8 8]
...
[1012 1012 1012 ... 1012 1012 1012]
[1016 1016 1016 ... 1016 1016 1016]
[1020 1020 1020 ... 1020 1020 1020]]box_widths, box_centers_x = np.meshgrid(widths, shifts_x) # shape is (65536,3) box_heights, box_centers_y = np.meshgrid(heights, shifts_y) # shape is (65536,3)
其实就是每一个预设的anchorbox在对应位置的坐标和长宽。
box_widths:[[22.627417 32. 45.254834]
[22.627417 32. 45.254834]
[22.627417 32. 45.254834]
...
[22.627417 32. 45.254834]
[22.627417 32. 45.254834]
[22.627417 32. 45.254834]]box_centers_x:[[ 0 0 0]
[ 4 4 4]
[ 8 8 8]
...
[1012 1012 1012]
[1016 1016 1016]
[1020 1020 1020]]# Reshape to get a list of (y, x) and a list of (h, w) box_centers = np.stack( [box_centers_y, box_centers_x], axis=2).reshape([-1, 2]) # shape is (196608, 2) box_sizes = np.stack([box_heights, box_widths], axis=2).reshape([-1, 2]) # shape is (196608, 2) # Convert to corner coordinates (y1, x1, y2, x2) boxes = np.concatenate([box_centers - 0.5 * box_sizes, box_centers + 0.5 * box_sizes], axis=1)
将box_centers和box_sizes形状变成容易读取和理解的方式,就变成x,y以及h,w,形状为(196608,2),196608是由65536*3得到。
其中np.stack在第2维进行叠加(从第0维开始计算),但是box_centers_y和box_centers_x只有2个维度,分别是第0维和第1维,没有第2维,这时,该方法会自动在第2维上添加一个维度,只不过该维度值为1,然后经过堆叠就变为2。所以堆叠后形状为(65536,3,2),然后通过reshape得到(196608,2)的形状。
boxes 就是将中心点坐标变为左上角和右下角的坐标,形状为(196608,4)。
如此循环5次,共产生(256*256*3, 4),(128*128*3, 4),(64*64*3,4),(32*32*3,4),(16*16*3,4),再经过np.concatenate(anchors, axis=0)代码,就能得到最终的所有特征图所能预设的anchors,其形状为(261888,4)。
下面从外调函数再返回本段讨论的主程序:
image_index = (image_index + 1) % len(image_ids)
if shuffle and image_index == 0:
np.random.shuffle(image_ids)
# Get GT bounding boxes and masks for image.
image_id = image_ids[image_index]
上面这段image_index表示选取的索引值,这个是防止第二次epoch读取超出范围。然后每次epoch开始时都随机以下图像id的索引值。这里要了解image_ids的具体值是[0,1,2,3,4...122217]。选取实际的image_id。
这段其实就是为了选取图像的时候足够随机。
# If the image source is not to be augmented pass None as augmentation
if dataset.image_info[image_id]['source'] in no_augmentation_sources:
image, image_meta, gt_class_ids, gt_boxes, gt_masks = \
load_image_gt(dataset, config, image_id, augment=augment,
augmentation=None,
use_mini_mask=config.USE_MINI_MASK)
else:
image, image_meta, gt_class_ids, gt_boxes, gt_masks = \
load_image_gt(dataset, config, image_id, augment=augment,
augmentation=augmentation,
use_mini_mask=config.USE_MINI_MASK)
上面这段表示是否有数据增强,本文使用了增强部分的代码。接下来我们进入到load_image_gt函数来看关键的数据读取函数。
load_image_gt
def load_image_gt(dataset, config, image_id, augment=False, augmentation=None, use_mini_mask=False):
输入参数:
dataset:就是coco数据集表示的类,数据的所有信息都包含在里面;
config:config就是配置文件,自行去查看,有些参数在中途可能被修改;
image_id:一个整数值,表示图像的id值,例如53347;
augment:作者写的老的增强方法,已经被作者弃用了;
augmentation:作者采用的新的增强方法,这是一个公开的图像增强库,很强大,这里直接就是一个类;
use_mini_mask:True,表明需要使用小的分割mask。
image = dataset.load_image(image_id)
这个load_image函数出现在mrcnn\utils.py中Dataset类中,
def load_image(self, image_id): """Load the specified image and return a [H,W,3] Numpy array. """ # Load image image = skimage.io.imread(self.image_info[image_id]['path']) # If grayscale. Convert to RGB for consistency. if image.ndim != 3: image = skimage.color.gray2rgb(image) # If has an alpha channel, remove it for consistency if image.shape[-1] == 4: image = image[..., :3] return image
根据图像id找到对应的图像路径,然后读取图像,返回图像。
mask, class_ids = dataset.load_mask(image_id)
进入到load_mask的代码,这个load_mask被重写,出现在samples\coco\coco.py中的CocoDataset类中。
def load_mask(self, image_id): """Load instance masks for the given image. Different datasets use different ways to store masks. This function converts the different mask format to one format in the form of a bitmap [height, width, instances]. Returns: masks: A bool array of shape [height, width, instance count] with one mask per instance. class_ids: a 1D array of class IDs of the instance masks. """ # If not a COCO image, delegate to parent class. image_info = self.image_info[image_id] if image_info["source"] != "coco": return super(CocoDataset, self).load_mask(image_id) instance_masks = [] class_ids = [] annotations = self.image_info[image_id]["annotations"] # Build mask of shape [height, width, instance_count] and list # of class IDs that correspond to each channel of the mask. for annotation in annotations: class_id = self.map_source_class_id( "coco.{}".format(annotation['category_id'])) if class_id: m = self.annToMask(annotation, image_info["height"], image_info["width"]) # Some objects are so small that they're less than 1 pixel area # and end up rounded out. Skip those objects. if m.max() < 1: continue # Is it a crowd? If so, use a negative class ID. if annotation['iscrowd']: # Use negative class ID for crowds class_id *= -1 # For crowd masks, annToMask() sometimes returns a mask # smaller than the given dimensions. If so, resize it. if m.shape[0] != image_info["height"] or m.shape[1] != image_info["width"]: m = np.ones([image_info["height"], image_info["width"]], dtype=bool) instance_masks.append(m) class_ids.append(class_id) # Pack instance masks into an array if class_ids: mask = np.stack(instance_masks, axis=2).astype(np.bool) class_ids = np.array(class_ids, dtype=np.int32) return mask, class_ids else: # Call super class to return an empty mask return super(CocoDataset, self).load_mask(image_id)
输入一个image_id,对应的annotations形如:
[{'image_id':1234,'segmentation':[[x轴坐标],[y轴坐标]], 'iscrowd': 一般都为0表示可以使用该目标不是很拥挤, 'bbox': [26.62, 293.17, 230.53, 153.48]表示外接矩形框坐标, 'category_id': 具体的类别, 'id': 不清楚是干啥的, 'area':面积},{'image_id':1234,'segmentation':[[x轴坐标],[y轴坐标]], 'iscrowd': 一般都为0表示可以使用该目标不是很拥挤, 'bbox': [26.62, 293.17, 230.53, 153.48]表示外接矩形框坐标, 'category_id': 具体的类别, 'id': 不清楚是干啥的, 'area':面积},{'image_id':1234,'segmentation':[[x轴坐标],[y轴坐标]], 'iscrowd': 一般都为0表示可以使用该目标不是很拥挤, 'bbox': [26.62, 293.17, 230.53, 153.48]表示外接矩形框坐标, 'category_id': 具体的类别, 'id': 不清楚是干啥的, 'area':面积}]
如上面距离所示,image_id一直都表示这一张图片,然后这张图片中有3个物体。
接下来将选取其中一个物体的annotation进行介绍:
class_id = self.map_source_class_id( "coco.{}".format(annotation['category_id']))
调用了函数
def map_source_class_id(self, source_class_id): """Takes a source class ID and returns the int class ID assigned to it. For example: dataset.map_source_class_id("coco.12") -> 23 """ return self.class_from_source_map[source_class_id]
self.class_from_source_map = {"{}.{}".format(info['source'], info['id']): id for info, id in zip(self.class_info, self.class_ids)}
从上述代码中可以看出,其实就是coco本身在标记的时候有自己的类别定义,比如coco中选取了80类,但是出现了90的类别字样,显然coco自身的类别需要被我们重新排定,这就有了相应了类别重定义。例如{'coco.90':79,'coco.0':1}等。
m = self.annToMask(annotation, image_info["height"], image_info["width"])
上面annoToMask函数我就不进去看了,涉及到了coco数据的解析,我比较懒就算了,直接得出结果,这里m就是一个物体的掩膜,当中非该物体为0,是这个物体即为1(其他的以此类推,但是这个掩膜始终只能包含0或1值)。
if m.max() < 1: continue
如果小于1,表示该掩膜中没有物体,丢弃进入下一次循环。
if annotation['iscrowd']: # Use negative class ID for crowds class_id *= -1 # For crowd masks, annToMask() sometimes returns a mask # smaller than the given dimensions. If so, resize it. if m.shape[0] != image_info["height"] or m.shape[1] != image_info["width"]: m = np.ones([image_info["height"], image_info["width"]], dtype=bool)
如果出现太拥挤的标签,给其赋予新标签-1,者对后面选取训练样本的时候有帮助。
instance_masks.append(m) class_ids.append(class_id)
在instance_masks中添加掩膜m,及其类别标签。一定记住这里不是一个图片添加一次,而是一个实例物体添加一次,千万不能搞混了。
mask = np.stack(instance_masks, axis=2).astype(np.bool) class_ids = np.array(class_ids, dtype=np.int32) return mask, class_ids
上面这段代码就是将其进行了组合,最终mask的形状为(图像高,图像宽,该图片中实例的数量);class_ids的形状为(该图片中实例的数量)
哎,这里还是在load_image_gt函数中,从中可以看出这个load_image_gt函数是真的有点关键。
original_shape = image.shape image, window, scale, padding, crop = utils.resize_image( image, min_dim=config.IMAGE_MIN_DIM, min_scale=config.IMAGE_MIN_SCALE, max_dim=config.IMAGE_MAX_DIM, mode=config.IMAGE_RESIZE_MODE)
首先来看看函数resize_image
def resize_image(image, min_dim=None, max_dim=None, min_scale=None, mode="square"):
输入参数:
image:输入的rgb图像
min_dim:800
min_scale:0
max_dim:1024
mode:'square'
# Keep track of image dtype and return results in the same dtype image_dtype = image.dtype # Default window (y1, x1, y2, x2) and default scale == 1. h, w = image.shape[:2] window = (0, 0, h, w) scale = 1 padding = [(0, 0), (0, 0), (0, 0)] crop = None
上面这段就是参数声明,image_type一般情况下是uint8,h,w是图像的高宽,window这里就是初始化,其他就不说了。
# Scale? if min_dim: # Scale up but not down scale = max(1, min_dim / min(h, w)) if min_scale and scale < min_scale: scale = min_scale # Does it exceed max dim? if max_dim and mode == "square": image_max = max(h, w) if round(image_max * scale) > max_dim: scale = max_dim / image_max # Resize image using bilinear interpolation if scale != 1: image = resize(image, (round(h * scale), round(w * scale)), preserve_range=True)
min_dim为800,满足条件进入。假设此时的h=494,w=640。scale=800/494=1.6194;由于min_scale为0,该判断被跳过,进入到下一段语句。然后if max_dim and mode == "square"这个判断中,取得原图h和w中较大的那个,即640*scale=1036.4,超过了max_dim,重新调整scale,使得放缩后长边尺寸小于max_dim。
随后进行图像的放缩,这个resize虽然被重写,但其实就是放缩。
if mode == "square": # Get new height and width h, w = image.shape[:2] top_pad = (max_dim - h) // 2 bottom_pad = max_dim - h - top_pad left_pad = (max_dim - w) // 2 right_pad = max_dim - w - left_pad padding = [(top_pad, bottom_pad), (left_pad, right_pad), (0, 0)] image = np.pad(image, padding, mode='constant', constant_values=0) window = (top_pad, left_pad, h + top_pad, w + left_pad)
由于是'square'方式,所以就只介绍该方式,其他方式暂不介绍。图像放缩后,网络期望的输入是1024*1024,也就是max_dim*max_dim,所以需要进行填补,填补完后,记录放缩后的图像在填补后的图像中的位置坐标,也就是window。
return image.astype(image_dtype), window, scale, padding, crop
最后返回
image:放缩+填补的图像,其形状为(1024,1024,3)
window:放缩后的图像在填补后的图像中的位置坐标,其形状为
scale:是一个列表,当中包含了4个坐标,为别为左上角的(y1,x1,y2,x2)
padding:填补的大小,顺序如上面的代码所示
crop:None,没有用到crop
嗯,还在load_image_gt中
mask = utils.resize_mask(mask, scale, padding, crop)
将mask也按照image的处理方式进行放缩和填补。
resize_mask具体函数在mrcnn/utils.py中,
def resize_mask(mask, scale, padding, crop=None): """Resizes a mask using the given scale and padding. Typically, you get the scale and padding from resize_image() to ensure both, the image and the mask, are resized consistently. scale: mask scaling factor padding: Padding to add to the mask in the form [(top, bottom), (left, right), (0, 0)] """ # Suppress warning from scipy 0.13.0, the output shape of zoom() is # calculated with round() instead of int() with warnings.catch_warnings(): warnings.simplefilter("ignore") mask = scipy.ndimage.zoom(mask, zoom=[scale, scale, 1], order=0) if crop is not None: y, x, h, w = crop mask = mask[y:y + h, x:x + w] else: mask = np.pad(mask, padding, mode='constant', constant_values=0) return mask
还在load_image_gt中,别急
if augmentation: import imgaug # Augmenters that are safe to apply to masks # Some, such as Affine, have settings that make them unsafe, so always # test your augmentation on masks MASK_AUGMENTERS = ["Sequential", "SomeOf", "OneOf", "Sometimes", "Fliplr", "Flipud", "CropAndPad", "Affine", "PiecewiseAffine"] def hook(images, augmenter, parents, default): """Determines which augmenters to apply to masks.""" return augmenter.__class__.__name__ in MASK_AUGMENTERS # Store shapes before augmentation to compare image_shape = image.shape mask_shape = mask.shape # Make augmenters deterministic to apply similarly to images and masks det = augmentation.to_deterministic() image = det.augment_image(image) # Change mask to np.uint8 because imgaug doesn't support np.bool mask = det.augment_image(mask.astype(np.uint8), hooks=imgaug.HooksImages(activator=hook)) # Verify that shapes didn't change assert image.shape == image_shape, "Augmentation shouldn't change image size" assert mask.shape == mask_shape, "Augmentation shouldn't change mask size" # Change mask back to bool mask = mask.astype(np.bool)
使用了imgaug数据增强方法(我个人感觉也贼好用)。
还在load_image_gt中,就放弃吧,出不来了。
_idx = np.sum(mask, axis=(0, 1)) > 0 mask = mask[:, :, _idx] class_ids = class_ids[_idx] # Bounding boxes. Note that some boxes might be all zeros # if the corresponding mask got cropped out. # bbox: [num_instances, (y1, x1, y2, x2)] bbox = utils.extract_bboxes(mask)
这里我们要了解这个mask的形状为(1024,1024,一张图片的实例数(假设等于26)),np.sum就是在0和1维上求和,其实就是在计算在一个实例mask上有没有1的标记,如果有那就是合格的mask,否则就不合格需要扔掉。
这里的utils.extract_bboxes其实就是为了取出标签的bbox,通过获取掩膜mask的最小外接矩形框。具体代码如下
def extract_bboxes(mask): """Compute bounding boxes from masks. mask: [height, width, num_instances]. Mask pixels are either 1 or 0. Returns: bbox array [num_instances, (y1, x1, y2, x2)]. """ boxes = np.zeros([mask.shape[-1], 4], dtype=np.int32) for i in range(mask.shape[-1]): m = mask[:, :, i] # Bounding box. horizontal_indicies = np.where(np.any(m, axis=0))[0] vertical_indicies = np.where(np.any(m, axis=1))[0] if horizontal_indicies.shape[0]: x1, x2 = horizontal_indicies[[0, -1]] y1, y2 = vertical_indicies[[0, -1]] # x2 and y2 should not be part of the box. Increment by 1. x2 += 1 y2 += 1 else: # No mask for this instance. Might happen due to # resizing or cropping. Set bbox to zeros x1, x2, y1, y2 = 0, 0, 0, 0 boxes[i] = np.array([y1, x1, y2, x2]) return boxes.astype(np.int32)
那么最终返回的boxes的形状就为(26,4)。
嗯,还是load_image_gt中
# Active classes # Different datasets have different classes, so track the # classes supported in the dataset of this image. active_class_ids = np.zeros([dataset.num_classes], dtype=np.int32) source_class_ids = dataset.source_class_ids[dataset.image_info[image_id]["source"]] active_class_ids[source_class_ids] = 1
这里其实就是看看有哪些类被具体使用了,只有这些类才会被参与训练。这里就是81类,也就是0-80。
active_class_ids的具体值为[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
嗯,还是load_image_gt中
if use_mini_mask: mask = utils.minimize_mask(bbox, mask, config.MINI_MASK_SHAPE)
minimize_mask在mrcnn/utils.py中
def minimize_mask(bbox, mask, mini_shape):
输入参数
bbox:以上面的26个实例为例子,那么其形状为(26,4)
mask:这里的形状为(1024,1024,26),其中取值只能是0和1
mini_shape:(56,56)
mini_mask = np.zeros(mini_shape + (mask.shape[-1],), dtype=bool) for i in range(mask.shape[-1]): # Pick slice and cast to bool in case load_mask() returned wrong dtype m = mask[:, :, i].astype(bool) y1, x1, y2, x2 = bbox[i][:4] m = m[y1:y2, x1:x2] if m.size == 0: raise Exception("Invalid bounding box with area of zero") # Resize with bilinear interpolation m = resize(m, mini_shape) mini_mask[:, :, i] = np.around(m).astype(np.bool) return mini_mask
这个放缩的步骤是这样:假设mask的形状为(1024,1024,26),那么我选取其中一个实例标签,形状为(1024,1024),通过bbox找到该实例标签中为1的区域,将这块区域进行放缩。这样循环26次,就得到26个(56,56)新实例标签,所以mini_mask的形状为(56,56,26)。
还在load_image_gt中
# Image meta data image_meta = compose_image_meta(image_id, original_shape, image.shape, window, scale, active_class_ids)
这一步是为了记录对原图做了哪些操作,这在测试的时候将结果返回到原图上起到了至关重要的作用。具体的image_meta的形状为(1,3,3,4,1,81)。
好嘛,load_image_gt终于结束了
return image, image_meta, class_ids, bbox, mask
返回的变量为
image:即将输入网络的图像,形状为(1024,1024,26);
image_meta:该图像从原始输入图像变换到(1024,1024,26),具体是如何变化,该变量就记录关键的操作步骤;
class_ids:26个实例到底属于哪个类别,这里它形状为(26);
bbox:实例具体在哪个位置,形状为(26,4);
mask:由于使用了minimask,所以其形状为(56,56,26)。
后面继续,我的天这也太多了。