【MaskRCNN】源码系列一：train数据处理二

最新推荐文章于 2021-08-09 12:15:19 发布

mjiansun

最新推荐文章于 2021-08-09 12:15:19 发布

阅读量1.8k

点赞数 4

分类专栏： Keras 论文笔记

本文链接：https://blog.csdn.net/u013066730/article/details/102501128

版权

论文笔记同时被 2 个专栏收录

87 篇文章 20 订阅

订阅专栏

Keras

43 篇文章 5 订阅

订阅专栏

data_generator（最关键部分）

generate_pyramid_anchor

generate_anchors

load_image_gt

data_generator（最关键部分）

上面这段代码只关注数据处理部分，也就是当中的data_generator函数，这里我只介绍train_generator是如何生成的。

def data_generator(dataset, config, shuffle=True, augment=False, augmentation=None,
                   random_rois=0, batch_size=1, detection_targets=False,
                   no_augmentation_sources=None):

输入参数：

dataset：就是主函数中进行过参数初始化及更新的数据类

config：config.py以及主函数中修改过的配置文件

shuffle：True

augment：False

augmentation：imgaug中的aug

random_rois：0

batch_size：1

detection_targets：False

no_augmentation_sources：None

参数初始化，其中image_ids=[0,1,2,...,122216,122217]，no_augmentation_sources=[].

backbone_shapes = compute_backbone_shapes(config, config.IMAGE_SHAPE)

调用了model.py中的compute_backbone_shapes()

def compute_backbone_shapes(config, image_shape):
    """Computes the width and height of each stage of the backbone network.

    Returns:
        [N, (height, width)]. Where N is the number of stages
    """
    if callable(config.BACKBONE):
        return config.COMPUTE_BACKBONE_SHAPE(image_shape)

    # Currently supports ResNet only
    assert config.BACKBONE in ["resnet50", "resnet101"]
    return np.array(
        [[int(math.ceil(image_shape[0] / stride)),
            int(math.ceil(image_shape[1] / stride))]
            for stride in config.BACKBONE_STRIDES])

其实这里在进入rpn时所使用的，当中包括P2（256），P3（128），P4（64），P5（32），P6（16），backbone_shapes具体值如下所示：

[[256 256]
[128 128]
[ 64 64]
[ 32 32]
[ 16 16]]

generate_pyramid_anchor
    anchors = utils.generate_pyramid_anchors(config.RPN_ANCHOR_SCALES,
                                             config.RPN_ANCHOR_RATIOS,
                                             backbone_shapes,
                                             config.BACKBONE_STRIDES,
                                             config.RPN_ANCHOR_STRIDE)
使用utils.py中的generate_pyramid_anchor函数
def generate_pyramid_anchors(scales, ratios, feature_shapes, feature_strides,
                             anchor_stride):
    """Generate anchors at different levels of a feature pyramid. Each scale
    is associated with a level of the pyramid, but each ratio is used in
    all levels of the pyramid.

    Returns:
    anchors: [N, (y1, x1, y2, x2)]. All generated anchors in one array. Sorted
        with the same order of the given scales. So, anchors of scale[0] come
        first, then anchors of scale[1], and so on.
    """
    # Anchors
    # [anchor_count, (y1, x1, y2, x2)]
    anchors = []
    for i in range(len(scales)):
        anchors.append(generate_anchors(scales[i], ratios, feature_shapes[i],
                                        feature_strides[i], anchor_stride))
    return np.concatenate(anchors, axis=0)
输入参数为：

scales：config.RPN_ANCHOR_SCALES=（32，64，128，256，512）

ratios：config.RPN_ANCHOR_RATIOS=（0.5，1，2）

feature_shapes：backbone_shapes=[[256 256] [128 128] [ 64 64] [ 32 32] [ 16 16]]

feature_strides：[4,8,16,32,64]这个时与上面的feature_shapes一一对应的，因为规定的输入图像尺寸为1024*1024

anchor_stride：1

这里的anchors共循环了5次，那么来看看一个的generate_anchors，这个函数在utils.py中，

scales 32 64 128 256 512
ratio [0.5, 1, 2]
feature_shapes (256,256) (128,128) (64,64) (32,32) (16,16)
feature_stride 4 8 16 32 64
anchor_stride 1 1 1 1 1
将生成的（256*256*3， 4），（128*128*3， 4），（64*64*3，4），（32*32*3，4），（16*16*3，4）进行组合，就得到最终的所有特征图的预设anchors坐标，其形状为（261888，4）

generate_anchors
def generate_anchors(scales, ratios, shape, feature_stride, anchor_stride):
输入参数：

scales：scales[0]=32

ratios：[0.5,1,2]

shape：feature_shape[0]=[256,256]

feature_stride：feature_strides[0]=4

anchor_stride：anchor_stride=1
    scales, ratios = np.meshgrid(np.array(scales), np.array(ratios))
    scales = scales.flatten()
    ratios = ratios.flatten()
不懂np.meshgrid可以参考https://blog.csdn.net/u013066730/article/details/101776821。所以

scales：[32，32，32]

ratios：[0.5，1. ，2. ]
    heights = scales / np.sqrt(ratios)
    widths = scales * np.sqrt(ratios)
计算了预设的anchor的宽高，得到的结果为，

heights：[45.254834，32.， 22.627417]

widths：[22.627417，32.，45.254834]
    shifts_y = np.arange(0, shape[0], anchor_stride) * feature_stride
    shifts_x = np.arange(0, shape[1], anchor_stride) * feature_stride
    shifts_x, shifts_y = np.meshgrid(shifts_x, shifts_y)
这里的shifts_y和shifts_x是在对应的特征图（这里特征图大小为256*256），在没乘feature_stride之前，是在256*256上的坐标，乘了feature_stride就变成了输入图像1024*1024的坐标了。紧接着就是画网格，来看一下

shifts_x：

[[ 0 4 8 ... 1012 1016 1020]
[ 0 4 8 ... 1012 1016 1020]
[ 0 4 8 ... 1012 1016 1020]
...
[ 0 4 8 ... 1012 1016 1020]
[ 0 4 8 ... 1012 1016 1020]
[ 0 4 8 ... 1012 1016 1020]]

shifts_y：

[[ 0 0 0 ... 0 0 0]
[ 4 4 4 ... 4 4 4]
[ 8 8 8 ... 8 8 8]
...
[1012 1012 1012 ... 1012 1012 1012]
[1016 1016 1016 ... 1016 1016 1016]
[1020 1020 1020 ... 1020 1020 1020]]
    box_widths, box_centers_x = np.meshgrid(widths, shifts_x) # shape is (65536,3)
    box_heights, box_centers_y = np.meshgrid(heights, shifts_y) # shape is (65536,3)
其实就是每一个预设的anchorbox在对应位置的坐标和长宽。

box_widths：[[22.627417 32. 45.254834]
[22.627417 32. 45.254834]
[22.627417 32. 45.254834]
...
[22.627417 32. 45.254834]
[22.627417 32. 45.254834]
[22.627417 32. 45.254834]]

box_centers_x：[[ 0 0 0]
[ 4 4 4]
[ 8 8 8]
...
[1012 1012 1012]
[1016 1016 1016]
[1020 1020 1020]]
    # Reshape to get a list of (y, x) and a list of (h, w)
    box_centers = np.stack(
        [box_centers_y, box_centers_x], axis=2).reshape([-1, 2]) # shape is (196608, 2)
    box_sizes = np.stack([box_heights, box_widths], axis=2).reshape([-1, 2]) # shape is (196608, 2)
    # Convert to corner coordinates (y1, x1, y2, x2)
    boxes = np.concatenate([box_centers - 0.5 * box_sizes,
                            box_centers + 0.5 * box_sizes], axis=1)
将box_centers和box_sizes形状变成容易读取和理解的方式，就变成x,y以及h,w，形状为（196608，2），196608是由65536*3得到。

其中np.stack在第2维进行叠加（从第0维开始计算），但是box_centers_y和box_centers_x只有2个维度，分别是第0维和第1维，没有第2维，这时，该方法会自动在第2维上添加一个维度，只不过该维度值为1，然后经过堆叠就变为2。所以堆叠后形状为（65536，3，2），然后通过reshape得到（196608，2）的形状。

boxes 就是将中心点坐标变为左上角和右下角的坐标，形状为（196608，4）。

如此循环5次，共产生（256*256*3， 4），（128*128*3， 4），（64*64*3，4），（32*32*3，4），（16*16*3，4），再经过np.concatenate(anchors, axis=0)代码，就能得到最终的所有特征图所能预设的anchors，其形状为（261888，4）。

下面从外调函数再返回本段讨论的主程序：

            image_index = (image_index + 1) % len(image_ids)
            if shuffle and image_index == 0:
                np.random.shuffle(image_ids)

            # Get GT bounding boxes and masks for image.
            image_id = image_ids[image_index]

上面这段image_index表示选取的索引值，这个是防止第二次epoch读取超出范围。然后每次epoch开始时都随机以下图像id的索引值。这里要了解image_ids的具体值是[0,1,2,3,4...122217]。选取实际的image_id。

这段其实就是为了选取图像的时候足够随机。

            # If the image source is not to be augmented pass None as augmentation
            if dataset.image_info[image_id]['source'] in no_augmentation_sources:
                image, image_meta, gt_class_ids, gt_boxes, gt_masks = \
                load_image_gt(dataset, config, image_id, augment=augment,
                              augmentation=None,
                              use_mini_mask=config.USE_MINI_MASK)
            else:
                image, image_meta, gt_class_ids, gt_boxes, gt_masks = \
                    load_image_gt(dataset, config, image_id, augment=augment,
                                augmentation=augmentation,
                                use_mini_mask=config.USE_MINI_MASK)

上面这段表示是否有数据增强，本文使用了增强部分的代码。接下来我们进入到load_image_gt函数来看关键的数据读取函数。

load_image_gt
def load_image_gt(dataset, config, image_id, augment=False, augmentation=None,
                  use_mini_mask=False):
输入参数：

dataset：就是coco数据集表示的类，数据的所有信息都包含在里面；

config：config就是配置文件，自行去查看，有些参数在中途可能被修改；

image_id：一个整数值，表示图像的id值，例如53347；

augment：作者写的老的增强方法，已经被作者弃用了；

augmentation：作者采用的新的增强方法，这是一个公开的图像增强库，很强大，这里直接就是一个类；

use_mini_mask：True，表明需要使用小的分割mask。

image = dataset.load_image(image_id)

这个load_image函数出现在mrcnn\utils.py中Dataset类中，

    def load_image(self, image_id):
        """Load the specified image and return a [H,W,3] Numpy array.
        """
        # Load image
        image = skimage.io.imread(self.image_info[image_id]['path'])
        # If grayscale. Convert to RGB for consistency.
        if image.ndim != 3:
            image = skimage.color.gray2rgb(image)
        # If has an alpha channel, remove it for consistency
        if image.shape[-1] == 4:
            image = image[..., :3]
        return image

根据图像id找到对应的图像路径，然后读取图像，返回图像。

mask, class_ids = dataset.load_mask(image_id)
进入到load_mask的代码，这个load_mask被重写，出现在samples\coco\coco.py中的CocoDataset类中。
    def load_mask(self, image_id):
        """Load instance masks for the given image.

        Different datasets use different ways to store masks. This
        function converts the different mask format to one format
        in the form of a bitmap [height, width, instances].

        Returns:
        masks: A bool array of shape [height, width, instance count] with
            one mask per instance.
        class_ids: a 1D array of class IDs of the instance masks.
        """
        # If not a COCO image, delegate to parent class.
        image_info = self.image_info[image_id]
        if image_info["source"] != "coco":
            return super(CocoDataset, self).load_mask(image_id)

        instance_masks = []
        class_ids = []
        annotations = self.image_info[image_id]["annotations"]
        # Build mask of shape [height, width, instance_count] and list
        # of class IDs that correspond to each channel of the mask.
        for annotation in annotations:
            class_id = self.map_source_class_id(
                "coco.{}".format(annotation['category_id']))
            if class_id:
                m = self.annToMask(annotation, image_info["height"],
                                   image_info["width"])
                # Some objects are so small that they're less than 1 pixel area
                # and end up rounded out. Skip those objects.
                if m.max() < 1:
                    continue
                # Is it a crowd? If so, use a negative class ID.
                if annotation['iscrowd']:
                    # Use negative class ID for crowds
                    class_id *= -1
                    # For crowd masks, annToMask() sometimes returns a mask
                    # smaller than the given dimensions. If so, resize it.
                    if m.shape[0] != image_info["height"] or m.shape[1] != image_info["width"]:
                        m = np.ones([image_info["height"], image_info["width"]], dtype=bool)
                instance_masks.append(m)
                class_ids.append(class_id)

        # Pack instance masks into an array
        if class_ids:
            mask = np.stack(instance_masks, axis=2).astype(np.bool)
            class_ids = np.array(class_ids, dtype=np.int32)
            return mask, class_ids
        else:
            # Call super class to return an empty mask
            return super(CocoDataset, self).load_mask(image_id)
输入一个image_id，对应的annotations形如：

[{'image_id':1234,'segmentation':[[x轴坐标],[y轴坐标]], 'iscrowd': 一般都为0表示可以使用该目标不是很拥挤, 'bbox': [26.62, 293.17, 230.53, 153.48]表示外接矩形框坐标, 'category_id': 具体的类别, 'id': 不清楚是干啥的, 'area':面积},{'image_id':1234,'segmentation':[[x轴坐标],[y轴坐标]], 'iscrowd': 一般都为0表示可以使用该目标不是很拥挤, 'bbox': [26.62, 293.17, 230.53, 153.48]表示外接矩形框坐标, 'category_id': 具体的类别, 'id': 不清楚是干啥的, 'area':面积},{'image_id':1234,'segmentation':[[x轴坐标],[y轴坐标]], 'iscrowd': 一般都为0表示可以使用该目标不是很拥挤, 'bbox': [26.62, 293.17, 230.53, 153.48]表示外接矩形框坐标, 'category_id': 具体的类别, 'id': 不清楚是干啥的, 'area':面积}]

如上面距离所示，image_id一直都表示这一张图片，然后这张图片中有3个物体。

接下来将选取其中一个物体的annotation进行介绍：
class_id = self.map_source_class_id(
                "coco.{}".format(annotation['category_id']))
调用了函数
    def map_source_class_id(self, source_class_id):
        """Takes a source class ID and returns the int class ID assigned to it.

        For example:
        dataset.map_source_class_id("coco.12") -> 23
        """
        return self.class_from_source_map[source_class_id]
self.class_from_source_map = {"{}.{}".format(info['source'], info['id']): id
                                      for info, id in zip(self.class_info, self.class_ids)}
从上述代码中可以看出，其实就是coco本身在标记的时候有自己的类别定义，比如coco中选取了80类，但是出现了90的类别字样，显然coco自身的类别需要被我们重新排定，这就有了相应了类别重定义。例如{'coco.90':79,'coco.0':1}等。
                m = self.annToMask(annotation, image_info["height"],
                                   image_info["width"])
上面annoToMask函数我就不进去看了，涉及到了coco数据的解析，我比较懒就算了，直接得出结果，这里m就是一个物体的掩膜，当中非该物体为0，是这个物体即为1（其他的以此类推，但是这个掩膜始终只能包含0或1值）。
                if m.max() < 1:
                    continue
如果小于1，表示该掩膜中没有物体，丢弃进入下一次循环。
                if annotation['iscrowd']:
                    # Use negative class ID for crowds
                    class_id *= -1
                    # For crowd masks, annToMask() sometimes returns a mask
                    # smaller than the given dimensions. If so, resize it.
                    if m.shape[0] != image_info["height"] or m.shape[1] != image_info["width"]:
                        m = np.ones([image_info["height"], image_info["width"]], dtype=bool)
如果出现太拥挤的标签，给其赋予新标签-1，者对后面选取训练样本的时候有帮助。
                instance_masks.append(m)
                class_ids.append(class_id)
在instance_masks中添加掩膜m，及其类别标签。一定记住这里不是一个图片添加一次，而是一个实例物体添加一次，千万不能搞混了。
            mask = np.stack(instance_masks, axis=2).astype(np.bool)
            class_ids = np.array(class_ids, dtype=np.int32)
            return mask, class_ids
上面这段代码就是将其进行了组合，最终mask的形状为（图像高，图像宽，该图片中实例的数量）；class_ids的形状为（该图片中实例的数量）

哎，这里还是在load_image_gt函数中，从中可以看出这个load_image_gt函数是真的有点关键。
    original_shape = image.shape
    image, window, scale, padding, crop = utils.resize_image(
        image,
        min_dim=config.IMAGE_MIN_DIM,
        min_scale=config.IMAGE_MIN_SCALE,
        max_dim=config.IMAGE_MAX_DIM,
        mode=config.IMAGE_RESIZE_MODE)
首先来看看函数resize_image
def resize_image(image, min_dim=None, max_dim=None, min_scale=None, mode="square"):
输入参数：

image：输入的rgb图像

min_dim：800

min_scale：0

max_dim：1024

mode：'square'
    # Keep track of image dtype and return results in the same dtype
    image_dtype = image.dtype
    # Default window (y1, x1, y2, x2) and default scale == 1.
    h, w = image.shape[:2]
    window = (0, 0, h, w)
    scale = 1
    padding = [(0, 0), (0, 0), (0, 0)]
    crop = None
上面这段就是参数声明，image_type一般情况下是uint8，h，w是图像的高宽，window这里就是初始化，其他就不说了。
    # Scale?
    if min_dim:
        # Scale up but not down
        scale = max(1, min_dim / min(h, w))
    if min_scale and scale < min_scale:
        scale = min_scale

    # Does it exceed max dim?
    if max_dim and mode == "square":
        image_max = max(h, w)
        if round(image_max * scale) > max_dim:
            scale = max_dim / image_max

    # Resize image using bilinear interpolation
    if scale != 1:
        image = resize(image, (round(h * scale), round(w * scale)),
                       preserve_range=True)
min_dim为800，满足条件进入。假设此时的h=494,w=640。scale=800/494=1.6194；由于min_scale为0，该判断被跳过，进入到下一段语句。然后if max_dim and mode == "square"这个判断中，取得原图h和w中较大的那个，即640*scale=1036.4，超过了max_dim，重新调整scale，使得放缩后长边尺寸小于max_dim。

随后进行图像的放缩，这个resize虽然被重写，但其实就是放缩。
    if mode == "square":
        # Get new height and width
        h, w = image.shape[:2]
        top_pad = (max_dim - h) // 2
        bottom_pad = max_dim - h - top_pad
        left_pad = (max_dim - w) // 2
        right_pad = max_dim - w - left_pad
        padding = [(top_pad, bottom_pad), (left_pad, right_pad), (0, 0)]
        image = np.pad(image, padding, mode='constant', constant_values=0)
        window = (top_pad, left_pad, h + top_pad, w + left_pad)
由于是'square'方式，所以就只介绍该方式，其他方式暂不介绍。图像放缩后，网络期望的输入是1024*1024，也就是max_dim*max_dim，所以需要进行填补，填补完后，记录放缩后的图像在填补后的图像中的位置坐标，也就是window。
    return image.astype(image_dtype), window, scale, padding, crop
最后返回

image：放缩+填补的图像，其形状为（1024，1024，3）

window：放缩后的图像在填补后的图像中的位置坐标，其形状为

scale：是一个列表，当中包含了4个坐标，为别为左上角的（y1,x1,y2,x2）

padding：填补的大小，顺序如上面的代码所示

crop：None，没有用到crop

嗯，还在load_image_gt中

    mask = utils.resize_mask(mask, scale, padding, crop)

将mask也按照image的处理方式进行放缩和填补。

resize_mask具体函数在mrcnn/utils.py中，

def resize_mask(mask, scale, padding, crop=None):
    """Resizes a mask using the given scale and padding.
    Typically, you get the scale and padding from resize_image() to
    ensure both, the image and the mask, are resized consistently.

    scale: mask scaling factor
    padding: Padding to add to the mask in the form
            [(top, bottom), (left, right), (0, 0)]
    """
    # Suppress warning from scipy 0.13.0, the output shape of zoom() is
    # calculated with round() instead of int()
    with warnings.catch_warnings():
        warnings.simplefilter("ignore")
        mask = scipy.ndimage.zoom(mask, zoom=[scale, scale, 1], order=0)
    if crop is not None:
        y, x, h, w = crop
        mask = mask[y:y + h, x:x + w]
    else:
        mask = np.pad(mask, padding, mode='constant', constant_values=0)
    return mask

还在load_image_gt中，别急

    if augmentation:
        import imgaug

        # Augmenters that are safe to apply to masks
        # Some, such as Affine, have settings that make them unsafe, so always
        # test your augmentation on masks
        MASK_AUGMENTERS = ["Sequential", "SomeOf", "OneOf", "Sometimes",
                           "Fliplr", "Flipud", "CropAndPad",
                           "Affine", "PiecewiseAffine"]

        def hook(images, augmenter, parents, default):
            """Determines which augmenters to apply to masks."""
            return augmenter.__class__.__name__ in MASK_AUGMENTERS

        # Store shapes before augmentation to compare
        image_shape = image.shape
        mask_shape = mask.shape
        # Make augmenters deterministic to apply similarly to images and masks
        det = augmentation.to_deterministic()
        image = det.augment_image(image)
        # Change mask to np.uint8 because imgaug doesn't support np.bool
        mask = det.augment_image(mask.astype(np.uint8),
                                 hooks=imgaug.HooksImages(activator=hook))
        # Verify that shapes didn't change
        assert image.shape == image_shape, "Augmentation shouldn't change image size"
        assert mask.shape == mask_shape, "Augmentation shouldn't change mask size"
        # Change mask back to bool
        mask = mask.astype(np.bool)

使用了imgaug数据增强方法（我个人感觉也贼好用）。

还在load_image_gt中，就放弃吧，出不来了。

    _idx = np.sum(mask, axis=(0, 1)) > 0
    mask = mask[:, :, _idx]
    class_ids = class_ids[_idx]
    # Bounding boxes. Note that some boxes might be all zeros
    # if the corresponding mask got cropped out.
    # bbox: [num_instances, (y1, x1, y2, x2)]
    bbox = utils.extract_bboxes(mask)

这里我们要了解这个mask的形状为（1024，1024，一张图片的实例数（假设等于26）），np.sum就是在0和1维上求和，其实就是在计算在一个实例mask上有没有1的标记，如果有那就是合格的mask，否则就不合格需要扔掉。

这里的utils.extract_bboxes其实就是为了取出标签的bbox，通过获取掩膜mask的最小外接矩形框。具体代码如下

def extract_bboxes(mask):
    """Compute bounding boxes from masks.
    mask: [height, width, num_instances]. Mask pixels are either 1 or 0.

    Returns: bbox array [num_instances, (y1, x1, y2, x2)].
    """
    boxes = np.zeros([mask.shape[-1], 4], dtype=np.int32)
    for i in range(mask.shape[-1]):
        m = mask[:, :, i]
        # Bounding box.
        horizontal_indicies = np.where(np.any(m, axis=0))[0]
        vertical_indicies = np.where(np.any(m, axis=1))[0]
        if horizontal_indicies.shape[0]:
            x1, x2 = horizontal_indicies[[0, -1]]
            y1, y2 = vertical_indicies[[0, -1]]
            # x2 and y2 should not be part of the box. Increment by 1.
            x2 += 1
            y2 += 1
        else:
            # No mask for this instance. Might happen due to
            # resizing or cropping. Set bbox to zeros
            x1, x2, y1, y2 = 0, 0, 0, 0
        boxes[i] = np.array([y1, x1, y2, x2])
    return boxes.astype(np.int32)

那么最终返回的boxes的形状就为（26，4）。

嗯，还是load_image_gt中
    # Active classes
    # Different datasets have different classes, so track the
    # classes supported in the dataset of this image.
    active_class_ids = np.zeros([dataset.num_classes], dtype=np.int32)
    source_class_ids = dataset.source_class_ids[dataset.image_info[image_id]["source"]]
    active_class_ids[source_class_ids] = 1
这里其实就是看看有哪些类被具体使用了，只有这些类才会被参与训练。这里就是81类，也就是0-80。

active_class_ids的具体值为[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]

嗯，还是load_image_gt中
    if use_mini_mask:
        mask = utils.minimize_mask(bbox, mask, config.MINI_MASK_SHAPE)
minimize_mask在mrcnn/utils.py中
def minimize_mask(bbox, mask, mini_shape):
输入参数

bbox：以上面的26个实例为例子，那么其形状为（26，4）

mask：这里的形状为（1024，1024，26），其中取值只能是0和1

mini_shape：（56，56）
    mini_mask = np.zeros(mini_shape + (mask.shape[-1],), dtype=bool)
    for i in range(mask.shape[-1]):
        # Pick slice and cast to bool in case load_mask() returned wrong dtype
        m = mask[:, :, i].astype(bool)
        y1, x1, y2, x2 = bbox[i][:4]
        m = m[y1:y2, x1:x2]
        if m.size == 0:
            raise Exception("Invalid bounding box with area of zero")
        # Resize with bilinear interpolation
        m = resize(m, mini_shape)
        mini_mask[:, :, i] = np.around(m).astype(np.bool)
    return mini_mask
这个放缩的步骤是这样：假设mask的形状为（1024，1024，26），那么我选取其中一个实例标签，形状为（1024，1024），通过bbox找到该实例标签中为1的区域，将这块区域进行放缩。这样循环26次，就得到26个（56，56）新实例标签，所以mini_mask的形状为（56，56，26）。

还在load_image_gt中
    # Image meta data
    image_meta = compose_image_meta(image_id, original_shape, image.shape,
                                    window, scale, active_class_ids)
这一步是为了记录对原图做了哪些操作，这在测试的时候将结果返回到原图上起到了至关重要的作用。具体的image_meta的形状为（1，3，3，4，1，81）。

好嘛，load_image_gt终于结束了
    return image, image_meta, class_ids, bbox, mask
返回的变量为

image：即将输入网络的图像，形状为（1024，1024，26）；

image_meta：该图像从原始输入图像变换到（1024，1024，26），具体是如何变化，该变量就记录关键的操作步骤；

class_ids：26个实例到底属于哪个类别，这里它形状为（26）；

bbox：实例具体在哪个位置，形状为（26，4）；

mask：由于使用了minimask，所以其形状为（56，56，26）。

后面继续，我的天这也太多了。

mjiansun

关注

4
点赞
踩
7

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录

scales	32	64	128	256	512
ratio	[0.5, 1, 2]
feature_shapes	(256,256)	(128,128)	(64,64)	(32,32)	(16,16)
feature_stride	4	8	16	32	64
anchor_stride	1	1	1	1	1
将生成的（2562563， 4），（1281283， 4），（64643，4），（32323，4），（16163，4）进行组合，就得到最终的所有特征图的预设anchors坐标，其形状为（261888，4）