【MaskRCNN】源码系列一:train数据处理二

目录

data_generator(最关键部分)

generate_pyramid_anchor 

generate_anchors

load_image_gt


data_generator(最关键部分)

上面这段代码只关注数据处理部分,也就是当中的data_generator函数,这里我只介绍train_generator是如何生成的。

def data_generator(dataset, config, shuffle=True, augment=False, augmentation=None,
                   random_rois=0, batch_size=1, detection_targets=False,
                   no_augmentation_sources=None):

输入参数:

dataset:就是主函数中进行过参数初始化及更新的数据类

config:config.py以及主函数中修改过的配置文件

shuffle:True

augment:False

augmentation:imgaug中的aug

random_rois:0

batch_size:1

detection_targets:False

no_augmentation_sources:None

 

 

参数初始化,其中image_ids=[0,1,2,...,122216,122217],no_augmentation_sources=[].

backbone_shapes = compute_backbone_shapes(config, config.IMAGE_SHAPE)

调用了model.py中的compute_backbone_shapes()

def compute_backbone_shapes(config, image_shape):
    """Computes the width and height of each stage of the backbone network.

    Returns:
        [N, (height, width)]. Where N is the number of stages
    """
    if callable(config.BACKBONE):
        return config.COMPUTE_BACKBONE_SHAPE(image_shape)

    # Currently supports ResNet only
    assert config.BACKBONE in ["resnet50", "resnet101"]
    return np.array(
        [[int(math.ceil(image_shape[0] / stride)),
            int(math.ceil(image_shape[1] / stride))]
            for stride in config.BACKBONE_STRIDES])

其实这里在进入rpn时所使用的,当中包括P2(256),P3(128),P4(64),P5(32),P6(16),backbone_shapes具体值如下所示:

[[256 256]
 [128 128]
 [ 64  64]
 [ 32  32]
 [ 16  16]]

generate_pyramid_anchor 

    anchors = utils.generate_pyramid_anchors(config.RPN_ANCHOR_SCALES,
                                             config.RPN_ANCHOR_RATIOS,
                                             backbone_shapes,
                                             config.BACKBONE_STRIDES,
                                             config.RPN_ANCHOR_STRIDE)

 使用utils.py中的generate_pyramid_anchor函数 

def generate_pyramid_anchors(scales, ratios, feature_shapes, feature_strides,
                             anchor_stride):
    """Generate anchors at different levels of a feature pyramid. Each scale
    is associated with a level of the pyramid, but each ratio is used in
    all levels of the pyramid.

    Returns:
    anchors: [N, (y1, x1, y2, x2)]. All generated anchors in one array. Sorted
        with the same order of the given scales. So, anchors of scale[0] come
        first, then anchors of scale[1], and so on.
    """
    # Anchors
    # [anchor_count, (y1, x1, y2, x2)]
    anchors = []
    for i in range(len(scales)):
        anchors.append(generate_anchors(scales[i], ratios, feature_shapes[i],
                                        feature_strides[i], anchor_stride))
    return np.concatenate(anchors, axis=0)

输入参数为:

scales:config.RPN_ANCHOR_SCALES=(32,64,128,256,512)

ratios:config.RPN_ANCHOR_RATIOS=(0.5,1,2)

feature_shapes:backbone_shapes=[[256 256] [128 128] [ 64  64] [ 32  32] [ 16  16]]

feature_strides:[4,8,16,32,64]这个时与上面的feature_shapes一一对应的,因为规定的输入图像尺寸为1024*1024

anchor_stride:1

这里的anchors共循环了5次,那么来看看一个的generate_anchors,这个函数在utils.py中,

scales3264128256512
ratio[0.5, 1, 2]
feature_shapes(256,256)(128,128)(64,64)(32,32)(16,16)
feature_stride48163264
anchor_stride11111
将生成的(256*256*3, 4),(128*128*3, 4),(64*64*3,4),(32*32*3,4),(16*16*3,4)进行组合,就得到最终的所有特征图的预设anchors坐标,其形状为(261888,4)

generate_anchors

def generate_anchors(scales, ratios, shape, feature_stride, anchor_stride):

输入参数:

scales:scales[0]=32

ratios:[0.5,1,2]

shape:feature_shape[0]=[256,256]

feature_stride:feature_strides[0]=4

anchor_stride:anchor_stride=1

    scales, ratios = np.meshgrid(np.array(scales), np.array(ratios))
    scales = scales.flatten()
    ratios = ratios.flatten()

不懂np.meshgrid可以参考https://blog.csdn.net/u013066730/article/details/101776821。所以

scales:[32,32,32]

ratios:[0.5,1. ,2. ]

    heights = scales / np.sqrt(ratios)
    widths = scales * np.sqrt(ratios)

计算了预设的anchor的宽高,得到的结果为,

heights:[45.254834,32., 22.627417]

widths:[22.627417,32.,45.254834]

    shifts_y = np.arange(0, shape[0], anchor_stride) * feature_stride
    shifts_x = np.arange(0, shape[1], anchor_stride) * feature_stride
    shifts_x, shifts_y = np.meshgrid(shifts_x, shifts_y)

这里的shifts_y和shifts_x是在对应的特征图(这里特征图大小为256*256),在没乘feature_stride之前,是在256*256上的坐标,乘了feature_stride就变成了输入图像1024*1024的坐标了。紧接着就是画网格,来看一下

shifts_x

[[   0    4    8 ... 1012 1016 1020]
 [   0    4    8 ... 1012 1016 1020]
 [   0    4    8 ... 1012 1016 1020]
 ...
 [   0    4    8 ... 1012 1016 1020]
 [   0    4    8 ... 1012 1016 1020]
 [   0    4    8 ... 1012 1016 1020]]

shifts_y

[[   0    0    0 ...    0    0    0]
 [   4    4    4 ...    4    4    4]
 [   8    8    8 ...    8    8    8]
 ...
 [1012 1012 1012 ... 1012 1012 1012]
 [1016 1016 1016 ... 1016 1016 1016]
 [1020 1020 1020 ... 1020 1020 1020]]

    box_widths, box_centers_x = np.meshgrid(widths, shifts_x) # shape is (65536,3)
    box_heights, box_centers_y = np.meshgrid(heights, shifts_y) # shape is (65536,3)

其实就是每一个预设的anchorbox在对应位置的坐标和长宽。

box_widths:[[22.627417  32.       45.254834]
                      [22.627417  32.       45.254834]
                      [22.627417  32.       45.254834]
                      ...
                      [22.627417  32.       45.254834]
                      [22.627417  32.       45.254834]
                      [22.627417  32.       45.254834]]

box_centers_x:[[   0    0    0]
                            [   4    4    4]
                            [   8    8    8]
                             ...
                            [1012 1012 1012]
                            [1016 1016 1016]
                            [1020 1020 1020]]

    # Reshape to get a list of (y, x) and a list of (h, w)
    box_centers = np.stack(
        [box_centers_y, box_centers_x], axis=2).reshape([-1, 2]) # shape is (196608, 2)
    box_sizes = np.stack([box_heights, box_widths], axis=2).reshape([-1, 2]) # shape is (196608, 2)
    # Convert to corner coordinates (y1, x1, y2, x2)
    boxes = np.concatenate([box_centers - 0.5 * box_sizes,
                            box_centers + 0.5 * box_sizes], axis=1)

将box_centers和box_sizes形状变成容易读取和理解的方式,就变成x,y以及h,w,形状为(196608,2),196608是由65536*3得到。

其中np.stack在第2维进行叠加(从第0维开始计算),但是box_centers_y和box_centers_x只有2个维度,分别是第0维和第1维,没有第2维,这时,该方法会自动在第2维上添加一个维度,只不过该维度值为1,然后经过堆叠就变为2。所以堆叠后形状为(65536,3,2),然后通过reshape得到(196608,2)的形状。

boxes 就是将中心点坐标变为左上角和右下角的坐标,形状为(196608,4)。

如此循环5次,共产生(256*256*3, 4),(128*128*3, 4),(64*64*3,4),(32*32*3,4),(16*16*3,4),再经过np.concatenate(anchors, axis=0)代码,就能得到最终的所有特征图所能预设的anchors,其形状为(261888,4)。

下面从外调函数再返回本段讨论的主程序:

            image_index = (image_index + 1) % len(image_ids)
            if shuffle and image_index == 0:
                np.random.shuffle(image_ids)

            # Get GT bounding boxes and masks for image.
            image_id = image_ids[image_index]

上面这段image_index表示选取的索引值,这个是防止第二次epoch读取超出范围。然后每次epoch开始时都随机以下图像id的索引值。这里要了解image_ids的具体值是[0,1,2,3,4...122217]。选取实际的image_id。

这段其实就是为了选取图像的时候足够随机。

            # If the image source is not to be augmented pass None as augmentation
            if dataset.image_info[image_id]['source'] in no_augmentation_sources:
                image, image_meta, gt_class_ids, gt_boxes, gt_masks = \
                load_image_gt(dataset, config, image_id, augment=augment,
                              augmentation=None,
                              use_mini_mask=config.USE_MINI_MASK)
            else:
                image, image_meta, gt_class_ids, gt_boxes, gt_masks = \
                    load_image_gt(dataset, config, image_id, augment=augment,
                                augmentation=augmentation,
                                use_mini_mask=config.USE_MINI_MASK)

上面这段表示是否有数据增强,本文使用了增强部分的代码。接下来我们进入到load_image_gt函数来看关键的数据读取函数。

load_image_gt

def load_image_gt(dataset, config, image_id, augment=False, augmentation=None,
                  use_mini_mask=False):

输入参数:

dataset:就是coco数据集表示的类,数据的所有信息都包含在里面;

config:config就是配置文件,自行去查看,有些参数在中途可能被修改;

image_id:一个整数值,表示图像的id值,例如53347;

augment:作者写的老的增强方法,已经被作者弃用了;

augmentation:作者采用的新的增强方法,这是一个公开的图像增强库,很强大,这里直接就是一个类;

use_mini_mask:True,表明需要使用小的分割mask。

 

image = dataset.load_image(image_id)

这个load_image函数出现在mrcnn\utils.py中Dataset类中,

    def load_image(self, image_id):
        """Load the specified image and return a [H,W,3] Numpy array.
        """
        # Load image
        image = skimage.io.imread(self.image_info[image_id]['path'])
        # If grayscale. Convert to RGB for consistency.
        if image.ndim != 3:
            image = skimage.color.gray2rgb(image)
        # If has an alpha channel, remove it for consistency
        if image.shape[-1] == 4:
            image = image[..., :3]
        return image

根据图像id找到对应的图像路径,然后读取图像,返回图像。

 

mask, class_ids = dataset.load_mask(image_id)

进入到load_mask的代码,这个load_mask被重写,出现在samples\coco\coco.py中的CocoDataset类中。

    def load_mask(self, image_id):
        """Load instance masks for the given image.

        Different datasets use different ways to store masks. This
        function converts the different mask format to one format
        in the form of a bitmap [height, width, instances].

        Returns:
        masks: A bool array of shape [height, width, instance count] with
            one mask per instance.
        class_ids: a 1D array of class IDs of the instance masks.
        """
        # If not a COCO image, delegate to parent class.
        image_info = self.image_info[image_id]
        if image_info["source"] != "coco":
            return super(CocoDataset, self).load_mask(image_id)

        instance_masks = []
        class_ids = []
        annotations = self.image_info[image_id]["annotations"]
        # Build mask of shape [height, width, instance_count] and list
        # of class IDs that correspond to each channel of the mask.
        for annotation in annotations:
            class_id = self.map_source_class_id(
                "coco.{}".format(annotation['category_id']))
            if class_id:
                m = self.annToMask(annotation, image_info["height"],
                                   image_info["width"])
                # Some objects are so small that they're less than 1 pixel area
                # and end up rounded out. Skip those objects.
                if m.max() < 1:
                    continue
                # Is it a crowd? If so, use a negative class ID.
                if annotation['iscrowd']:
                    # Use negative class ID for crowds
                    class_id *= -1
                    # For crowd masks, annToMask() sometimes returns a mask
                    # smaller than the given dimensions. If so, resize it.
                    if m.shape[0] != image_info["height"] or m.shape[1] != image_info["width"]:
                        m = np.ones([image_info["height"], image_info["width"]], dtype=bool)
                instance_masks.append(m)
                class_ids.append(class_id)

        # Pack instance masks into an array
        if class_ids:
            mask = np.stack(instance_masks, axis=2).astype(np.bool)
            class_ids = np.array(class_ids, dtype=np.int32)
            return mask, class_ids
        else:
            # Call super class to return an empty mask
            return super(CocoDataset, self).load_mask(image_id)

输入一个image_id,对应的annotations形如:

[{'image_id':1234,'segmentation':[[x轴坐标],[y轴坐标]], 'iscrowd': 一般都为0表示可以使用该目标不是很拥挤, 'bbox': [26.62, 293.17, 230.53, 153.48]表示外接矩形框坐标, 'category_id': 具体的类别, 'id': 不清楚是干啥的, 'area':面积},{'image_id':1234,'segmentation':[[x轴坐标],[y轴坐标]], 'iscrowd': 一般都为0表示可以使用该目标不是很拥挤, 'bbox': [26.62, 293.17, 230.53, 153.48]表示外接矩形框坐标, 'category_id': 具体的类别, 'id': 不清楚是干啥的, 'area':面积},{'image_id':1234,'segmentation':[[x轴坐标],[y轴坐标]], 'iscrowd': 一般都为0表示可以使用该目标不是很拥挤, 'bbox': [26.62, 293.17, 230.53, 153.48]表示外接矩形框坐标, 'category_id': 具体的类别, 'id': 不清楚是干啥的, 'area':面积}]

如上面距离所示,image_id一直都表示这一张图片,然后这张图片中有3个物体。

接下来将选取其中一个物体的annotation进行介绍:

class_id = self.map_source_class_id(
                "coco.{}".format(annotation['category_id']))

调用了函数

    def map_source_class_id(self, source_class_id):
        """Takes a source class ID and returns the int class ID assigned to it.

        For example:
        dataset.map_source_class_id("coco.12") -> 23
        """
        return self.class_from_source_map[source_class_id]
self.class_from_source_map = {"{}.{}".format(info['source'], info['id']): id
                                      for info, id in zip(self.class_info, self.class_ids)}

从上述代码中可以看出,其实就是coco本身在标记的时候有自己的类别定义,比如coco中选取了80类,但是出现了90的类别字样,显然coco自身的类别需要被我们重新排定,这就有了相应了类别重定义。例如{'coco.90':79,'coco.0':1}等。

                m = self.annToMask(annotation, image_info["height"],
                                   image_info["width"])

上面annoToMask函数我就不进去看了,涉及到了coco数据的解析,我比较懒就算了,直接得出结果,这里m就是一个物体的掩膜,当中非该物体为0,是这个物体即为1(其他的以此类推,但是这个掩膜始终只能包含0或1值)。

                if m.max() < 1:
                    continue

如果小于1,表示该掩膜中没有物体,丢弃进入下一次循环。

                if annotation['iscrowd']:
                    # Use negative class ID for crowds
                    class_id *= -1
                    # For crowd masks, annToMask() sometimes returns a mask
                    # smaller than the given dimensions. If so, resize it.
                    if m.shape[0] != image_info["height"] or m.shape[1] != image_info["width"]:
                        m = np.ones([image_info["height"], image_info["width"]], dtype=bool)

如果出现太拥挤的标签,给其赋予新标签-1,者对后面选取训练样本的时候有帮助。

                instance_masks.append(m)
                class_ids.append(class_id)

在instance_masks中添加掩膜m,及其类别标签。一定记住这里不是一个图片添加一次,而是一个实例物体添加一次,千万不能搞混了。

            mask = np.stack(instance_masks, axis=2).astype(np.bool)
            class_ids = np.array(class_ids, dtype=np.int32)
            return mask, class_ids

上面这段代码就是将其进行了组合,最终mask的形状为(图像高,图像宽,该图片中实例的数量);class_ids的形状为(该图片中实例的数量)

 

哎,这里还是在load_image_gt函数中,从中可以看出这个load_image_gt函数是真的有点关键。

    original_shape = image.shape
    image, window, scale, padding, crop = utils.resize_image(
        image,
        min_dim=config.IMAGE_MIN_DIM,
        min_scale=config.IMAGE_MIN_SCALE,
        max_dim=config.IMAGE_MAX_DIM,
        mode=config.IMAGE_RESIZE_MODE)

 首先来看看函数resize_image

def resize_image(image, min_dim=None, max_dim=None, min_scale=None, mode="square"):

输入参数:

image:输入的rgb图像

min_dim:800

min_scale:0

max_dim:1024

mode:'square'

    # Keep track of image dtype and return results in the same dtype
    image_dtype = image.dtype
    # Default window (y1, x1, y2, x2) and default scale == 1.
    h, w = image.shape[:2]
    window = (0, 0, h, w)
    scale = 1
    padding = [(0, 0), (0, 0), (0, 0)]
    crop = None

上面这段就是参数声明,image_type一般情况下是uint8,h,w是图像的高宽,window这里就是初始化,其他就不说了。

    # Scale?
    if min_dim:
        # Scale up but not down
        scale = max(1, min_dim / min(h, w))
    if min_scale and scale < min_scale:
        scale = min_scale

    # Does it exceed max dim?
    if max_dim and mode == "square":
        image_max = max(h, w)
        if round(image_max * scale) > max_dim:
            scale = max_dim / image_max

    # Resize image using bilinear interpolation
    if scale != 1:
        image = resize(image, (round(h * scale), round(w * scale)),
                       preserve_range=True)

min_dim为800,满足条件进入。假设此时的h=494,w=640。scale=800/494=1.6194;由于min_scale为0,该判断被跳过,进入到下一段语句。然后if max_dim and mode == "square"这个判断中,取得原图h和w中较大的那个,即640*scale=1036.4,超过了max_dim,重新调整scale,使得放缩后长边尺寸小于max_dim。

随后进行图像的放缩,这个resize虽然被重写,但其实就是放缩。

    if mode == "square":
        # Get new height and width
        h, w = image.shape[:2]
        top_pad = (max_dim - h) // 2
        bottom_pad = max_dim - h - top_pad
        left_pad = (max_dim - w) // 2
        right_pad = max_dim - w - left_pad
        padding = [(top_pad, bottom_pad), (left_pad, right_pad), (0, 0)]
        image = np.pad(image, padding, mode='constant', constant_values=0)
        window = (top_pad, left_pad, h + top_pad, w + left_pad)

由于是'square'方式,所以就只介绍该方式,其他方式暂不介绍。图像放缩后,网络期望的输入是1024*1024,也就是max_dim*max_dim,所以需要进行填补,填补完后,记录放缩后的图像在填补后的图像中的位置坐标,也就是window。

    return image.astype(image_dtype), window, scale, padding, crop

最后返回

image:放缩+填补的图像,其形状为(1024,1024,3)

window:放缩后的图像在填补后的图像中的位置坐标,其形状为

scale:是一个列表,当中包含了4个坐标,为别为左上角的(y1,x1,y2,x2)

padding:填补的大小,顺序如上面的代码所示

crop:None,没有用到crop

嗯,还在load_image_gt中 

    mask = utils.resize_mask(mask, scale, padding, crop)

将mask也按照image的处理方式进行放缩和填补。

resize_mask具体函数在mrcnn/utils.py中,

def resize_mask(mask, scale, padding, crop=None):
    """Resizes a mask using the given scale and padding.
    Typically, you get the scale and padding from resize_image() to
    ensure both, the image and the mask, are resized consistently.

    scale: mask scaling factor
    padding: Padding to add to the mask in the form
            [(top, bottom), (left, right), (0, 0)]
    """
    # Suppress warning from scipy 0.13.0, the output shape of zoom() is
    # calculated with round() instead of int()
    with warnings.catch_warnings():
        warnings.simplefilter("ignore")
        mask = scipy.ndimage.zoom(mask, zoom=[scale, scale, 1], order=0)
    if crop is not None:
        y, x, h, w = crop
        mask = mask[y:y + h, x:x + w]
    else:
        mask = np.pad(mask, padding, mode='constant', constant_values=0)
    return mask

 

 

还在load_image_gt中,别急

    if augmentation:
        import imgaug

        # Augmenters that are safe to apply to masks
        # Some, such as Affine, have settings that make them unsafe, so always
        # test your augmentation on masks
        MASK_AUGMENTERS = ["Sequential", "SomeOf", "OneOf", "Sometimes",
                           "Fliplr", "Flipud", "CropAndPad",
                           "Affine", "PiecewiseAffine"]

        def hook(images, augmenter, parents, default):
            """Determines which augmenters to apply to masks."""
            return augmenter.__class__.__name__ in MASK_AUGMENTERS

        # Store shapes before augmentation to compare
        image_shape = image.shape
        mask_shape = mask.shape
        # Make augmenters deterministic to apply similarly to images and masks
        det = augmentation.to_deterministic()
        image = det.augment_image(image)
        # Change mask to np.uint8 because imgaug doesn't support np.bool
        mask = det.augment_image(mask.astype(np.uint8),
                                 hooks=imgaug.HooksImages(activator=hook))
        # Verify that shapes didn't change
        assert image.shape == image_shape, "Augmentation shouldn't change image size"
        assert mask.shape == mask_shape, "Augmentation shouldn't change mask size"
        # Change mask back to bool
        mask = mask.astype(np.bool)

使用了imgaug数据增强方法(我个人感觉也贼好用)。

 

还在load_image_gt中,就放弃吧,出不来了。

    _idx = np.sum(mask, axis=(0, 1)) > 0
    mask = mask[:, :, _idx]
    class_ids = class_ids[_idx]
    # Bounding boxes. Note that some boxes might be all zeros
    # if the corresponding mask got cropped out.
    # bbox: [num_instances, (y1, x1, y2, x2)]
    bbox = utils.extract_bboxes(mask)

这里我们要了解这个mask的形状为(1024,1024,一张图片的实例数(假设等于26)),np.sum就是在0和1维上求和,其实就是在计算在一个实例mask上有没有1的标记,如果有那就是合格的mask,否则就不合格需要扔掉。

这里的utils.extract_bboxes其实就是为了取出标签的bbox,通过获取掩膜mask的最小外接矩形框。具体代码如下

def extract_bboxes(mask):
    """Compute bounding boxes from masks.
    mask: [height, width, num_instances]. Mask pixels are either 1 or 0.

    Returns: bbox array [num_instances, (y1, x1, y2, x2)].
    """
    boxes = np.zeros([mask.shape[-1], 4], dtype=np.int32)
    for i in range(mask.shape[-1]):
        m = mask[:, :, i]
        # Bounding box.
        horizontal_indicies = np.where(np.any(m, axis=0))[0]
        vertical_indicies = np.where(np.any(m, axis=1))[0]
        if horizontal_indicies.shape[0]:
            x1, x2 = horizontal_indicies[[0, -1]]
            y1, y2 = vertical_indicies[[0, -1]]
            # x2 and y2 should not be part of the box. Increment by 1.
            x2 += 1
            y2 += 1
        else:
            # No mask for this instance. Might happen due to
            # resizing or cropping. Set bbox to zeros
            x1, x2, y1, y2 = 0, 0, 0, 0
        boxes[i] = np.array([y1, x1, y2, x2])
    return boxes.astype(np.int32)

那么最终返回的boxes的形状就为(26,4)。

 

嗯,还是load_image_gt中

    # Active classes
    # Different datasets have different classes, so track the
    # classes supported in the dataset of this image.
    active_class_ids = np.zeros([dataset.num_classes], dtype=np.int32)
    source_class_ids = dataset.source_class_ids[dataset.image_info[image_id]["source"]]
    active_class_ids[source_class_ids] = 1

这里其实就是看看有哪些类被具体使用了,只有这些类才会被参与训练。这里就是81类,也就是0-80。

active_class_ids的具体值为[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]

 

嗯,还是load_image_gt中

    if use_mini_mask:
        mask = utils.minimize_mask(bbox, mask, config.MINI_MASK_SHAPE)

minimize_mask在mrcnn/utils.py中 

def minimize_mask(bbox, mask, mini_shape):

输入参数

bbox:以上面的26个实例为例子,那么其形状为(26,4)

mask:这里的形状为(1024,1024,26),其中取值只能是0和1

mini_shape:(56,56)

    mini_mask = np.zeros(mini_shape + (mask.shape[-1],), dtype=bool)
    for i in range(mask.shape[-1]):
        # Pick slice and cast to bool in case load_mask() returned wrong dtype
        m = mask[:, :, i].astype(bool)
        y1, x1, y2, x2 = bbox[i][:4]
        m = m[y1:y2, x1:x2]
        if m.size == 0:
            raise Exception("Invalid bounding box with area of zero")
        # Resize with bilinear interpolation
        m = resize(m, mini_shape)
        mini_mask[:, :, i] = np.around(m).astype(np.bool)
    return mini_mask

这个放缩的步骤是这样:假设mask的形状为(1024,1024,26),那么我选取其中一个实例标签,形状为(1024,1024),通过bbox找到该实例标签中为1的区域,将这块区域进行放缩。这样循环26次,就得到26个(56,56)新实例标签,所以mini_mask的形状为(56,56,26)。

 

还在load_image_gt中

    # Image meta data
    image_meta = compose_image_meta(image_id, original_shape, image.shape,
                                    window, scale, active_class_ids)

这一步是为了记录对原图做了哪些操作,这在测试的时候将结果返回到原图上起到了至关重要的作用。具体的image_meta的形状为(1,3,3,4,1,81)。

 

好嘛,load_image_gt终于结束了

    return image, image_meta, class_ids, bbox, mask

返回的变量为

image:即将输入网络的图像,形状为(1024,1024,26);

image_meta:该图像从原始输入图像变换到(1024,1024,26),具体是如何变化,该变量就记录关键的操作步骤;

class_ids:26个实例到底属于哪个类别,这里它形状为(26);

bbox:实例具体在哪个位置,形状为(26,4);

mask:由于使用了minimask,所以其形状为(56,56,26)。

 

后面继续,我的天这也太多了。

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值