FCOS 代码（三） demo过程的整个流程

匿名的魔术师

已于 2022-05-30 10:26:07 修改

阅读量919

点赞数 2

文章标签：深度学习 pytorch 人工智能

于 2022-05-25 20:55:26 首次发布

本文链接：https://blog.csdn.net/allrubots/article/details/124944956

版权

FCOS代码（一） (demo过程)骨干网络结构详解，mask-rcnn ResNet+fpn

FCOS代码（二）(demo过程) RPN网络结构

首先是fcos_demo.py，这里最主要的步骤是下面这个，

coco_demo = COCODemo(
    cfg,
    confidence_thresholds_for_classes=thresholds_for_classes,
    min_image_size=args.min_image_size
)

它里面包含了整个demo的过程，也代表了FCOS这个方法的推理过程，即怎么去实现的该方法。

demo_im_names = os.listdir(args.images_dir)  # 放置测试图片的路径

    # prepare object that handles inference plus adds predictions on top of image
    coco_demo = COCODemo(
        cfg,
        confidence_thresholds_for_classes=thresholds_for_classes,
        min_image_size=args.min_image_size
    )  # all cfg from imported
    i = 0
    for im_name in demo_im_names:
        img = cv2.imread(os.path.join(args.images_dir, im_name))  # 读入图片
        if img is None:
            continue
        start_time = time.time()
        composite = coco_demo.run_on_opencv_image(img)  # 返回最终的结果
        print("{}\tinference time: {:.2f}s".format(im_name, time.time() - start_time))

可以看到这里用了composite = coco_demo.run_on_opencv_image(img)，从而转到preditector.py中，其定义如下所示：

1） run_on_opencv_image

    def run_on_opencv_image(self, image):
        """
        Arguments:
            image (np.ndarray): an image as returned by OpenCV

        Returns:
            prediction (BoxList): the detected objects. Additional information
                of the detection properties can be found in the fields of
                the BoxList via `prediction.fields()`
        """
        predictions = self.compute_prediction(image)  # 返回预测的结果，已经反射回原图像
        top_predictions = self.select_top_predictions(predictions)  # 返回筛选过后满足分类得分大于阈值的预测结果

        result = image.copy()  # 原img
        if self.show_mask_heatmaps:  # False，热力图操作
            return self.create_mask_montage(result, top_predictions)
        result = self.overlay_boxes(result, top_predictions)  # 在原图像上画预测的矩形框
        if self.cfg.MODEL.MASK_ON:  # False,画mask
            result = self.overlay_mask(result, top_predictions)
        if self.cfg.MODEL.KEYPOINT_ON:  # False 关键点
            result = self.overlay_keypoints(result, top_predictions)
        result = self.overlay_class_names(result, top_predictions)

        return result

这里有一句 predictions = self.compute_prediction(image)，其定义如下所示接下来细节描述一下该函数。

(I) compute_prediction

    def compute_prediction(self, original_image):
        """
        Arguments:
            original_image (np.ndarray): an image as returned by OpenCV

        Returns:
            prediction (BoxList): the detected objects. Additional information
                of the detection properties can be found in the fields of
                the BoxList via `prediction.fields()`
        """
        # apply pre-processing to image
        image = self.transforms(original_image)  # 将输入图片宽resize成800，并且将像素值标准化为正态分布
        # convert to an ImageList, padded so that it is divisible by
        # cfg.DATALOADER.SIZE_DIVISIBILITY
        image_list = to_image_list(image, self.cfg.DATALOADER.SIZE_DIVISIBILITY)  # SIZE_DIVISIBILITY-->32  返回的是ImageList类，里面包含image_sizes和tensors
        image_list = image_list.to(self.device)  # to cuda
        # compute predictions
        with torch.no_grad():
            predictions = self.model(image_list)  # 举例 BoxList:100
        predictions = [o.to(self.cpu_device) for o in predictions]  # to cpu

        # always single image is passed at a time
        prediction = predictions[0]

        # reshape prediction (a BoxList) into the original image size
        height, width = original_image.shape[:-1]  # 原图像的长和宽
        prediction = prediction.resize((width, height))  # 返回经过resize后的预测结果，将预测的bbox坐标映射到原图像上

        if prediction.has_field("mask"):  # False
            # if we have masks, paste the masks in the right position
            # in the image, as defined by the bounding boxes
            masks = prediction.get_field("mask")
            # always single image is passed at a time
            masks = self.masker([masks], [prediction])[0]
            prediction.add_field("mask", masks)
        return prediction

i. image = self.transforms(original_image)

定义如下所示

    def build_transform(self):
        """
        Creates a basic transformation that was used to train the models
        """
        cfg = self.cfg

        # we are loading images with OpenCV, so we don't need to convert them
        # to BGR, they are already! So all we need to do is to normalize
        # by 255 if we want to convert to BGR255 format, or flip the channels
        # if we want it to be in RGB in [0-1] range.
        if cfg.INPUT.TO_BGR255:  # True
            to_bgr_transform = T.Lambda(lambda x: x * 255)  # 将x中的元素都乘255，在下面的compose中会用到
        else:
            to_bgr_transform = T.Lambda(lambda x: x[[2, 1, 0]])  

        normalize_transform = T.Normalize(
            mean=cfg.INPUT.PIXEL_MEAN, std=cfg.INPUT.PIXEL_STD
        )  #  标准化均值为0，方差为1的正态分布，(x-mean)/std 调整图片的3通道分布区间。cfg.INPUT.PIXEL_MEAN={list:3}[102.9801,115.9465,122.7717], cfg.INPUT.PIXEL_STD={list:3}[1.0,1.0,1.0]。

        transform = T.Compose(
            [
                T.ToPILImage(),  # 转换图像为PIL格式，为下一步铺路
                T.Resize(self.min_image_size),  # 将短边resize到800
                T.ToTensor(),  # 转到tensor的形式，并且将像素值除以255归一化到[0,1]之间
                to_bgr_transform,  # 将像素乘以255，
                normalize_transform,  # 进行标准化
            ]
        )
        return transform

------ to_bgr_transform = T.Lambda(lambda x: x * 255)

这个T是导入的工具包

from torchvision import transforms as T

T.Lambda函数（参考pytorch transforms.Lambda的使用）定义自己的策略然后封装它（封装的步骤在下面的transform = T.Compose步骤中），这里定义的作用是使x中的每个元素乘以255。

------T.Normalize(mean=cfg.INPUT.PIXEL_MEAN, std=cfg.INPUT.PIXEL_STD)

T.Normalize(参考PyTorch数据归一化处理：transforms.Normalize及计算图像数据集的均值和方差)对图像进行标准化（均值变为0，标准差变为1，正态分布），公式 (x-mean)/std 。这个均值和方差都是事先计算好的（根据参考，好像是在数据集上预先抽样计算得到的），而且这是相当于通过这个上述式子调整数据的分布，使其满足均值为0，标准差为1，这样可以加快收敛速度。

标准化和归一化参考什么是归一化，它与标准化的区别是什么？。可以更好理解。

------T.ToPILImage() 和 T.ToTensor()

参考（torchvision.transforms.ToTensor()与torchvision.transforms.ToPILImage()详解）。

根据参考，ToPILImage() 会根据输入其中图像的格式其输出会有所区别，但最终输出都是PIL格式。当输入的图像是numpy（比如cv2,imread读入时），此时图像的通道为（h,w,c）,所以不需要再转换通道的顺序，并且它的像素值在[0, 255]之间的话只转换颜色通道（c中的）的数序（BGR---->RGB，||这个暂时不确定，有点迷||。参考【PyTorch】torchvision.transforms.ToPILImage 与图像分辨率）；当其输入是Tensor时，图像的Tensor的通道形式为（c, h, w），而且其像素值是浮点型，若在此之前使用了Totensor将像素值除以255变为[0, 1]区间的话，ToPILImage会改变通道形式为（h,w,c），并且将像素值乘以255。

（也可以结合参考Pytorch之浅入torchvision.transforms.ToTensor与ToPILImage）

------ T.Resize

调整 PILImage对象的尺寸，不能是用io.imread或者cv2.imread读取的图片。（参考Pytorch transforms.Resize()的简单用法）所以在此之前用了ToPILImage() 方法转成对应的格式。

ii. image_list = to_image_list(image, self.cfg.DATALOADER.SIZE_DIVISIBILITY)

其代码如下所示， max_size = tuple(max(s) for s in zip(*[img.shape for img in tensors]))这一步,对于只输入一张图片无变化，如果传入的是batchsize（训练阶段）就是统一所有图片的形状，通道数都是3，宽都是800，则只统一长度为batchsize中img最大的长度。然后再对max_size进行处理使img的尺寸都是stride步长的整数倍（math.ceil为向上取整函数，参考Math.ceil()）。

def to_image_list(tensors, size_divisible=0):  # 32
    """
    tensors can be an ImageList, a torch.Tensor or
    an iterable of Tensors. It can't be a numpy array.
    When tensors is an iterable of Tensors, it pads
    the Tensors with zeros so that they have the same
    shape
    """
    if isinstance(tensors, torch.Tensor) and size_divisible > 0:  # True
        tensors = [tensors]  # 转换成一个列表形式

    if isinstance(tensors, ImageList):
        return tensors
    elif isinstance(tensors, torch.Tensor):
        # single tensor shape can be inferred
        if tensors.dim() == 3:
            tensors = tensors[None]
        assert tensors.dim() == 4
        image_sizes = [tensor.shape[-2:] for tensor in tensors]
        return ImageList(tensors, image_sizes)
    elif isinstance(tensors, (tuple, list)):  # True，因为之前那步已经把tensors放进一个列表中
        max_size = tuple(max(s) for s in zip(*[img.shape for img in tensors]))  # 使每个batch的img形状长一样，宽之前都做了处理为800，取它们之中最大的长并统一，当demo时是只传入一张图片，所以没有发生变化。

        # TODO Ideally, just remove this and let me model handle arbitrary
        # input sizs
        if size_divisible > 0:
            import math

            stride = size_divisible  # 32
            max_size = list(max_size)  # 举例：list:3 [3, 800, 1120]
            max_size[1] = int(math.ceil(max_size[1] / stride) * stride)  # 先除以步长向上取整然后再乘以步长在转整型，确保图片的尺寸是步长的整数倍，下同
            max_size[2] = int(math.ceil(max_size[2] / stride) * stride)
            max_size = tuple(max_size)  # 转成元组， 举例 tuple:3 (3,800,1120)

        batch_shape = (len(tensors),) + max_size  # 将batchsize的大小加入第一维 举例 tuple:4 (1,3,800,1120)
        batched_imgs = tensors[0].new(*batch_shape).zero_()  # 与imgshape一样，数值全为0的tensor  Tensor：(1,3,800,1120)
        for img, pad_img in zip(tensors, batched_imgs):
            pad_img[: img.shape[0], : img.shape[1], : img.shape[2]].copy_(img)

        image_sizes = [im.shape[-2:] for im in tensors]  # 举例 list:1 [torch.Size([800,1120])]

        return ImageList(batched_imgs, image_sizes)  # 返回ImageList类
    else:
        raise TypeError("Unsupported type for to_image_list: {}".format(type(tensors)))

iii. with torch.no_grad():
            predictions = self.model(image_list)

输入图片，得道网络的输出也测结果，参考FCOS代码（一） (demo过程)骨干网络结构详解，mask-rcnn ResNet+fpn以及

FCOS代码（二）(demo过程) RPN网络结构

iv.eight, width = original_image.shape[:-1]

得到原图像的长和宽，之前的操作都是把输入的原图像进行了transform，宽resize成了800，长也进行了改变。

v. prediction = prediction.resize((width, height))

prediction 是之前返回的 BoxList类实例结果，其中的resize函数如下所示，把预测的结果bbox的坐标反射回原图像上。

 def resize(self, size, *args, **kwargs):
        """
        Returns a resized copy of this bounding box

        :param size: The requested size in pixels, as a 2-tuple:
            (width, height).
        """

        ratios = tuple(float(s) / float(s_orig) for s, s_orig in zip(size, self.size))  # 原img 长和宽 / transform后img的 长和宽  举例：ratios={tuple:2} (0.57,0.5714), size={tuple:2} (640,457), self.size={tuple:2} (1120,800)
        if ratios[0] == ratios[1]:  # 如果比率都相等
            ratio = ratios[0]  # 取其中一个即可
            scaled_box = self.bbox * ratio  # 预测的坐标乘以比率
            bbox = BoxList(scaled_box, size, mode=self.mode)  # 折射回原img的BoxList类
            # bbox._copy_extra_fields(self)
            for k, v in self.extra_fields.items():  # 标签和类别得分
                if not isinstance(v, torch.Tensor):
                    v = v.resize(size, *args, **kwargs)
                bbox.add_field(k, v)  # 加入bbox实例的字典
            return bbox

        ratio_width, ratio_height = ratios  # 长和宽比
        xmin, ymin, xmax, ymax = self._split_into_xyxy()  # 坐标
        scaled_xmin = xmin * ratio_width  # 乘以比率 ， 下同
        scaled_xmax = xmax * ratio_width
        scaled_ymin = ymin * ratio_height
        scaled_ymax = ymax * ratio_height
        scaled_box = torch.cat(
            (scaled_xmin, scaled_ymin, scaled_xmax, scaled_ymax), dim=-1
        )  # 连接在一起
        bbox = BoxList(scaled_box, size, mode="xyxy")  # 类实例
        # bbox._copy_extra_fields(self)
        for k, v in self.extra_fields.items():
            if not isinstance(v, torch.Tensor):
                v = v.resize(size, *args, **kwargs)
            bbox.add_field(k, v)  # 标签和类别得分 添加字典

        return bbox.convert(self.mode)  # 直接返回该实例

至此，compute_prediction已经结束。

(II) top_predictions = self.select_top_predictions(predictions)

代码如下所示

    def select_top_predictions(self, predictions):
        """
        Select only predictions which have a `score` > self.confidence_threshold,
        and returns the predictions in descending order of score

        Arguments:
            predictions (BoxList): the result of the computation by the model.
                It should contain the field `scores`.

        Returns:
            prediction (BoxList): the detected objects. Additional information
                of the detection properties can be found in the fields of
                the BoxList via `prediction.fields()`
        """
        scores = predictions.get_field("scores")  # 拿出分类得分
        labels = predictions.get_field("labels")  # 拿出对应的标签
        thresholds = self.confidence_thresholds_for_classes[(labels - 1).long()]  # 拿出各个标签的阈值，举例 Tensor:(100, )
        keep = torch.nonzero(scores > thresholds).squeeze(1)  # 得到满足大于标签类别的顺序索引，举例 tensor：（4，） =([2,20,38,99])
        predictions = predictions[keep]  # 最终的预测结果 举例 BoxList:4
        scores = predictions.get_field("scores")  # 分类得分，这个是更新了的predictions
        _, idx = scores.sort(0, descending=True)  # 返回由大到小的得分排列顺序，以及之前的对应索引顺序
        return predictions[idx]  # 返回最终的预测结果，BoxList类实例，举例 BoxList:4

i. thresholds = self.confidence_thresholds_for_classes[(labels - 1).long()]

torch.long函数能把对应的tensor值转换成index索引形式，参考pytorc torch.uint8与torch.long/ torch. float。这步就是对应预测的顺序依次拿出对应的类别标签的阈值。self.confidence_thresholds_for_classes在focs_demo.py里定义的。

（III） result = self.overlay_boxes(result, top_predictions)

其代码如下所示，功能为在原图像上画出预测的bbox边界框。

    def overlay_boxes(self, image, predictions):  # 传入的是原图片，和预测的结果
        """
        Adds the predicted boxes on top of the image

        Arguments:
            image (np.ndarray): an image as returned by OpenCV
            predictions (BoxList): the result of the computation by the model.
                It should contain the field `labels`.
        """
        labels = predictions.get_field("labels")  # 标签
        boxes = predictions.bbox  # 边界框

        colors = self.compute_colors_for_labels(labels).tolist()  # 返回不同的color数值转列表，每一个为3通道的rgb数值，举例 {list:4}=[[1,127,31],[33,111,3],[35,110,65],[40,235,220]]

        for box, color in zip(boxes, colors):
            box = box.to(torch.int64)  # 预测的bbox转成整型
            top_left, bottom_right = box[:2].tolist(), box[2:].tolist()  # 左上角顶点，右下角顶点
            image = cv2.rectangle(
                image, tuple(top_left), tuple(bottom_right), tuple(color), 2
            )  # 在原图像上画矩形框

        return image

i. colors = self.compute_colors_for_labels(labels).tolist()

代码定义如下，其中转numpy.uint8的格式参考np.astype uint8之后发生了什么，小数部分舍去，整数部分保留

    def compute_colors_for_labels(self, labels):
        """
        Simple function that adds fixed colors depending on the class
        """
        colors = labels[:, None] * self.palette  # 结果举例 Tensor：（4，3）种类标签类别--索引顺序，乘上eslf.palette--Tensor:(3,)--[33554431,32767,2097151]
        colors = (colors % 255).numpy().astype("uint8")  # 除以255取余，转numpy，uint8形式
        return colors  # 返回不同的颜色数值，范围为0～255

(IV) result = self.overlay_class_names(result, top_predictions)

代码如下所示，为每个矩形框标识上得分和种类名称

    def overlay_class_names(self, image, predictions):
        """
        Adds detected class names and scores in the positions defined by the
        top-left corner of the predicted bounding box

        Arguments:
            image (np.ndarray): an image as returned by OpenCV
            predictions (BoxList): the result of the computation by the model.
                It should contain the field `scores` and `labels`.
        """
        scores = predictions.get_field("scores").tolist()  # 分类得分列表
        labels = predictions.get_field("labels").tolist()  # 标签列表
        labels = [self.CATEGORIES[i] for i in labels]  # 标签所代表的种类名称列表
        boxes = predictions.bbox  # 预测的bbox

        template = "{}: {:.2f}"
        for box, score, label in zip(boxes, scores, labels):
            x, y = box[:2]  # 左上角顶点
            s = template.format(label, score)  # 种类名称，得分
            cv2.putText(
                image, s, (int(x), int(y)
                           ), cv2.FONT_HERSHEY_SIMPLEX, .5, (255, 255, 255), 1
            )  # 写上文本

        return image

至此， run_on_opencv_image函数结束，返回最终的预测结果，即画了矩形框和标识了类别和得分的原图像。

至此，整个demo过程结束！

匿名的魔术师

关注

2
点赞
踩
5

收藏

觉得还不错? 一键收藏
打赏
0
评论
FCOS 代码（三） demo过程的整个流程

首先是fcos_demo.py，这里最主要的步骤是下面这个，coco_demo = COCODemo( cfg, confidence_thresholds_for_classes=thresholds_for_classes, min_image_size=args.min_image_size)它里面包含了整个demo的过程，也代表了FCOS这个方法的推理过程，即怎么去实现的该方法。demo_im_names = os.listdir(args.images_d
复制链接

扫一扫