FCOS 代码 (三) demo过程的整个流程

FCOS代码(一) (demo过程)骨干网络结构详解,mask-rcnn ResNet+fpn

FCOS代码(二)(demo过程) RPN网络结构

首先是fcos_demo.py,这里最主要的步骤是下面这个,

coco_demo = COCODemo(
    cfg,
    confidence_thresholds_for_classes=thresholds_for_classes,
    min_image_size=args.min_image_size
)

它里面包含了整个demo的过程,也代表了FCOS这个方法的推理过程,即怎么去实现的该方法。

demo_im_names = os.listdir(args.images_dir)  # 放置测试图片的路径

    # prepare object that handles inference plus adds predictions on top of image
    coco_demo = COCODemo(
        cfg,
        confidence_thresholds_for_classes=thresholds_for_classes,
        min_image_size=args.min_image_size
    )  # all cfg from imported
    i = 0
    for im_name in demo_im_names:
        img = cv2.imread(os.path.join(args.images_dir, im_name))  # 读入图片
        if img is None:
            continue
        start_time = time.time()
        composite = coco_demo.run_on_opencv_image(img)  # 返回最终的结果
        print("{}\tinference time: {:.2f}s".format(im_name, time.time() - start_time))

可以看到这里用了composite = coco_demo.run_on_opencv_image(img),从而转到preditector.py中,其定义如下所示:

1) run_on_opencv_image

    def run_on_opencv_image(self, image):
        """
        Arguments:
            image (np.ndarray): an image as returned by OpenCV

        Returns:
            prediction (BoxList): the detected objects. Additional information
                of the detection properties can be found in the fields of
                the BoxList via `prediction.fields()`
        """
        predictions = self.compute_prediction(image)  # 返回预测的结果,已经反射回原图像
        top_predictions = self.select_top_predictions(predictions)  # 返回筛选过后满足分类得分大于阈值的预测结果

        result = image.copy()  # 原img
        if self.show_mask_heatmaps:  # False,热力图操作
            return self.create_mask_montage(result, top_predictions)
        result = self.overlay_boxes(result, top_predictions)  # 在原图像上画预测的矩形框
        if self.cfg.MODEL.MASK_ON:  # False,画mask
            result = self.overlay_mask(result, top_predictions)
        if self.cfg.MODEL.KEYPOINT_ON:  # False 关键点
            result = self.overlay_keypoints(result, top_predictions)
        result = self.overlay_class_names(result, top_predictions)

        return result

这里有一句 predictions = self.compute_prediction(image),其定义如下所示接下来细节描述一下该函数。

(I) compute_prediction

    def compute_prediction(self, original_image):
        """
        Arguments:
            original_image (np.ndarray): an image as returned by OpenCV

        Returns:
            prediction (BoxList): the detected objects. Additional information
                of the detection properties can be found in the fields of
                the BoxList via `prediction.fields()`
        """
        # apply pre-processing to image
        image = self.transforms(original_image)  # 将输入图片宽resize成800,并且将像素值标准化为正态分布
        # convert to an ImageList, padded so that it is divisible by
        # cfg.DATALOADER.SIZE_DIVISIBILITY
        image_list = to_image_list(image, self.cfg.DATALOADER.SIZE_DIVISIBILITY)  # SIZE_DIVISIBILITY-->32  返回的是ImageList类,里面包含image_sizes和tensors
        image_list = image_list.to(self.device)  # to cuda
        # compute predictions
        with torch.no_grad():
            predictions = self.model(image_list)  # 举例 BoxList:100
        predictions = [o.to(self.cpu_device) for o in predictions]  # to cpu

        # always single image is passed at a time
        prediction = predictions[0]

        # reshape prediction (a BoxList) into the original image size
        height, width = original_image.shape[:-1]  # 原图像的长和宽
        prediction = prediction.resize((width, height))  # 返回经过resize后的预测结果,将预测的bbox坐标映射到原图像上

        if prediction.has_field("mask"):  # False
            # if we have masks, paste the masks in the right position
            # in the image, as defined by the bounding boxes
            masks = prediction.get_field("mask")
            # always single image is passed at a time
            masks = self.masker([masks], [prediction])[0]
            prediction.add_field("mask", masks)
        return prediction

i.  image = self.transforms(original_image)

定义如下所示

    def build_transform(self):
        """
        Creates a basic transformation that was used to train the models
        """
        cfg = self.cfg

        # we are loading images with OpenCV, so we don't need to convert them
        # to BGR, they are already! So all we need to do is to normalize
        # by 255 if we want to convert to BGR255 format, or flip the channels
        # if we want it to be in RGB in [0-1] range.
        if cfg.INPUT.TO_BGR255:  # True
            to_bgr_transform = T.Lambda(lambda x: x * 255)  # 将x中的元素都乘255,在下面的compose中会用到
        else:
            to_bgr_transform = T.Lambda(lambda x: x[[2, 1, 0]])  

        normalize_transform = T.Normalize(
            mean=cfg.INPUT.PIXEL_MEAN, std=cfg.INPUT.PIXEL_STD
        )  #  标准化均值为0,方差为1的正态分布,(x-mean)/std 调整图片的3通道分布区间。cfg.INPUT.PIXEL_MEAN={list:3}[102.9801,115.9465,122.7717], cfg.INPUT.PIXEL_STD={list:3}[1.0,1.0,1.0]。

        transform = T.Compose(
            [
                T.ToPILImage(),  # 转换图像为PIL格式,为下一步铺路
                T.Resize(self.min_image_size),  # 将短边resize到800
                T.ToTensor(),  # 转到tensor的形式,并且将像素值除以255归一化到[0,1]之间
                to_bgr_transform,  # 将像素乘以255,
                normalize_transform,  # 进行标准化
            ]
        )
        return transform

------ to_bgr_transform = T.Lambda(lambda x: x * 255)

这个T是导入的工具包

from torchvision import transforms as T

T.Lambda函数(参考pytorch transforms.Lambda的使用)定义自己的策略然后封装它(封装的步骤在下面的transform = T.Compose步骤中),这里定义的作用是使x中的每个元素乘以255。

------T.Normalize(mean=cfg.INPUT.PIXEL_MEAN, std=cfg.INPUT.PIXEL_STD)

T.Normalize(参考PyTorch数据归一化处理:transforms.Normalize及计算图像数据集的均值和方差)对图像进行标准化(均值变为0,标准差变为1正态分布), 公式 (x-mean)/std 。这个均值和方差都是事先计算好的(根据参考,好像是在数据集上预先抽样计算得到的),而且这是相当于通过这个上述式子调整数据的分布,使其满足均值为0,标准差为1,这样可以加快收敛速度。

标准化和归一化参考什么是归一化,它与标准化的区别是什么?。可以更好理解。

------T.ToPILImage()  和  T.ToTensor()

参考(torchvision.transforms.ToTensor()与torchvision.transforms.ToPILImage()详解)。

根据参考,ToPILImage() 会根据输入其中图像的格式其输出会有所区别,但最终输出都是PIL格式。当输入的图像是numpy(比如cv2,imread读入时),此时图像的通道为(h,w,c),所以不需要再转换通道的顺序,并且它的像素值在[0, 255]之间的话只转换颜色通道(c中的)的数序(BGR---->RGB,||这个暂时不确定,有点迷||。参考【PyTorch】torchvision.transforms.ToPILImage 与图像分辨率);当其输入是Tensor时,图像的Tensor的通道形式为(c, h, w),而且其像素值是浮点型,若在此之前使用了Totensor将像素值除以255变为[0, 1]区间的话,ToPILImage会改变通道形式为(h,w,c),并且将像素值乘以255。

(也可以结合参考Pytorch之浅入torchvision.transforms.ToTensor与ToPILImage

------ T.Resize

调整 PILImage对象的尺寸,不能是用io.imread或者cv2.imread读取的图片。(参考Pytorch transforms.Resize()的简单用法)所以在此之前用了ToPILImage() 方法转成对应的格式。

ii. image_list = to_image_list(image, self.cfg.DATALOADER.SIZE_DIVISIBILITY)

其代码如下所示, max_size = tuple(max(s) for s in zip(*[img.shape for img in tensors]))这一步,对于只输入一张图片无变化,如果传入的是batchsize(训练阶段)就是统一所有图片的形状,通道数都是3,宽都是800,则只统一长度为batchsize中img最大的长度。然后再对max_size进行处理使img的尺寸都是stride步长的整数倍(math.ceil为向上取整函数,参考Math.ceil())。

def to_image_list(tensors, size_divisible=0):  # 32
    """
    tensors can be an ImageList, a torch.Tensor or
    an iterable of Tensors. It can't be a numpy array.
    When tensors is an iterable of Tensors, it pads
    the Tensors with zeros so that they have the same
    shape
    """
    if isinstance(tensors, torch.Tensor) and size_divisible > 0:  # True
        tensors = [tensors]  # 转换成一个列表形式

    if isinstance(tensors, ImageList):
        return tensors
    elif isinstance(tensors, torch.Tensor):
        # single tensor shape can be inferred
        if tensors.dim() == 3:
            tensors = tensors[None]
        assert tensors.dim() == 4
        image_sizes = [tensor.shape[-2:] for tensor in tensors]
        return ImageList(tensors, image_sizes)
    elif isinstance(tensors, (tuple, list)):  # True,因为之前那步已经把tensors放进一个列表中
        max_size = tuple(max(s) for s in zip(*[img.shape for img in tensors]))  # 使每个batch的img形状长一样,宽之前都做了处理为800,取它们之中最大的长并统一,当demo时是只传入一张图片,所以没有发生变化。

        # TODO Ideally, just remove this and let me model handle arbitrary
        # input sizs
        if size_divisible > 0:
            import math

            stride = size_divisible  # 32
            max_size = list(max_size)  # 举例:list:3 [3, 800, 1120]
            max_size[1] = int(math.ceil(max_size[1] / stride) * stride)  # 先除以步长向上取整然后再乘以步长在转整型,确保图片的尺寸是步长的整数倍,下同
            max_size[2] = int(math.ceil(max_size[2] / stride) * stride)
            max_size = tuple(max_size)  # 转成元组, 举例 tuple:3 (3,800,1120)

        batch_shape = (len(tensors),) + max_size  # 将batchsize的大小加入第一维 举例 tuple:4 (1,3,800,1120)
        batched_imgs = tensors[0].new(*batch_shape).zero_()  # 与imgshape一样,数值全为0的tensor  Tensor:(1,3,800,1120)
        for img, pad_img in zip(tensors, batched_imgs):
            pad_img[: img.shape[0], : img.shape[1], : img.shape[2]].copy_(img)

        image_sizes = [im.shape[-2:] for im in tensors]  # 举例 list:1 [torch.Size([800,1120])]

        return ImageList(batched_imgs, image_sizes)  # 返回ImageList类
    else:
        raise TypeError("Unsupported type for to_image_list: {}".format(type(tensors)))
iii. with torch.no_grad():
            predictions = self.model(image_list) 

输入图片,得道网络的输出也测结果,参考FCOS代码(一) (demo过程)骨干网络结构详解,mask-rcnn ResNet+fpn以及

FCOS代码(二)(demo过程) RPN网络结构

iv.eight, width = original_image.shape[:-1]

得到原图像的长和宽,之前的操作都是把输入的原图像进行了transform,宽resize成了800,长也进行了改变。

v.  prediction = prediction.resize((width, height))

prediction 是 之前返回的 BoxList类实例结果,其中的resize函数如下所示,把预测的结果bbox的坐标反射回原图像上。

 def resize(self, size, *args, **kwargs):
        """
        Returns a resized copy of this bounding box

        :param size: The requested size in pixels, as a 2-tuple:
            (width, height).
        """

        ratios = tuple(float(s) / float(s_orig) for s, s_orig in zip(size, self.size))  # 原img 长和宽 / transform后img的 长和宽  举例:ratios={tuple:2} (0.57,0.5714), size={tuple:2} (640,457), self.size={tuple:2} (1120,800)
        if ratios[0] == ratios[1]:  # 如果比率都相等
            ratio = ratios[0]  # 取其中一个即可
            scaled_box = self.bbox * ratio  # 预测的坐标乘以比率
            bbox = BoxList(scaled_box, size, mode=self.mode)  # 折射回原img的BoxList类
            # bbox._copy_extra_fields(self)
            for k, v in self.extra_fields.items():  # 标签和类别得分
                if not isinstance(v, torch.Tensor):
                    v = v.resize(size, *args, **kwargs)
                bbox.add_field(k, v)  # 加入bbox实例的字典
            return bbox

        ratio_width, ratio_height = ratios  # 长和宽比
        xmin, ymin, xmax, ymax = self._split_into_xyxy()  # 坐标
        scaled_xmin = xmin * ratio_width  # 乘以比率 , 下同
        scaled_xmax = xmax * ratio_width
        scaled_ymin = ymin * ratio_height
        scaled_ymax = ymax * ratio_height
        scaled_box = torch.cat(
            (scaled_xmin, scaled_ymin, scaled_xmax, scaled_ymax), dim=-1
        )  # 连接在一起
        bbox = BoxList(scaled_box, size, mode="xyxy")  # 类实例
        # bbox._copy_extra_fields(self)
        for k, v in self.extra_fields.items():
            if not isinstance(v, torch.Tensor):
                v = v.resize(size, *args, **kwargs)
            bbox.add_field(k, v)  # 标签和类别得分 添加字典

        return bbox.convert(self.mode)  # 直接返回该实例

至此,compute_prediction已经结束。

(II) top_predictions = self.select_top_predictions(predictions)

代码如下所示

    def select_top_predictions(self, predictions):
        """
        Select only predictions which have a `score` > self.confidence_threshold,
        and returns the predictions in descending order of score

        Arguments:
            predictions (BoxList): the result of the computation by the model.
                It should contain the field `scores`.

        Returns:
            prediction (BoxList): the detected objects. Additional information
                of the detection properties can be found in the fields of
                the BoxList via `prediction.fields()`
        """
        scores = predictions.get_field("scores")  # 拿出分类得分
        labels = predictions.get_field("labels")  # 拿出对应的标签
        thresholds = self.confidence_thresholds_for_classes[(labels - 1).long()]  # 拿出各个标签的阈值,举例 Tensor:(100, )
        keep = torch.nonzero(scores > thresholds).squeeze(1)  # 得到满足大于标签类别的顺序索引,举例 tensor:(4,) =([2,20,38,99])
        predictions = predictions[keep]  # 最终的预测结果 举例 BoxList:4
        scores = predictions.get_field("scores")  # 分类得分,这个是更新了的predictions
        _, idx = scores.sort(0, descending=True)  # 返回由大到小的得分排列顺序,以及之前的对应索引顺序
        return predictions[idx]  # 返回最终的预测结果,BoxList类实例,举例 BoxList:4

i. thresholds = self.confidence_thresholds_for_classes[(labels - 1).long()]

torch.long函数能把对应的tensor值转换成index索引形式,参考pytorc torch.uint8与torch.long/ torch. float。这步就是对应预测的顺序依次拿出对应的类别标签的阈值。self.confidence_thresholds_for_classes在focs_demo.py里定义的。

(III) result = self.overlay_boxes(result, top_predictions)

其代码如下所示, 功能为在原图像上画出预测的bbox边界框。

    def overlay_boxes(self, image, predictions):  # 传入的是原图片,和预测的结果
        """
        Adds the predicted boxes on top of the image

        Arguments:
            image (np.ndarray): an image as returned by OpenCV
            predictions (BoxList): the result of the computation by the model.
                It should contain the field `labels`.
        """
        labels = predictions.get_field("labels")  # 标签
        boxes = predictions.bbox  # 边界框

        colors = self.compute_colors_for_labels(labels).tolist()  # 返回不同的color数值转列表,每一个为3通道的rgb数值,举例 {list:4}=[[1,127,31],[33,111,3],[35,110,65],[40,235,220]]

        for box, color in zip(boxes, colors):
            box = box.to(torch.int64)  # 预测的bbox转成整型
            top_left, bottom_right = box[:2].tolist(), box[2:].tolist()  # 左上角顶点,右下角顶点
            image = cv2.rectangle(
                image, tuple(top_left), tuple(bottom_right), tuple(color), 2
            )  # 在原图像上画矩形框

        return image

i.  colors = self.compute_colors_for_labels(labels).tolist()

代码定义如下,其中转numpy.uint8的格式参考np.astype uint8之后发生了什么,小数部分舍去,整数部分保留

    def compute_colors_for_labels(self, labels):
        """
        Simple function that adds fixed colors depending on the class
        """
        colors = labels[:, None] * self.palette  # 结果举例 Tensor:(4,3)种类标签类别--索引顺序,乘上eslf.palette--Tensor:(3,)--[33554431,32767,2097151]
        colors = (colors % 255).numpy().astype("uint8")  # 除以255取余,转numpy,uint8形式
        return colors  # 返回不同的颜色数值,范围为0~255

(IV) result = self.overlay_class_names(result, top_predictions)

代码如下所示,为每个矩形框标识上得分和种类名称

    def overlay_class_names(self, image, predictions):
        """
        Adds detected class names and scores in the positions defined by the
        top-left corner of the predicted bounding box

        Arguments:
            image (np.ndarray): an image as returned by OpenCV
            predictions (BoxList): the result of the computation by the model.
                It should contain the field `scores` and `labels`.
        """
        scores = predictions.get_field("scores").tolist()  # 分类得分列表
        labels = predictions.get_field("labels").tolist()  # 标签列表
        labels = [self.CATEGORIES[i] for i in labels]  # 标签所代表的种类名称列表
        boxes = predictions.bbox  # 预测的bbox

        template = "{}: {:.2f}"
        for box, score, label in zip(boxes, scores, labels):
            x, y = box[:2]  # 左上角顶点
            s = template.format(label, score)  # 种类名称,得分
            cv2.putText(
                image, s, (int(x), int(y)
                           ), cv2.FONT_HERSHEY_SIMPLEX, .5, (255, 255, 255), 1
            )  # 写上文本

        return image

至此, run_on_opencv_image函数结束,返回最终的预测结果,即画了矩形框和标识了类别和得分的原图像。

至此,整个demo过程结束!

  • 2
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
FCOS(Fully Convolutional One-Stage Object Detection)是一种基于全卷积网络的单阶段目标检测算法,而Transformer是一种用于序列建模的模型。将FCOS与Transformer结合可以提高目标检测的性能和效果。 在代码实现上,FCOS与Transformer结合的方式主要包括以下几个步骤: 1. 数据预处理:首先,需要对目标检测数据进行预处理,包括图像的缩放、裁剪、数据增强等操作,以及目标框的编码和标签的生成。 2. 特征提取:使用预训练的卷积神经网络(如ResNet)对输入图像进行特征提取,得到一系列特征图。 3. Transformer编码器:将特征图输入到Transformer编码器中进行序列建模。在FCOS中,可以使用多层的Transformer编码器来对特征图进行处理,以捕捉不同尺度的目标信息。 4. 分类和回归头:在Transformer编码器的输出上,添加分类和回归头来预测目标的类别和位置。分类头通常是一个全连接层,用于预测目标的类别概率分布;回归头通常是一个全连接层,用于预测目标的边界框坐标。 5. 损失函数:定义FCOS与Transformer结合的目标函数,包括分类损失和回归损失。常用的损失函数包括交叉熵损失和平滑L1损失。 6. 训练与优化:使用训练数据对模型进行训练,并使用优化算法(如随机梯度下降)来更新模型参数。训练过程中,可以使用一些技巧来提高模型的性能,如学习率调整、数据增强、正则化等。 7. 推理与评估:使用训练好的模型对测试数据进行推理,得到目标检测结果。可以使用一些评估指标(如精确率、召回率、平均精确率等)来评估模型的性能。 以上是FCOS与Transformer结合的代码实现的一般步骤,具体的实现细节可能因不同的代码库而有所差异。你可以参考一些开源的目标检测代码库(如Detectron2、MMDetection等)中的相关实现来了解更多细节。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

匿名的魔术师

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值