[PaddleSeg源码阅读] 关于PaddleSeg模型返回的都是list这件小事

最新推荐文章于 2024-01-13 20:53:16 发布

氵文大师

最新推荐文章于 2024-01-13 20:53:16 发布

阅读量472

点赞数

分类专栏： paddlepaddle历险记每日一氵文章标签： numpy python 开发语言

本文链接：https://blog.csdn.net/HaoZiHuang/article/details/125772966

版权

每日一氵同时被 2 个专栏收录

158 篇文章 11 订阅

订阅专栏

paddlepaddle历险记

46 篇文章 11 订阅

订阅专栏

1. 倒着往回推

在 paddleseg/core/infer.py 中，inference 和 slide_inference函数` 中：

(同时，aug_inference 也会调用 inference函数， inference函数内部也会调用 slide_inference函数)

def slide_inference(model, im, crop_size, stride):
    """
    Infer by sliding window.

    Args:
        model (paddle.nn.Layer): model to get logits of image.
        im (Tensor): the input image.
        crop_size (tuple|list). The size of sliding window, (w, h).
        stride (tuple|list). The size of stride, (w, h).

    Return:
        Tensor: The logit of input image.
    """
    h_im, w_im = im.shape[-2:]
    w_crop, h_crop = crop_size
    w_stride, h_stride = stride
    # calculate the crop nums
    rows = np.int(np.ceil(1.0 * (h_im - h_crop) / h_stride)) + 1
    cols = np.int(np.ceil(1.0 * (w_im - w_crop) / w_stride)) + 1
    # prevent negative sliding rounds when imgs after scaling << crop_size
    rows = 1 if h_im <= h_crop else rows
    cols = 1 if w_im <= w_crop else cols
    # TODO 'Tensor' object does not support item assignment. If support, use tensor to calculation.
    final_logit = None
    count = np.zeros([1, 1, h_im, w_im])
    for r in range(rows):
        for c in range(cols):
            h1 = r * h_stride
            w1 = c * w_stride
            h2 = min(h1 + h_crop, h_im)
            w2 = min(w1 + w_crop, w_im)
            h1 = max(h2 - h_crop, 0)
            w1 = max(w2 - w_crop, 0)
            im_crop = im[:, :, h1:h2, w1:w2]
            logits = model(im_crop)       # <------------------------- 看这里，以下几行
            if not isinstance(logits, collections.abc.Sequence):
                raise TypeError(
                    "The type of logits must be one of collections.abc.Sequence, e.g. list, tuple. But received {}"
                    .format(type(logits)))
            logit = logits[0].numpy()
            if final_logit is None:
                final_logit = np.zeros([1, logit.shape[1], h_im, w_im])
            final_logit[:, :, h1:h2, w1:w2] += logit[:, :, :h2 - h1, :w2 - w1]
            count[:, :, h1:h2, w1:w2] += 1
    if np.sum(count == 0) != 0:
        raise RuntimeError(
            'There are pixel not predicted. It is possible that stride is greater than crop_size'
        )
    final_logit = final_logit / count
    final_logit = paddle.to_tensor(final_logit)
    return final_logit

def inference(model,
              im,
              ori_shape=None,
              transforms=None,
              is_slide=False,
              stride=None,
              crop_size=None):
    """
    Inference for image.

    Args:
        model (paddle.nn.Layer): model to get logits of image.
        im (Tensor): the input image.
        ori_shape (list): Origin shape of image.
        transforms (list): Transforms for image.
        is_slide (bool): Whether to infer by sliding window. Default: False.
        crop_size (tuple|list). The size of sliding window, (w, h). It should be probided if is_slide is True.
        stride (tuple|list). The size of stride, (w, h). It should be probided if is_slide is True.

    Returns:
        Tensor: If ori_shape is not None, a prediction with shape (1, 1, h, w) is returned.
            If ori_shape is None, a logit with shape (1, num_classes, h, w) is returned.
    """
    if hasattr(model, 'data_format') and model.data_format == 'NHWC':
        im = im.transpose((0, 2, 3, 1))
    if not is_slide:
        logits = model(im)                 # <------------------------- 看这里，以下几行
        if not isinstance(logits, collections.abc.Sequence):
            raise TypeError(
                "The type of logits must be one of collections.abc.Sequence, e.g. list, tuple. But received {}"
                .format(type(logits)))
        logit = logits[0]
    else:
        logit = slide_inference(model, im, crop_size=crop_size, stride=stride)
    if hasattr(model, 'data_format') and model.data_format == 'NHWC':
        logit = logit.transpose((0, 3, 1, 2))
    if ori_shape is not None:
        logit = reverse_transform(logit, ori_shape, transforms, mode='bilinear')
        pred = paddle.argmax(logit, axis=1, keepdim=True, dtype='int32')
        return pred, logit
    else:
        return logit

如果 is_slide 为 False，则走这个分支(只走 inference 函数)：

logits = model(im)                 # <------------------------- 看这里，以下几行
if not isinstance(logits, collections.abc.Sequence):
    raise TypeError(
        "The type of logits must be one of collections.abc.Sequence, e.g. list, tuple. But received {}"
        .format(type(logits)))
logit = logits[0]

如果 is_slide 为 True，则走这个分支(inference 中调用 slide_inference 函数)：

logits = model(im_crop)       # <------------------------- 看这里，以下几行
if not isinstance(logits, collections.abc.Sequence):
    raise TypeError(
        "The type of logits must be one of collections.abc.Sequence, e.g. list, tuple. But received {}"
        .format(type(logits)))
logit = logits[0].numpy()

1是 logit 是 logits 的第0个元素
2是 logits 应当是 collections.abc.Sequence 类，诸如 list, tuple 之类的类型

总之，model(input) 返回的应当是列表，大概率只有一个元素

2. 例子

光通过这样的推断，不是很精确，咱找几个 model 看看他 forward 的返回值是 list 还是 Tensor 就好了

来看看最近比较火的 pp_liteseg 模型

位置在 paddleseg/models/pp_liteseg.py

我们只需要找到，被装饰器@manager.MODELS.add_component 装饰的类即可
也就是
在这里插入图片描述
直接查看其 forward 函数的返回值：

看到两个分支返回的都是 list，无需知道list里到底是什么，只需要知道返回的是list即可

另外，在非 training 的时候，list中只有一个元素

再来看看之前比较火的 pphumanseg_lite 模型
paddleseg/models/pphumanseg_lite.py

找最主要的类，一是看被装饰器@manager.MODELS.add_component 装饰的类
二是看__all__中的元素是啥：

__all__ = ['PPHumanSegLite']

来看其 forward 函数：

    def forward(self, x):
        # Encoder
        input_shape = paddle.shape(x)[2:]

        x = self.conv_bn0(x)  # 1/2
        shortcut = self.conv_bn1(x)  # shortcut
        x = F.max_pool2d(x, kernel_size=3, stride=2, padding=1)  # 1/4
        x = self.block1(x)  # 1/8
        x = self.block2(x)  # 1/16

        # Decoder
        x = self.depthwise_separable0(x)
        shortcut_shape = paddle.shape(shortcut)[2:]
        x = F.interpolate(
            x,
            shortcut_shape,
            mode='bilinear',
            align_corners=self.align_corners)
        x = paddle.concat(x=[shortcut, x], axis=1)
        x = self.depthwise_separable1(x)

        logit = self.depthwise_separable2(x)
        logit = F.interpolate(
            logit,
            input_shape,
            mode='bilinear',
            align_corners=self.align_corners)

        return [logit]   # <---------- logit 显然是个 Tensor, 这里给他套上框框，变成列表