torchvision.transform.RandomResizedCrop

最新推荐文章于 2025-03-10 13:25:53 发布

alien丿明天

最新推荐文章于 2025-03-10 13:25:53 发布

阅读量1.4k

点赞数 2

分类专栏： torchvision 文章标签： pytorch

本文链接：https://blog.csdn.net/weixin_43118280/article/details/123993483

版权

torchvision 专栏收录该内容

3 篇文章

订阅专栏

本文解析了PyTorch中RandomResizedCrop类的源码，重点讲解了size、scale、ratio和interpolation等参数的作用，以及如何通过get_params方法生成随机裁剪和缩放的参数。适合理解图像预处理在深度学习中的应用。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

1.Pytorch 原码阅读

class RandomResizedCrop(torch.nn.Module):
    """
    Crop a random portion of image and resize it to a given size
    If the image is torch Tensor, it is expected
    to have [..., H, W] shape, where ... means an arbitrary number of leading dimensions

    A crop of the original image is made: the crop has a random area (H * W)
    and a random aspect ratio. This crop is finally resized to the given
    size. This is popularly used to train the Inception networks.
    Args:
        size (int or sequence): expected output size of the crop, for each edge. If size is an
            int instead of sequence like (h, w), a square output size ``(size, size)`` is
            made. If provided a sequence of length 1, it will be interpreted as (size[0], size[0]).

            .. note::
                In torchscript mode size as single int is not supported, use a sequence of length 1: ``[size, ]``.
        scale (tuple of float): Specifies the lower and upper bounds for the random area of the crop,
            before resizing. The scale is defined with respect to the area of the original image.
        ratio (tuple of float): lower and upper bounds for the random aspect ratio of the crop, before
            resizing.
        interpolation (InterpolationMode): Desired interpolation enum defined by
            :class:`torchvision.transforms.InterpolationMode`. Default is ``InterpolationMode.BILINEAR``.
            If input is Tensor, only ``InterpolationMode.NEAREST``, ``InterpolationMode.BILINEAR`` and
            ``InterpolationMode.BICUBIC`` are supported.
            For backward compatibility integer values (e.g. ``PIL.Image.NEAREST``) are still acceptable.
    """
    
    def __init__(self, size, scale=(0.08, 1.0), ratio=(3.0 / 4.0, 4.0 / 3.0), interpolation=InterpolationMode.BILINEAR):
        super().__init__()
        _log_api_usage_once(self)
        self.size = _setup_size(size, error_msg="Please provide only two dimensions (h, w) for size.")

        if not isinstance(scale, Sequence):
            raise TypeError("Scale should be a sequence")
        if not isinstance(ratio, Sequence):
            raise TypeError("Ratio should be a sequence")
        if (scale[0] > scale[1]) or (ratio[0] > ratio[1]):
            warnings.warn("Scale and ratio should be of kind (min, max)")

        # Backward compatibility with integer value
        if isinstance(interpolation, int):
            warnings.warn(
                "Argument interpolation should be of type InterpolationMode instead of int. "
                "Please, use InterpolationMode enum."
            )
            interpolation = _interpolation_modes_from_int(interpolation)

        self.interpolation = interpolation
        self.scale = scale
        self.ratio = ratio
        
    def get_params(img: Tensor, scale: List[float], ratio: List[float]) -> Tuple[int, int, int, int]:
        """Get parameters for ``crop`` for a random sized crop.

        Args:
            img (PIL Image or Tensor): Input image.
            scale (list): range of scale of the origin size cropped
            ratio (list): range of aspect ratio of the origin aspect ratio cropped

        Returns:
            tuple: params (i, j, h, w) to be passed to ``crop`` for a random
            sized crop.
        """
        width, height = F.get_image_size(img)
        area = height * width

        log_ratio = torch.log(torch.tensor(ratio))
        for _ in range(10):
            target_area = area * torch.empty(1).uniform_(scale[0], scale[1]).item()
            aspect_ratio = torch.exp(torch.empty(1).uniform_(log_ratio[0], log_ratio[1])).item()

            w = int(round(math.sqrt(target_area * aspect_ratio)))
            h = int(round(math.sqrt(target_area / aspect_ratio)))

            if 0 < w <= width and 0 < h <= height:
                i = torch.randint(0, height - h + 1, size=(1,)).item()
                j = torch.randint(0, width - w + 1, size=(1,)).item()
                return i, j, h, w

        # Fallback to central crop
        in_ratio = float(width) / float(height)
        if in_ratio < min(ratio):
            w = width
            h = int(round(w / min(ratio)))
        elif in_ratio > max(ratio):
            h = height
            w = int(round(h * max(ratio)))
        else:  # whole image
            w = width
            h = height
        i = (height - h) // 2
        j = (width - w) // 2
        return i, j, h, w
	def forward(self, img):
        """
        Args:
            img (PIL Image or Tensor): Image to be cropped and resized.

        Returns:
            PIL Image or Tensor: Randomly cropped and resized image.
        """
        i, j, h, w = self.get_params(img, self.scale, self.ratio)
        return F.resized_crop(img, i, j, h, w, self.size, self.interpolation)


    def __repr__(self) -> str:
        interpolate_str = self.interpolation.value
        format_string = self.__class__.__name__ + f"(size={self.size}"
        format_string += f", scale={tuple(round(s, 4) for s in self.scale)}"
        format_string += f", ratio={tuple(round(r, 4) for r in self.ratio)}"
        format_string += f", interpolation={interpolate_str})"
        return format_string

2.源码理解

def get_params(img: Tensor, scale: List[float], ratio: List[float]) -> Tuple[int, int, int, int]:
参数： size:（224，224） , scale(0.08, 1.0), ratio=(3.0/4.0, 4.0/3.0)

`size`:

就是传入图片的大小,可以是h, w, (h,), (w,),（h,w）,并且传入的img_size需要是Torch.Tensor / PIL Image类型，一般在Transform Compose 里面使用。

`scale`:

import torch
h=w = 224
area = h*w
scale = [0.5,0.5]
target_area = area * torch.empty(1).uniform_(scale[0], scale[1]).item()
##从scale[0]到scale[1]两个数之间的均匀分布中抽取一个数出来，再乘以原来的面积
print(area, target_area)
###结果
scale=[0.5,0.5]---->(50176, 25088.0)
#下面是默认选择的参数
scale=[0.08, 1.0] --->(50176, 39760.93994140625)

`ratio`:

import torch
import math
ratio = (3.0/4.0, 4.0/3.0)
log_ratio = torch.log(torch.tensor(ratio))
aspect_ratio = torch.exp(torch.empty(1).uniform_(log_ratio[0], log_ratio[1])).item()
#round 返回四舍五入指
w = int(round(math.sqrt(target_area * aspect_ratio)))
h = int(round(math.sqrt(target_area / aspect_ratio)))
print('log_ratio:',log_ratio)
print('aspect_ratio:',aspect_ratio)
print('h,w:',h,w)
##结果
log_ratio: tensor([-0.2877,  0.2877])
aspect_ratio: 1.1371617317199707
h,w: 187 213

if 0 < w <= 224 and 0 < h <= 224:
                i = torch.randint(0, 224 - h + 1, size=(1,)).item()
                j = torch.randint(0, 224 - w + 1, size=(1,)).item()
print('i,j',i,j )
##结果
(i,j 3 21)

`interpolation`:

如果输入是Tensor类型的话，以下三个插值mode是被允许的：1.InterpolationMode.NEAREST(最近邻插值) 2.InterpolationMode.BILINEAR(双线性插值) 3.InterpolationMode.BICUBIC(双三次线性插值) 在反向传播过程中为了兼容整数值，PIL.Image.NEAREST也是支持的。