PyTorch provides a variety of means to crop images. For example, torchvision.transforms provides several functions to crop
PIL images; PyTorch Forum provides an answer of how to crop image in a differentiable way (differentiable with respect to the image). However, sometimes we need a fully differentiable approach for the cropping action itself. How shall we implement that?
Before reaching the answer, we need first to learn about the image coordinate system in PyTorch. It is a left-handed Cartesian system origined at the middle of an image. The coordinate has been normalized to range , where indicates the top-left corner, and indicates the bottom-right corner, as pointed out by the doc.
Let be the top-left corner of the cropped image with respect to the coordinate of the original image; likewise, we denote as the bottom-right corner of the cropped image. It’s clear that corresponds to with respect to the cropped image coordinate system, and corresponds to . We’d like a function that maps from the cropped image system to the original image system for every point in the cropped image. Since only scaling and translation are involved, the function can be parameterized by an affine transformation matrix such that
where since skewing is not involved. Denote as the homogeneous coordinate of such that , maps with respect to the cropped image system to with respect to the original image system, i.e. . Thus,
Solving the equations,
We’ll need two functions:
torch.nn.functional.affine_gridto convert the parameterization to
torch.nn.functional.grid_sampleto find the corresponding original image coordinate from each cropped image coordinate
import torch import torch.nn.functional as F B, C, H, W = 16, 3, 224, 224 # batch size, input channels # original image height and width # Let `I` be our original image I = torch.rand(B, C, H, W) # Set the (x,y) and (x',y') to define the rectangular region to crop x, y = -0.5, -0.3 # some examplary random coordinates; x_, y_ = 0.7, 0.8 # in practice, (x,y,x_,y_) might be predicted # as a tensor in the computation graph # Set the affine parameters theta = torch.tensor([ [(x_-x)/2, 0, (x_+x)/2], [ 0,(y_-y)/2, (y_+y)/2], ]).unsqueeze_(0).expand(B, -1, -1) # compute the flow field; # where size is the output size (scaling involved) # `align_corners` option must be the same throughout the code f = F.affine_grid(theta, size=(B, C, H//2, W//2), align_corners=False) I_cropped = F.grid_sample(I, f, align_corners=False)