I encountered a problem. My network is trained with tensors of size BxCx128x128, but I need to verify its image reconstruction performance with images of size 1024x1024. To make the reconstruction smooth, I need to split my input of size BxCx1024x1024 into BxCx128x128 tensors with overlap, which are then fed to the network for reconstruction. Then, the reconstructed tensors of size BxCx128x128 should be used for reconstructing the tensor of size BxCx1024x1024 by averaging the overlaping elements. Note that (Size(in)-128)/stride may be not a integer. How to use padding strategy to ensure that every element is cropped at least for one time? How to implement the process of recovering the BxCx1024x1024 tensor from overlapping BxCx128x128 tensors? Could anyone give me some suggestions? Thanks advance for your consideration.
fold
should work in your use case.
Here is a small example creating the expected input shape step by step:
B, C, W, H = 2, 3, 1024, 1024
x = torch.randn(B, C, H, W)
kernel_size = 128
stride = 64
patches = x.unfold(3, kernel_size, stride).unfold(2, kernel_size, stride)
print(patches.shape) # [B, C, nb_patches_h, nb_patches_w, kernel_size, kernel_size]
# perform the operations on each patch
# ...
# reshape output to match F.fold input
patches = patches.contiguous().view(B, C, -1, kernel_size*kernel_size)
print(patches.shape) # [B, C, nb_patches_all, kernel_size*kernel_size]
patches = patches.permute(0, 1, 3, 2)
print(patches.shape) # [B, C, kernel_size*kernel_size, nb_patches_all]
patches = patches.contiguous().view(B, C*kernel_size*kernel_size, -1)
print(patches.shape) # [B, C*prod(kernel_size), L] as expected by Fold
# https://pytorch.org/docs/stable/nn.html#torch.nn.Fold
output = F.fold(
patches, output_size=(H, W), kernel_size=kernel_size, stride=stride)
print(output.shape) # [B, C, H, W]
Thanks for your reply. In fact, I have noticed these two functions in other related posts (e.g., Patch Making Does Pytorch have Anything to Offer?), but there are still several questions.
- In the example, the stride is set to 64, which satisfies (Size(in)-128)/164 to be an integer. But how to handle arbirary strides that may not satisfy this (e.g. stride=40) with padding or other strategies?
- When we get patches with size being [B,C,nb_patches_h,nb_patches_w,kernel_size,kernel_size], we can surely perform the operation on each patch by looping over patches in dimension 0 and 1. Is there a way that we can perform the operation on all the patch one-time?
- Is the spatial neighborhood information preserved in unfold and fold process? The reconstructed large tensor should average the overlapping elements in tensor patches, but I did not see any signs of this process.
Many thanks for your help.
其它一些相关的问题
https://github.com/pytorch/pytorch/pull/1523#issue-227526015
def max_pool2d(input, kernel_size, stride):
kh, kw = kernel_size
dh, dw = stride
# get all image windows of size (kh, kw) and stride (dh, dw)
input_windows = input.unfold(2, kh, dh).unfold(3, kw, dw)
# view the windows as (kh * kw)
input_windows = input_windows.contiguous().view(*input_windows.size()[:-2], -1)
max_val, max_idx = input_windows.max(4)
return max_val, max_idx
https://forums.fast.ai/t/split-large-satellite-image-into-tiles-patches/32039