Pytorch将大tensor分成一个个的小tensor再重构回去

最新推荐文章于 2022-09-11 16:05:13 发布

开飞机的小毛驴儿

最新推荐文章于 2022-09-11 16:05:13 发布

阅读量925

点赞数

分类专栏：深度学习

本文链接：https://blog.csdn.net/jzwong/article/details/104402454

版权

深度学习专栏收录该内容

141 篇文章 18 订阅

订阅专栏

https://discuss.pytorch.org/t/how-to-split-tensors-with-overlap-and-then-reconstruct-the-original-tensor/70261/2

I encountered a problem. My network is trained with tensors of size BxCx128x128, but I need to verify its image reconstruction performance with images of size 1024x1024. To make the reconstruction smooth, I need to split my input of size BxCx1024x1024 into BxCx128x128 tensors with overlap, which are then fed to the network for reconstruction. Then, the reconstructed tensors of size BxCx128x128 should be used for reconstructing the tensor of size BxCx1024x1024 by averaging the overlaping elements. Note that (Size(in)-128)/stride may be not a integer. How to use padding strategy to ensure that every element is cropped at least for one time? How to implement the process of recovering the BxCx1024x1024 tensor from overlapping BxCx128x128 tensors? Could anyone give me some suggestions? Thanks advance for your consideration.

fold should work in your use case.
Here is a small example creating the expected input shape step by step:

B, C, W, H = 2, 3, 1024, 1024
x = torch.randn(B, C, H, W)

kernel_size = 128
stride = 64
patches = x.unfold(3, kernel_size, stride).unfold(2, kernel_size, stride)
print(patches.shape) # [B, C, nb_patches_h, nb_patches_w, kernel_size, kernel_size]

# perform the operations on each patch
# ...

# reshape output to match F.fold input
patches = patches.contiguous().view(B, C, -1, kernel_size*kernel_size)
print(patches.shape) # [B, C, nb_patches_all, kernel_size*kernel_size]
patches = patches.permute(0, 1, 3, 2) 
print(patches.shape) # [B, C, kernel_size*kernel_size, nb_patches_all]
patches = patches.contiguous().view(B, C*kernel_size*kernel_size, -1)
print(patches.shape) # [B, C*prod(kernel_size), L] as expected by Fold
# https://pytorch.org/docs/stable/nn.html#torch.nn.Fold

output = F.fold(
    patches, output_size=(H, W), kernel_size=kernel_size, stride=stride)
print(output.shape) # [B, C, H, W]

Thanks for your reply. In fact, I have noticed these two functions in other related posts (e.g., Patch Making Does Pytorch have Anything to Offer?), but there are still several questions.

In the example, the stride is set to 64, which satisfies (Size(in)-128)/164 to be an integer. But how to handle arbirary strides that may not satisfy this (e.g. stride=40) with padding or other strategies?
When we get patches with size being [B,C,nb_patches_h,nb_patches_w,kernel_size,kernel_size], we can surely perform the operation on each patch by looping over patches in dimension 0 and 1. Is there a way that we can perform the operation on all the patch one-time?
Is the spatial neighborhood information preserved in unfold and fold process? The reconstructed large tensor should average the overlapping elements in tensor patches, but I did not see any signs of this process.

Many thanks for your help.

def max_pool2d(input, kernel_size, stride):
    kh, kw = kernel_size
    dh, dw = stride
    # get all image windows of size (kh, kw) and stride (dh, dw)
    input_windows = input.unfold(2, kh, dh).unfold(3, kw, dw)
    # view the windows as (kh * kw)
    input_windows = input_windows.contiguous().view(*input_windows.size()[:-2], -1)
    max_val, max_idx = input_windows.max(4)
    return max_val, max_idx

https://forums.fast.ai/t/split-large-satellite-image-into-tiles-patches/32039