Pytorch将大tensor分成一个个的小tensor再重构回去

 

https://discuss.pytorch.org/t/how-to-split-tensors-with-overlap-and-then-reconstruct-the-original-tensor/70261/2

I encountered a problem. My network is trained with tensors of size BxCx128x128, but I need to verify its image reconstruction performance with images of size 1024x1024. To make the reconstruction smooth, I need to split my input of size BxCx1024x1024 into BxCx128x128 tensors with overlap, which are then fed to the network for reconstruction. Then, the reconstructed tensors of size BxCx128x128 should be used for reconstructing the tensor of size BxCx1024x1024 by averaging the overlaping elements. Note that (Size(in)-128)/stride may be not a integer. How to use padding strategy to ensure that every element is cropped at least for one time? How to implement the process of recovering the BxCx1024x1024 tensor from overlapping BxCx128x128 tensors? Could anyone give me some suggestions? Thanks advance for your consideration.

 

fold should work in your use case.
Here is a small example creating the expected input shape step by step:

B, C, W, H = 2, 3, 1024, 1024
x = torch.randn(B, C, H, W)

kernel_size = 128
stride = 64
patches = x.unfold(3, kernel_size, stride).unfold(2, kernel_size, stride)
print(patches.shape) # [B, C, nb_patches_h, nb_patches_w, kernel_size, kernel_size]

# perform the operations on each patch
# ...

# reshape output to match F.fold input
patches = patches.contiguous().view(B, C, -1, kernel_size*kernel_size)
print(patches.shape) # [B, C, nb_patches_all, kernel_size*kernel_size]
patches = patches.permute(0, 1, 3, 2) 
print(patches.shape) # [B, C, kernel_size*kernel_size, nb_patches_all]
patches = patches.contiguous().view(B, C*kernel_size*kernel_size, -1)
print(patches.shape) # [B, C*prod(kernel_size), L] as expected by Fold
# https://pytorch.org/docs/stable/nn.html#torch.nn.Fold

output = F.fold(
    patches, output_size=(H, W), kernel_size=kernel_size, stride=stride)
print(output.shape) # [B, C, H, W]

Thanks for your reply. In fact, I have noticed these two functions in other related posts (e.g., Patch Making Does Pytorch have Anything to Offer?), but there are still several questions.

  1. In the example, the stride is set to 64, which satisfies (Size(in)-128)/164 to be an integer. But how to handle arbirary strides that may not satisfy this (e.g. stride=40) with padding or other strategies?
  2. When we get patches with size being [B,C,nb_patches_h,nb_patches_w,kernel_size,kernel_size], we can surely perform the operation on each patch by looping over patches in dimension 0 and 1. Is there a way that we can perform the operation on all the patch one-time?
  3. Is the spatial neighborhood information preserved in unfold and fold process? The reconstructed large tensor should average the overlapping elements in tensor patches, but I did not see any signs of this process.

Many thanks for your help.

 

其它一些相关的问题

https://github.com/pytorch/pytorch/pull/1523#issue-227526015

def max_pool2d(input, kernel_size, stride):
    kh, kw = kernel_size
    dh, dw = stride
    # get all image windows of size (kh, kw) and stride (dh, dw)
    input_windows = input.unfold(2, kh, dh).unfold(3, kw, dw)
    # view the windows as (kh * kw)
    input_windows = input_windows.contiguous().view(*input_windows.size()[:-2], -1)
    max_val, max_idx = input_windows.max(4)
    return max_val, max_idx

 https://forums.fast.ai/t/split-large-satellite-image-into-tiles-patches/32039

好的,下面是一个简单的卷积自编码器的实现代码,使用PyTorch框架。 ```python import torch import torch.nn as nn import torch.optim as optim import torchvision.datasets as dset import torchvision.transforms as transforms from torch.autograd import Variable from torchvision.utils import save_image # 定义超参数 num_epochs = 100 batch_size = 128 learning_rate = 1e-3 # 加载数据集 transform = transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,)) ]) train_dataset = dset.MNIST(root='./data', train=True, download=True, transform=transform) train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True) # 定义卷积自编码器模型 class AutoEncoder(nn.Module): def __init__(self): super(AutoEncoder, self).__init__() self.encoder = nn.Sequential( nn.Conv2d(1, 16, kernel_size=3, stride=3, padding=1), nn.ReLU(True), nn.MaxPool2d(kernel_size=2, stride=2), nn.Conv2d(16, 8, kernel_size=3, stride=2, padding=1), nn.ReLU(True), nn.MaxPool2d(kernel_size=2, stride=1) ) self.decoder = nn.Sequential( nn.ConvTranspose2d(8, 16, kernel_size=3, stride=2), nn.ReLU(True), nn.ConvTranspose2d(16, 8, kernel_size=5, stride=3, padding=1), nn.ReLU(True), nn.ConvTranspose2d(8, 1, kernel_size=2, stride=2, padding=1), nn.Tanh() ) def forward(self, x): x = self.encoder(x) x = self.decoder(x) return x # 实例化模型 model = AutoEncoder() # 定义损失函数和优化器 criterion = nn.MSELoss() optimizer = optim.Adam(model.parameters(), lr=learning_rate) # 训练模型 for epoch in range(num_epochs): for data in train_loader: img, _ = data img = Variable(img) optimizer.zero_grad() output = model(img) loss = criterion(output, img) loss.backward() optimizer.step() # 每训练10个epoch就保存一次重构图片 if epoch % 10 == 0: pic = output.data save_image(pic, './output/image_{}.png'.format(epoch)) # 输出损失值 print('epoch [{}/{}], loss:{:.4f}'.format(epoch+1, num_epochs, loss.data)) ``` 以上代码中,我们首先定义了一个`AutoEncoder`类,其中包含了一个`encoder`和一个`decoder`。`encoder`用于将输入图片压缩成一个较小的向量,`decoder`则将这个向量解码成一个重构的图片。我们使用了`nn.Conv2d`和`nn.ConvTranspose2d`来定义卷积层和反卷积层,使用了`nn.MaxPool2d`来进行下采样。 在模型训练过程中,我们使用了`nn.MSELoss`作为损失函数,使用了`optim.Adam`作为优化器。每训练10个epoch就保存一次重构图片,并输出当前损失值。 希望以上代码对您有所帮助!
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值