pytorch中的F.grid_sample解释

最新推荐文章于 2025-01-10 13:23:15 发布

dadaHaHa1234

最新推荐文章于 2025-01-10 13:23:15 发布

阅读量2.6w

点赞数 12

文章标签： pytorch 深度学习 python

本文链接：https://blog.csdn.net/qq_32425195/article/details/107249146

版权

本文介绍如何使用PyTorch中的F.grid_sample函数实现图像的变形和特征图的采样，包括构造恒等采样矩阵、应用偏移量、处理边界情况以及特征融合案例。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

1.一般用法

首先构造一个恒等采样的矩阵（左上角是（-1，-1），右下角是（1，1），记为grid），然后在该矩阵的基础上加上x,y方向的offset,构成一个新的采样矩阵（flow_grid)，然后使用F.grid_sample和flow_grid对原图做采样，得到trans_feature。

#构造恒等变换的采样矩阵，左上角是（-1，-1），右下角是（1，1），中间是（0，0）
gridY = torch.linspace(-1, 1, steps = featSize).view(1, -1, 1, 1).expand(1, featSize, featSize, 1)
gridX = torch.linspace(-1, 1, steps = featSize).view(1, 1, -1, 1).expand(1, featSize,  featSize, 1)
grid = torch.cat((gridX, gridY), dim=3).type(predict_roofside.type())
#predict_offset是以像素问单位的offset数组，通过该方式转换为采样时所需的形式
predict_offset_xy=2*predict_offset/featSize
predict_offset_xy=predict_offset_xy.permute(0,2,3,1)
#flow_grid通过将grid与predict_offset_xy相加，构成新的采样矩阵，对原图predict_roof做采样
flow_grid = torch.clamp(predict_offset_xy+ grid, min=-1, max=1)
trans_feature = F.grid_sample(predict_roof,flow_grid)

解释predict_offset的含义：

shape[B,2,H,W]，其中2表示x,y方向的两个通道的偏移，H,W是尺寸（offset的尺寸与要变换的图像的尺寸是相同的）
predict_offset值的含义
predict_offset[0,0,50:100,50:100]=30，表示图像中[50:100,50:100]这片区域的图像使用其右侧(x正方向)30个像素的区域进行填充，即[50:100,50:100]的区域被[80:130,50:100]的区域填充，而原图中[80:130,50:100]的区域保持不变。

在这里插入图片描述

predict_offset_xy[0,1,250:350,250:350]=-100 表示图像中[250:350,250:350]这片区域的图像使用其上方(y负方向)100个像素的区域进行填充，即[250:350,250:350]的区域被[250:350,150:250]的区域填充，而原图中[250:350,150:250]的区域保持不变。
在这里插入图片描述

总结：predict_offset中任意位置【index_x,index_y】=x,y的值的含义为：变换后的的特征图trans_feature[index_x,index_y]=orig_feature[index_x+x,index_y+y]的值填充

变换是出现多对一或者无对应的情况怎么办

什么情况会出现多对一的情况：
由于变换后的的特征图trans_feature[index_x,index_y]=orig_feature[index_x+x,index_y+y]的值填充，所以trans_feature[index_x,index_y]=orig_feature[index_x+x1,index_y+y1]且trans_feature[index_x,index_y]=orig_feature[index_x+x2,index_y+y2]，即predict_offset【index_x,index_y】=x1,y1,且predict_offset【index_x,index_y】=x2,y2.所以这是不可能的。

什么情况会出现无对应的情况
由于变换后的的特征图trans_feature[index_x,index_y]=orig_feature[index_x+x,index_y+y]的值填充，所以trans_feature[index_x,index_y]=orig_feature[index_x+x,index_y+y]，且[index_x+x,index_y+y]超出原图的最大大小。
当出现该情况时，F.grid_sample允许通过zeros或者border来填充.

指定predict_offset_xy[0,0,50:100,50:100]=-150，且填充方式指定为zeros 在这里插入图片描述
指定predict_offset_xy[0,0,50:100,50:100]=-150，且填充方式指定为border的情况

特征融合案例

Accel: A Corrective Fusion Network for Efficient Semantic Segmentation on Video

在这里插入图片描述
网络过程：
(1)对关键帧k使用较大的网络（NRfeat）提取特征，然后用NRtask得到分类结果
（2）对帧i使用较小的网络（NUfeat）提取特征，然后用NUtask得到分类结果
(3)使用1*1的卷积将NRtask分类结果和NUtask分类结果融合在一起

Deep Feature Flow for Video Recognition

在这里插入图片描述
网络过程：
对关键帧k
（1）使用Nfeat提取特征，使用Ntask得到目标检测或者语义分割的结果，对于语义分割，Ntask是1*1的卷积，对于目标检测，Ntask是bbox+卷积分类+卷积回归边框
对非关键帧i：
（1）使用光流估计网络（即上图F）得到从帧k到帧i的光流，然后把帧k的特征根据光流warp到当前帧i中,作为当前帧的特征
(2)使用Ntask对当前帧i的特征做目标检测或者语义分割

F.grid_sample的梯度反向传播

featuremap1_roof=torch.rand((1,3,568,568),requires_grad=True)
flowCoarse2=torch.rand((1,568,568,2),requires_grad=True)
predict_foot = F.grid_sample(featuremap1_roof,flowCoarse2)
predict_foot.sum().backward()

F.grid_sample接受两个参数，特征图和warp用矩阵，F.grid_sample操作会将梯度回传给这两项。

F.grid_sample可用于采样数据

def point_sample(input, point_coords, **kwargs):
    """
    A wrapper around :function:`torch.nn.functional.grid_sample` to support 3D point_coords tensors.
    Unlike :function:`torch.nn.functional.grid_sample` it assumes `point_coords` to lie inside
    [0, 1] x [0, 1] square.

    Args:
        input (Tensor): A tensor of shape (N, C, H, W) that contains features map on a H x W grid.
        point_coords (Tensor): A tensor of shape (N, P, 2) or (N, Hgrid, Wgrid, 2) that contains
        [0, 1] x [0, 1] normalized point coordinates.

    Returns:
        output (Tensor): A tensor of shape (N, C, P) or (N, C, Hgrid, Wgrid) that contains
            features for points in `point_coords`. The features are obtained via bilinear
            interplation from `input` the same way as :function:`torch.nn.functional.grid_sample`.
    """
    add_dim = False
    if point_coords.dim() == 3:
        add_dim = True
        point_coords = point_coords.unsqueeze(2)
    output = F.grid_sample(input, 2.0 * point_coords - 1.0, **kwargs)
    if add_dim:
        output = output.squeeze(3)
    return output