一文彻底弄懂 PyTorch 的 `F.grid_sample`

微凉的衣柜

已于 2024-06-11 22:11:07 修改

阅读量8k

点赞数 34

分类专栏：深度学习文章标签： pytorch 人工智能 python

于 2024-06-11 20:06:28 首次发布

本文链接：https://blog.csdn.net/weixin_41496173/article/details/139607685

版权

深度学习专栏收录该内容

43 篇文章

订阅专栏

在深度学习和计算机视觉任务中，经常需要对图像或特征图进行采样和变换。PyTorch 提供的 F.grid_sample 函数非常方便，用于根据指定的坐标从输入张量中采样特定点的值。本文将详细介绍如何使用 F.grid_sample，并通过两个具体例子解释其工作原理。

什么是 `F.grid_sample`？

F.grid_sample 是 PyTorch 中的一个函数，用于根据给定的坐标网格对输入张量进行采样。它常用于图像变形、数据增强等任务。函数的核心思想是使用双线性插值从输入张量中提取指定坐标的值。

示例代码

以下代码展示了如何使用 F.grid_sample 从一个 4x4 的输入张量中采样特定点的值：

import torch
import torch.nn.functional as F

# 定义一个 4x4 的输入张量
input_tensor = torch.tensor([[[[1, 2, 3, 4],
                               [5, 6, 7, 8],
                               [9, 10, 11, 12],
                               [13, 14, 15, 16]]]], dtype=torch.float)

# 定义采样点，归一化坐标在 [-1, 1] 范围内
# 这里使用小数坐标进行采样
grid = torch.tensor([[[[-0.5, -0.5], [0.5, -0.5]],
                      [[-0.5, 0.5], [0.5, 0.5]]]], dtype=torch.float)

# 使用 F.grid_sample 进行采样
output = F.grid_sample(input_tensor, grid, align_corners=True)

print(output)

计算过程

假设输入张量的尺寸为 (4, 4)，采样点坐标的归一化范围在 [-1, 1]，我们将其转换为张量坐标的范围 [0, 3]。

归一化坐标转换公式

归一化坐标转换公式如下：
$x_{\text{input}} = \frac{(x_{\text{grid}} + 1) \cdot (W - 1)}{2}$
$y_{\text{input}} = \frac{(y_{\text{grid}} + 1) \cdot (H - 1)}{2}$

示例计算 1：归一化采样点 `[-0.5, -0.5]`

对于归一化采样点 [-0.5, -0.5]，我们将其转换为输入张量的实际坐标：

$x_{\text{input}} = \frac{(-0.5 + 1) \cdot (4 - 1)}{2} = \frac{0.5 \cdot 3}{2} = 0.75$
$y_{\text{input}} = \frac{(-0.5 + 1) \cdot (4 - 1)}{2} = \frac{0.5 \cdot 3}{2} = 0.75$

这样，归一化坐标 [-0.5, -0.5] 对应的输入张量实际坐标为 [0.75, 0.75]。

假设采样点 (x, y) 对应输入张量的坐标 [0.75, 0.75]，我们可以确定其周围的四个像素值：

左上角像素 (0, 0)
右上角像素 (0, 1)
左下角像素 (1, 0)
右下角像素 (1, 1)

计算权重 wx 和 wy：

wx = 0.75（x 坐标的小数部分）
wy = 0.75（y 坐标的小数部分）

使用双线性插值公式计算插值值：

top_left = input_tensor[0, 0, 0, 0]  # 1
top_right = input_tensor[0, 0, 0, 1]  # 2
bottom_left = input_tensor[0, 0, 1, 0]  # 5
bottom_right = input_tensor[0, 0, 1, 1]  # 6

value = (1 - 0.75) * (1 - 0.75) * 1 + 0.75 * (1 - 0.75) * 2 + (1 - 0.75) * 0.75 * 5 + 0.75 * 0.75 * 6
# 结果为 0.0625 + 0.375 + 0.9375 + 3.375 = 4.75

示例计算 2：归一化采样点 `[0.5, -0.5]`

对于归一化采样点 [0.5, -0.5]，我们将其转换为输入张量的实际坐标：

$x_{\text{input}} = \frac{(0.5 + 1) \cdot (4 - 1)}{2} = \frac{1.5 \cdot 3}{2} = 2.25$
$y_{\text{input}} = \frac{(-0.5 + 1) \cdot (4 - 1)}{2} = \frac{0.5 \cdot 3}{2} = 0.75$

这样，归一化坐标 [0.5, -0.5] 对应的输入张量实际坐标为 [2.25, 0.75]。

假设采样点 (x, y) 对应输入张量的坐标 [2.25, 0.75]，我们可以确定其周围的四个像素值：

左上角像素 (2, 0)
右上角像素 (2, 1)
左下角像素 (3, 0)
右下角像素 (3, 1)

计算权重 wx 和 wy：

wx = 0.25（x 坐标的小数部分）
wy = 0.75（y 坐标的小数部分）

使用双线性插值公式计算插值值：

top_left = input_tensor[0, 0, 2, 0]  # 9
top_right = input_tensor[0, 0, 2, 1]  # 10
bottom_left = input_tensor[0, 0, 3, 0]  # 13
bottom_right = input_tensor[0, 0, 3, 1]  # 14

value = (1 - 0.25) * (1 - 0.75) * 9 + 0.25 * (1 - 0.75) * 10 + (1 - 0.25) * 0.75 * 13 + 0.25 * 0.75 * 14
# 结果为 0.75 * 0.25 * 9 + 0.25 * 0.25 * 10 + 0.75 * 0.75 * 13 + 0.25 * 0.75 * 14
# 结果为 1.6875 + 0.625 + 7.3125 + 2.625 = 12.25