图解卷积计算原理与pytorch中fold和unfold函数的使用

最新推荐文章于 2025-03-20 16:07:24 发布

子燕若水

最新推荐文章于 2025-03-20 16:07:24 发布

阅读量5.3k

点赞数 5

分类专栏： cuda&深度学习环境框架深度学习算法

本文链接：https://blog.csdn.net/u010087338/article/details/113666140

版权

深度学习算法同时被 3 个专栏收录

159 篇文章

订阅专栏

cuda&深度学习环境

87 篇文章

订阅专栏

框架

16 篇文章

订阅专栏

本文详细介绍了PyTorch中卷积操作的实现，包括nn.Conv2d的封装以及如何使用unfold, matmul, fold进行自定义卷积。nn.Unfold将输入张量分解成小块，matmul进行矩阵乘法，fold再将结果还原，整个过程等效于卷积操作。通过实例展示了输入输出的维度变化，并强调了这些操作在实际中的应用。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

在pytorch中,我们常用的卷积是封装好了的卷积,如nn.Conv2d, 对应到原理图的左上那个子图

但如果封装好的这个卷积操作不能满足我们想要更细粒度的操作的话,pytorch还为我们提供了 unfold , matmul , fold 三个操作（conv = unfold + matmul + fold）

torch.nn.Unfold就是原理图下面中间的那个,也就算把一个立体的tensor(feature)分成 w_1*h_1 个部分(kernel_size-sized block),然后把每一个准备和kernel相乘的部分拉直。该类的构造器的参数有：

torch.nn.Unfold(kernel_size, dilation=1, padding=0, stride=1)

我们来看下unfold的输入和输出，其输入形状如: (c_0,w_0,h_0) , 输出就是 (c_0*w_k*h_k,w_1*h_1)

unfold之后,我们构造 c_1 个可以学习的tensor (c_0,w_k,h_k) 作为kernels,并把它像左下图那样展开成 (c_1, c_0*w_k * h_k) ,注意这里的 c_1 是kernel的个数

然后就用pytorch自带的matmul,把kernels 的展开乘unfold 之后的input tensor (c_0*w_k*h_k,w_1*h_1) 得到Output Maps,维度为 (c_1,w_1*h_1)

通过GPU并行加速相乘之后,我们还需要把计算结果Output Maps通过fold把它还原回tensor(feature),此时就需要用的pytorch中提供的

torch.nn.Fold(output_size, kernel_size, dilation=1, padding=0, stride=1)

# Combines an array of sliding local blocks into a large containing tensor.

# output_size (int or tuple) – the shape of the spatial dimensions of the output # (i.e., output.sizes()[2:])

# kernel_size (int or tuple) – the size of the sliding blocks

# stride (int or tuple) – the stride of the sliding blocks in the input spatial # dimensions. Default: 1

# padding (int or tuple, optional) – implicit zero padding to be added on both sides of input. Default: 0

# dilation (int or tuple, optional) – a parameter that controls the stride of elements within the neighborhood. Default: 1

fold的输入是Output Maps (c_1,w_1*h_1) ,输出是tensor (c_1,w_1,h_1)