转：深度可分离卷积与其计算量实例

最新推荐文章于 2024-03-20 15:33:13 发布

万物琴弦光锥之外

最新推荐文章于 2024-03-20 15:33:13 发布

阅读量1.1k

点赞数

分类专栏：机器学习神经网络文章标签：神经网络深度学习

原文链接：https://blog.csdn.net/makefish/article/details/88716534

版权

机器学习同时被 2 个专栏收录

60 篇文章 3 订阅

订阅专栏

神经网络

26 篇文章 0 订阅

订阅专栏

深度可分离卷积与其计算量实例

开篇总结：
先举例计算
然后图片感悟一下吧

开篇总结：

1.分离channel各自为政，开始卷积=>(H,W,C)
2. 之后经过11C的卷积成为=>(H,W,1)
3. 分离操作大大节省运算量

先举例计算

原始图像 (3,12,12) 用256个(3,5,5)的卷积核得到 (256,8, 8)（stride=1）

原始卷积计算量：
- 回想卷积的操作，可以知道，一个卷积核要在原始图像上移动 $8\times8$ 次！一次操作量为 $3\times5\times5$
- 然后有256个这样的曹锁也就是 $256\times(3\times5\times5)\times(8\times8)=1228800$
深度可分离卷积计算量：3*(155)(88)+256*(113)(88)=53952
- 通道分别做单通道卷积:
  - 3个 $1\times5\times5$ 的卷积核扫描原始图片得到 $3\times8\times8$ ，因此总计为 $3\times(1\times5\times5)\times(8\times8)=4800$
- $3\times8\times8$ 被 256个 $3\times1\times1$ 卷积核扫描得到 $256\times8\times8$ 的,因此这一步骤需要 $256\times(3\times1\times1)\times(8\times8)=49152$
- 因此总共为 $53952$ ,是之前的计算量的 $0.044$ 倍

然后图片感悟一下吧

图片转自
在这里插入图片描述

下面这个文章介绍了深度可分离卷积是怎么做的：
https://towardsdatascience.com/a-basic-introduction-to-separable-convolutions-b99ec3102728
https://eli.thegreenplace.net/2018/depthwise-separable-convolutions-for-machine-learning/
本文的很多内容都是在这两个文章的基础上整理的。

卷积基础

描述一个二维矩阵，使用row col。三维的，使用channel row col。四维则多了一个参数：batch channel row col。batch channel row col的逻辑顺序则和数据格式有关，常见的有NHWC和NCHW：https://mp.weixin.qq.com/s/I4Q1Bv7yecqYXUra49o7tw?

2D卷积

2D卷积只有col row的概念。（略）

3D卷积和4D卷积

我们先看3D卷积。
假设过滤器窗口是3x3x3（其中一个3代表了in_depth）。有四个这样的窗口，用于提取同一个图片的四个属性(out_depth 0…3指定，对应输出Out channel 0…3)。那么，针对图片里面的某个Batch（譬如Batch 0），四个过滤器窗口的处理流程如下（请注意：四个过滤器，操作的都是同一组数据，即都是Batch 0）：
在这里插入图片描述
参考的源码实现里面，i，j代表了输出的某个属性的任意位置的值。这个值，是窗口和输入卷积得来的。

参考的源代码（Copy 自引文）：

def conv2d_multi_channel(input, w):
    """Two-dimensional convolution with multiple channels.
Uses SAME padding with 0s, a stride of 1 and no dilation.

input: input array with shape (height, width, in_depth)
w: filter array with shape (fd, fd, in_depth, out_depth) with odd fd.
   in_depth is the number of input channels, and has the be the same as
   input's in_depth; out_depth is the number of output channels.

Returns a result with shape (height, width, out_depth).
"""
assert w.shape[0] == w.shape[1] and w.shape[0] % 2 == 1

padw = w.shape[0] // 2
padded_input = np.pad(input,
                      pad_width=((padw, padw), (padw, padw), (0, 0)),
                      mode='constant',
                      constant_values=0)

height, width, in_depth = input.shape
assert in_depth == w.shape[2]
out_depth = w.shape[3]
output = np.zeros((height, width, out_depth))

for out_c in range(out_depth):
    # For each output channel, perform 2d convolution summed across all
    # input channels.
    for i in range(height):
        for j in range(width):
            # Now the inner loop also works across all input channels.
            # 卷积操作是用窗口和输入的所有通道做乘加运算。
            for c in range(in_depth):
                #下面这段应该封装为一个新的函数：用于求解输出的某个属性的卷积。
                for fi in range(w.shape[0]):
                    for fj in range(w.shape[1]):
                        w_element = w[fi, fj, c, out_c]
                        output[i, j, out_c] += (
                            padded_input[i + fi, j + fj, c] * w_element)
return output

所谓4D，就是对每个Batch重复上面的过程。

参考文献：
https://eli.thegreenplace.net/2018/depthwise-separable-convolutions-for-machine-learning/

正常卷积

原始图像是二维的，大小是12x12。由于是RGB格式的，所以有三个通道，这相当于是一个3维的图片。其输入图片格式是：12x12x3。滤波器窗口大小是5x5x3。这样的话，得到的输出图像大小是8x8x1（padding模式是valid）。

12x12x3 * 5x5x3 => 8x8x1

在这里插入图片描述

一个5x5x3滤波器得到的输出图像8x8x1，仅仅提取到的图片里面的一个属性。如果希望获取图片更多的属性，譬如要提取256个属性，则需要：

12x12x3 * 5x5x3x256 => 8x8x256

如下图(图片引用自原网站。感觉应该将8x8x256那个立方体绘制成256个8x8x1，因为他们不是一体的，代表了256个属性)：

在这里插入图片描述

正常卷积的问题在于，它的卷积核是针对图片的所有通道设计的（通道的总数就是depth）。那么，每要求增加检测图片的一个属性，卷积核就要增加一个。所以正常卷积，卷积参数的总数=属性的总数x卷积核的大小。

深度可分离卷积

深度可分离卷积的方法有所不同。正常卷积核是对3个通道同时做卷积。也就是说，3个通道，在一次卷积后，输出一个数。
深度可分离卷积分为两步：

第一步用三个卷积对三个通道分别做卷积，这样在一次卷积后，输出3个数。
这输出的三个数，再通过一个1x1x3的卷积核（pointwise核），得到一个数。

所以深度可分离卷积其实是通过两次卷积实现的。

第一步，对三个通道分别做卷积，输出三个通道的属性：
在这里插入图片描述
第二步，用卷积核1x1x3对三个通道再次做卷积，这个时候的输出就和正常卷积一样，是8x8x1：

如果要提取更多的属性，则需要设计更多的1x1x3卷积核心就可以(图片引用自原网站。感觉应该将8x8x256那个立方体绘制成256个8x8x1，因为他们不是一体的，代表了256个属性)：：
在这里插入图片描述
可以看到，如果仅仅是提取一个属性，深度可分离卷积的方法，不如正常卷积。随着要提取的属性越来越多，深度可分离卷积就能够节省更多的参数。