『算法学习』深度可分离卷积

最新推荐文章于 2023-05-26 16:37:11 发布

weixin_34240520

最新推荐文章于 2023-05-26 16:37:11 发布

阅读量245

点赞数

文章标签：人工智能

一、通道和区域

标准的卷积过程可以看上图，一个2×2的卷积核在卷积时，对应图像区域中的所有通道均被同时考虑，问题在于，为什么一定要同时考虑图像区域和通道？我们为什么不能把通道和空间区域分开考虑？

深度可分离卷积提出了一种新的思路：对于不同的输入channel采取不同的卷积核进行卷积。

二、结合TensorFlow API介绍具体实现

顺便一提，tf的实现可以接收rate参数，即可以采用空洞卷积的方式进行操作。

1、depthwise_conv2d 分离卷积部分

我们定义一张4*4的双通道图片

import tensorflow as tf

img1 = tf.constant(value=[[[[1],[2],[3],[4]],
                           [[1],[2],[3],[4]],
                           [[1],[2],[3],[4]],
                           [[1],[2],[3],[4]]]],dtype=tf.float32)

img2 = tf.constant(value=[[[[1],[1],[1],[1]],
                           [[1],[1],[1],[1]],
                           [[1],[1],[1],[1]],
                           [[1],[1],[1],[1]]]],dtype=tf.float32)

img = tf.concat(values=[img1,img2],axis=3)

img

<tf.Tensor 'concat_1:0' shape=(1, 4, 4, 2) dtype=float32>

使用3*3的卷积核，输入channel为2，输出channel为2（卷积核数目为2），

filter1 = tf.constant(value=0, shape=[3,3,1,1],dtype=tf.float32)
filter2 = tf.constant(value=1, shape=[3,3,1,1],dtype=tf.float32)
filter3 = tf.constant(value=2, shape=[3,3,1,1],dtype=tf.float32)
filter4 = tf.constant(value=3, shape=[3,3,1,1],dtype=tf.float32)
filter_out1 = tf.concat(values=[filter1,filter2],axis=2)
filter_out2 = tf.concat(values=[filter3,filter4],axis=2)
filter = tf.concat(values=[filter_out1,filter_out2],axis=3)

filter

<tf.Tensor 'concat_4:0' shape=(3, 3, 2, 2) dtype=float32>

同时执行卷积操作，和深度可分离卷积操作，

out_img_conv = tf.nn.conv2d(input=img, filter=filter, strides=[1,1,1,1], padding='VALID')
out_img_depthwise = tf.nn.depthwise_conv2d(input=img, 
                                           filter=filter, strides=[1,1,1,1], rate=[1,1], padding='VALID')

with tf.Session() as sess:
    res1 = sess.run(out_img_conv)
    res2 = sess.run(out_img_depthwise)
print(res1, '\n', res1.shape)
print(res2, '\n', res2.shape)

[[[[  9.  63.]
   [  9.  81.]]

  [[  9.  63.]
   [  9.  81.]]]] 
 (1, 2, 2, 2)  # 《----------


[[[[  0.  36.   9.  27.]
   [  0.  54.   9.  27.]]

  [[  0.  36.   9.  27.]
   [  0.  54.   9.  27.]]]] 
 (1, 2, 2, 4)# 《----------

对比输出shape，depthwise_conv2d输出的channel数目为in_channel * 卷积核数目，每一个卷积核对应通道都会对对应的channel进行一次卷积，所以输出通道数更多，

看到这里大家可能会误解深度可分离卷积的输出通道数大于普通卷积，其实这只是“分离”部分，后面还有组合的步骤，而普通卷积只不过直接完成了组合：通过对应点相加，将四个卷积中间结果合并为卷积核个数（这里是2）

2、合并特征

合并过程如下，可分离卷积中的合并过程变成可学习的了，使用一个1*1的普通卷积进行特征合并，

point_filter = tf.constant(value=1, shape=[1,1,4,4],dtype=tf.float32)
out_img_s = tf.nn.conv2d(input=out_img_depthwise, filter=point_filter, strides=[1,1,1,1], padding='VALID')
with tf.Session() as sess:
    res3 = sess.run(out_img_s)
print(res3, '\n', res3.shape)

[[[[ 72.  72.  72.  72.]
   [ 90.  90.  90.  90.]]

  [[ 72.  72.  72.  72.]
   [ 90.  90.  90.  90.]]]] 
 (1, 2, 2, 4)

3、`separable_conv2d 一步完成`

out_img_se = tf.nn.separable_conv2d(input=img, 
                                    depthwise_filter=filter, 
                                    pointwise_filter=point_filter, 
                                    strides=[1,1,1,1], rate=[1,1], padding='VALID')


with tf.Session() as sess:
    print(sess.run(out_img_se))

[[[[ 72.  72.  72.  72.]
   [ 90.  90.  90.  90.]]

  [[ 72.  72.  72.  72.]
   [ 90.  90.  90.  90.]]]] 
 (1, 2, 2, 4)

三、优势

参数量降低

假设输入通道数为3，要求输出通道数为256，两种做法：

1.直接接一个3×3×256的卷积核，参数量为：3×3×3×256 = 6,912

2.DW操作，分两步完成，参数量为：3×3×3 + 3×1×1×256 = 795（3个特征层*（3*3的卷积核）），卷积深度参数通常取为1

通道区域分离

深度可分离卷积将以往普通卷积操作同时考虑通道和区域改变（卷积先只考虑区域，然后再考虑通道），实现了通道和区域的分离。