depth-wise convolution and depth-wise separable convolution

最新推荐文章于 2024-11-12 20:59:31 发布

你好，赵同学

最新推荐文章于 2024-11-12 20:59:31 发布

阅读量60

点赞数

分类专栏：目标检测文章标签：深度学习 cnn

本文链接：https://blog.csdn.net/weixin_60210050/article/details/133438390

版权

目标检测专栏收录该内容

10 篇文章 0 订阅

订阅专栏

Introduction

The standard convolution layer involve $channel_{input},channel_{output},filter\_size)$ ,for instance,
$I n p u t - c hann e l : 10$
$O u tp u t - c hann e l : 20$
$F i lt er - s i ze : 7$
可以得到 $=(7\times7\times10+1)\times20=9820$
the amount of parameters of the standard convolution is so much that the model is more probably over-fitting.So the depth-wise convolution and the depth-wise separable convolution is proposed to avoid this scenarios.

depth-wise convolution

We use each filter channels only at one input channels.
hhh
To produce the same effect with normal convolution, what we need to do is select a channel ,make all the elements zeros in the filter except that channel and then convolve.
Although the parameters remain same , depth-wise convolution can produce 3 output channels wth only one 3-channel filter, but the standard convolution produce only 1 output channel with the same filter.

depth-wise separable convolution

在这里插入图片描述
We perform depth-wise convolution at horizontal dimension( height and width ) and after that we use $1\times1$ convolution to cover the depth dimension so that we can produce any channels we want.

Parameter:
Requirements:
$I n p u t - c hann e l : 3$
$O u tp u t - c hann e l : 3$
$f i lt er - s i ze : 3$
the standard convolution: $Parameters=(3\times3\times3+1)\times3=84$
depth-wise separable convolution : $Parameters=(3\times3+1)\times3+3\times3=39$

Having too many parameters forces function to memorize lather than learn and thus over-fitting.Depth-wise separable convolution save us from that.