depth-wise convolution and depth-wise separable convolution

Introduction

The standard convolution layer involve ( c h a n n e l i n p u t , c h a n n e l o u t p u t , f i l t e r _ s i z e ) (channel_{input},channel_{output},filter\_size) (channelinput,channeloutput,filter_size),for instance,
I n p u t − c h a n n e l : 10 Input-channel: 10 Inputchannel:10
O u t p u t − c h a n n e l : 20 Output-channel:20 Outputchannel:20
F i l t e r − s i z e : 7 Filter-size:7 Filtersize:7
可以得到 P a r a m e t e r s = ( 7 × 7 × 10 + 1 ) × 20 = 9820 Parameters =(7\times7\times10+1)\times20=9820 Parameters=(7×7×10+1)×20=9820
the amount of parameters of the standard convolution is so much that the model is more probably over-fitting.So the depth-wise convolution and the depth-wise separable convolution is proposed to avoid this scenarios.

depth-wise convolution

We use each filter channels only at one input channels.
hhh
To produce the same effect with normal convolution, what we need to do is select a channel ,make all the elements zeros in the filter except that channel and then convolve.
Although the parameters remain same , depth-wise convolution can produce 3 output channels wth only one 3-channel filter, but the standard convolution produce only 1 output channel with the same filter.

depth-wise separable convolution

在这里插入图片描述
We perform depth-wise convolution at horizontal dimension( height and width ) and after that we use 1 × 1 1\times1 1×1convolution to cover the depth dimension so that we can produce any channels we want.

Parameter:
Requirements:
I n p u t − c h a n n e l : 3 Input-channel: 3 Inputchannel:3
O u t p u t − c h a n n e l : 3 Output-channel:3 Outputchannel:3
f i l t e r − s i z e : 3 filter-size:3 filtersize:3
the standard convolution: P a r a m e t e r s = ( 3 × 3 × 3 + 1 ) × 3 = 84 Parameters=(3\times3\times3+1)\times3=84 Parameters=(3×3×3+1)×3=84
depth-wise separable convolution : P a r a m e t e r s = ( 3 × 3 + 1 ) × 3 + 3 × 3 = 39 Parameters=(3\times3+1)\times3+3\times3=39 Parameters=(3×3+1)×3+3×3=39

Having too many parameters forces function to memorize lather than learn and thus over-fitting.Depth-wise separable convolution save us from that.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值