Reading notes--Deformable Convolution Networks

Reading notes–Deformable Convolution Networks

This article is just used to record the important part of original paper which I think can help understanding.

  In this work, we introduce two new models to enhance the transformation modeling capability of CNNs, namely, deformable convolution and deformable RoI polling. Both are based on the idea of augmenting the spacial sampling locations in models with additional offsets and learning the offsets from the target tasks, without additional supervision.
This new models can be esaily trained end-to-end by standard back-propagation, giving rise to deformable convolution networks.

Deformable Concolution

  Deformable convolution adds 2D offsets to regular grid sampling locations in standard convolution. It enables free form deformation of of the sampling grid. It is illustrated in Figure 1. The offset are learned from the preceding feature maps, via additional convolution layers.

Figure 1

  The 2D convolution consists of two steps: 1) sampling using a regular grid R over the input feature map x; 2) summation of sampled values weighted by w. The grid R defines the receptive field size and dilation. For example,

R={(1,1),(1,0),...,(0,1),(1,1)} R = { ( − 1 , − 1 ) , ( − 1 , 0 ) , . . . , ( 0 , 1 ) , ( 1 , 1 ) }

defines a 3 x 3 kernel with dilation 1.
  For each location P0 P 0 on the output feature map y, we have
y(p0)=pnRw(pn)x(p0+pn),(1) (1) y ( p 0 ) = ∑ p n ∈ R w ( p n ) ⋅ x ( p 0 + p n ) ,

  where pn p n enumerates the locations in R.
  In deformable convolution, the regular grid R is augmented with offsets {Δpn|n=1,...,N} { Δ p n | n = 1 , . . . , N } ,where N = |R|. Eq.(1) becomes

y(p0)=pnRw(pn)x(p0+pn+Δpn).(2) (2) y ( p 0 ) = ∑ p n ∈ R w ( p n ) ⋅ x ( p 0 + p n + Δ p n ) .

  Now, the sampling is on the irregular and offset locations pn+Δpn p n + Δ p n . As the offset Δpn Δ p n is typically fractional, Eq.(2) is implemented via bilinear interpolation as

x(p)=qG(q,p)x(q),(3) (3) x ( p ) = ∑ q G ( q , p ) ⋅ x ( q ) ,

where p denotes an arbitrary (fractional) location ( p=p0+pn+Δpn p = p 0 + p n + Δ p n for Eq.(2)), q enumerates all integral spacial locations in the future map x, and G(,) G ( ⋅ , ⋅ ) is the bilinear interpolation kernel. Note that G is two dimensional. It is separated into two one dimensional kernels as
G(q,p)=g(qx,px)g(qy,py),(4) (4) G ( q , p ) = g ( q x , p x ) ⋅ g ( q y , p y ) ,

where g(a,b)=max(0,1|ab|) g ( a , b ) = m a x ( 0 , 1 − | a − b | ) . Eq.(3) is fast to compute as G(q,p) G ( q , p ) is non-zero only for a few q q s.

Deformable RoI pooling

  Deformable RoI pooling adds an offset to each bin position in regular bin partition of the previous RoI pooling. Similiarly, the offsets are learned from the preceding feature maps and RoIs, enabling adaptive part localization for objects with different shapes.

  RoI pooling is used in all region proposal based object detection methods. It coverts an input rectangular region of arbitrary size into fixed size feature.
  RoI Pooling Given the input feature x and a RoI of size w x h and top-left corner P0, RoI pooling divides the RoI into k x k(k is a free parameter) bins and outputs a k x k feature map y. For (i,j)-th bin (0<=i,j<k), ( 0 <= i , j < k ) , we have

y(i,j)=pbin(i,j)x(p0+p)/nij,(5) (5) y ( i , j ) = ∑ p ∈ b i n ( i , j ) x ( p 0 + p ) / n i j ,

where nij n i j is the number of pixels in the bin.The (i,j)-th bin spans iwkpx<(i+1)wkandjhkpy< ⌊ i w k ⌋ ≤ p x < ⌈ ( i + 1 ) w k ⌉ a n d ⌊ j h k ⌋ ≤ p y < (j+1)hk ⌈ ( j + 1 ) h k ⌉
Similarly as in Eq.(2), in deformable RoI pooling, offsets {Δpij|0i,j<k} { Δ p i j | 0 ≤ i , j < k } are added to the spatial binning positions. Eq.(5)becomes

y(i,j)=pbin(i,j)x(p0+p+Δpij)/nij,(6) (6) y ( i , j ) = ∑ p ∈ b i n ( i , j ) x ( p 0 + p + Δ p i j ) / n i j ,

Typically, Δpij Δ p i j is fractional. Eq.(6) is implemented by bilinear interpolation via Eq.(3) and Eq.(4).

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值