Reading notes--Deformable Convolution Networks

最新推荐文章于 2024-07-28 18:37:23 发布

wsq1920

最新推荐文章于 2024-07-28 18:37:23 发布

阅读量202

点赞数

分类专栏： deep learning 文章标签： Semantic Segmention

本文链接：https://blog.csdn.net/m0_37718446/article/details/80647589

版权

deep learning 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

Reading notes–Deformable Convolution Networks

This article is just used to record the important part of original paper which I think can help understanding.

In this work, we introduce two new models to enhance the transformation modeling capability of CNNs, namely, deformable convolution and deformable RoI polling. Both are based on the idea of augmenting the spacial sampling locations in models with additional offsets and learning the offsets from the target tasks, without additional supervision.
This new models can be esaily trained end-to-end by standard back-propagation, giving rise to deformable convolution networks.

Deformable Concolution

Deformable convolution adds 2D offsets to regular grid sampling locations in standard convolution. It enables free form deformation of of the sampling grid. It is illustrated in Figure 1. The offset are learned from the preceding feature maps, via additional convolution layers.

The 2D convolution consists of two steps: 1) sampling using a regular grid R over the input feature map x; 2) summation of sampled values weighted by w. The grid R defines the receptive field size and dilation. For example,

R = {(- 1, - 1), (- 1, 0), . . ., (0, 1), (1, 1)}

$R = \{(-1,-1),(-1,0),...,(0,1),(1,1)\}$
defines a 3 x 3 kernel with dilation 1.
For each location

P0 P 0 $P_0$ on the output feature map y, we have

y (p 0) = \sum p n \in R w (p n) \cdot x (p 0 + p n), (1)

$\begin{equation} y(p_0) = \sum _{p_n\in R} w(p_n)\cdot x(p_0+p_n)\tag{1}, \end{equation}$

where $p_n$ enumerates the locations in R.
In deformable convolution, the regular grid R is augmented with offsets $\{\Delta p_n|n = 1,...,N\}$ ,where N = |R|. Eq.(1) becomes

y (p 0) = \sum p n \in R w (p n) \cdot x (p 0 + p n + Δ p n) . (2)

$\begin{equation} y(p_0) = \sum _{p_n\in R} w(p_n)\cdot x(p_0+p_n+\Delta p_n)\tag{2}. \end{equation}$

Now, the sampling is on the irregular and offset locations $p_n+\Delta p_n$ . As the offset $\Delta p_n$ is typically fractional, Eq.(2) is implemented via bilinear interpolation as

x (p) = \sum q G (q, p) \cdot x (q), (3)

$\begin{equation} x(p) = \sum _q G(q,p)\cdot x(q)\tag{3}, \end{equation}$
where p denotes an arbitrary (fractional) location (

p=p0+pn+Δpn p = p 0 + p n + Δ p n $p = p_0+p_n+\Delta p_n$ for Eq.(2)), q enumerates all integral spacial locations in the future map x, and

G(⋅,⋅) G ( ⋅ , ⋅ ) $G(\cdot,\cdot)$ is the bilinear interpolation kernel. Note that G is two dimensional. It is separated into two one dimensional kernels as

G (q, p) = g (q x, p x) \cdot g (q y, p y), (4)

$\begin{equation} G(q,p) = g(q_x,p_x)\cdot g(q_y,p_y)\tag{4}, \end{equation}$
where

g(a,b)=max(0,1−|a−b|) g ( a , b ) = m a x ( 0 , 1 − | a − b | ) $g(a,b) = max(0,1 - |a - b|)$ . Eq.(3) is fast to compute as

G(q,p) G ( q , p ) $G(q,p)$ is non-zero only for a few

q q $q$ s.

Deformable RoI pooling

Deformable RoI pooling adds an offset to each bin position in regular bin partition of the previous RoI pooling. Similiarly, the offsets are learned from the preceding feature maps and RoIs, enabling adaptive part localization for objects with different shapes.

RoI pooling is used in all region proposal based object detection methods. It coverts an input rectangular region of arbitrary size into fixed size feature.
RoI Pooling Given the input feature x and a RoI of size w x h and top-left corner $P_0$ , RoI pooling divides the RoI into k x k(k is a free parameter) bins and outputs a k x k feature map y. For (i,j)-th bin $(0<=i,j<k),$ we have

y (i, j) = \sum p \in b i n (i, j) x (p 0 + p) / n i j, (5)

$\begin{equation} y(i,j) = \sum_{p\in bin(i,j)}x(p_0+p)/n_{ij}\tag{5}, \end{equation}$
where

nij n i j $n_{ij}$ is the number of pixels in the bin.The (i,j)-th bin spans

⌊iwk⌋≤px<⌈(i+1)wk⌉and⌊jhk⌋≤py< ⌊ i w k ⌋ ≤ p x < ⌈ ( i + 1 ) w k ⌉ a n d ⌊ j h k ⌋ ≤ p y < $\lfloor i \frac{w}{k} \rfloor \leq p_x < \lceil (i+1) \frac{w}{k} \rceil and \lfloor j \frac{h}{k} \rfloor \leq p_y <$