# Faster R-CNN

## 二、特征提取网络

image → conv3-64 → conv3-64 →pool2 → conv3-128 →conv3-128 → pool2 → conv3-256 → conv3-256 → conv3-256 → pool2 → conv3-512 → conv3-512 →conv3-512 →pool2→conv3-512 → conv3-512→ conv3-512

VGG网络特点是只用3*3大小的卷积核

## 三、 RPN网络

### 滑动窗口

${t}_{x}=\left(x-{x}_{a}\right)/{w}_{a},{t}_{y}=\left(y-{y}_{a}\right)/{h}_{a},{t}_{w}=log\left(w/{w}_{a}\right),{t}_{h}=log\left(h/{h}_{a}\right)$$t_x = (x - x_a) / w_a, t_y = (y - y_a) / h_a, t_w = log(w / w_a), t_h = log(h / h_a)$
${t}_{x}^{\ast }=\left({x}^{\ast }-{x}_{a}\right)/{w}_{a},{t}_{y}^{\ast }=\left({y}^{\ast }-{y}_{a}\right)/{h}_{a},{t}_{w}^{\ast }=log\left({w}^{\ast }/{w}_{a}\right),{t}_{h}^{\ast }=log\left({h}^{\ast }/{h}_{a}\right)$$t_x^* = (x^*- x_a) / w_a, t_y^* = (y^* - y_a) / h_a, t_w^* = log(w^*/ w_a), t_h^*= log(h^* / h_a)$

x,y,w,h 分别表示预测值;
${x}_{a},{y}_{a},{h}_{a},{w}_{a}$$x_a,y_a,h_a,w_a$表示Anchor Boxes；
${t}_{x}^{\ast },{t}_{y}^{\ast },{t}_{w}^{\ast },{t}_{h}^{\ast }$$t_x^* ,t_y^* ,t_w^* ,t_h^*$表示真实物体位置

### 感受野

${n}_{out}=⌊{n}_{out}+2p-k⌋/s+1$$n_{out}= \lfloor n_{out} + 2p - k\rfloor /s + 1$

${j}_{out}={j}_{in}\ast s$$j_{out} = j_{in} * s$

${r}_{out}={r}_{in}+\left(k-1\right)\ast {j}_{in}$$r_{out} = r_{in} + (k - 1) * j_{in}$

$star{t}_{out}=star{t}_{in}+\left(\left(k-1\right)/2-p\right)\ast {j}_{in}$$start_{out} = start_{in} + ((k-1) / 2 - p) * j_{in}$

### 具体网络

feature(conv3-512) $\to$$\rightarrow$ conv3-512$\to$$\rightarrow$conv1-18,conv1-36(18为9 * 2类别预测,36为9 * 4位置预测)

### 损失函数

$L\left({p}_{i},{t}_{i}\right)=\frac{1}{{N}_{cls}}\sum _{i}{L}_{cls}\left({p}_{i},{p}_{i}^{\ast }\right)+\lambda \frac{1}{{N}_{reg}}\sum _{i}{p}_{i}^{\ast }{L}_{reg}\left({t}_{i},{t}_{i}^{\ast }\right)$

${L}_{cls}=-\left({p}_{i}^{\ast }\mathrm{log}{p}_{i}+\left(1-{p}_{i}^{\ast }\right)\mathrm{log}\left(1-{p}_{i}\right)\right)$

${L}_{reg}\left({t}_{i},{t}_{i}^{\ast }\right)=smoot{h}_{{L}_{1}}\left({t}_{i}-{t}_{i}^{\ast }\right)$

## 四、Fast R-CNN

RPN网络预测物体位置，然后将位置信息传给Fast R-CNN检测网络，提取给定位置后的特征图信息进行分类，并进行位置精修得出最后的结果

### ROI层

RPN预测的物体大小不一，意味着给出的特征图的向量维数不一致。然而对于后面的全连接层必须保证维数一致。因此这里借鉴了SPPNET的思想，进行了MaxPooling。对于任意特征图m * n,固定输出为7 * 7矩阵。即进行卷积核大小为m/7 * n/7的MaxPooling。

### 具体网络

feature(conv3-512),RPNproposal$\to$$\rightarrow$ROI(7*7*512)$\to$$\rightarrow$FC(4096)$\to$$\rightarrow$FC(4096)$\to$$\rightarrow$(SVD)FC21,(SVD)FC84

• 广告
• 抄袭
• 版权
• 政治
• 色情
• 无意义
• 其他

120