Deep Learning的案例FasterRCNN（二）

最新推荐文章于 2024-07-12 11:25:47 发布

卡列宁在睡觉

最新推荐文章于 2024-07-12 11:25:47 发布

阅读量243

点赞数

分类专栏： DeepLearning 文章标签：深度学习 deep learning 图像识别

本文链接：https://blog.csdn.net/longwoo1012/article/details/106960558

版权

DeepLearning 专栏收录该内容

9 篇文章 0 订阅

订阅专栏

二

训练流

caffe版本的训练步骤

Step1-RPN.TRAIN
Step1-RPN.PROPOSAL
Step2-FASTRCNN.TRAIN
Step3-RPN.TRAIN
Step3-RPN.PROPOSAL
Step4-FASTRCNN.TRAIN

到底在学习什么东西？

识别+小分类(是否有物体)：RPN
识别方法：CNN特征提取+Bounding-Box回归
回归(学习)什么: 一种平面图形的映射 $\mathbf{t}:\mathbf{P}\to \mathbf{\hat{G}}$ 。
具体来说是把一个proposal( $\mathbf{P}$ )形状变形成一个groundtruth( $\mathbf{\hat{G}}$ )形状。已知函数形式，学习(求解)该函数的参数

$\mathbf{t}=\left\{ \begin{array}{ll} \hat{G}_{x}=P_{w}t_{x}(\mathbf{P})+P_{x} & \textrm{}\\ \hat{G}_{y}=P_{h}t_{y}(\mathbf{P})+P_{y} & \textrm{}\\ \hat{G}_{w}=P_{w}exp(t_{w}(\mathbf{P})) & \textrm{}\\ \hat{G}_{h}=P_{h}exp(t_{h}(\mathbf{P})) & \textrm{}\\ \end{array} \right.$

对于每一个 $t_{*}=\mathbf{w}_{*}^{T}\phi(\mathbf{P}^i)$
$\mathbf{w}_{*}=\argmin\limits_{\hat{\mathbf{w}}_{*}}\sum^{N}_{i}(t^i_{*}-\hat{\mathbf{w}}_{*}^{T}\phi(\mathbf{P}^i))^2+\lambda||\hat{\mathbf{w}}_{*}||^2$

LossFunction

注意这里的 $\mathbf{v}=\hat{\mathbf{w}}_{*}^{T}\phi(\mathbf{P}^i)$ ,即预测的变形函数的参数
$L(\mathbf{p},u,\mathbf{t}^u,\mathbf{v})=L_{cls}(\mathbf{p},u)+\lambda[u\geqslant1]L_{loc}(\mathbf{t}^u,\mathbf{v})$

$L_{loc}(\mathbf{t}^u,\mathbf{v})=\sum_{i\in\{x,y,w,h\}}smooth_{L_{1}}(t^u_{i},v_{i})$

$smooth_{L_{1}}(x)=\left\{ \begin{array}{ll} 0.5 (\sigma x)^2 & \textrm{ if } |x| < \frac{1}{\sigma ^2}\\ |x| - \frac{0.5 }{\sigma ^2} & \textrm{ otherwise}\\ \end{array} \right.$

在caffe中的数据流

在这里插入图片描述

一张原始图片经过共享卷积层得到 $bottom[0]=[...,W_{c},H_{c}]$
ROIDataLayer从一张随机缩放过的图片中抽出所有的ROI区域，得到 $bottom[1]=gtbox[x1,y1,x2,y2,cls];bottom[2]=[W_{0},H_{},scale]$
AnchorTargetLayer在setup时原点处产生 $A$ 个anchor
在Forward中产生 $W_{c}*H_{c}$ 个shift值,通过shift偏移anchor共产生 $A*W_{c}*H_{c}$ 个 $all\_anchors$
去掉 $all\_anchors$ 中越界的anchor得到新的anchors
计算anchors与gtbox的overlap值，给anchors标记label,其中 $> 0.7, l a b e l = 1; < 0.3, l a b e l = 0; o t h e r, l a b e l = - 1$ 给 $t o p [0]$
抽取256个anchors的样本数，正负样本保持1:1
计算 $\mathbf{v}$ ,给 $t o p [1]$ ;计算 $u$ ,给 $t o p [2]$ ;计算 $\lambda$ ,给 $t o p [3]$
Backward反向传播，更新权值