原文链接:https://blog.csdn.net/john_bh/article/details/106380784
转载请注明作者和出处: http://blog.csdn.net/john_bh/
ICCV链接:Adaptive Wing Loss for Robust Face Alignment via Heatmap Regression
Arxiv链接:Adaptive Wing Loss for Robust Face Alignment via Heatmap Regression
作者及团队:俄勒冈州立大学(美国) & JD Digits
会议及时间:ICCV 2019
code:原作者开源github 地址
文章目录
1.主要贡献
这是一篇在人脸关键点检测中基于热图回归的损失函数研究。
- 改进了wing loss ,提出了基于热图回归的Adaptive wing loss,它能够使其形状适应不同类型的 ground truth heatmap pixels,自适应属性可减少前景像素上的小误差,以实现精确的 landmark 定位,同时容忍背景像素上的小误差,以实现更高的收敛速度;
- 提出了加权损失图, 解决前景像素和背景像素之间的不平衡问题,能够在训练过程中专注于前景像素和困难的背景像素,有助于使得前景回传更大的loss,背景传递更小的loss,使得训练效果更好;
- 使用CoordConv 对坐标信息包括边界坐标信息进行编码,更像一种attention机制,有助于网络学习到更好的效果;
- 提出了将关键点的边界Boundary和关键点landmark一起训练的思路;
- Adaptive wing loss还有助于其他热图回归任务,例如人体关键点。
2. 总体框架
如图3所示,整个框架有 4 个hourglass模块, 输入
256
∗
256
256*256
</span><span class="katex-html"><span class="base"><span class="strut" style="height: 0.64444em; vertical-align: 0em;"></span><span class="mord">2</span><span class="mord">5</span><span class="mord">6</span><span class="mspace" style="margin-right: 0.222222em;"></span><span class="mbin">∗</span><span class="mspace" style="margin-right: 0.222222em;"></span></span><span class="base"><span class="strut" style="height: 0.64444em; vertical-align: 0em;"></span><span class="mord">2</span><span class="mord">5</span><span class="mord">6</span></span></span></span></span> 大小的人脸图像,会对该图像进行长宽各10%的扩充,输出图像大小为 <span class="katex--inline"><span class="katex"><span class="katex-mathml">
64
∗
64
64*64
</span><span class="katex-html"><span class="base"><span class="strut" style="height: 0.64444em; vertical-align: 0em;"></span><span class="mord">6</span><span class="mord">4</span><span class="mspace" style="margin-right: 0.222222em;"></span><span class="mbin">∗</span><span class="mspace" style="margin-right: 0.222222em;"></span></span><span class="base"><span class="strut" style="height: 0.64444em; vertical-align: 0em;"></span><span class="mord">6</span><span class="mord">4</span></span></span></span></span> ,预测的特征图包含c个通道的 landmarks 和1个通道的 boundary。其中,landmarks表示人脸关键点,一个channel预测一个点,boundary表示人脸轮廓的分割的线,Landmarks+boundary一起预测有助于促进网络学习的更好。<br> <img src="https://img-blog.csdnimg.cn/20200607111448214.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2pvaG5fYmg=,size_16,color_FFFFFF,t_70#pic_center" alt="在这里插入图片描述"></p>
3. Adaptive wing loss
3.1 相关方法的调研
基于heat map 回归的关键点检测:
- 在热图回归中,通过绘制以每个通道的的每个 ground truth 为中心的高斯分布,生成 ground truth heat map。
- 模型在像素水平上针对ground truth heat map进行回归,然后使用预测的热图来推断 landmark 位置。
如图1所示,前景像素(具有正值的像素)的预测准确性,尤其是接近每个高斯分布模式的像素(图1),对 landmark 预测至关重要,即使这些像素上的很小预测误差也可能导致预测偏离正确模式。相反,准确预测背景像素(具有零值的像素)的值并不重要,因为这些像素在大多数情况下不会影响 landmark 预测。但是,对困难的背景像素(图1 中difficult background)的预测精度也很重要,因为它们经常被错误地回归为前景像素,并可能导致不准确的预测。
作者分析了MSE损失,在基于heat map 回归中使用MSE存在两个问题:
1. MSE对小误差不敏感,这将会影响高斯分布模型的表现;
2. 在训练过程中,MSE对所有像素采用相同的权重,但是背景像素比前景像素多很多,存在像素类别不平衡问题
- 1
- 2
如图2所示,MSE损失训练的模型倾向于预测前景像素上具有低强度的模糊且膨胀的热图(图2c),而这些低质量的heat map 会导致错误的landmark 预测。作者尝试使用 wing loss,发现背景像素上的小误差将累积明显的梯度,从而导致训练过程发散。所以作者提出 Adaptive Wing loss。
对于热图回归,训练收敛于:
N 是训练样本数目;H,W,C分别表示 heatmap 的高,宽,和 通道;L o s s n Loss_n </span><span class="katex-html"><span class="base"><span class="strut" style="height: 0.83333em; vertical-align: -0.15em;"></span><span class="mord mathdefault">L</span><span class="mord mathdefault">o</span><span class="mord mathdefault">s</span><span class="mord"><span class="mord mathdefault">s</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 0.151392em;"><span class="" style="top: -2.55em; margin-left: 0em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">n</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.15em;"><span class=""></span></span></span></span></span></span></span></span></span></span> 表示第 <span class="katex--inline"><span class="katex"><span class="katex-mathml"> n n </span><span class="katex-html"><span class="base"><span class="strut" style="height: 0.43056em; vertical-align: 0em;"></span><span class="mord mathdefault">n</span></span></span></span></span> 个样本的损失;<span class="katex--inline"><span class="katex"><span class="katex-mathml"> y i , j , k 和 y ^ i , j , k y_{i,j,k} 和 \hat y_{i,j,k} </span><span class="katex-html"><span class="base"><span class="strut" style="height: 0.980548em; vertical-align: -0.286108em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right: 0.03588em;">y</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 0.336108em;"><span class="" style="top: -2.55em; margin-left: -0.03588em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">i</span><span class="mpunct mtight">,</span><span class="mord mathdefault mtight" style="margin-right: 0.05724em;">j</span><span class="mpunct mtight">,</span><span class="mord mathdefault mtight" style="margin-right: 0.03148em;">k</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.286108em;"><span class=""></span></span></span></span></span></span><span class="mord cjk_fallback">和</span><span class="mord"><span class="mord accent"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 0.69444em;"><span class="" style="top: -3em;"><span class="pstrut" style="height: 3em;"></span><span class="mord mathdefault" style="margin-right: 0.03588em;">y</span></span><span class="" style="top: -3em;"><span class="pstrut" style="height: 3em;"></span><span class="accent-body" style="left: -0.19444em;">^</span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.19444em;"><span class=""></span></span></span></span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 0.336108em;"><span class="" style="top: -2.55em; margin-left: -0.03588em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">i</span><span class="mpunct mtight">,</span><span class="mord mathdefault mtight" style="margin-right: 0.05724em;">j</span><span class="mpunct mtight">,</span><span class="mord mathdefault mtight" style="margin-right: 0.03148em;">k</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.286108em;"><span class=""></span></span></span></span></span></span></span></span></span></span>分别别表示 ground truth 像素和 预测的像素。<br> <br> 因此,具有较大梯度幅度的像素上的正误差(影响较大)将需要通过具有较小影响的许多像素上的负误差来平衡。 与梯度大小较小的错误相比,梯度大小较大的错误也将在训练期间更加关注</p>
wing loss: wing loss 无法克服在
y
−
y
^
=
0
y-\hat y = 0
</span><span class="katex-html"><span class="base"><span class="strut" style="height: 0.77777em; vertical-align: -0.19444em;"></span><span class="mord mathdefault" style="margin-right: 0.03588em;">y</span><span class="mspace" style="margin-right: 0.222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right: 0.222222em;"></span></span><span class="base"><span class="strut" style="height: 0.88888em; vertical-align: -0.19444em;"></span><span class="mord accent"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 0.69444em;"><span class="" style="top: -3em;"><span class="pstrut" style="height: 3em;"></span><span class="mord mathdefault" style="margin-right: 0.03588em;">y</span></span><span class="" style="top: -3em;"><span class="pstrut" style="height: 3em;"></span><span class="accent-body" style="left: -0.19444em;">^</span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.19444em;"><span class=""></span></span></span></span></span><span class="mspace" style="margin-right: 0.277778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right: 0.277778em;"></span></span><span class="base"><span class="strut" style="height: 0.64444em; vertical-align: 0em;"></span><span class="mord">0</span></span></span></span></span> 时梯度的不连续性,因为在这一点上梯度幅度较大,与 <span class="katex--inline"><span class="katex"><span class="katex-mathml">
L
1
L1
</span><span class="katex-html"><span class="base"><span class="strut" style="height: 0.68333em; vertical-align: 0em;"></span><span class="mord mathdefault">L</span><span class="mord">1</span></span></span></span></span> 损失相比,训练更难以收敛。此属性使 Wing loss 不适用于 heatmap 热图回归,因为在所有背景像素上都计算了Wing loss 后,背景像素上的小误差会产生不成比例的影响。训练在这些像素上输出零或小的梯度的神经网络非常困难,将会导致模型很难收敛。<br> <img src="https://img-blog.csdnimg.cn/20200607111448537.png#pic_center" alt="在这里插入图片描述"></p>
3.2 提出 Adaptive wing loss
通过分析希望损失函数在误差较大时具有恒定的影响力,因此对于不正确的注释和遮挡将是可靠的,随着训练过程的继续和误差的减小,将出现两种情况:
- 对于前景像素,影响(以及渐变)应开始增加,以便训练能够专注于减少这些误差。然后,当误差非常接近于零时,影响应迅速减小,以使这些“足够好”的像素不再被关注。正确估计的减小的影响有助于网络保持收敛,而不是像
L 1 L_1 </span><span class="katex-html"><span class="base"><span class="strut" style="height: 0.83333em; vertical-align: -0.15em;"></span><span class="mord"><span class="mord mathdefault">L</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 0.301108em;"><span class="" style="top: -2.55em; margin-left: 0em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.15em;"><span class=""></span></span></span></span></span></span></span></span></span></span>和wing loss 那样振荡</li><li><strong>对于背景像素</strong>,梯度的行为应更类似于 <span class="katex--inline"><span class="katex"><span class="katex-mathml"> M S E MSE </span><span class="katex-html"><span class="base"><span class="strut" style="height: 0.68333em; vertical-align: 0em;"></span><span class="mord mathdefault" style="margin-right: 0.10903em;">M</span><span class="mord mathdefault" style="margin-right: 0.05764em;">S</span><span class="mord mathdefault" style="margin-right: 0.05764em;">E</span></span></span></span></span> 损失,即随着训练误差的减小,梯度将逐渐减小至零,因此,当误差较小时,影响将相对较小。此属性减少了训练对背景像素的关注,从而稳定了训练过程。<br> <img src="https://img-blog.csdnimg.cn/20200607111448485.png#pic_center" alt="在这里插入图片描述"></li></ul>
y 和 y ^ y 和 \hat y </span><span class="katex-html"><span class="base"><span class="strut" style="height: 0.88888em; vertical-align: -0.19444em;"></span><span class="mord mathdefault" style="margin-right: 0.03588em;">y</span><span class="mord cjk_fallback">和</span><span class="mord accent"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 0.69444em;"><span class="" style="top: -3em;"><span class="pstrut" style="height: 3em;"></span><span class="mord mathdefault" style="margin-right: 0.03588em;">y</span></span><span class="" style="top: -3em;"><span class="pstrut" style="height: 3em;"></span><span class="accent-body" style="left: -0.19444em;">^</span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.19444em;"><span class=""></span></span></span></span></span></span></span></span></span> 分别表示 ground truth heatmap 和预测的 heatmap;</li><li><span class="katex--inline"><span class="katex"><span class="katex-mathml"> ω , θ , α , ϵ \omega, \theta, \alpha, \epsilon </span><span class="katex-html"><span class="base"><span class="strut" style="height: 0.69444em; vertical-align: 0em;"></span><span class="mord mathdefault" style="margin-right: 0.03588em;">ω</span><span class="mord cjk_fallback">,</span><span class="mord mathdefault" style="margin-right: 0.02778em;">θ</span><span class="mord cjk_fallback">,</span><span class="mord mathdefault" style="margin-right: 0.0037em;">α</span><span class="mord cjk_fallback">,</span><span class="mord mathdefault">ϵ</span></span></span></span></span> 都是正数,<span class="katex--inline"><span class="katex"><span class="katex-mathml"> ω = 14 , θ = 0.5 , α = 2.1 , ϵ = 1 \omega = 14, \theta = 0.5, \alpha = 2.1, \epsilon = 1 </span><span class="katex-html"><span class="base"><span class="strut" style="height: 0.43056em; vertical-align: 0em;"></span><span class="mord mathdefault" style="margin-right: 0.03588em;">ω</span><span class="mspace" style="margin-right: 0.277778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right: 0.277778em;"></span></span><span class="base"><span class="strut" style="height: 0.69444em; vertical-align: 0em;"></span><span class="mord">1</span><span class="mord">4</span><span class="mord cjk_fallback">,</span><span class="mord mathdefault" style="margin-right: 0.02778em;">θ</span><span class="mspace" style="margin-right: 0.277778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right: 0.277778em;"></span></span><span class="base"><span class="strut" style="height: 0.64444em; vertical-align: 0em;"></span><span class="mord">0</span><span class="mord">.</span><span class="mord">5</span><span class="mord cjk_fallback">,</span><span class="mord mathdefault" style="margin-right: 0.0037em;">α</span><span class="mspace" style="margin-right: 0.277778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right: 0.277778em;"></span></span><span class="base"><span class="strut" style="height: 0.64444em; vertical-align: 0em;"></span><span class="mord">2</span><span class="mord">.</span><span class="mord">1</span><span class="mord cjk_fallback">,</span><span class="mord mathdefault">ϵ</span><span class="mspace" style="margin-right: 0.277778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right: 0.277778em;"></span></span><span class="base"><span class="strut" style="height: 0.64444em; vertical-align: 0em;"></span><span class="mord">1</span></span></span></span></span>;其中<span class="katex--inline"><span class="katex"><span class="katex-mathml"> α = 2.1 \alpha = 2.1 </span><span class="katex-html"><span class="base"><span class="strut" style="height: 0.43056em; vertical-align: 0em;"></span><span class="mord mathdefault" style="margin-right: 0.0037em;">α</span><span class="mspace" style="margin-right: 0.277778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right: 0.277778em;"></span></span><span class="base"><span class="strut" style="height: 0.64444em; vertical-align: 0em;"></span><span class="mord">2</span><span class="mord">.</span><span class="mord">1</span></span></span></span></span> 因为 <span class="katex--inline"><span class="katex"><span class="katex-mathml"> y y </span><span class="katex-html"><span class="base"><span class="strut" style="height: 0.625em; vertical-align: -0.19444em;"></span><span class="mord mathdefault" style="margin-right: 0.03588em;">y</span></span></span></span></span> 的区间是[0,1],对 <span class="katex--inline"><span class="katex"><span class="katex-mathml"> y y </span><span class="katex-html"><span class="base"><span class="strut" style="height: 0.625em; vertical-align: -0.19444em;"></span><span class="mord mathdefault" style="margin-right: 0.03588em;">y</span></span></span></span></span> 值接近1像素,幂指数 <span class="katex--inline"><span class="katex"><span class="katex-mathml"> α − y \alpha -y </span><span class="katex-html"><span class="base"><span class="strut" style="height: 0.66666em; vertical-align: -0.08333em;"></span><span class="mord mathdefault" style="margin-right: 0.0037em;">α</span><span class="mspace" style="margin-right: 0.222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right: 0.222222em;"></span></span><span class="base"><span class="strut" style="height: 0.625em; vertical-align: -0.19444em;"></span><span class="mord mathdefault" style="margin-right: 0.03588em;">y</span></span></span></span></span> 将略大于1,非线性部分将像wing loss ,在小的误差上由较大影响,但与wing loss 不同的是,当误差非常接近于零时,其影响会迅速降至零,如图4所示。另外,较大 <span class="katex--inline"><span class="katex"><span class="katex-mathml"> ω \omega </span><span class="katex-html"><span class="base"><span class="strut" style="height: 0.43056em; vertical-align: 0em;"></span><span class="mord mathdefault" style="margin-right: 0.03588em;">ω</span></span></span></span></span> 的和较小的 <span class="katex--inline"><span class="katex"><span class="katex-mathml"> ϵ \epsilon </span><span class="katex-html"><span class="base"><span class="strut" style="height: 0.43056em; vertical-align: 0em;"></span><span class="mord mathdefault">ϵ</span></span></span></span></span> 增加对小误差的影响;</li><li><span class="katex--inline"><span class="katex"><span class="katex-mathml"> A = ω ( 1 / ( 1 + ( θ / ω ) ( α − y ) ) ) ( α − y ) ( ( θ / ω ) ( α − y − 1 ) ) ( 1 / ω ) A = \omega(1/(1+(\theta /\omega)^{(\alpha - y)}))(\alpha -y)((\theta / \omega)^{(\alpha -y -1)})(1/\omega) </span><span class="katex-html"><span class="base"><span class="strut" style="height: 0.68333em; vertical-align: 0em;"></span><span class="mord mathdefault">A</span><span class="mspace" style="margin-right: 0.277778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right: 0.277778em;"></span></span><span class="base"><span class="strut" style="height: 1em; vertical-align: -0.25em;"></span><span class="mord mathdefault" style="margin-right: 0.03588em;">ω</span><span class="mopen">(</span><span class="mord">1</span><span class="mord">/</span><span class="mopen">(</span><span class="mord">1</span><span class="mspace" style="margin-right: 0.222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right: 0.222222em;"></span></span><span class="base"><span class="strut" style="height: 1.138em; vertical-align: -0.25em;"></span><span class="mopen">(</span><span class="mord mathdefault" style="margin-right: 0.02778em;">θ</span><span class="mord">/</span><span class="mord mathdefault" style="margin-right: 0.03588em;">ω</span><span class="mclose"><span class="mclose">)</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height: 0.888em;"><span class="" style="top: -3.063em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mopen mtight">(</span><span class="mord mathdefault mtight" style="margin-right: 0.0037em;">α</span><span class="mbin mtight">−</span><span class="mord mathdefault mtight" style="margin-right: 0.03588em;">y</span><span class="mclose mtight">)</span></span></span></span></span></span></span></span></span><span class="mclose">)</span><span class="mclose">)</span><span class="mopen">(</span><span class="mord mathdefault" style="margin-right: 0.0037em;">α</span><span class="mspace" style="margin-right: 0.222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right: 0.222222em;"></span></span><span class="base"><span class="strut" style="height: 1.138em; vertical-align: -0.25em;"></span><span class="mord mathdefault" style="margin-right: 0.03588em;">y</span><span class="mclose">)</span><span class="mopen">(</span><span class="mopen">(</span><span class="mord mathdefault" style="margin-right: 0.02778em;">θ</span><span class="mord">/</span><span class="mord mathdefault" style="margin-right: 0.03588em;">ω</span><span class="mclose"><span class="mclose">)</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height: 0.888em;"><span class="" style="top: -3.063em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mopen mtight">(</span><span class="mord mathdefault mtight" style="margin-right: 0.0037em;">α</span><span class="mbin mtight">−</span><span class="mord mathdefault mtight" style="margin-right: 0.03588em;">y</span><span class="mbin mtight">−</span><span class="mord mtight">1</span><span class="mclose mtight">)</span></span></span></span></span></span></span></span></span><span class="mclose">)</span><span class="mopen">(</span><span class="mord">1</span><span class="mord">/</span><span class="mord mathdefault" style="margin-right: 0.03588em;">ω</span><span class="mclose">)</span></span></span></span></span>;</li><li><span class="katex--inline"><span class="katex"><span class="katex-mathml"> C = ( θ A − ω l n ( 1 + ( θ / ω ) ( α − y ) ) ) C = (\theta A-\omega ln(1+(\theta /\omega)^{(\alpha -y)})) </span><span class="katex-html"><span class="base"><span class="strut" style="height: 0.68333em; vertical-align: 0em;"></span><span class="mord mathdefault" style="margin-right: 0.07153em;">C</span><span class="mspace" style="margin-right: 0.277778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right: 0.277778em;"></span></span><span class="base"><span class="strut" style="height: 1em; vertical-align: -0.25em;"></span><span class="mopen">(</span><span class="mord mathdefault" style="margin-right: 0.02778em;">θ</span><span class="mord mathdefault">A</span><span class="mspace" style="margin-right: 0.222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right: 0.222222em;"></span></span><span class="base"><span class="strut" style="height: 1em; vertical-align: -0.25em;"></span><span class="mord mathdefault" style="margin-right: 0.03588em;">ω</span><span class="mord mathdefault" style="margin-right: 0.01968em;">l</span><span class="mord mathdefault">n</span><span class="mopen">(</span><span class="mord">1</span><span class="mspace" style="margin-right: 0.222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right: 0.222222em;"></span></span><span class="base"><span class="strut" style="height: 1.138em; vertical-align: -0.25em;"></span><span class="mopen">(</span><span class="mord mathdefault" style="margin-right: 0.02778em;">θ</span><span class="mord">/</span><span class="mord mathdefault" style="margin-right: 0.03588em;">ω</span><span class="mclose"><span class="mclose">)</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height: 0.888em;"><span class="" style="top: -3.063em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mopen mtight">(</span><span class="mord mathdefault mtight" style="margin-right: 0.0037em;">α</span><span class="mbin mtight">−</span><span class="mord mathdefault mtight" style="margin-right: 0.03588em;">y</span><span class="mclose mtight">)</span></span></span></span></span></span></span></span></span><span class="mclose">)</span><span class="mclose">)</span></span></span></span></span>,使得函数在<span class="katex--inline"><span class="katex"><span class="katex-mathml"> ∣ y − y ^ ∣ = θ |y - \hat y|=\theta </span><span class="katex-html"><span class="base"><span class="strut" style="height: 1em; vertical-align: -0.25em;"></span><span class="mord">∣</span><span class="mord mathdefault" style="margin-right: 0.03588em;">y</span><span class="mspace" style="margin-right: 0.222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right: 0.222222em;"></span></span><span class="base"><span class="strut" style="height: 1em; vertical-align: -0.25em;"></span><span class="mord accent"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 0.69444em;"><span class="" style="top: -3em;"><span class="pstrut" style="height: 3em;"></span><span class="mord mathdefault" style="margin-right: 0.03588em;">y</span></span><span class="" style="top: -3em;"><span class="pstrut" style="height: 3em;"></span><span class="accent-body" style="left: -0.19444em;">^</span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.19444em;"><span class=""></span></span></span></span></span><span class="mord">∣</span><span class="mspace" style="margin-right: 0.277778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right: 0.277778em;"></span></span><span class="base"><span class="strut" style="height: 0.69444em; vertical-align: 0em;"></span><span class="mord mathdefault" style="margin-right: 0.02778em;">θ</span></span></span></span></span> 处平滑连续。</li></ul>
图5展示了幂指数
α
−
y
\alpha -y
</span><span class="katex-html"><span class="base"><span class="strut" style="height: 0.66666em; vertical-align: -0.08333em;"></span><span class="mord mathdefault" style="margin-right: 0.0037em;">α</span><span class="mspace" style="margin-right: 0.222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right: 0.222222em;"></span></span><span class="base"><span class="strut" style="height: 0.625em; vertical-align: -0.19444em;"></span><span class="mord mathdefault" style="margin-right: 0.03588em;">y</span></span></span></span></span> 在不同 <span class="katex--inline"><span class="katex"><span class="katex-mathml">
y
y
</span><span class="katex-html"><span class="base"><span class="strut" style="height: 0.625em; vertical-align: -0.19444em;"></span><span class="mord mathdefault" style="margin-right: 0.03588em;">y</span></span></span></span></span> 值之间的平稳过渡,使得小误差的影响会随着y值的增大而逐渐增大。<br> <img src="https://img-blog.csdnimg.cn/20200607111448274.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2pvaG5fYmg=,size_16,color_FFFFFF,t_70#pic_center" alt="在这里插入图片描述"></p>
4. Weighted loss map
在典型的人脸关键点定位中,通常是
64 × 64 64\times 64 </span><span class="katex-html"><span class="base"><span class="strut" style="height: 0.72777em; vertical-align: -0.08333em;"></span><span class="mord">6</span><span class="mord">4</span><span class="mspace" style="margin-right: 0.222222em;"></span><span class="mbin">×</span><span class="mspace" style="margin-right: 0.222222em;"></span></span><span class="base"><span class="strut" style="height: 0.64444em; vertical-align: 0em;"></span><span class="mord">6</span><span class="mord">4</span></span></span></span></span>大小的 heatmap ,高斯分布大小为<span class="katex--inline"><span class="katex"><span class="katex-mathml"> 7 × 7 7 \times 7 </span><span class="katex-html"><span class="base"><span class="strut" style="height: 0.72777em; vertical-align: -0.08333em;"></span><span class="mord">7</span><span class="mspace" style="margin-right: 0.222222em;"></span><span class="mbin">×</span><span class="mspace" style="margin-right: 0.222222em;"></span></span><span class="base"><span class="strut" style="height: 0.64444em; vertical-align: 0em;"></span><span class="mord">7</span></span></span></span></span>,这样的话前景像素只占总像素的<span class="katex--inline"><span class="katex"><span class="katex-mathml"> 1.2 % 1.2 \% </span><span class="katex-html"><span class="base"><span class="strut" style="height: 0.80556em; vertical-align: -0.05556em;"></span><span class="mord">1</span><span class="mord">.</span><span class="mord">2</span><span class="mord">%</span></span></span></span></span>。对这样一个不平衡的数据分配相等的权值会使训练过程收敛速度变慢,导致训练效果较差。</p>
为了进一步使网络对前景像素和困难背景像素(接近前景像素的背景像素)更加关注,作者引入了加权损失图来平衡不同类型像素的损失,Weighted Loss Map 有助于使得前景回传更大的loss,背景传递更小的loss,使得训练效果更好,如公式4:
-
H d H^d </span><span class="katex-html"><span class="base"><span class="strut" style="height: 0.849108em; vertical-align: 0em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right: 0.08125em;">H</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height: 0.849108em;"><span class="" style="top: -3.063em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">d</span></span></span></span></span></span></span></span></span></span></span></span> 是由 ground truth heatmap 通过<span class="katex--inline"><span class="katex"><span class="katex-mathml"> 3 × 3 3 \times 3 </span><span class="katex-html"><span class="base"><span class="strut" style="height: 0.72777em; vertical-align: -0.08333em;"></span><span class="mord">3</span><span class="mspace" style="margin-right: 0.222222em;"></span><span class="mbin">×</span><span class="mspace" style="margin-right: 0.222222em;"></span></span><span class="base"><span class="strut" style="height: 0.64444em; vertical-align: 0em;"></span><span class="mord">3</span></span></span></span></span> 灰度膨胀产生。</li><li>loss map mask <span class="katex--inline"><span class="katex"><span class="katex-mathml"> M M </span><span class="katex-html"><span class="base"><span class="strut" style="height: 0.68333em; vertical-align: 0em;"></span><span class="mord mathdefault" style="margin-right: 0.10903em;">M</span></span></span></span></span> 设置前景像素和困难背景像素1,其他像素0。</li></ul>
权重损失函数定义如公式5:
-
⨂ \bigotimes </span><span class="katex-html"><span class="base"><span class="strut" style="height: 1.00001em; vertical-align: -0.25001em;"></span><span class="mop op-symbol small-op" style="position: relative; top: -5e-06em;">⨂</span></span></span></span></span> 是按元素操作;</li><li><span class="katex--inline"><span class="katex"><span class="katex-mathml"> W W </span><span class="katex-html"><span class="base"><span class="strut" style="height: 0.68333em; vertical-align: 0em;"></span><span class="mord mathdefault" style="margin-right: 0.13889em;">W</span></span></span></span></span>是超参数,控制权重的增加,作者设置 <span class="katex--inline"><span class="katex"><span class="katex-mathml"> w = 10 w=10 </span><span class="katex-html"><span class="base"><span class="strut" style="height: 0.43056em; vertical-align: 0em;"></span><span class="mord mathdefault" style="margin-right: 0.02691em;">w</span><span class="mspace" style="margin-right: 0.277778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right: 0.277778em;"></span></span><span class="base"><span class="strut" style="height: 0.64444em; vertical-align: 0em;"></span><span class="mord">1</span><span class="mord">0</span></span></span></span></span>。</li></ul>
可视化权重图如图6所示:
5. Boundary Information
作者参考 LAB 将边界预测作为子任务引入到网络中,但方式有所不同。 除了将边界分成不同的部分,仅使用一个附加通道作为将所有边界线组合到热图的边界通道,这将有效地捕获人脸上的全球信息。 然后,边界信息将通过前向传播的卷积操作自然地聚合到网络中,并且还将在第6节中使用以生成 landmark 坐标图,实验表明这样可以进一步提高定位精度。
6. Coordinate aggregation
将CoordConv 集成到模型中,以提高传统卷积神经网络捕获坐标信息的能力。 除了对
X
,
Y
X,Y
</span><span class="katex-html"><span class="base"><span class="strut" style="height: 0.68333em; vertical-align: 0em;"></span><span class="mord mathdefault" style="margin-right: 0.07847em;">X</span><span class="mord cjk_fallback">,</span><span class="mord mathdefault" style="margin-right: 0.22222em;">Y</span></span></span></span></span>和半径坐标编码,还利用边界预测仅在边界处生成 <span class="katex--inline"><span class="katex"><span class="katex-mathml">
X
X
</span><span class="katex-html"><span class="base"><span class="strut" style="height: 0.68333em; vertical-align: 0em;"></span><span class="mord mathdefault" style="margin-right: 0.07847em;">X</span></span></span></span></span> 和 <span class="katex--inline"><span class="katex"><span class="katex-mathml">
Y
Y
</span><span class="katex-html"><span class="base"><span class="strut" style="height: 0.68333em; vertical-align: 0em;"></span><span class="mord mathdefault" style="margin-right: 0.22222em;">Y</span></span></span></span></span> 坐标。 更具体地说,将 <span class="katex--inline"><span class="katex"><span class="katex-mathml">
X
X
</span><span class="katex-html"><span class="base"><span class="strut" style="height: 0.68333em; vertical-align: 0em;"></span><span class="mord mathdefault" style="margin-right: 0.07847em;">X</span></span></span></span></span> 坐标编码定义为 <span class="katex--inline"><span class="katex"><span class="katex-mathml">
C
x
C_x
</span><span class="katex-html"><span class="base"><span class="strut" style="height: 0.83333em; vertical-align: -0.15em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right: 0.07153em;">C</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 0.151392em;"><span class="" style="top: -2.55em; margin-left: -0.07153em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">x</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.15em;"><span class=""></span></span></span></span></span></span></span></span></span></span>,根据先前HG的边界预测为 <span class="katex--inline"><span class="katex"><span class="katex-mathml">
B
B
</span><span class="katex-html"><span class="base"><span class="strut" style="height: 0.68333em; vertical-align: 0em;"></span><span class="mord mathdefault" style="margin-right: 0.05017em;">B</span></span></span></span></span>,将边界坐标编码 <span class="katex--inline"><span class="katex"><span class="katex-mathml">
B
x
B_x
</span><span class="katex-html"><span class="base"><span class="strut" style="height: 0.83333em; vertical-align: -0.15em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right: 0.05017em;">B</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 0.151392em;"><span class="" style="top: -2.55em; margin-left: -0.05017em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">x</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.15em;"><span class=""></span></span></span></span></span></span></span></span></span></span> 定义为:<br> <img src="https://img-blog.csdnimg.cn/20200607112622351.png#pic_center" alt="在这里插入图片描述"><br> <span class="katex--inline"><span class="katex"><span class="katex-mathml">
B
y
B_y
</span><span class="katex-html"><span class="base"><span class="strut" style="height: 0.969438em; vertical-align: -0.286108em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right: 0.05017em;">B</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 0.151392em;"><span class="" style="top: -2.55em; margin-left: -0.05017em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight" style="margin-right: 0.03588em;">y</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.286108em;"><span class=""></span></span></span></span></span></span></span></span></span></span> 以类似的方式从 <span class="katex--inline"><span class="katex"><span class="katex-mathml">
C
y
Cy
</span><span class="katex-html"><span class="base"><span class="strut" style="height: 0.87777em; vertical-align: -0.19444em;"></span><span class="mord mathdefault" style="margin-right: 0.07153em;">C</span><span class="mord mathdefault" style="margin-right: 0.03588em;">y</span></span></span></span></span> 生成。 坐标通道在运行时生成,然后与原始输入连接以执行常规卷积。</p>
7. Experiments
- WFLW:如表1所示,其中 wing loss 的 backbone 是ResNet50
- COFW:在COFW显示了方法对大姿态和严重遮挡的人脸的鲁棒性,如表2所示:
- 300W:在数据集300W上实验结果显示,达到SOAT,如表3所示:
- 300W private test dataset 如表4所示:
7.2. Ablation study
8. Supplementary Material
1. Implementation Detail of CoordConv on Boundary Information
在原有的CoordConv 的基础上,增加了两个带有边界信息的坐标编码通道。这个过程的可视化如图8所示:
2. Evaluation on AFLW
3. Effectiveness of AdaptiveWing loss on Training
4. Robustness of Adaptive Wing loss on datasets with manually added annotation noise
5. Experiment on different number of HG stacks
6. Result Visualization