【C3AE】《C3AE：Exploring the Limits of Compact Model for Age Estimation》

bryant_meng

已于 2022-08-29 10:41:10 修改

阅读量416

点赞数 1

分类专栏： CNN / Transformer 文章标签：深度学习神经网络机器学习 age estimation C3AE

于 2022-08-26 17:33:25 首次发布

本文链接：https://blog.csdn.net/bryant_meng/article/details/125933428

版权

CNN / Transformer 专栏收录该内容

212 篇文章 7 订阅

订阅专栏

在这里插入图片描述

CVPR-2019

1 Background and Motivation

在这里插入图片描述

作者探索了一下在基于深度学习的年龄预测任务中，the limits of compact model for small scale image（小模型小输入），提出 Compact yet efficient Cascade Context-based Age Estimation model（C3AE）方法，在 IMDB-WIKI / Morph II / FG-NET 数据集上取得了惊艳的效果！

2 Related Work

Age Estimation
Compact Model

3 Advantages / Contributions

结合 Two Points Representation of Age 和多尺度输入，提出 C3AE，模型小，精度高

4 Method

在这里插入图片描述

Compact basic model, Casaced training and multi-scale Context

1）Compact Model for Smallscale Image: Revisiting Standard Convolution

比较正常 conv 和 depth-wise separable conv 的计算量

作者的结论是小网络，小输入，正常卷积的代价比 DWS卷积的小……

别慌，我们看看作者葫芦里卖的什么药

先看 mobilenet 论文中的公式（来自【MobileNet】《MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications》）

小网络 $N$ 再怎么小，也小不过 $D_K^2$ ，作者是怎么得出深度可分离卷积比正常卷积计算量小的结论呢？

来看看作者的观点

the depth-wise convolution often requires much more channel numbers in order to perform comparable to standard convolution on small-scale images

在这里插入图片描述
两种 Conv 输入输出通道数 $M$ 和 $N$ 都不一样，比了个寂寞……

好吧，跟着作者的诡辩，得出

$\frac{M}{\hat{M} \cdot \hat{N}} + \frac{MN}{\hat{M} \cdot \hat{N} \cdot D_K^2} = \frac{144}{32 \times 32} + \frac{144 \times 144}{32 \times 32 \times 3^2} = 2.39 > 1$

输入输出通道都比别人小了 4 倍多，计算量才是别人 $\frac{1}{2.39}$ ……

2）Two Points Representation of Age
在这里插入图片描述
利用两点来表示年龄

对于第 $n$ 张图片的年龄 $y_n$ ，我们可以用两个整数的 $z_n^1$ 和 $z_n^2$ （年龄 $y_n$ 落在年龄间隔之间， $z_n^1$ < $y_n$ < $z_n^2$ ）加权来表示，加权系数为 $\lambda_1$ 和 $\lambda_2$ （ $\lambda_1 + \lambda_2 = 1$ ）

比如，68 可以用 60 和 70 加权表示， $68 = 60 * 0.2 + 70 * 0.8$

当年龄间隔区间服从（ $z_n^2 - z_n^1 = K$ ）均匀分布的时候，

年龄可以重新表示为
在这里插入图片描述

$z_n^1 = \left \lfloor \frac{y_n}{K} \right \rfloor \cdot K$

$z_n^2 = \left \lceil \frac{y_n}{K} \right \rceil \cdot K$

实验中 $K = 10$

这样我们可以通过分布的形式来监督年龄了（可以理解为 two-hot）

假设 K = 10，范围 10~80岁，则年龄区间为， ${10, 20, 30, 40, 50, 60, 70, 80\}$ （对应图 2 中的 $w_2$ ）

68 的 two-hot（Distribution $\vec{y}$ ）为 $[0, 0, 0, 0, 0, 0.2, 0.8, 0]$

这样可以监督 two-hot 的分布了，不是单纯的 regression 68，也不需要像 one-hot 那样 80-10+1 类

但是 Each point can also be represented by two points or any other more points

eg $\times 0+0.5 \times100 = 0.2 \times10+0.2 \times 40+0.2 \times 60+0.2 \times 90$

所以网络怎么监督也很关键，作者采用了 cascade 的方式

3）Cascade Training

$I_n \overset{conv}{\rightarrow} X \overset{w_1}{\rightarrow} \vec{y_n} \overset{w_2}{\rightarrow} y_n$

在这里插入图片描述

$\vec{y_n}$ 也即 distribution $\vec{y}$

KL 监督 two-hot 分布
在这里插入图片描述
MAE 监督年龄

实验中 $\alpha = 10$

4）Context-based Regression Model

多尺度输入

在这里插入图片描述

5 Experiments

5.1 Datasets and Metrics

1）数据集

IMDB-WIKI
523051 images，0~100岁，noise 比较多，本文仅用于 pre-train
Morph II
55000 face images of 13000 subjects with age label，16~77岁，平均 4 images per subject
FG-NET
1002 face images from 82 non-celebrity subjects，0~69 岁，平均 12 images per subject

2）评价指标

mean absolute value (MAE)

5.2 Ablation Study

1）the Plain Model of C3AE

精度

在这里插入图片描述
Morph II(M-MAE), IMDB (I-MAE) and WIKI (W-MAE)

速度
在这里插入图片描述
train-val loss

residual 结构和 SE block

res 的引入增大了 error，SE 的引入降低 error

2）Cascade and Context Module
在这里插入图片描述

横坐标 $w_2$ ，纵坐标 $\vec{y_n}$ ，上面一行是多尺度输入的结果，下面是各个单尺度输入的结果

但尺最后一个 bin 学的不太准，92.73 和 55.49，应该是 80，作者给出的解释为

we found that there are only 9 samples in the range [70, 80], and it is easy to explain why the last element is abnormal.

5.3 VS SOTA

1）Comparison with State-of-the-arts on Morph-II
在这里插入图片描述
2）Comparison with State-of-the-arts on FGNET

6 Conclusion（own） / Future work

摘抄一些优秀的论文解读：

两点表示法结合 KL 监督学分布
SSR-Net：A Compact Soft Stagewise Regression Network for Age Estimation
Ordinal Regression
有序回归： Ordinal Regression的理解

逻辑回归
原始的逻辑回归只解决二分类问题，由二分类问题进而也可以扩展到多分类问题。参考李航的《统计学习方法》。
分类问题可用于对猫，狗，鸟，花等的分类。

有序回归
但是，当不同类别的类别之间有一定的顺序关系时，仅仅使用分类损失是不够的。
比如：我们对人的年龄进行分类：0 岁，1 岁和 2 岁。这时仅仅使用分类损失是不够的。
如果一个样本的真实年龄是0，如果用分类方法，我们把它的年龄分类成 1 岁和 2 岁时的损失是相等的。但是，明显1岁要比2岁，更加接近于 0 岁，1 岁是一个比 2 岁更可被接受的分类。因此从应用意义上，1 岁应该比 2 岁有更小的损失。
有序回归就是解决这个问题，除了考虑分类损失以外，还要考虑误分类的类别和真实类别之间的排序关系，排序更近的损失应该更小。