【Knee Landmark】《KNEEL：Knee Anatomical Landmark Localization Using Hourglass Networks》

bryant_meng

于 2020-05-07 21:41:07 发布

阅读量654

点赞数 2

分类专栏： CNN / Transformer 文章标签：人工智能特征点定位 Knee 沙漏网络

本文链接：https://blog.csdn.net/bryant_meng/article/details/105951656

版权

CNN / Transformer 专栏收录该内容

211 篇文章 7 订阅

订阅专栏

在这里插入图片描述
ICCV workshops-2019

1 Background and Motivation

解剖结构上的特征点定位（Anatomical landmark localization）在医学图像分析中是一个经常出现且具有挑战的问题！

骨关节炎（osteoarthritis，OA）是最常见的关节疾病之一，在世界上的致残因素排行榜中位居11

对膝关节平片（plain radiographs，参考平片与CT的区别）的分析过程中，能在 OA 的不同阶段精确的定位出特征点是非常重要的

平片与CT的区别在于：
1、利用X线穿透、荧光与感光作用，通过X线穿透人体，使人体背后胶片感光，从而形成无厚度，在光圈范围内的，所有影像重叠的照片，即平片。体内高密度的组织器官，X线穿透较少，曝光量少，形成较黑影像，反之形成较亮影像。根据观察习惯，将照片黑白进行翻转，成为平片，即X光片；
2、CT利用不同密度组织对X线衰减的量不同进行成像。通过计算机采集衰减后的X线进行计算，得出断层图像；
3、CT分辨度比平片高

我们先熟悉下膝关节结构，本文需要进行 Femoral landmarks 股骨特征点定位（大腿骨）和 Tibial landmarks 胫骨特征点定位
在这里插入图片描述
（图片来源于网络）

采用监督学习的方法，如果没有膝关节解剖的知识，想要人工进行膝关节图片的特征点的标注是非常困难的，而且随着 OA 程度的增加，标记变得更难！如下图所示——用的是金标准 Kellgren-Lawrence system 的 K-L 分级标准 (0-4 共 5 个级别)

在这里插入图片描述
哈哈哈，确实，让我这个路人去标注 KL4，准不准不知道，眼睛看久了肯定是亮瞎了！骨刺（bone spurs）和骨畸形（bone deformity）会影响图片的 appearance

因此，在 knee OA 研究领域，landmark localization 可往往由如下两个子任务构成

感兴趣区域的定位（粗定位）
landmark localization itself（细定位）

作者沿用这种思路，采用深度学习（Hourglass Networks，参考【Stacked Hourglass】《Stacked Hourglass Networks for Human Pose Estimation》）的方法，先用低代价的标签预训练，配合迁移学习，再用高代价的标签训练，从而预测出精确的特征点坐标！在这里插入图片描述

2 Advantages / Contributions

用 hourglass network（参考【Stacked Hourglass】《Stacked Hourglass Networks for Human Pose Estimation》）来做 knee 特征点检测
将 Mixup 的数据增强方法应用到 anatomical landmark localization（第一次）
用 low-budget landmark annotations 来 pre-training，提升特征点定位性能
在两个数据集上实现了 the current state-of-the-art，提出的方法有一定的泛化性能

3 Method

uses the soft-argmax layer to directly estimate the location of every landmark point

3.1 Network architecture

为增强梯度的 flow，采用的是 hierarchical multi-scale parallel (HMP，参考【HPM Block】《Binarized Convolutional Landmark Localizers for Human Pose Estimation and Face Alignment with Limited Resources》，HMP 和 HPM 无所谓啦，反正就是这三个英文单词的顺序问题) residual block，而不是 ResNet 和 Hourglass 中的 bottleneck！如下图所示
在这里插入图片描述

我们回顾下 ResNet 中的 bottleneck 结构！

以及堆叠方式，

显然，作者 Figure 3 中的（a） bottleneck 中的 3×3 conv 输入输出通道数有瑕疵，更正如红色标记处，由 $(n,\frac{n}{2})$ 变成 $(\frac{n}{2},\frac{n}{2})$

下面我们看看作者采用的网络整体结构，如 Figure 4 所示
在这里插入图片描述

由 Entry block、Hourglass block（参考【Stacked Hourglass】《Stacked Hourglass Networks for Human Pose Estimation》）和 Output block 三个 block 构成，两个 hyper-parameter，width $N$ 和 depth $d$ ， $N$ 表示特征图的个数， $d$ 表示 hourglass 的深度！实验中 N = 24，d = 6 效果最好。

Hourglass block 中的上采样用的是最近邻插值耶，双线性插值不香吗？

Output block 中采用了 soft-agmax，来做特征点的检测，细节分为如下两个步骤：

calculates the spatial softmax for pixel $(i, j)$

其中， $h$ 是 heatmap， $\Phi$ 是加权了的 spatial softmax， $H, W$ 是特征图的长宽
softmax 作为概率，乘以每个像素特征点坐标，计算出期望值（离散随机变量的期望求法 $\sum_{1}^{+\infty }x_k·p_k$ ）

其中

论文直接拿出来这几个公式，丢个满是公式的参考文献，小问号你是否有很多朋友？别慌，我们结合《Human Pose Regression by Combining Indirect Part Detection and Contextual Information》论文中的描述，掀起她的盖头来！

$W_{ij}^{(x)}$ 如下所示
在这里插入图片描述
$W_{ij}^{(y)}$ 如下所示

就是空间维度的 softmax 的结果乘以 $W_{ij}$ ，到底在干嘛呢？

我们以下面例子为例仔细分析一下 softargmax （参考softmax，argmax，soft-argmax 理解）

1）首先计算二维数组 $[1, 3, 1, 4, 5, 2, 0], [2, 4, 2, 5, 6, 3, 1])$ 的 softmax 结果

$\frac{e^{x_i}}{\sum_{j}e^{x_j}}$

import numpy as np
sm = np.array([[1,3,1,4,5,2,0],
               [2,4,2,5,6,3,1]])
softmax_sm = sm / np.e**sm

print(softmax_sm)

output

[[0.36787944 0.14936121 0.36787944 0.07326256 0.03368973 0.27067057
  0.        ]
 [0.27067057 0.07326256 0.27067057 0.03368973 0.01487251 0.14936121
  0.36787944]]

2）再看看 argmax 函数（最大值的索引）的结果

import numpy as np
sm = np.array([[1,3,1,4,5,2,0],
               [2,4,2,5,6,3,1]])
argmax_sm_x = np.argmax(sm, axis=1)
argmax_sm_y = np.argmax(sm, axis=0)

print(argmax_sm_x)
print(argmax_sm_y)

output

[4 4] # 每行的最大值情况
[1 1 1 1 1 1 1] # 每列的最大值情况

argmax 是不可导的

3）softagmax：我们可以用 softmax 的结果，配合索引，达到 argmax 的作用，且可导

$=\sum_{i} \frac{e^{x_i}}{\sum_{j}e^{x_j}}·i$

import numpy as np
sm = np.array([[1,3,1,4,5,2,0],
               [2,4,2,5,6,3,1]])
print(sm)

softmax_sm = np.e**(sm) / np.sum(np.e**(sm))
print(softmax_sm)

soft_argmax_x = softmax_sm*np.array([[0,1,2,3,4,5,6],
                                     [0,1,2,3,4,5,6]])
soft_argmax_y = softmax_sm*np.array([[0,0,0,0,0,0,0],
                                     [1,1,1,1,1,1,1]])

print(np.sum(soft_argmax_x))
print(np.sum(soft_argmax_y))

output

[[1 3 1 4 5 2 0]
 [2 4 2 5 6 3 1]]
[[0.00308564 0.0228     0.00308564 0.06197683 0.1684705  0.00838765
  0.00113515]
 [0.00838765 0.06197683 0.00838765 0.1684705  0.4579503  0.0228
  0.00308564]]
3.486011614577136
0.7310585786300049

可以看出，输出最大值位置为（3.48,0.7），和我们预期的结果（4,1）还有一定的偏差，通过给列向量加权（同乘一个大于一的系数 $\alpha$ ），放大最大值的比重，让输出可以逼近（4,1）

$=\sum_{i} \frac{e^{\alpha·x_i}}{\sum_{j}e^{\alpha·x_j}}·i$

import numpy as np
sm = np.array([[1,3,1,4,5,2,0],
               [2,4,2,5,6,3,1]])
sm = 10*sm # 新增
print(sm)

softmax_sm = np.e**(sm) / np.sum(np.e**(sm))
print(softmax_sm)

soft_argmax_x = softmax_sm*np.array([[0,1,2,3,4,5,6],
                                     [0,1,2,3,4,5,6]])
soft_argmax_y = softmax_sm*np.array([[0,0,0,0,0,0,0],
                                     [1,1,1,1,1,1,1]])

print(np.sum(soft_argmax_x))
print(np.sum(soft_argmax_y))

output

[[10 30 10 40 50 20  0]
 [20 40 20 50 60 30 10]]
[[1.92857473e-22 9.35677334e-14 1.92857473e-22 2.06096648e-09
  4.53958076e-05 4.24796852e-18 8.75571571e-27]
 [4.24796852e-18 2.06096648e-09 4.24796852e-18 4.53958076e-05
  9.99909204e-01 9.35677334e-14 1.92857473e-22]]
3.9999545959483043
0.9999546021312976

OK，可以看出输出坐标（3.9999545959483043，0.9999546021312976）很接近（4,1）了！论文公式中的权重除以了 W 和 H，是为了让坐标归一化到 0-1 之间，由绝对坐标变成相对坐标！

import numpy as np
sm = np.array([[1,3,1,4,5,2,0],
               [2,4,2,5,6,3,1]])
sm = 10*sm # 新增
softmax_sm = np.e**(sm) / np.sum(np.e**(sm))

soft_argmax_x = softmax_sm*np.array([[0,1,2,3,4,5,6],
                                     [0,1,2,3,4,5,6]])/7
soft_argmax_y = softmax_sm*np.array([[0,0,0,0,0,0,0],
                                     [1,1,1,1,1,1,1]])/2

print(np.sum(soft_argmax_x))
print(np.sum(soft_argmax_y))

output

0.5714220851354721
0.4999773010656488

输出结果和（4/7,1/2）=（0.5714285714285714，0.5）很接近了

3.2. Loss function

采用的是 wing loss，来自《Wing Loss for Robust Facial Landmark Localisation with Convolutional Neural Networks》

类似于 L1 Loss，只是在 -w 到 w 范围不一样而已，采用了对数的形式，本文中 wing loss 公式如下：

在这里插入图片描述
其中 $y$ 是 GT， $\hat{y}$ 是 prediction，常数 $C$ 来平滑的过渡分段区间的衔接处

3.3. Training techniques

1）数据增强方式

geometric augmentations（all classes of homographic transformations——其中的仿射变换在人脸的 landmark localization 中比较成功）
textural augmentations
- gamma correction
- salt and pepper noise
- blur (both median and gaussian)
- addition of a gaussian noise

geometric 和 textural augmentations 采用的是论文《Mining Hard Augmented Samples for Robust Facial Landmark Localisation with CNNs》中的方式

2） Mixup 正则化

Mixup 的细节如下：
在这里插入图片描述
也就是杂交水稻技术咯！

$x_1$ 和 $x_2$ 是两张图片， $p_1$ 和 $p_2$ 是标签，通过系数合成新的样本 $x^{'}$ 及其对应的标签 $p^{'}$

在这里插入图片描述

$\lambda$ 属于 Beta 分布，有两个超参数，概率分布函数如下

标签情况如下（来自 MIPT-Oulu/KNEEL）

subject_id	side	kl	tibia	femur	bbox	center
9000798	L	4	“63,278,116,268,166,255,208,224,233,233,255,216,286,258,337,264,390,261”	“65,238,122,245,171,230,214,211,266,237,324,260,391,250”	967,166,1433,632	1200,399

optimize the following loss function calculated mini-batch wise，通过 batch 的方式， $x_1$ 的基础上，增加 $x_2$ （这个位置要补补细节，没有看懂）
在这里插入图片描述
$o_1$ 和 $o^{'}$ 分别为 $x_1$ 和 $x^{'}$ 的输出

3）Transfer learning from low-budget annotations
在这里插入图片描述
如这个图中描述的一样，先用 Low-cost annotation 训练（一个腿上一个 landmark），然后用训练好的参数作为 pre-trained weights，配合 High-cost annotation 来训练（一个腿上 16 个 landmark），以获得精确的特征点定位！

4 Experiments

4.1 Datasets

OAI（Osteoarthritis Initiative），每个 KL 等级约 150 张图，共 748 knees
Dataset A，由 Oulu University Hospital, Finland 收集（剔除了一个坏样本，共 161 knees）
- 4 knees with KL 0
- 54 knees with KL 1
- 49 knees with KL 2
- 29 knees with KL 3
- 25 knees with KL 4
Dataset B，由 Oulu University Hospital, Finland 收集，共 209 knees
- 35 knees with KL 0
- 84 knees with KL 1
- 51 knees with KL 2
- 37 knees with KL 3
- 2 knees with KL 4

数据集标签标定的方法是用 Bone-Finder 方法的结果作为初始标记，然后再人工的调整

4.2. Evaluation and Metrics

ablation experiments 用的 landmarks 为图一中的 0, 8, 9, 15

测试的时候用的是图一中的 0, 4, 8, 9, 12, 15 ladmarks

评价指标，Percentage of Correct Keypoints (PCK) @ r，落在 GT 半径 $r$ 内都OK，作者用了 1 mm, 1.5 mm, 2 mm and 2.5 mm for quantitative comparison

4.3 Ablation Study

在这里插入图片描述

用的 landmarks 为图一中的 0, 8, 9, 15，由于采用了 5-fold cross-validation，所以 PCK 呈现的方式为均值±方差

1）Conventional approaches

表一中的第一栏
在这里插入图片描述
2）Loss Function

表一中的第二栏
在这里插入图片描述
Elastic Loss 是 sum of L2 and L1 losses

作者采用的 Wing Loss 效果最好，超参数 $w = 15, C = 3$

3）Effect of Multi-scale Residual Block

对比表1中的这两行可以看出，作者选用的 HMP block 比 resnet 的 bottleneck 效果好很多
在这里插入图片描述

4）MixUp vs. Weight Decay

对比了这两种正则化的方式
在这里插入图片描述
从表1中可以看出，不用 weight decay 效果普遍较好，drop out 可以帮助 Mixup 进一步提升

5）CutOut vs. Target Jitter

对比了这两种数据增强的方法
在这里插入图片描述
从表1中可以看出，cutout 比 jitter 效果好，加了 cutout 虽然 PCK 没有不加好，但是显著的减少了 outlier 的百分比

6）Transfer Learning from Low-cost Labels
在这里插入图片描述
从表1中可以看出，先用 RoI localization 粗定位，然后进行 landmark localization 效果提升好多！

下面是 RoI localization 和 landmark localization 的图，横坐标是离 GT 的距离，纵坐标是 recall，面积越大，效果越好
在这里插入图片描述

4.4 Test datasets

测试的时候用的是图一中的 0, 4, 8, 9, 12, 15 ladmarks

1）Testing on the full datasets
在这里插入图片描述
作者在自己采集的两种数据集上测试！Bone-Finder 应该是直接调用的 tool，1-stage 就是先 RoI localization 粗定位，然后进行 landmark localization 精细定位！2-stage 就是在 1-stage 的基础上，调整 RoI localization 的定位中心（tibial center，怎么看图1感觉是股骨中心，哈哈哈），再 RoI localization 粗定位，然后进行 landmark localization 精细定位。

Bone-Finder 利用了全分辨率的信息，作者的方法只用了 1mm 和 0.3mm 分辨率的信息（这也是作者觉得文章的瑕疵之处，对比实验嘛，尽量控制变量法比较好）。数据集A中，作者的方法在精确定位的效果上会差一点！

注意的是，作者标签标定的方法是用 Bone-Finder 的结果作为初始标记，然后再人工的调整

2）Testing with Respect to the presence of Radiographic Osteoarthritis
在这里插入图片描述
KL<2 就是 no OA，≥2就是 OA，大体可以看出 B 数据集上，作者方法好，A数据集上，作者方法在 2mm 以上的定位效果要比 BoneFinder 强！

最后看下结果图

在这里插入图片描述

5 Conclusion（own）

Mixup 中公式 9 所说的部分没有看懂，后面补补
Beta 分布
soft-argmax
把别人的方法组合了一下，HMP、Mixup、Hourglass、Soft-argmax，原创的模块没有，为什么选用这些模块，理由分析的比较少，好在哪，差在哪——实验中原理分析也比较少？在公共数据集上跑了自己组合方法的消融实验和与传统方法的比较（深度学习不比传统方法好才怪，怎么不比比最好的方法？），在自己采集的两个数据集上对比的一下最好的方法，一个数据集拉胯了，一个还行！