神经网络不确定性综述(Part IV)——Uncertainty estimation_Ensemble methods&Test-time augmentation

exploreandconquer

已于 2024-05-22 13:27:05 修改

阅读量768

点赞数 21

分类专栏： Uncertainty 文章标签：神经网络人工智能算法

于 2024-05-22 13:02:57 首次发布

本文链接：https://blog.csdn.net/Rad1ant_up/article/details/139117221

版权

Uncertainty 专栏收录该内容

6 篇文章 3 订阅

订阅专栏

3.3 Ensemble methods

3.3.1 Principles of ensemble methods

Ensemble的思想是协同不同模型之间的预测以得到更好的泛化结果，认为一组决策者往往比单个决策者做更好的决策。

举一个简单的例子，for an ensemble $f{:}X\to Y$ with members $f_i{:}X\to Y$ for $i\in1,2,\ldots,M,$ 可以将所有成员的预测做平均，

$f(x):=\frac1M\sum_{i=1}^Mf_i(x)$

Ensemble除了能够提供更加精准的预测，我们还可以通过评估the variety among the member’s predictions来获得model uncertainty。不同类型的ensemble方法总体思路是一致的，在方法与应用上没有太大差异。此外，最初的ensemble方法只是用来提升模型的准确性/鲁棒性的，而不是为了衡量uncertainty而设计，即使人们现在发现ensemble十分适合用来做uncertainty estimation。

3.3.2 Single‑ and multi‑mode evaluation

神经网络定义的映射是高度非线性的，因此会包含很多local optima，导致模型在训练过程中可能陷入不同的局部最优点。Deterministic neural networks在解空间中会陷入其中某一个局部最优解。类似地，BNN也会收敛到one single optimum，但是会额外将此局部最优解的uncertainty考虑在内。也就是说，该局部最优解的neighboring points会影响一个测试样本的prediction。由于上述方法关注于single regions，也被称为single-mode evaluation。与此对应，ensemble methods consist of several networks, which should converge to different local optima，被称为所谓的multi-mode evaluation。Multi-mode evaluation的目的是通过结合拥有不同优缺点的模型来提升overall performance。

3.3.3 Bringing variety into ensembles

应用ensemble methods时最关键的一点是最大限度地提高单个网络之间的多样性。为了达到这个目的，有很多相关方法被提出。

由于loss的非线性，神经网络的不同初始化通常会产生不同的训练结果；并且由于训练是在mini-batches上实现的，因此训练数据的顺序同样会影响最终的结果。

Bagging and boosting

Bagging (Bootstrap aggregating) and Boosting are two strategies that vary the distribution of the used training data sets by sampling new sets of training samples from the original set.

Bagging is sampling from the training data uniformly and with replacement.由于bagging中replacement的存在，ensemble members可以在训练集中看到一些样本很多次，同时也看不见某些样本。

For boosting, the members are trained one after another, and the probability of sampling a sample for the next training set is based on the performance of the already trained ensemble.

Data augmentation

在每个ensemble members上随机地对input data做数据增强会使得模型在不同的数据点上训练，从而得到members的多样性。

Ensemble of different network architecture

The combination of different network architectures leads to different loss landscapes and can therefore also increase the diversity in the resulting predictions.

Interestingly, it is found that bagging performs better for a small number of ensemble members while boosting performs better for a large number.

3.3.4 Ensemble methods and uncertainty quantification

详见原文

3.3.5 Making ensemble methods more efficient

与single model methods相比，ensemble methods的计算量和内存消耗显著增加(Sagi and Rokach 2018; Malinin et al . 2020)。为应用程序部署ensemble时，可用内存和计算能力通常是有限的，减少模型的参数量可以有效降低内存和计算功耗。Pruning approaches reduce the complexity of ensembles by pruning over the members and reducing the redundancy among them. 为此，研究人员开发了一些基于不同多样性度量的方法，以在不强烈影响性能的情况下删除单个member。

Distillation是另一种常见的方法，它通过教a single network to present the knowledge of a group of neural networks，使得模型数量被减少至一个single model。Ensemble distillation approaches通过单个网络捕获ensemble的行为。

The first works on ensemble distillation used the average of the softmax outputs of the ensemble members in order to teach a student network the derived predictive uncertainty (Hinton et al 2015).

其它关键词：sub-ensembles, batch-ensembles.

3.3.6 Sum up ensemble methods

Ensemble methods非常容易应用，通常不需要对standard deterministic model做重大修改。此外，由于ensemble members彼此独立，使训练容易并行化，训练好的ensembles也很容易被扩展。ensemble方法一个主要挑战是在members之间引入多样性，并且所需的内存以及计算量会随着members数量的增加而增加。

Since the overconfidence from single members can be transferred to the whole ensemble, strategies that encourage the members to deliver different false predictions instead of all delivering the same false prediction should be further investigated.

3.4 Test-time augmentation

The basic method is to create multiple test samples from each test sample by applying data augmentation techniques on it and then test all those samples to compute a predictive distribution in order to measure uncertainty. 这种方法背后的思想是，通过增强test samples来探索different views从而捕获不确定性。在做数据增强时，方法可以与训练时所有的方法完全一致。

Test-time augmentation带来的一个问题是，应该如何进行数据增强？有研究发现，由于受到如the nature of the problem at hand, the size of training data, the deep neural network architecture, and the type of augmentation等各种因素的影响，test-time augmentation有时会将本来预测正确的样本变为预测错误的样本，and vice versa。此外，有些augmentation会捕获较少的uncertainty，而另一些会捕获更多的uncertainty，对于不同任务要具体分析并选择合适的数据增强手段。