操作系统的不确定性是指程序执行结果的不确定性_DL模型不确定性评价Can You Trust Your Model's Uncertainty?...

最新推荐文章于 2020-12-11 11:44:38 发布

weixin_39595931

最新推荐文章于 2020-12-11 11:44:38 发布

阅读量239

点赞数

文章标签：操作系统的不确定性是指程序执行结果的不确定性

常用的CNN通常只给出的分类的预测结果，没有对这个预测结果的不确定性估计，而我们希望能在知道预测结果的同时，也知道网络对预测结果到底有多大的把握。本文对目前的不确定性估计方法做了全面的评价。重要的结论是ensemble比其他方法好！

Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shiftarxiv.org

2015年Yarin Gal等人提出用Mont Carol Dropout 来估计模型的不确定性，自此这方面的工作越来越多，很多新的不确定估计的方法被提出来。谷歌最近的这篇工作在不同的任务上对目前的不确定性估计方法做了全面的评价，非常solid。下面的学习笔记把take home message列出来。

采用跟训练数据不同的测试数据来评估以下6种不确定性估计方法。

首先是基于MNIST的illustrate example：模型为LeNet，测试数据为MNIST测试集，shifted data（rotated or horizontally translated），completely OOD dataset Not-MNIST.

What we would like to see: Naturally, 我们期望模型在shifted dataset上精度下降，entropy增加。本质上来说，我们希望不确定性能在跟训练数据不同的测试数据上：模型 knows what it does not know。

What we observe: 从Fig 1(a-b)可以看出随着shift的增加，所有的方法accuracy都下降，但是accuracy的差别很小，而brier （越小越好）的差别较大。一个重要的发现是：while calibrating on the validation set leads to well-calibrated predictions on the test set, it does not guarantee calibration on shifted data，

重点关注其在图像数据集上的表现：模型为20-layer and 50-layer ResNets ，训练数据为CIFAR-10和ImageNet，测试数据为distortion 数据和SVHN。

总的来说，ensembles性能最好，dropout比temperature scaling 和last layer methods好。While the relative ordering of methods is consistent on both CIFAR-10 and ImageNet (ensembles perform best), the ordering is quite different from that on MNIST where SVI performs best. Interestingly, LL-SVI and LL-Dropout perform worse than the vanilla method on skewed datasets as well as SVHN.

Takeaways and Recommendations

随着dataset shift的增加，所有的方法给出的不确定性质量都在下降。
即使在i.i.d. test dataset有较好的calibration and accuracy，但是不能迁移到shifted dataset。
Post-hoc calibration (on i.i.d validation) with temperature scaling leads to well-calibrated uncertainty on i.i.d. test and small values of skew, but is significantly outperformed by methods that take epistemic uncertainty into account as the skew increases.
Last layer Dropout exhibits less uncertainty on skewed and OOD datasets than Dropout.
SVI is very promising on MNIST/CIFAR but it is difficult to get to work on larger datasets such as ImageNet and other architectures such as LSTMs.
The relative ordering of methods is mostly consistent (except for MNIST) across our experiments. The relative ordering of methods on MNIST is not reflective of their ordering on other datasets.
Deep ensembles seem to perform the best across most metrics and be more robust to dataset shift. We found that relatively small ensemble size (e.g. M = 5) may be sufficient

[1] Gal, Y. and Ghahramani, Z. Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In ICML, 2016

weixin_39595931

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
操作系统的不确定性是指程序执行结果的不确定性_DL模型不确定性评价Can You Trust Your Model's Uncertainty?...

常用的CNN通常只给出的分类的预测结果，没有对这个预测结果的不确定性估计，而我们希望能在知道预测结果的同时，也知道网络对预测结果到底有多大的把握。本文对目前的不确定性估计方法做了全面的评价。重要的结论是ensemble比其他方法好！Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Data...
复制链接

扫一扫

操作系统的不确定性是指程序执行结果的不确定性_DL模型不确定性评价Can You Trust Your Model&#x27;s Uncertainty?...

Takeaways and Recommendations

操作系统的不确定性是指程序执行结果的不确定性_DL模型不确定性评价Can You Trust Your Model's Uncertainty?...