操作系统的不确定性是指程序执行结果的不确定性_DL模型不确定性评价Can You Trust Your Model's Uncertainty?...

常用的CNN通常只给出的分类的预测结果,没有对这个预测结果的不确定性估计,而我们希望能在知道预测结果的同时,也知道网络对预测结果到底有多大的把握。本文对目前的不确定性估计方法做了全面的评价。重要的结论是ensemble比其他方法好!

Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift​arxiv.org

2015年Yarin Gal等人提出用Mont Carol Dropout 来估计模型的不确定性,自此这方面的工作越来越多,很多新的不确定估计的方法被提出来。谷歌最近的这篇工作在不同的任务上对目前的不确定性估计方法做了全面的评价,非常solid。下面的学习笔记把take home message列出来。

采用跟训练数据不同的测试数据来评估以下6种不确定性估计方法

81be8c05ff3f2d1cb8a220bc7c160d6f.png

首先是基于MNIST的illustrate example:模型为LeNet,测试数据为MNIST测试集,shifted data(rotated or horizontally translated),completely OOD dataset Not-MNIST.

What we would like to see: Naturally, 我们期望模型在shifted dataset上精度下降,entropy增加。本质上来说,我们希望不确定性能在跟训练数据不同的测试数据上:模型 knows what it does not know

8adb626eadca0e4bb5fd4007619f05c8.png

What we observe: 从Fig 1(a-b)可以看出随着shift的增加,所有的方法accuracy都下降,但是accuracy的差别很小,而brier (越小越好)的差别较大。一个重要的发现是:while calibrating on the validation set leads to well-calibrated predictions on the test set, it does not guarantee calibration on shifted data

重点关注其在图像数据集上的表现:模型为20-layer and 50-layer ResNets ,训练数据为CIFAR-10和ImageNet,测试数据为distortion 数据和SVHN。

3e9b9cade36606288477a2ae2054de72.png

总的来说,ensembles性能最好,dropout比temperature scaling 和last layer methods好。While the relative ordering of methods is consistent on both CIFAR-10 and ImageNet (ensembles perform best), the ordering is quite different from that on MNIST where SVI performs best. Interestingly, LL-SVI and LL-Dropout perform worse than the vanilla method on skewed datasets as well as SVHN.

Takeaways and Recommendations

  • 随着dataset shift的增加,所有的方法给出的不确定性质量都在下降。
  • 即使在i.i.d. test dataset有较好的calibration and accuracy,但是不能迁移到shifted dataset。
  • Post-hoc calibration (on i.i.d validation) with temperature scaling leads to well-calibrated uncertainty on i.i.d. test and small values of skew, but is significantly outperformed by methods that take epistemic uncertainty into account as the skew increases.
  • Last layer Dropout exhibits less uncertainty on skewed and OOD datasets than Dropout.
  • SVI is very promising on MNIST/CIFAR but it is difficult to get to work on larger datasets such as ImageNet and other architectures such as LSTMs.
  • The relative ordering of methods is mostly consistent (except for MNIST) across our experiments. The relative ordering of methods on MNIST is not reflective of their ordering on other datasets.
  • Deep ensembles seem to perform the best across most metrics and be more robust to dataset shift. We found that relatively small ensemble size (e.g. M = 5) may be sufficient

[1] Gal, Y. and Ghahramani, Z. Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In ICML, 2016

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值