深度学习模型不好解释_单一模型中的深度学习梦想准确性和可解释性

最新推荐文章于 2023-12-31 01:10:08 发布

weixin_26750481

最新推荐文章于 2023-12-31 01:10:08 发布

阅读量503

点赞数

文章标签：深度学习 tensorflow 机器学习人工智能 python

原文链接：https://medium.com/ai-in-plain-english/a-deep-learning-dream-accuracy-and-interpretability-in-a-single-model-cc381c543d72

版权

深度学习模型不好解释

I recently started a new newsletter focus on AI education. TheSequence is a no-BS( meaning no hype, no news etc) AI-focused newsletter that takes 5 minutes to read. The goal is to keep you up to date with machine learning projects, research papers and concepts. Please give it a try by subscribing below:

我最近开始了一份有关AI教育的新时事通讯。 TheSequence是无BS(意味着没有炒作，没有新闻等)，它是专注于AI的新闻通讯，需要5分钟的阅读时间。目标是让您了解机器学习项目，研究论文和概念的最新动态。请通过以下订阅尝试一下：

Machine learning is a discipline full of frictions and tradeoffs but none more important like the balance between accuracy and interpretability. In principle, highly accurate machine learning models such as deep neural networks tend to be really hard to interpret while simpler models like decision trees fall short in many sophisticated scenarios. Conventional machine learning wisdom tell us that accuracy and interpretability are opposite forces in the architecture of a model but its that always the case? Can we build models that are both highly performant and simple to understand? An interesting answer can be found in a paper published by researchers from IBM that proposes a statistical method for improving the performance of simpler machine learning models using the knowledge from more sophisticated models.

机器学习是一门充满摩擦和折衷的学科，但没有其他像准确性和可解释性之间的平衡那样重要了。原则上，诸如深度神经网络之类的高度精确的机器学习模型往往很难解释，而诸如决策树之类的简单模型在许多复杂的场景中都不够用。传统的机器学习智慧告诉我们，准确性和可解释性在模型的架构中是相反的作用，但事实总是如此吗？我们能否建立既高效又易于理解的模型？在IBM研究人员发表的一篇论文中可以找到一个有趣的答案，该论文提出了一种统计方法，该方法可以利用来自更复杂模型的知识来提高简单机器学习模型的性能。

Finding the right balance between performance and interpretability in machine learning models is far from being a trivial endeavor. Psychologically, we are more attracted towards things we can explain while the homo- economicus inside us prefers the best outcome for a given problem. Many real world data science scenarios can be solved using both simple and highly sophisticated machine learning models. In those scenarios, the advantages of simplicity and interpretability tend to outweigh the benefits of performance.

在机器学习模型的性能和可解释性之间找到合适的平衡绝非易事。从心理上讲，我们更喜欢我们可以解释的事物，对于给定的问题，我们内心的经济主义者更倾向于最佳结果。可以使用简单和高度复杂的机器学习模型来解决许多现实世界的数据科学场景。在这些情况下，简单性和可解释性的优势往往胜过性能的优势。

机器学习简单性的优势 (The Advantages of Machine Learning Simplicity)

The balance between transparency and performance can be described as the relationship between research and real world applications. Most artificial intelligence(AI) research these days is focused on uberly sophisticated disciplines such as reinforcement learning or generative models. However, when comes to practical applications the trust in simpler machine learning models tend to prevail. We see this all the time with complex scenarios in computational biology and economics being solved using simple sparse linear models or complex instrumented domains such as semi-conductor manufacturing addressed using decision trees. There are many practical advantages to simplicity in machine learning models that can’t be easily overlooked until you are confronted with a real world scenario. Here are some of my favorites:

透明度和性能之间的平衡可以描述为研究与实际应用之间的关系。如今，大多数人工智能(AI)研究都集中在复杂的学科上，例如强化学习或生成模型。但是，在实际应用中，人们倾向于使用更简单的机器学习模型。我们一直看到，使用简单的稀疏线性模型或复杂的仪器域(例如使用决策树解决的半导体制造)可以解决计算生物学和经济学中的复杂情况。简化机器学习模型具有许多实际优势，除非您面对现实情况，否则不能轻易忽略这些优势。这是我的最爱：

· Small Datasets: Companies usually have limited amounts of usable data collected for their business problems. As such, simple models are many times preferred here as they are less likely to overfit the data and in addition can provide useful insight.

· 小数据集：公司通常会针对其业务问题而收集到数量有限的可用数据。因此，这里简单的模型多次受到青睐，因为它们不太可能过拟合数据，而且可以提供有用的见解。

· Resource-Limited Environments: Simple models are also useful in settings where there are power and memory constraints.

· 资源受限的环境：简单模型在存在电源和内存限制的设置中也很有用。

· Trust: Simpler models inspired trust in domain experts which are often responsible for the results of the models.

· 信任：更简单的模型激发了对领域专家的信任，这些专家通常对模型的结果负责。

Despite the significant advantages of simplicity in machine learning models, we can’t simply neglect the benefits of top performant models. However, what if we could improve the performance of simpler machine learning models using the knowledge from more sophisticated alternatives? This is the path that IBM researchers decided to follow with a new method called ProfWeight.

尽管简单性在机器学习模型中具有显着的优势，但我们不能简单地忽略高性能模型的优势。但是，如果我们可以使用更复杂的替代方法中的知识来提高简单机器学习模型的性能，该怎么办？这是IBM研究人员决定采用称为ProfWeight的新方法遵循的方法。

体重教授 (ProfWeight)

The idea behind ProfWeight is incredibly creative to the point of resulting counter intuitive to many machine learning experts. Conceptually, ProfWeight transfers information from a pre-trained deep neural network that has a high test accuracy to a simpler interpretable model or a very shallow network of low complexity and a priori low test accuracy. In that context, ProfWeight uses a sophisticated deep learning model as a high-performing teacher which lessons can be used to teach the simple, interpretable, but generally low-performing student model.

ProfWeight背后的想法非常有创意，以至于使许多机器学习专家感到反直觉。从概念上讲，ProfWeight将信息从具有高测试准确性的预先训练的深度神经网络传输到更简单的可解释模型或具有低复杂度和先验低测试准确性的非常浅的网络。在这种情况下，ProfWeight将复杂的深度学习模型用作高效能的老师，该课程可用于教授简单，可解释但通常效果不佳的学生模型。

Image for post — Source: https://arxiv.org/abs/1807.07506

To implement the knowledge transfer between the teacher and student models, ProfWeight introduces probes which are weights in samples according to the difficulty of the network to classify them. Each probe takes its input from one of the hidden layers and processes it through a single fully connected layer with a softmax layer in the size of the network output attached to it. The probe in a specific layer serves as a classifier that only uses the prefix of the network up to that layer. Despite its complexity, ProfWeight can be summarized in four main steps:

为了实现教师模型和学生模型之间的知识转移，ProfWeight引入了探针，这些探针是根据网络的难度对样本中的权重进行分类的。每个探针都从一个隐藏层中获取其输入，并通过一个单独的完全连接的层对其进行处理，并在其上附加附有网络输出大小的softmax层。特定层中的探针用作分类器，仅使用该层之前的网络前缀。尽管非常复杂，ProfWeight仍可以分为四个主要步骤：

1) Attach and train probes on intermediate representations of a high performing neural network.

1)在高性能神经网络的中间表示上附加并训练探针。

2) Train a simple model on the original dataset.

2)在原始数据集上训练一个简单的模型。

3) Learn weights for examples in the dataset as a function of the simple model and the probes.

3)学习作为简单模型和探针函数的数据集中示例的权重。

4) Retrain the simple model on the final weighted dataset.

4)在最终加权数据集上重新训练简单模型。

The entire ProfWeight model can be seen as a pipeline of probing, obtaining confidence weights, and re-training. For computing the weights, the IBM team used different techniques such as area under the curve(AUC) or rectified linear units(ReLu).

整个ProfWeight模型可以看作是探测，获得置信度权重和重新训练的管道。为了计算权重，IBM团队使用了不同的技术，例如曲线下的面积(AUC)或校正的线性单位(ReLu)。

结果 (The Results)

IBM tested ProfWeight across different scenarios and benchmarked the results against traditional models. One of those experiments focused on measuring the quality of metal produced in a manufacturing plant. The input dataset consist of different measurements during a metal manufacturing process such as acid concentrations, electrical readings, metal deposition amounts, time of etching, time since last cleaning, glass fogging and various gas flows and pressures. The simple model used by ProfWeight was a decision tree algorithm. For the complex teacher model was, IBM used a deep neural network with an input layer and five fully connected hidden layers of size 1024 which have shown accuracy of over 90% in this specific scenario. Using different variations of ProfWeight, the accuracy of the decision tree model improved from 74% to over 87% while maintaining the same levels of interpretability.

IBM在不同的场景中测试了ProfWeight，并根据传统模型对结果进行了基准测试。这些实验之一专注于测量制造工厂生产的金属的质量。输入数据集包括金属制造过程中的不同测量值，例如酸浓度，电读数，金属沉积量，蚀刻时间，自上次清洁以来的时间，玻璃起雾以及各种气体流量和压力。 ProfWeight使用的简单模型是决策树算法。对于复杂的教师模型，IBM使用了一个深度神经网络，该网络具有一个输入层和五个大小为1024的完全连接的隐藏层，在这种特定情况下，其显示的准确性超过90％。使用ProfWeight的不同变体，决策树模型的准确性从74％提高到87％以上，同时保持了相同的可解释性。

ProfWeight is one of the most creative approaches I’ve seen that try to solve the dilemma between transparency and performance in machine learning models. The results of ProfWeight showed that it might be possible to improve the performance of simpler machine learning model using the knowledge of complex alternatives. This work could be the basics for bridging different schools of thought in machine learning such as deep learning and statistical models.

ProfWeight是我所见过的最具创意的方法之一，它试图解决机器学习模型中透明度和性能之间的难题。 ProfWeight的结果表明，利用复杂替代方法的知识，可能有可能提高简单机器学习模型的性能。这项工作可能是在机器学习(例如深度学习和统计模型)中桥接不同思想流派的基础。

翻译自: https://medium.com/ai-in-plain-english/a-deep-learning-dream-accuracy-and-interpretability-in-a-single-model-cc381c543d72

深度学习模型不好解释

weixin_26750481

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
深度学习模型不好解释_单一模型中的深度学习梦想准确性和可解释性

深度学习模型不好解释I recently started a new newsletter focus on AI education. TheSequence is a no-BS( meaning no hype, no news etc) AI-focused newsletter that takes 5 minutes to read. The goal is to keep you u...
复制链接

扫一扫