读论文2：SELFEXPLAIN: A Self-Explaining Architecture for Neural Text Classifiers

BITflyer

已于 2022-11-14 19:11:27 修改

阅读量176

点赞数

文章标签：深度学习人工智能计算机视觉

于 2022-09-19 22:22:01 首次发布

本文链接：https://blog.csdn.net/BITflyer/article/details/126942402

版权

SELFEXPLAIN: A Self-Explaining Architecture for Neural Text Classifiers

Abstract：[上一篇：读论文1](https://blog.csdn.net/BITflyer/article/details/126938360)
SelfExplain：[下一篇：读论文3](https://blog.csdn.net/BITflyer/article/details/126963091)
Introduction
Introduction总结

Abstract：上一篇：读论文1

SelfExplain：下一篇：读论文3

Introduction

今天开始正式读内容啦，目前我知道这是一篇关于自解释的文章，用了全局和局部两种网络层来生成解释。
为了方便阅读，我删掉了大部分的引用标志，大家去原文链接读原版本吧

第一段：需要解释神经网络

Neural network models are often opaque: they provide limited insight into interpretations of model
decisions and are typically treated as “black boxes”. There has been ample evidence that such models overfit to spurious artifacts and amplify biases in data. This underscores the need
to understand model decision making.

第一段就提了一下为什么需要解释：神经网络不透明，有偏差咱也不知道，到底咋做的决定咱们人看不出来啊，心里没底，就想着得整明白为啥它这么决策，解释清楚了，咱们才能放心大胆地用。

第二段：事后解释与自解释

Prior work in interpretability for neural text classification predominantly follows two approaches:

第二段开始介绍前人方法啦：两个途径

(i) post-hoc explanation methods that explain predictions for previously trained models based on
model internals,

这个post-hoc就是事后解释的意思，事后解释在前一篇里面预习到了，就是先把模型训练好，之后再按照它的结果想办法解释一下。

后此（post hoc）是“后此，所以因此”（post hoc,ergo propter hoc）的缩写。它是从拉丁文翻译而来的，整个短语的意思是：“在此之后，因而必然由此造成。

这段是在百度知道一个大佬回答的：原文链接

(ii) inherently interpretable models whose interpretability is built-in and optimized jointly with the end task. While post-hoc methods are often the only option for already-trained models, inherently interpretable models may provide greater transparency since explanation capability is embedded directly within the model .

那么第二种呢，就是固有的可解释模型啦，它解释能力是内置的，并且随着模型一起优化。也就是之前预习到的“自解释模型”啦。
虽然，对于训练好的，一般只能用前面那个事后解释，但是后面这种效果更好，更透明，属于是里应外合（doge）。

第三段：当前自解释的问题

In natural language applications, feature attribution based on attention scores has been the predominant method for developing inherently interpretable neural classifiers. Such methods interpret model decisions locally by explaining the classifier’s decision as a function of relevance of features (words) in input samples.

在自然语言应用中，基于注意力评分的特征归因一直是开发固有的可解释神经分类器的主要方法。
这些方法通过将分类器的决策解释为输入样本中特征（单词）相关性的函数，在局部解释模型决策。

However, such interpretations were shown to be unreliable and unfaithful .

问题一：被证明了不可靠不真实

Moreover, with natural language being structured and compositional, explaining the role of higher-level compositional concepts like phrasal structures (beyond individual word-level feature attributions) remains an open challenge.

问题二：停留在单词层面，短语等文章概念的解释工作还是挑战．

Another known limitation of such feature attribution based methods is that the explanations are limited to the input feature space and often require additional methods for providing global explanations, i.e., explaining model decisions as a function of influential training data.

问题三：解释仅限于输入特征空间，并且通常需要额外的方法来提供全局解释，即，将模型决策解释为有影响力的训练数据的函数

第四段：本文方法原理优势

In this work, we propose SELFEXPLAIN—a self explaining model that incorporates both global and local interpretability layers into neural text classifiers. Compared to word-level feature attributions, we use high-level phrase-based concepts, producing a more holistic picture of a classifier’s decisions.

本文提出的模型有全局和局部解释层。
本文从短语层面解释，比上一段提到的单词层面强。

SELFEXPLAIN incorporates: (i) Locally Interpretable Layer (LIL), a layer that quantifies via activation difference, the relevance of each concept to the final label distribution of an input sample.
(ii) Globally Interpretable Layer (GIL), a layer that uses maximum inner product search (MIPS) to retrieve the most influential concepts from the training data for a given input sample.

SELFEXPLAIN包含：

局部可解释层（LIL），该层通过激活的不同，来量化每个概念与最终标签分布的相关性。
全局可解释层（GIL），该层使用最大内积搜索 （MIPS）从给定输入样本的训练数据中检索最有影响力的概念。

We show how GIL and LIL layers can be integrated into transformer-based classifiers, converting them into self-explaining architectures.
The interpretability of the classifier is enforced through regularization, and the entire model is end-to-end differentiable. To the best of our knowledge, SELFEXPLAIN is the first self-explaining neural text classification approach to provide both global and local interpretability in a single model.

将 GIL 和 LIL 层集成到基于transformer的分类器中，变成自解释的架构。
分类器的可解释性是通过正则化来实现的，整个模型是端到端可微分的end-to-end differentiable。
SELFEXPLAIN是第一种可在单个模型中提供全局和局部可解释性的自解释神经文本分类方法。

这里的“端到端可微分”不理解，目前没找到特别清晰的讲解，后面看懂了再回来更新。

Introduction总结

说了背景、两条路线、自解释的前人方法和局限。
最后提出本文的方法和优势。

关键词：GIL，LIL，正则化，transformer，端到端，相关性评估，最大内积搜索。

BITflyer

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
读论文2：SELFEXPLAIN: A Self-Explaining Architecture for Neural Text Classifiers

读论文2：SELFEXPLAIN: A Self-Explaining Architecture for Neural Text Classifiers把instruction也读完啦，下一次开始读核心内容啦
复制链接

扫一扫