Conceptual Challenges for Interpretable Machine Learning

最新推荐文章于 2025-05-20 18:44:14 发布

MC仁.光谷啊姆小新新

最新推荐文章于 2025-05-20 18:44:14 发布

阅读量1.5k

点赞数

分类专栏：算法治理文章标签：机器学习人工智能

本文链接：https://blog.csdn.net/weixin_42786150/article/details/123196186

版权

随着机器学习渗透到社会各个领域，解释性成为重要需求。解释性机器学习（IML）旨在解决这一问题，但面临三个被忽视的概念挑战：目标的模糊性、错误率控制和过程与产品的区分。作者通过探讨这些挑战，指出多数IML算法缺乏明确的目标、忽视错误率测试和动态解释过程，呼吁未来工作需关注IML的概念基础。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Conceptual Challenges for Interpretable Machine Learning

David S. Watson1

'Department of Statistical Science, University College London, London, UK

Email for correspondence: david.watson@ucl.ac.uk

§0 Abstract

As machine learning has gradually entered into ever more sectors of public and private life, there has been a growing demand for algorithmic explainability. How can we make the predictions of complex statistical models more intelligible to end users? A subdiscipline of computer science known as interpretable machine learning (IML) has emerged to address this urgent question. Numerous influential methods have been proposed, from local linear approximations to rule lists and counterfactuals. In this article, I highlight three conceptual challenges that are largely overlooked by authors in this area. I argue that the vast majority of IML algorithms are plagued by (1) ambiguity with respect to their true target; (2) a disregard for error rates and severe testing; and (3) an emphasis on product over process. Each point is developed at length, drawing on relevant debates in epistemology and philosophy of science. Examples and counterexamples from IML are considered, demonstrating how failure to acknowledge these problems can result in counterintuitive and potentially misleading explanations. Without greater care for the conceptual foundations of IML, future work in this area is doomed to repeat the same mistakes.

Keywords: Artificial intelligence, explainability, causality, pragmatics, severe testing

§1 Introduction

Machine learning (ML) is ubiquitous in modern society. Complex learning algorithms are widely deployed in private industries like finance (Heaton, Polson, & Witte, 2017) and insurance (Lin et al., 2017), as well as public services such as healthcare (Topol, 2019) and education (Peters, 2018). Their prevalence is largely driven by results. ML models outperform humans not just at strategy games like chess (Silver et al., 2018) and StarCraft (Vinyals et al., 2019), but at important scientific tasks like antibiotic discovery (Stokes et al., 2020) and predicting protein structure (Jumper et al., 2021).

High-performance algorithms are often opaque, in the sense that it is difficult or impossible for humans to understand the internal logic behind individual predictions. This raises fundamental issues of trust. How can we be sure a model is right when we have no idea why it predicts the values it does? Accuracy on previous cases may suggest reliability, but epistemol-ogists are well aware that a good track record is no guarantee of future success. Just as inductive inferences can lead us astray when presumptions of uniformity fail, so models can err when deployed in new contexts. This can lead to discriminatory predictions with potentially disastrous consequences in high-stakes settings like healthcare (Obermeyer et al., 2019) and criminal justice (Angwin et al., 2016). European regulators, sensitive to these concerns, have begun introducing explainability guidelines into data protection law, although the proper interpretation of the relevant texts remains a matter of some dispute (Selbst & Powles, 2017; Wachter, Mittelstadt, & Floridi, 2017).

While interpreting models is by no means a new concern in computer science and statistics, it is only in the last few years that a formal subfield has emerged to address the issues surrounding algorithmic opacity. I shall refer to this subdiscipline as interpretable machine learning (IML), also sometimes called explainable artificial intelligence (XAI). I employ the former term because it emphasizes the subjective goal of interpretation over the (purportedly) objective goal of explanation, while simultaneously specifying the focus on ML as opposed to more generic artificial intelligence tasks. IML comprises a diverse collection of technical approaches intended to render statistical predictions more intelligible to humans.1 My focus in this article is primarily on model-agnostic post-hoc methods, which attempt to explain the outputs of some underlying target function without making any assumptions about its form. Such explanations may be global (spanning the entire feature space) or local (applying only to some subregion of the feature space). Both types are considered here.

The last few years have seen considerable advances in IML, several of which will be examined in detail below. Despite this progress, I contend that the field has yet to overcome or even properly acknowledge certain fundamental conceptual obstacles. In this article, I highlight three in particular:

(1) Ambiguous fidelity. Everyone agrees that algorithmic explanations must be faithful -but to what exactly? The target model or the data generating process? Failure to appreciate the difference has led to confusing and unproductive debates.
(2) Error rate control. The vast majority of IML methods do not even bother to quantify expected error rates. This makes it impossible to subject algorithmic explanations to severe tests, as is required of any scientific hypothesis.
(3) Process vs. Product. Current approaches overwhelmingly treat explanations as static deliverables, computed once and for all. In fact, successful explanations are more of a process than a product. They require dynamic, iterative refinements between multiple agents.

A number of other conceptual challenges surrounding IML have already garnered much attention in the literature, especially those pertaining to subtle distinctions between explanations, interpretations, and understanding (Krishnan, 2020; Paez, 2019; Zednik, 2019); the purported trade-off between model accuracy and intelligibility (Rudin, 2019; Zerilli et al.,

2019) ; as well as typologies and genealogies of algorithmic opacity (Burrell, 2016; Creel,
2020) . I have little to add to those debates here, which I believe have been well argued by numerous authors. The challenges I highlight in this article, by contrast, are woefully underexamined despite their obvious methodological import. To make my case, I shall draw upon copious literature from epistemology and philosophy of science to unpack points (1)-(3) and demonstrate their relevance for IML through a number of real and hypothetical examples. While each point raises unique issues, together they point toward a singular conclusion - that despite undeniable technical advances, the conceptual foundations of IML remain underdeveloped. Fortunately, there are glimmers of hope to be found in this burgeoning discourse. I consider exceptions to each trend that collectively suggest a promising horizon of possibility for IML research.

The remainder of this article is structured as follows. I review relevant background material in §2, framing IML as a demand for causal explanations. In §3, I distinguish between two oft-conflated notions of explanatory fidelity, revealing the apparent contradiction to be a simple confusion between complementary levels of abstraction. In §4, I draw on error-statistical considerations to argue that popular IML methods fail to meet minimal severity criteria, making it difficult to judge between competing explanations. I defend a dialogic account of explanation in §5, arguing that satisfactory solutions must include some degree of user interaction and feedback. I conclude in §6 with a review of my findings and some reflections on the role and limits of philosophy as a theoretical guide in critiquing and designing algorithmic explanations.

§2 Background

In this section, I provide necessary background on IML methods, as well as formal details on empirical risk minimization and structural causal models. Building on Woodward (2003)’s minimal theory of explanation, I frame the IML project as a certain sort of causal inquiry. This perspective elucidates the conceptual challenges that follow, as causal reasoning helps to disambiguate targets (§3), identify proper estimands for inference (§4), and ensure fruitful explanatory dialogue (§5).

§2.1 All IML is causal

Say some high-performance supervised learner f has been trained on copious volumes of biomedical data, and diagnoses Jack with rare disease y. Jack’s general practitioner, Dr. Jill, is as perplexed as he is by this unexpected diagnosis. Jack shows no outward symptoms of y and does not match the typical disease profile. Treatment for y is aggressive and potentially dangerous, so Jack wants to be certain before he proceeds. When Jack and Dr. Jill try to find out why /made this prediction, they receive a curt reply from the software company that licenses the technology, informing them that they should accept the diagnosis because / is very accurate. Most commentators would agree that this answer is unsatisfactory. But how exactly should we improve upon it? What is the proper form of explanation in this case?

I shall argue that what Jack and Dr. Jill seek is a causal account of why /made the particular prediction it did. Following the interventionist tradition, I regard an explanation as causal insomuch as it identifies a set of variables which, when set to some values, are sufficient to bring about the outcome in question, and, when set to alternative values, are sufficient to alter the outcome in some prespecified way. Woodward (2003, p. 203) formalizes these criteria, stating that model M provides a causal explanation for outcome Y if and only if:

(i) The generalizations described by M are accurate, or at least approximately so, as are the observations Y = y and X = x.
(ii) According to M, Y = y under an intervention that sets X = x.
(iii) There exists some possible intervention that sets X = x' (where x ^ %'), with M correctly describing the value Y = y' (where y ^ y') that Y would assume under the intervention.

The full details of Woodward’s program are beyond the scope of this article.2 However, his minimal account of explanation is a valuable starting point for analysis. In Jack’s case, we may satisfy these criteria empirically by finding some other patient who is medically similar to Jack but receives a different diagnosis. Alternatively, we could query the model / directly using synthetic data in which we perturb Jack’s input features until we achieve the desired outcome. If, for instance, we devise an input vector x' identical to Jack’s input x except along one dimension - say, decreased heartrate - and the model does not diagnose this hypothetical datapoint with rare disease y, then we may justifiably conclude that heartrate is causally responsible for the original prediction. This kind of explanation constitutes at least one viable explanans for the target explanandum.

Current IML approaches can be roughly grouped into three classes: feature attribution methods, case-based explanations, and rule lists. The latter category poses considerable computational challenges for large datasets, which may explain why the first two are generally more popular. Local linear approximators, a kind of feature attribution technique, are the most widely used approach in IML (Bhatt et al., 2020). Notable instances include local interpretable model-agnostic explanations, aka LIME (Ribeiro, Singh, & Guestrin, 2016); and Shapley additive explanations, aka SHAP (Lundberg & Lee, 2017). Specifics vary, but the goal with these methods is essentially the same - to compute the linear combination of inputs that best explains the decision boundary or regression surface near a point of interest (see Fig. 1). Counterfactual explanations (Wachter, Mittelstadt, & Russell, 2018), which account for predictions via synthetic matching techniques like those described above, are another common approach. Variants of LIME, SHAP, and counterfactual explanations have recently been implemented in open-source algorithmic explainability toolkits distributed by major tech firms such as Google,3 Microsoft,4 and IBM.5 When I speak of “popular IML methods”, I have these algorithms in mind.

Figure 1. A nonlinear functionf(x) (blue curve) is approximated by a linear function L(x) (green curve) at the point x = a. Since L is simpler thanf, it may help users better understand the model’s predictive behavior near the input. Computing such tangents is the basic idea behind local linear approximators like LIME and SHAP.

No matter one’s methodological approach, the central aim of IML is always, more or less explicitly, to answer questions of the form:

Q. Why did model f predict outcome yt as opposed to alternative y[ ^ yt for input vector X[ ?

A global explanation answers Q for each i e [n], while local explanations limit themselves to individual samples. At either resolution, successful answers must satisfy Woodward’s three criteria. Those that fail to do so are unfaithful to their target (i), or else do not provide necessary (iii) or sufficient (ii) conditions for the explanandum.6 This is perhaps most obviously true in the case of rule lists (see, e.g., Ribeiro et al., 2018), which specify sufficient conditions (i.e., causal rules) for certain sorts of model predictions. An explanatory rule list for Jack’s diagnosis may say something like, “If heartrate is decreased, then predict y'” The causal connection is similarly straightforward for feature attribution methods, which attempt to quantify the predictive impact of particular variables. In Jack’s case, it may be that heartrate receives the largest variable importance score because it has the greatest causal effect on model outcomes. Interestingly, the creators of the counterfactual explanation algorithm explicitly motivate their work with reference to Lewis’s theory of causation (1973). According to this view, we causally explain Jack’s prediction by appealing to the nearest possible world in which he receives a different diagnosis. Though there are important differences between this account and the interventionist theory I endorse here, the citation only serves to underscore the reliance of IML on causal frameworks - as well as the ambiguity this reliance can engender.

If the causal foundations of IML are not always clear, perhaps this is because most authors in this area are steeped in a tradition of statistics and computer science that has historically prioritized prediction over explanation (Breiman, 2001; Shmueli, 2010). I will briefly formalize the distinction between supervised learning and causal modelling to pre-empt any potential confusion and ground the following discussion in established theory.

§2.2 Empirical risk minimization and structural causal models

A supervised learning algorithm is a method for predicting outcomes f e Rk based on inputs Xe Rd with minimal error.7 This requires a training dataset of input/output pairs zf = {(xi'Vi)}”=i, where each sample zt represents a draw from some unknown distribution P(Z). An algorithm is associated with a function space T, and the goal is to find the model f e T that minimizes some predetermined loss function L(f,Z), which quantifies the distance between model outputs f(X) = Y and true outcomes Y. Common examples include mean squared error for regression and cross-entropy for classification. The expected value of the loss is the risk, and empirical risk minimization (ERM) is the learning strategy whereby we select whichever model attains the minimal loss within a given function class T. ERM is provably consistent (i.e., guaranteed to converge uniformly upon the best model in T) under two key assumptions (Vapnik & Chervonenkis, 1971): (1) samples are independently and identically distributed (i.i.d.); and (2) T is of bounded complexity.8

The ERM approach provides the theoretical basis for all modern ML techniques, including support vector machines (Scholkopf & Smola, 2017), boosting (Schapire & Freund, 2012), and deep learning (Goodfellow, Bengio, & Courville, 2016).9 As noted in §1, these algorithms have proven incredibly effective at predicting outcomes for complex tasks like image classification and natural language processing. However, critics argue that ERM ignores important structural dependencies between predictors, effectively elevating correlation over causation. The problem is especially acute when variables are confounded. To cite a famous example, researchers trained a neural network to help triage pneumonia patients at Mount Sinai hospital in New York (Caruana et al., 2015). The model was an excellent predictor, easily outperforming all competitors. Upon close inspection, however, the researchers were surprised to discover that the algorithm assigned low probability of death to pneumonia patients with a history of asthma, a well-known risk factor for emergency room patients under acute pulmonary distress. The unexpected association was no simple mistake. Because asthmatics suffering from pneumonia are known to be high risk, doctors quickly send them to the intensive care unit (ICU) for monitoring. The extra attention they receive in the ICU lowers their overall probability of death. This confounding signal obscures a more complex causal picture that ERM is fundamentally incapable of capturing on its own.

Examples like this highlight the importance of interpretable explanations for high-stakes ML predictions such as those commonly found in clinical medicine (Watson et al., 2019). They also demonstrate the dangers of relying on ERM when the i.i.d. assumption fails. The external validity of a given model depends on structural facts about training and test environments (Pearl & Bareinboim, 2014), e.g. the assignment mechanism that dictates which patients are sent to the ICU. If we were to deploy the pneumonia triage algorithm in a new hospital where doctors are not already predisposed to provide extra care for asthma patients - perhaps a clinic where doctors rely exclusively on a high-performance ML model to prioritize treatment - then empirical risk may substantially underestimate the true generalization error. In light of these considerations, a number of prominent authors have advocated for an explicitly causal approach to statistical learning (Pearl, 2000; Peters, Janzing, & Scholkopf, 2017; Spirtes, Glymour, & Scheines, 2000; van der Laan & Rose, 2011). The basic strategy can be elucidated through the formalism of structural causal models (SCMs). A probabilistic SCM .M is a tuple (U, V, F, P(u)), where U is a set of exogenous variables, i.e. unobserved background conditions; V is a set of endogenous variables, i.e. observed features; F is a set of deterministic functions mapping causes to direct effects; and P(u) is a probability distribution over U. An SCM can be visually depicted as a directed graph, where nodes are variables and edges denote direct causal relationships (see Fig. 2). A fully specified M provides a map from background conditions to a joint distribution over observables, M:U ^ P(r).

With SCMs, we can express the effects not just of conditioning on variables, but of intervening on them. In graphical terms, an intervention on a variable effectively deletes all incoming edges, resulting in the submodel Mx. Interventions are formally expressed by Pearl’s (2000) do-operator. The interventional distribution P(F|do(A = 1)) may deviate considerably from the observational distribution P(F |A = 1) within a given M. For instance, if all and only men (Z = 1) take some drug (X = 1), then health outcomes Y could be the result of sex or treatment, since P(K|X = 1) = P(K\Z = 1). However, if we randomly assign treatment to patients independent of their sex, then we may get a very different value for P(y |do(X = 1)), especially if there is a confounding effect between sex and outcomes, for example if men are more likely than women to respond to treatment. Only by breaking the association between X and Z can we disentangle the relevant from the spurious effects. This is the motivating logic behind randomized control trials (RCTs), which are widely used by scientists and regulatory agencies to establish treatment efficacy.10 The do-calculus provides a provably complete set of rules for reasoning about interventions (Shpitser & Pearl, 2008), including criteria for deciding whether and how causal effects can be estimated from observational data.

Figure 2. Simple examples of causal graphs. Solid edges denote observed causal relationships, dashed edges unobserved. (a) A model with confounding between variables X and Y. (b) The same model after intervening on X, thereby eliminating all incoming causal effects.

Though the models we seek to explain with IML tools are typically ERM algorithms, the causal nature of this undertaking arguably demands an SCM approach. The mismatch between these two modelling strategies sets the stage for a number of conceptual problems. Sullivan (2020) argues that algorithmic opacity derives not from any inherent complexity in models or systems per se, but rather from the “link uncertainty” that results when there is little empirical evidence connecting the two levels. Even when such links are well-established, however, it is not always clear which level is the intended target of explanation. Causal reasoning, as formalized by SCMs, can help diagnose and resolve issues of link uncertainty by making the assumptions of any given IML tool more explicit.

§3 Ambiguous fidelity

One obvious desideratum for any IML tool is accuracy. We want explanations that are true, or at least probably approximately correct, to use Valiant’s memorable phrase (1984). This accords with the first of Woodward’s three criteria cited above. In this section, I argue that this uncontroversial goal is underspecified. Though the problem emerges for any IML approach, I will focus here on a longstanding dispute between proponents of marginal and conditional variable importance measures, two popular kinds of feature attribution methods. I show that the debate between these two camps is dissolved (rather than resolved) as soon as we recognize that each kind of measure is faithful to a different target. The question of which should be preferred for a given IML task cannot be answered without taking into account pragmatic information regarding the context, level of abstraction, and purpose of the underlying inquiry.

§3.1 Systems and models

I have argued that IML’s fundamental question Q poses a certain sort of causal problem. However, it is important to note how Q differs from more familiar problems in the natural and social sciences. Toward that end, I briefly review three well-known and interrelated challenges that complicate efforts to infer and quantify causal effects.

The problem of induction. Although commonly associated with Hume (1739, 1748) in the anglophone tradition, inductive skepticism goes back at least as far as Sextus Empiricus (Flori

最低0.47元/天解锁文章