Reasons, Values, Stakeholders: A Philosophical Framework for Explainable Artificial Intelligence

arXiv:2103.00752v1 [cs.CY] 1 Mar 2021

Reasons, Values, Stakeholders: A Philosophical Framework for Explainable Artificial Intelligence

Atoosa Kasirzadeh

atoosa.kasirzadeh@mail.utoronto.ca

University of Toronto (Toronto, Canada) & Australian National University (Canberra, Australia)

Abstract

The societal and ethical implications of the use of opaque artificial intelligence systems for consequential decisions, such as welfare allocation and criminal justice, have generated a lively debate among multiple stakeholder groups, including computer scientists, ethicists, social scientists, policy makers, and end users. However, the lack of a common language or a multi-dimensional framework to appropriately bridge the technical, epistemic, and normative aspects of this debate prevents the discussion from being as productive as it could be. Drawing on the philosophical literature on the nature and value of explanations, this paper offers a multi-faceted framework that brings more conceptual precision to the present debate by (1) identifying the types of explanations that are most pertinent to artificial intelligence predictions, (2) recognizing the relevance and importance of social and ethical values for the evaluation of these explanations, and (3) demonstrating the importance of these explanations for incorporating a diversified approach to improving the design of truthful algorithmic ecosystems. The proposed philosophical framework thus lays the groundwork for establishing a pertinent connection between the technical and ethical aspects of artificial intelligence systems.

Keywords: Ethical Algorithm, Explainable AI, Philosophy of Explanation, Ethics of Artificial Intelligence

Preprint submitted to Elsevier

  • 1. Preliminaries

Governments and private actors are seeking to leverage recent developments in artificial intelligence (AI) and machine learning (ML) algorithms for use in high-stakes decision making. These algorithms have begun to greatly impact human lives by guiding several consequential decision-making procedures such as hiring employees, assigning loans and credit scores, making medical diagnoses, and dealing with recidivism in the criminal justice system (see, for example, [1, 2, 3, 4, 5, 6]). This enterprise is motivated by the idea that high-stakes decision problems are instances of learnable algorithmic tasks, such as pattern recognition, classification, and clustering.

ML algorithms have had some success in making accurate predictions of unobserved data in some domains, but many such systems remain opaque. This means that it is difficult for humans to have sufficient reasons to understand why an algorithmic outcome or another is obtained in a given circumstance. A possible silver bullet to alleviate this opacity would be to require the algorithms to explain themselves. On this basis, several academic and industrial research programs have been put into operation to develop the so-called "explainable AI" systems: AI systems that are able to explain themselves (see, for example, [7, 8]).1 More concretely, the development of explainable AI systems is motivated by several social and ethical reasons (see, for example, [10,11,12,13,14,15,16,17,18,19]): increasing societal acceptance of predictionbased decisions, establishing trust in the results of these decisions, making algorithms accountable to the public, removing the sources of algorithmic discrimination and unfairness, legitimizing the incorporation of AI algorithms in several decision contexts, and facilitating a fruitful conversation among stakeholders on the justification of the use of these algorithms in decision making.

These considerations, in addition to their epistemic importance, point to a normative and social significance for AI explanations. An appropriate accommodation of these considerations requires finding a clear and comprehensive answer to the question of what AI explanations are, who their consumers are, and how they matter.

Computer scientists have identified no well-formed definition of AI explainability as an all-purpose, monolithic concept [20, 21]. However, they have developed several technical methods for making the AI systems explainable to some degree (for two recent surveys, see [8,22]). Nevertheless, it is necessary to ground these technical methods in a conceptually precise analysis of the nature and value of explanations if the quality of claims about the ability of intelligent agents can be reconciled and assessed in explanations that incorporate social and normative significance. A framework for AI explanations that specify why and how they matter remains to be established. One promising approach to achieving this is seeking inspiration from the social sciences and philosophy.

To achieve this, I begin by briefly reviewing the literature that describes the main efforts that have been made to identify the conceptual foundations of and requirements for explainability.

In Miller [23], the most elaborate survey to-date, primarily the causal aspects of everyday, local explanations for building explainable AI systems are examined. This survey, however, does not include any discussion of non-causal aspects of AI explanation.2 Zerilli et al. [24] argue that explanations we demand from opaque algorithmic systems must be given from an intentional stance. A popular framework for the analysis of to cognitive systems is used by Zednik [25] to map different analytic techniques (such as input heatmapping), as developed by the explainable AI researchers, to respond to the epistemic demands of six kinds of stakeholders (as suggested by Tomsett et al. [26]): developers, examiners, operators, executors, data subjects, and decision subjects.

As such, this paper extends these works in pursuit of the following goal: although epistemic demands and everyday, local explanations are necessary to build an intelligent explainable system, they are not sufficient to ensure that it is built. This is because AI systems are value-laden across several dimensions. This means that ineliminable contextual value judgments must be made about AI algorithms that necessarily inform consequential decisions. Here, I show why and how these values matter for the evaluation of the acceptability and quality of AI explanations. Because inherently complex and complicated AI ecosystems are connected with various stakeholders, institutions, values, and norms, the simplicity and locality of an explanation should not be the only virtue sought in the design of explainable AI. Hence, the goal here is to offer a sufficiently rich framework for explainable AI that can be abridged and adjusted in relation to the risks and consequences attributed to different prediction contexts.

In other words, merely a philosophical framework (such as that of Zednik's [25]) for the correspondence between analytical methods of explainable AI and the epistemic demands of certain stakeholders cannot provide sufficient resources for the critical engagement of those who receive explanations from AI systems. More specifically, such a framework has two limitations. First, I argue that, in addition to epistemic demands, the social and normative values matter to developing a philosophical framework for explainable AI systems. Thus, I argue for a broadened scope of the explanations of interest for the design of an explainable system. Second, a simple mapping between analytical methods for explainable AI and the six stakeholder types excludes many significant stakeholders (including ethicists, policy-makers, and social and political scientists) who also assess AI explanations in relation to normative and social significance.

To overcome these limitations, I offer a philosophical framework for building explainable AI systems that has sufficient resources to enable explicit articulation of why and how social, ethical, and epistemic values matter in terms of asserting the relevance of a range of explanation types and stakeholders in explaining AI outputs.

This paper broadens the scope and relevance of current frameworks for AI explanations to accommodate the significance of explainable AI and its role in ensuring democratic and ethical algorithmic decisions. In such a framework, the usefulness of AI explanations must be evaluated in relation to a variety of epistemic, normative, and social demands. These fall along the three axes Reasons, Values, and Stakeholders, in what I call the RVS framework. It includes different levels of explanation. Each level makes sense of algorithmic outcomes in relation to different sets of reasons, values, and stakeholders. The novelty here is that the levels of explanation do not simply map onto the analytic methods as explanations or to the six stakeholder categories. Explanations go beyond analytic methods and relate to practical, ethical, and political reasoning and values which are required if explanations are to be made sense of by broader sets of stakeholders. This framework enables diversification and the inclusion of different epistemic and practical standpoints in AI ecosystems.

The remainder of this paper is structured as follows. In Section 2, I argue that, in addition to data-analytical explanations, design explanations that incorporate representational, mathematical, and optimality information are pertinent to the understanding of the predictions of AI systems, and therefore should be included in the design of explainable AI systems. In Section 3, I propose a philosophically-informed schema in which a variety of design and data-analytical information can find their place in explanations of AI outcomes. In Section 4, I present our philosophical framework for explainable AI by incorporating ethical and social values into the evaluation of the quality and relevance of AI explanations. The paper is concluded in Section 5 with some directions for future work.

  • 2. Kinds of explanation

Let us consider an algorithm that assesses job candidates to recommend one as a future employee for company X. Nora, a competent candidate, applies for the job. Her application is rejected. Nora wants to know why she is rejected (Figure 1). She does not seek a just-so story that makes sense by organizing events into an intelligible whole. She wants the explanation to have a factual foundation and to be empirically or mathematically true.

Figure 1: Algorithmic recommendation and the why-question.

This decision-making problem is exemplary of the social problems that might be resolved with a classification algorithm. Figure 1 can be expanded to Figure 2 to show a more fine-grained characterization of the decision process in terms of five main stages.

To make the matters more concrete, let us consider a deep supervised learning algorithm that makes hiring predictions (Figure 3).3 That is, given a training data set composed of several labeled feature vectors, the algorithm learns to map applicants' CV features to their expected job performance. Given Nora's CV, the learned model predicts whether she will perform well if given the job. What explains the prediction that leads to Nora's rejection?

why?

Figure 3: Prediction-based hiring decision.

For now, we are not considering ourselves to be bound by the epistemic needs of particular stakeholders. Rather, we take a broad and maximalist perspective, and we seek a multi-dimensional explanation for this algorithmic prediction. I tackle the attribution of stakeholders and the relevance of each kind of explanation for different normative and social considerations in the next section.

One natural way to cash out the explanation for why Nora is rejected is to decompose the explanatory information in terms of each of the five ML-relevant stages sketched in Figure 3. Collectively, the information in steps (1) -(5) produces the determination of the prediction, and therefore explains how a particular algorithmic recommendation is obtained for Nora.

Consider some potential responses to the five clusters of questions specified by (1) - (5) in Figure 3. (1) What role does the size and inclusiveness of the training data play in recommending for Nora's rejection? What would have happened if the training data were different, incorporated different criteria for predicting applicant success, or were pre-processed with different standards? (2) How does the particular learning process of the system determine the prediction outcome? What would the algorithm's outcome be if the learning process had been different, if for example the algorithm had learned in an unsupervised fashion? (3) How does the algorithm's specific thinking style (i.e., way of reasoning) in relation to its particular constituents (loss function or hyper parameters, for example) impact what it predicts? (4) Why and how does this representation of Nora, in terms of quantified and measured input features, produce this recommendation? After all, it would seem perfectly reasonable to imagine that a different choice for the relevant input features would give a different prediction outcome. (5) Which of Nora's features cause, correlate with, or counterfactually depend upon the prediction for her rejection?4

The majority of work on the design of explainable AI systems hitherto has focused on methods that emphasize responses to explanatory questions (3) or (5), mainly by producing feature-based token explanations (i.e., explanations for particular cases, such as Nora's rejection in response to particular input features). There has also been some related work that designs explainability methods to approximate the logic of opaque reasoning in a deep neural network, such as by extracting a decision tree, or by visualizing the network's hidden layer activity. There approaches relate to a class of design-based AI explanations, that approximate the design of the system, as I discuss in Section

  • 2.2.

These methods explain algorithmic prediction in terms of dependency patterns, namely, causal, counterfactual, correlational, and contrastive ones, relating salient input features and the algorithmic outcome. Let us review these approaches briefly before turning to the role of design explanations for algorithmic predictions.

  • 2.1. Feature-based token explanations

Attempts at making AI explainable in terms of technical methods to extract dependency relations between the input and target features fall roughly into three categories (i) - (iii).

  • (i) Correlational explanations provide information on associations between certain specifically important input features and output predictions. These associations can be captured by using methods such as feature importance, a saliency map, a partial dependence plot, individual conditional expectation, or rule extraction to indicate how important the association of a given input feature (e.g., Nora's longevity at her workplace or the temperature) is to an output prediction (e.g., of Nora's duration of stay at the given job or the number of bikers in the streets) [28, 29, 30, 31, 5, 32, 33, 34, 35, 36].

  • (ii) Causal and counterfactual explanations reveal causal dependencies among input features and predicted outcomes. Causal explanations can be obtained with methods such as causal modeling or approximate decision trees [37, 38, 39,40]. For instance, a causal explanation for why Nora was recommended for rejection might be that some of her input features, such as her ethnicity, led to a lower score for her predicted performance.

  • (iii) Example-based e

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值