Abstract
Privacy and data protection are concerns raised about most digital technologies. The advance of artificial intelligence (AI) has given even higher levels of prominence to these concerns.
Three cases are presented as examples to highlight the way in which AI can affect or exacerbate privacy concerns. The first deals with the use of private data in authoritarian regimes. The second looks at the implications of AI use of genetic data. The third concerns problems linked to biometric surveillance. Then follows a description of how privacy concerns are currently addressed via data protection regulation and a discussion of where AI may raise new challenges to existing data protection regimes. Current European data protection law requires data protection impact assessment. This chapter suggests that a broader AI impact assessment could broaden the remit of such an assessment to offer more comprehensive coverage of possible privacy concerns linked to AI.
Keywords
Privacy · Data protection · Social credit · Data misuse · Authoritarian government · Genetic data · Biometrics · Surveillance
3.1 Introduction
Concerns about the possible negative impact of artificial intelligence (AI) on privacy have been widely expressed. Not all AI applications use personal data and therefore some uses may not have any privacy implications. However, the need for large datasets for the training and validation of machine learning models can raise a range of different concerns. Privacy is a complex concept that we return to in more detail below. Key to the discussion of privacy in AI is the worry that the use of AI technologies can lead to the violation of data protection principles, which then leads to harm for specific individuals or groups whose data is analysed using AI.
Privacy and data protection are issues that apply to most digital technologies, including AI. It is possible for most personal data to be misused for purposes that breach data protection principles or violate legitimate privacy preferences unless appropriate safeguards are in place. An important legal recognition of the “right to privacy” based on legitimate privacy preferences was the first, expressed in the nineteenth century. The stipulated “right to be let alone” was driven by a key technical innovation of the time, namely the ability to take photographs of individuals. This new technology raised concerns that had previously been immaterial when capturing the likeness of a person required them to sit down in front of a painter for extended periods.
Ever since the nineteenth century, data protection regulation and legislation have developed in tandem with new technical capabilities and resulting threats to privacy. The growing ability to process data through electronic computers led to much academic debate on the topic and the development of so-called principles of fair information practices. These were originally developed in the US in 1973. They still underpin much of our thinking on data protection today. The principles include that:
- 1. individuals should have the right to know how organizations use personal information and to inspect their records and correct any errors;
- 2. individuals should have the right to prevent secondary use of personal information if they object to such use;
- 3. organizations that collect or use personal information must take reasonable precautions to prevent misuse of the information.
These principles have contributed to the creation of legislation and shaped its content since the 1970s and 1980s. At the European level, Directive 95/46/EC established a shared approach and visible data protection principles in 1995. It was superseded by the General Data Protection Regulation (GDPR) (European Parliament and Council of the EU 2016), which came into effect in 2018.
Given that AI is not the first potential threat to privacy or data protection, it is worth asking why the impacts of AI technologies on privacy are often seen as key ethical concerns. One part of the answer is that machine learning allows the development of fine-grained categories of data which, in turn, can be used to categorise and profile individuals. Such profiling may well be the intended result of AI use, for instance when an organisation seeks to identify potential customers to target with advertising campaigns. Such profiling may also have discriminatory effects as outlined in Chap. 2. It may also have other undesirable consequences for individuals or groups and open the way to misuse, such as when consumer profiles are used for political purposes (see Chap. 5).
AI uses of personal data can furthermore facilitate surveillance far beyond the capabilities that existed prior to AI. This includes automated surveillance of individuals using their biometric data, for example employing facial recognition, as developed in more detail in the cases below. There may be good reasons for the development and employment of such surveillance, as well as morally desirable outcomes, for instance the prevention of gender-based violence. But AI-based surveillance may also have undesired outcomes. The key challenge is that data protection is a moral value that must be balanced against other moral values. This is important to keep in mind from a moral perspective, especially because data protection is strongly regulated whereas other ethical issues and possible moral advantages are typically not subject to the same level of regulation. The following cases of privacy violations that are enabled by AI demonstrate this point.
3.2 Cases of Privacy Violations Through AI
Case 1: Use of Personal Data by Authoritarian Regimes
China is one of the world’s leading nations in AI development. It embraces the use of large amounts of data that it collects on its citizens, for instance in its social credit scoring system. This system uses a large number of data points, including social media data, local government data and citizens’ activities, to calculate a trustworthiness score for every citizen. Several data platforms are used to integrate data into “a state surveillance infrastructure”. High scores lead to the allocation of benefits, such as lower utility rates and favourable booking conditions, whereas low scores can lead to the withdrawal of services. Within China, the system benefits from high levels of approval because Chinese citizens “interpret it through frames of benefit-generation and promoting honest dealings in society and the economy instead of privacy-violation.”
All states collect information about their citizens for a broad range of purposes. Some of these purposes may enjoy strong support from citizens, such as the allocation of financial support or healthcare, while others may be less popular, such as tax collection. Authoritarian governments can make additional use of data on their citizens to stabilise their power base. A case in point is China, even though research has shown that Chinese citizens interpret the system from the perspective of its benefits.
It has also been argued that China has strong data protection laws. However, these do not apply to state bodies, and government use of data for schemes such as social credit scoring are therefore not covered. This differs from the situation in Europe, where data protection law is binding on governments and state bodies as well. Social credit scoring is contentious. However, it is not always too different from activities such as “nudging” that democratic governments use, for example to encourage healthy behaviour such as giving up smoking or taking up exercise.
Both nudging and social credit scoring are contested, though one can see arguments in their favour. But the use of AI for the supervision of citizens can go far beyond these. By employing AI, authoritarian regimes may find it easier to analyse large amounts of data, such as social media posts, and to identify contributions that can trigger government responses. (原文在这里举中国为例子, 声称中国对社交媒体中的维吾尔语信息实行监管, 这是典型的双标言论, 我将它删掉了。其实论及国家层面,我觉得以"美国窃听各国政要"为例更典型。原文还给出了一篇文章的链接, 对外网评论感兴趣的读者可以点击查看)
Case 2: Genetic Privacy
Many genetic programmes are hailed for delivering medical breakthroughs via personalised medicine and the diagnosis of hereditary diseases. For instance, the Saudi Human Genome Program (SHGP), launched by the Saudi King in 2013, was announced with such aims. Research showed that “90.7% of [Saudi] participants agreed that AI could be used in the SHGP”. However, the same research showed “a low level of knowledge … regarding sharing and privacy of genetic data”, pointing to a potential mismatch of awareness of the benefits as opposed to the risks of genetic research supported by AI.
Genetic data is data that can provide deep insights into medical conditions, but also regarding possible risks and propensities for diseases that can go beyond other types of data. It thus has the properties of medical data and is therefore subject to stronger data protection regimes as part of a special category of data in many jurisdictions. Yet the importance and potential of genetic data goes beyond its medical uses. Genetic data of one person can provide information about their human heritage, their ancestors and their offspring. Access to genetic data can therefore present benefits as well as risks and entail a multitude of ethical issues. For instance, genomic datasets can improve research on cancer and rare diseases, while the reidentification of even anonymised data risks serious privacy concerns for the families involved.
With the costs of gene sequencing continuing to fall, one can reasonably expect genetic data to become part of routine healthcare within a decade. This raises questions about data governance, storage, security etc. Such genetic data requires Big Data analytics approaches typically based on some sort of AI in order to be viable and provide relevant scientific or diagnostic insights.
In addition to the use of genetic data in healthcare, there is a growing number of private providers, such as 23andMe, Ancestry and Veritas Genetics that offer gene sequencing services commercially. This raises further questions around the ownership of data and the security of these companies, and creates uncertainty about the use of data should such a company go bankrupt or be bought out.
Addressing ethical concerns can lead to unpleasant surprises, for example when a genetic analysis contradicts assumed relationships in a family, proving that someone’sancestry is not as had been supposed. In some cases this may be greeted with humour or mild embarrassment, but in others, where ancestry is crucial to the legitimacy of a social position, evidence of this kind may have manifestly negative consequences. Such consequences, it could be argued, are part of the nature of genetic data and should be dealt with via appropriate information and consent procedures. However, it is in the nature of genetic data that it pertains to more than one individual. If a sibling, for example, undertakes a genetic analysis, then many of the findings will be relevant to other family members. If such an analysis shows, for instance, that a parent is carrying a gene that contributes to a disease, other siblings’ propensities to develop this disease would likely be increased as well, even though they did not take a genetic test themselves. This example demonstrates the possible conflicts arising from possessing and sharing such information.
AI analysis of genetic data may lead to medical insights. Indeed, this is the assumption that supports the business model of private gene-sequencing organisations. Their work is built on the assumption that collecting large amounts of genetic data in addition to other data that their customers provide will allow them to identify genetic patterns that can help predict or explain diseases. This, in turn, opens the way to medical research and finding cures, potentially a highly lucrative business.
From an ethical perspective this is problematic because the beneficiaries of this data analysis will normally be the companies, whereas the individual data subjects or donors will at best be notified of the insights their data has contributed to. Another concern is that the analysis may lead to the ability to predict disease trajectories without being able to intervene or cure, thus forcing patients to face difficult decisions involving complex probabilities that most non-experts are poorly equipped to deal with.
A further concern is that of mission creep, where the original purpose of the data collection is replaced by a changing or altogether different use. One obvious example is the growing interest from law enforcement agencies in gaining access to more genetic data so that they can, for example, identify culprits through genetic fingerprinting. The main point is that data, once it exists in digital form, is difficult to contain. We could use the metaphor of grease in an internal combustion engine. Data, once in an electronically accessible format, is very difficult to remove, just like grease in an engine. It may end up in unexpected places, and attempts to delete it may prove futile. In the case of genetic data this raises problems of possible future, and currently unpredicted, use, which, due to the very personal nature of the data, may have significant consequences that one currently cannot predict.
The Saudi case is predicated on the assumption of beneficial outcomes of the sharing of genetic data, and so far there is little data to demonstrate whether and in what way ethical issues have arisen or are likely to arise. A key concern here is that due to the tendency of data to leak easily, waiting until ethical concerns have materialised before addressing them is unlikely to be good enough. At that point the genie will be out of the bottle and the “greased” data may be impossible to contain.
Case 3: Biometric Surveillance
“Nijeer Parks is the third person known to be arrested for a crime he did not commit based on a bad face recognition match.”Parks was falsely accused of stealing and trying to hit a policy officer with his car based on facial recognition software – but he was 30 miles away at the time. “Facial recognition ... [is] very good with white men, very poor on Black women and not so great on white women, even.”It becomes particularly problematic when “the police trust the facial recognition technology more than the individual”.
Biometric surveillance uses data about the human body to closely observe or follow an individual. The most prominent example of this is the use of facial features in order to track someone. In this broad sense of the term, any direct observation of a person, for example a suspected criminal, is an example of biometric surveillance. The main reason why biometric surveillance is included in the discussion of privacy concerns is that AI systems allow an enormous expansion of the scope of such surveillance. Whereas in the past one observer could only follow one individual, or maybe a few, the advent of machine learning and image recognition techniques, coupled with widespread image capture from closed-circuit television cameras, allows community surveillance. Automatic face recognition and tracking is not the only possible example of biometric surveillance, but it is the one that is probably most advanced and raises most public concern relating to privacy, as in the case described above.
There are numerous reasons why biometric surveillance is deemed to be ethically problematic. It can be done without the awareness of the data subject and thus lead to the possibility and the perception of pervasive surveillance. While some might welcome pervasive surveillance as a contribution to security and the reduction of crime, it has been strongly argued that being subject to it can lead to significant harm. Brown argues that humans need a “protective cocoon” that shields from external scrutiny. This is needed to develop a sense of “ontological security”, a condition for psychological and mental health. Following this argument, pervasive surveillance is ethically problematic, simply for the psychological damage it can do through its very existence. Surveillance can lead to self-censoring and “social cooling”, that is, a modification of social interaction caused by fear of possible sanctions. AI-enabled large-scale biometric surveillance could reasonably be expected to lead to this effect.
3.3 Data Protection and Privacy
AI is by far not the only threat to privacy, but it adds new capabilities that can either exacerbate existing threats, for example by automating mass surveillance based on biometric data, or add new angles to privacy concerns, for example by exposing new types of data, such as genetic data, to the possibility of privacy violations.
Before we look at what is already being done to address these concerns and what else could be done, it is worth providing some more conceptual clarity. The title of this chapter and the headlines covering much of the public debate on the topics raised here refer to “privacy”. As suggested at the beginning of this chapter, however, privacy is a broad term that covers more than the specific aspects of AI-enabled analysis of personal data.
A frequently cited categorisation of privacy concepts proposes that there are seven types of privacy: privacy of the person, privacy of behaviour and action, privacy of personal communication, privacy of data and image, privacy of thoughts and feelings, privacy of location and space, and privacy of association (including group privacy).
Most of these types of privacy can be linked to data, but they go far beyond simple measures of data protection. Nissenbaum suggests that privacy can be understood as contextual integrity. This means that privacy protection must be context-specific and that information gathering needs to conform to the norms of the context. She uses this position to argue against public surveillance.
It should thus be clear that privacy issues cannot be comprehensively resolved by relying on formal mechanisms of data protection governance, regulation and/or legislation. However, data protection plays a crucial role in and is a necessary condition of privacy preservation. The application of data protection principles to AI raises several questions. One relates to the balance between the protection of personal data and the openness of data for novel business processes, where it has been argued that stronger data protection rules, such as the EU’s GDPR, can lead to the weakening of market positions in the race for AI dominance. On the other hand, there are worries that current data protection regimes may not be sufficient in their coverage to deal with novel privacy threats arising from AI technologies and applications.
A core question which has long been discussed in the broader privacy debate is whether privacy is an intrinsic or an instrumental value. Intrinsic values are those values that are important in themselves and need no further justification. Instrumental values are important because they lead to something that is good. The distinction is best known in environmental philosophy, where some argue that an intact natural environment has an intrinsic value while others argue that it is solely needed for human survival or economic reasons.
However, this distinction may be simplistic, and the evaluation of a value may require attention to both intrinsic and instrumental aspects. For our purposes it is important to note that the question whether privacy is an intrinsic or instrumental value has a long tradition. The question is not widely discussed in the AI ethics discourse, but the answer to it is important in determining the extent to which AI-related privacy risks require attention. The recognition of privacy as a fundamental right, for example in the European Charter of Fundamental Rights, settles this debate to some degree and posits privacy as a fundamental right worthy of protection. However, even assuming that privacy is an unchanging human right, technology will affect how respect for privacy is shown. AI can also raise novel threats to privacy, for example by making use of emotion data that do not fit existing remedies.
Finally, like most other fundamental rights, privacy is not an absolute right. Personal privacy finds its limits when it conflicts with other basic rights or obligations, for example when the state compiles data in order to collect taxes or prevent the spread of diseases. The balancing of privacy against other rights and obligations therefore plays an important role in finding appropriate mitigations for privacy threats.
3.4 Responses to AI-Related Privacy Threats
We propose two closely related responses to AI-related privacy threats: data protection impact assessments (DPIAs) and AI impact assessments (AI-IAs).
DPIAs developed from earlier privacy impact assessments. They are predicated on the idea that it is possible to proactively identify possible issues and address them early in the development of a technology or a sociotechnical system. This idea is widespread and there are numerous types of impact assessment, such as environmental impact assessments, social impact assessments and ethics impact assessments. The choice of terminology for DPIAs indicates a recognition of the complexity of the concept of privacy and a consequently limited focus on data protection only. DPIAs are mandated in some cases under the EU’s GDPR. As a result of this legal requirement, DPIAs have been widely adopted and there are now well-established methods that data controllers can use.
Data Controllers and Data Processors
The concept of a data controller is closely linked to the GDPR, where it is defined as the organisation that determines the purposes for which and the means by which personal data is processed. The data controller has important responsibilities with regard to the data they control and is normally liable when data protection rules are violated. The data processor is the organisation that processes personal data on behalf of the data controller. This means that data controller and data processor have clearly defined tasks, which are normally subject to a contractual agreement. An example might be a company that analyses personal data for training a machine learning system. This company, because it determines the purpose and means of processing, is the controller. It may store the data on a cloud storage system. The organisation running the cloud storage could then serve as data processor.
In practice, DPIAs are typically implemented in the form of a number of questions that a data controller or data processor has to answer, in order to identify the type of data and the purpose and legal basis of the data processing, and to explore whether the mechanisms in place to protect the data are appropriate to the risk of data breaches. The risk-based approach that underlies DPIAs, or at least those undertaken in response to the GDPR, shows that data protection is not a static requirement but must be amenable to the specifics of the context of data processing. This can raise questions when AI is used for data processing, as the exact uses of machine learning models may be difficult to predict, or where possible harms would not target the individual data subject but may occur at a social level, for instance when groups of the population are stigmatised because of characteristics that are manifest in their personal data. An example might be a healthcare system that identifies a correlation between membership of an ethnic group and propensity to a particular disease. Even though this says nothing about causality, it could nevertheless lead to prejudice against members of the ethnic group.
We have therefore suggested that a broader type of impact assessment is more appropriate for AI, one that includes questions of data protection but also looks at other possible ethical issues in a more structured way. Several such AI-IAs have been developed by various institutions. The most prominent was proposed by the EU’s High-Level Expert Group in its Assessment List for Trustworthy AI, or ALTAI. Other examples are the AI Now Institute’s algorithmic impact assessment, the IEEE’s recommended practice for assessing the impact of autonomous and intelligent systems on human wellbeing and the ECP Platform’s artificial intelligence impact assessment.
What all these examples have in common is that they broaden the idea of an impact assessment for AI to address various ethical issues. They all cover data protection questions but go beyond them. This means that they may deal with questions of long-term or large-scale use of AI, such as economic impact or changes in democratic norms, that go beyond the protection of individual personal data. In fact, there are several proposals that explicitly link AI-IAs and DPIAs or that focus in particular on the data protection aspect of an AI-IA. An AI-IA should therefore not be seen as a way to replace a DPIA, but rather as supplementing and strengthening it.
3.5 Key Insights
Privacy remains a key concern in the AI ethics debate and this chapter has demonstrated several ways in which AI can cause harm, based on the violation of data protection principles. Unlike other aspects of the AI ethics debate, privacy is recognised as a human right, and data protection, as a means of supporting privacy, is extensively regulated. As a result of this high level of attention, there are well-established mechanisms, such as DPIAs, which can easily be extended to cover broader AI issues or incorporated into AI-IAs.
The link between DPIAs and AI-IAs can serve as an indication of the role of data and data protection as a foundational aspect of many other ethical issues. Not all of AI ethics can be reduced to data protection. However, many of the other issues discussed in this book have a strong link to personal data. Unfair discrimination, for example, typically requires and relies on personal data on the individuals who are discriminated against. Economic exploitation in surveillance capitalism is based on access to personal data that can be exploited for commercial purposes. Political and other types of manipulation require access to personal data to identify personal preferences and propensities to react to certain stimuli. Data protection is thus a key to many of the ethical issues of AI, and our suggested remedies are therefore likely to be relevant across a range of issues. Many of the responses to AI ethics discussed in this book will, in turn, touch on or incorporate aspects of data protection.
This does not imply, however, that dealing with privacy and data protection in AI is easy or straightforward. The responses that we suggest here, i.e. DPIAs and AI-IAs, are embedded in the European context, in which privacy is recognised as a human right and data protection has been codified in legislation. It might be challenging to address such issues in the absence of this societal and institutional support. Our Case 1 above, which describes the use of data by an authoritarian regime, is a reminder that state and government-level support for privacy and data protection cannot be taken for granted.
Another difficulty lies in the balancing of competing goods and the identification of the boundaries of what is appropriate and ethically defensible. We have mentioned the example of using AI to analyse social media to identify cases of religious speech that can be used to persecute religious minorities. (这是在Case 1里的, 原文声称中国对社交媒体中的维吾尔语信息实行监管, 这是典型的双标言论, 我将它删掉了) The same technology can be used to search social media in a different institutional context to identify terrorist activities. These two activities may be technically identical, though they are subject to different interpretations. This raises non-trivial questions about who determines what constitutes an ethically legitimate use of AI and where the boundaries of that use are, and on what grounds such distinctions are drawn. This is a reminder that AI ethics can rarely be resolved simply, but needs to be interpreted from a broader perspective that includes a systems view of the AI application and considers institutional and societal aspects when ethical issues are being evaluated and mitigated.