机器学习攻击综述总结

最新推荐文章于 2023-03-14 14:43:14 发布

Billy1900

最新推荐文章于 2023-03-14 14:43:14 发布

阅读量2.8k

点赞数 5

本文链接：https://blog.csdn.net/billy1900/article/details/107064918

版权

Attacks on Machine Learning

文章目录

Attacks on Machine Learning

Though AI brings great power, like any technology, is not immune to attacks.There are different categories of attacks on ML models depending on the actual goals of an attacker (Espionage, Sabotage, Fraud) and the stages of machine learning pipeline (training and production), or also can be called attacks on algorithm and attacks on a model respectively. They are Evasion, Poisoning, Trojaning, Backdooring, Reprogramming, and Inference attacks. Evasion, poisoning and inference are the most widespread now.

1. The first one of attacks categories

Stage\Goal	Espionage	Sabotage	Fraud
Training	Inference by poisoning	Poisoning, Trojian, Backdooring	Poisoning
Production	Inference attack	Advesarial Reprogramming, Evasion(False negative evasion)	Evasion (False positive evasion)

1.1 Espionage

Objective: The objective is to glean insights about the system and utilize the received information for his or her own profit or plot more advanced attacks.

real-life incident: A real-life privacy incident happened when Netflix published their dataset. While the data was anonymized, hackers were able to identify the authors of particular reviews. In the world of systems and proprietary algorithms, one of the goals will be to capitalize on a system’s algorithm, the information about the structure of the system, the neural network, the type of this network, a number of layers, etc.

1.2 Sabotage

Objective: The objective is to disable functionality of an AI system.

There are some ways of sabotaging:

Flooding AI with requests, which require more computation time than an average example.
Flooding with incorrectly classified objects to increase manual work on false positives. In case this misclassification takes place, or there is a need to erode trust in this system. For example, an attacker can make the system of video recommendation recommend horror movies to comedy lovers.
Modifying a model by retraining it with wrong examples so that the model outcome will let down. It only works if the model is trained online.
Using computing power of an AI model for solving your own tasks. This attack is called adversarial reprogramming.

1.3 Fraud

Objective: The objective is to misclassify tasks.

Attackers has two ways to do it–by interacting with a system at learning stage (Poisoning-to poison data) or production stage (Evasion-to exploit vulnerabilities to misoperation). It is the most common attack on machine learning model. It refers to design an input, which seems normal for a human but is wrongly classified by ML models.

2.The second one of attacks categories

Evasion, Poisoning, Trojaning, Backdooring, Reprogramming, and Inference attacks

2.1 Evasion (Adversarial Examples)

Evasion is a most common attack on machine learning model performed during production. It refers to designing an input, which seems normal for a human but is wrongly classified by ML models. A typical example is to change some pixels in a picture before uploading, so that image recognition system fails to classify the result. In fact, this adversarial example can fool humans. See an image below.
在这里插入图片描述

Some restrictions should be taken into account before choosing the right attack method: goal, knowledge, and method restrictions.

Goal restriction (Targeted vs Non-targeted vs Universal)
- Confidence reduction — we don’t change a class but highly impact the confidence
- Misclassification — we change a class without any specific target
- Targeted misclassification — we change a class to a particular target
- Source/target misclassification — we change a particular source to a particular target
- Universal misclassification — we can change any source to particular target
Knowledge restriction (White-box, Black-box, Grey-box)
- Black-box method — an attacker can only send information to the system and obtain a simple result about a class.
- Grey-box methods — an attacker may know details about dataset or a type of neural network, its structure, the number of layers, etc.
- White-box methods — everything about the network is known including all weights and all data on which this network was trained.
Method restriction (L-0, L-1, L-2, L-infinity — norms)：pixel difference