论文笔记：Understanding Membership Inferences on Well-Generalized Learning Models

最新推荐文章于 2025-04-27 20:09:54 发布

ChangeOrange

最新推荐文章于 2025-04-27 20:09:54 发布

阅读量583

点赞数

分类专栏：论文笔记文章标签：机器学习安全

本文链接：https://blog.csdn.net/ChangeOrange/article/details/112315272

版权

本文探讨了即使模型泛化良好，会员推断攻击（MIA）仍可能成功的问题，指出过度拟合不是信息泄露的根本原因，而是一些训练实例对学习模型的独特影响。提出了通用MIA（GMIA）方法，通过检测和分析易受攻击的目标记录来推断其成员身份。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Introduction

背景论文：Membership inference attacks against machine learning models
1.Membership inference attack (MIA) can succeed on overfitted models with only black-box access to the model
2.Attack model can be constructed using labeled datasets generated by a set of shadow models
3. MIA is effective when the target models are overfitted to training data

问题：whether it is feasible to perform MIA on well-generalized models with only black-box access

关于ML privacy risk，作者的思考方向与答案：

(1) Is overfitting a root cause of membership disclosure from a machine learning model?
Overfitting can be sufficient but is by no means necessary for exposing membership information from training data.
(2) Is generalization the right solution for membership disclosure?
Existing regularization approaches are insufficient to defeat MIA（这一点与背景论文中的结论不同）
(3) What is the fundamental cause of membership disclosure?
Such information leaks are caused by the unique influences a specific instance in the training set can have on the learning model

Overfitting is essentially a special case of such unique influences but the generic situation is much more complicated.
In detection of overfitting：look for an instance’s positive impact on a model’s accuracy for the training set and limited/negative impact for the testing set.
In detection of the unique influences in general：consider the case when an instance both contributes useful information to the model for predicting other instances and brings in noise uniquely characterizing itself.

The model generalization methods that suppress overfitting may reduce the noise introduced by training instances, but cannot completely remove their unique influences, particularly the influences essential for the model’s prediction power.

GMIA–Generalized MIA

overfitted models: answers (probabilities) to the queries on the training instances differ significantly from those to other queries
well-generalized model :behaves similarly on the training data and test data, therefore Cannot utilize shadow models to generate a meaningful training set for the attack model

GMIA: detecting and analyzing vulnerable target records (outliers) to infer their membership

extracting high-level feature vector from the intermediate outputs of models trained on the data(accessible to the adversary) to estimate whether a given instance is an outlier
假设前提：an outlier is more likely to be a vulnerable target record when it