【论文阅读】Black-Box Dissector: Towards Erasing-based Hard-Label Model Stealing Attack(2021)

摘要

Previous studies have verified that(先前的研究已经证实) the functionality of black-box models(黑盒模型的功能) can be stolen with full probability outputs(全概率输出). However, under the more practical hard-label setting(在更实际的硬标签设置下), we observe that existing methods(现有的方法) suffer from catastrophic performance degradation(灾难性的性能下降). We argue(认为) this is due to(由于) the lack of rich information(缺乏丰富的信息) in the probability prediction(预测概率) and the overfitting caused by hard labels(硬标签引起的过拟合). To this end(为此), we propose a novel hard-label model stealing method(新的硬标签模型窃取方法) termed black-box dissector(成为黑盒解析器), which consists of two erasing-based modules(两个基于擦除的模块). One is a CAM-driven erasing strategy(擦除策略) that is designed to increase the information capacity(信息容量) hidden in hard labels(硬标签) from the victim model. The other is a random-erasing-based(基于随机擦除) self-knowledge distillation module(自知识蒸馏) that utilizes soft labels(软标签) from the substitute model(替代模型) to mitigate overfitting(减轻过拟合). Extensive experiments(广泛的实验) on four widely-used(通用) datasets consistently demonstrate(一致表明) that our method outperforms(优于) state-of-the-art methods, with an improvement of at most(最多提高) 8.27%. We also validate(验证) the effectiveness(有效性) and practical potential(实用潜力) of our method on real-world APIs and defense methods. Furthermore(此外), our method promotes(促进) other downstream tasks(其他下游任务), i.e., transfer adversarial attacks(转移对抗攻击).

方法

在这里插入图片描述

结论

We investigated(研究) the problem(问题) of model stealing attacks(模型窃取攻击) under the hard-label setting(硬标签设置下) and pointed out(指出) why previous methods are not effective enough(有效性不足). We presented a new method, termed black box dissector(称为黑盒分解), which contains a CAM-driven erasing strategy(擦除策略) and a RE-based self-KD module. We showed its superiority on four widely-used(通过) datasets and verified(验证) the effectiveness(有效性) of our method with defense methods(防御方法), real-world APIs, and the downstream adversarial attack(下游对抗性攻击). Though focusing on(关注) image data(图像数据) in this paper, our method is general for(通用) other tasks as long as(只要) the CAM and similar erasing method work, e.g., synonym saliency words replacement(同义词显著词替换) for NLP tasks [4]. We believe our method can be easily extended(容易扩展) to other fields and inspire future researchers. Model stealing attack(模型窃取攻击) poses a threat(威胁) to the deployed machine learning models(机器学习模型的部署). We hope this work will draw attention to the protection of deployed models and furthermore(进一步) shed more light on the attack mechanisms(攻击机制) and prevention methods(预防方法). Additionally(此外), transformer-based classifiers are becoming hot, and their security issues should also be paid attention to. This kind of classifier divides(分割) the images into patches and our method works by erasing parts of images, it is more convenient(方便) for us to align(对齐) the attention map(注意力图) by masking the patch and mine the missing information. We will validate this idea in the further work.

论文链接

Black-Box Dissector: Towards Erasing-based Hard-Label Model Stealing Attack

  • 17
    点赞
  • 20
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Bosenya12

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值