《Adversarial Attacks on Neural Network Policies》阅读笔记

本文探讨了对抗攻击对神经网络策略的影响,分析了白盒(FGSM方法)和黑盒攻击在强化学习中的实施,研究了跨策略和跨算法的攻击迁移性。实验表明,尽管黑盒攻击效果不如白盒,但仍具有一定的有效性,强调了对强化学习模型进行安全测试的重要性。
摘要由CSDN通过智能技术生成

摘要

对抗攻击可以攻击基本的神经网络模型,攻击常见的深度学习任务(如分类、识别)等,仅需要通过修改一像素的值,就可以使得神经网络输出目标结果,并且在白盒与黑盒场景下均可以攻击成功。值得注意的是,考虑到现实任务中积攒的数据往往总是那么规整,全面,且需要人工智能模型参与决策的任务变得更多了,对无监督学习、强化学习模型的攻击(在某些应用场景下也可以理解为测试)也需要考虑到其特质,进行进一步研究。本文认为目前的对抗攻击方法也能用于攻击强化学习策略。

介绍

强化学习没有大量的标记好的训练样本,其训练数据是通过训练过程得来的,可以理解为“试错”的过程。因此,其不像对抗攻击常用于的图像分类领域,没有大量标注好的训练数据可以用来生成对抗样本。同时,对比于白盒攻击,在黑盒攻击场景下,攻击者既无法获得目标策略网络的细节,也无法获得大量训练集,因此对强化学习模型的攻击难度更大。
对抗样本的有效性受到两方面因素影响,首先是用于学习策略的深度强化学习算法,其次是攻击前提是白盒还是黑盒。因此本文主要内容包括以下两个方面:

  1. 分析了白盒攻击(本文中是FGSM方法)对三种强化学习训练出的Atari games的攻击。
  2. 分析了针对以上策略的黑盒攻击(可以访问训练环境,但不知道具体算法是什么,也不能初始化目标策略)。

白盒攻击过程

采用FGSM方法,计算损失函数关于输入 x x x的梯度。类似于用于图像分类的CNN,输出 y y y是多维的,维数 n n n代表采取的action。在计算梯度时,我们认为采取的最优action即是输出 y y y中值最高的一维代表的action。在三种我们考虑的学习算法中,TRPO和A3C均为随机性策略强化学习,DQN则是确定性策略强化学习。

DQN是value-based的,基于Q-Learning,其每一步action的选择是基于计算出的Q-value来决定的,因此,其损失函数是target的Q-value值与当前步的Q-value值的差。
L ( w ) = E [ ( r t + 1 + γ m a x Q

Adversarial attacks are a major concern in the field of deep learning as they can cause misclassification and undermine the reliability of deep learning models. In recent years, researchers have proposed several techniques to improve the robustness of deep learning models against adversarial attacks. Here are some of the approaches: 1. Adversarial training: This involves generating adversarial examples during training and using them to augment the training data. This helps the model learn to be more robust to adversarial attacks. 2. Defensive distillation: This is a technique that involves training a second model to mimic the behavior of the original model. The second model is then used to make predictions, making it more difficult for an adversary to generate adversarial examples that can fool the model. 3. Feature squeezing: This involves converting the input data to a lower dimensionality, making it more difficult for an adversary to generate adversarial examples. 4. Gradient masking: This involves adding noise to the gradients during training to prevent an adversary from estimating the gradients accurately and generating adversarial examples. 5. Adversarial detection: This involves training a separate model to detect adversarial examples and reject them before they can be used to fool the main model. 6. Model compression: This involves reducing the complexity of the model, making it more difficult for an adversary to generate adversarial examples. In conclusion, improving the robustness of deep learning models against adversarial attacks is an active area of research. Researchers are continually developing new techniques and approaches to make deep learning models more resistant to adversarial attacks.
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值