@wuxin1994 2017-09-07T16:00:36.000000Z 字数 4591 阅读 2755

对抗攻击论文总结

Secure

综述

机器学习的安全性《Can machine learning be secure?》

对机器学习安全性的影响可以通过两方面：一是在学习过程中调节训练数据《Poisoning attacks against support vector machines》，二是在预测过程中操控输入数据《In Machine Learning and Knowledge Discovery in Databases）。

Attack

对抗问题最先由2013《Evasion attacks against machine learning at test time》提出，对抗攻击（Adversarial Attacks）的提出2014《Intriguing properties of neural networks》提出。

对抗攻击产生的原因，分为两个方面：深度神经网络模型的非线性加上不充分的模型平均和不充分的正则化导致的过拟合；《Explaining And Harnessing Adversarial Examples》提到高维空间中的线性性就足以造成对抗样本，深度模型对对抗样本的无力最主要的还是由于其线性部分的存在。

针对对抗攻击在黑盒攻击中的应用《Practical black-box attacks against deep learning systems using adversarial examples》，（增强学习中的黑盒攻击）《Adversarial Attacks on Neural Network Policies》，《Universal adversarial perturbations》中的构造样本方法也能在黑盒前提下发挥作用。
通常的黑盒攻击都是在对抗攻击的transferability特性的情况下发挥作用，《Machine Learning as an Adversarial Service:Learning Black-Box Adversarial Examples》提出了直接采用对抗样本的方法进行黑盒攻击。

对抗攻击的transferability特性《Intriguing properties of neural networks》（多种神经网络结构），《Adversarial Attacks on Neural Network Policies》（在多种增强学习策略下），

构建对抗样本的算法（优化算法）：
（围绕两个问题：difficult to find new methods that are both effective in jeopardizing a model and computationally affordable）
1.传统的梯度下降，牛顿法，BFGS，L-BFGS
2.Jacobian saliency map attack (JSMA) ：《The limitations of deep learning in adversarial settings
2.5 FGSM：《Explaining And Harnessing Adversarial Examples》
iterative version of FGSM：《Adversarial examples in the physical world》(smaller perturbation)
3.RP2：《Robust Physical-World Attacks on Machine Learning Models》
4.Papernot Method：《 Adversarial perturbations against deep neural networks for malware classification》
5.Universal Perturbations （extend of DeepFool method）：《Analysis of universal adversarial perturbations》《Universal adversarial perturbations》（思考：能否根据universal原理将神经网络中的输入都加上同一个扰动，让对应的模型分类效果更好，从而得出置信度更高的分类结果）
6.DeepFool：《Deepfool: a simple and accurate method to fool deep neural networks. 》the first method to compute and apply the minimal perturbation necessary for misclassification under the L2 norm.（the approximation is more accurate than FGSM and faster than JSMA）（still computationally expensive）
7.《Towards evaluating the robustness of neural networks》（The authors cast the formulation of Szegedy et al. into a more efficient optimization problem, which allows them to craft efficient adversarial samples with low distortion.）（also very expensive）
8. Virtual adversarial examples：《Virtual adversarial training: a regularization method for supervised and semi-supervised learning》

对抗攻击应用在物理目标上：（面部识别）《Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition》，（拍照图片）《Adversarial examples in the physical world》，（路标）《Robust Physical-World Attacks on Machine Learning Models》
，（自动汽车）《Concrete Problems for Autonomous Vehicle Safety:Advantages of Bayesian Deep Learning》，（恶意软件分类）《Adversarial Perturbations Against Deep Neural Networks for Malware Classification》

在增强学习上的应用《Vulnerability of deep reinforcement learning to policy induction attacks.》，《Adversarial Attacks on Neural Network Policies》

增强攻击效果的方法：
《Adversarial Attacks on Image Recognition》提到可以通过PCA降维处理数据

Defence

defence的两个方向：《Adversarial Attacks on Neural Network Policies》（作为future work提到）
1. 将对抗样本加入到训练集中。即是可以手动生成对抗样本，并加入到训练集中。（但是生成对抗样本的代价比较大）
2. 在测试模型时增加一个探测对抗输入的模块，判断输入是否有对抗攻击
Defence的方法：

Adversarial Training（augmenting the training data with perturbed examples）：《Intriguing properties of neural networks》（either feeding a model with both true and adversarial examples or learning it using the following modified objective function:
J ˆ(θ, x, y) = αJ(θ, x, y) + (1 − α)J(θ, x + ∆x, y)）
Defensive distillation：《Distillation as a defense to adversarial perturbations against deep neural networks》--hardens the model in two steps: first, a classification model is trained and its softmax layer is smoothed by division with a constant T ; then, a second model is trained using the same inputs, but instead of feeding it the original labels, the probability vectors from the last layer of the first model are used as soft targets. （《Adversarial perturbations of deep neural networks》对这种方法进行了改动，只需要一步即可构成攻击）
Feature squeezing：《Feature squeezing: Detecting adversarial examples in deep neural networks》《Feature squeezing mitigates and detects carlini/wagner adversarial examples》
Detection systems:
performe statistical tests:《On the (statistical) detection of adversarial examples》
use an additional model for detection:《Adversarial and clean data are not twins》《On detecting adversarial perturbations》
apply dropout at test time:《Detecting adversarial samples from artifacts》
PCA whitening ：《Early methods for detecting adversarial images》

研究点：

各种攻击方式的优化
defence策略的构建
针对各种特定应用场景的对抗
利用攻击和defence策略优化现有模型结构，增强机器学习模型的效果
探究对抗攻击深层原理，理解其背后的数学本质，实际上就是理解深度神经网络的工作原理以及对抗攻击为什么能有作用。
思考：对抗攻击的问题，好像神经网络泛化性能的限制问题，因为增加的很小的扰动就能让模型错误率比较高，说明模型的泛化能力不够强，针对其他样本的效果也不是很好，因此，增强神经网络模型的泛化能力也是defence策略的一个方向，尤其是在黑盒攻击方面

意义 ## (citing from )

Able to handle massive volumes of data
Works at machine speed to thwart attacks
Does not rely on signatures
Can stop known and unknown malware
Stops malware pre-execution
Higher detection retes, lower false positives