@wuxin1994 2017-10-17T15:27:08.000000Z 字数 2698 阅读 1085

The Explanation of Adversarial Examples

Secure

Linear explanation of Adversarial Examples
Goodfellow explains the existence of adversarial for linear models by starting with a description of the precision of the sensor or storage media in [1]. They use only eight bits per pixel(gray scales) in common examples digital images. As a result, they leave all information below $\frac{1}{255}$ of the dynamic range.
The point here is that because the precision of the features is limited, it makes little sense for a classifier to respond differently to an input $x$ than to an adversarial input $\hat{x} = x + η$ , if of course every element of the perturbation $η$ is smaller than the precision of the features. Let’s say that $\epsilon$ is the largest value below the resolution of the sensor (or storage media). Then for problems with well-separated classes, we expect a classifier to assign the same class to $x$ and $\hat{x}$ , so long as $||η||_∞ < \epsilon$ (where $||x||_∞$ is the max or infinity norm).

Then, if dim(w) = $n$ and $\overline{|w| }= m$ , then the activation $w^Tη$ will grow by $\epsilon mn$ .What is interesting here (among other things) is that the max norm $||η||_∞$ , does not grow with the dimensionality of the input, while at the same time the change in activation caused by $η$ can grow linearly with n. As a result in high dimensional cases many very small changes to the input can add up to one large change to the output. So essentially we see that even a simple linear model can have adversarial examples if its input has sufficient dimensionality. This result is in stark contrast to thinking around the time of [2], when it was thought that adversarial examples were a property of highly non-linear deep neural networks.

Non-Linear Models
Goodfellow, et.al [3] hypothesizes that many neural network architectures are too linear to resist linear adversarial perturbation. These include LSTMs, ReLUs and Maxout networks, which are all intentionally designed to behave in very linear ways so that they will be easier to optimize. The important conclusion here is that the behavior of simple linear models under adversarial perturbation $η$ described in above should also apply to attack neural networks.
Tranditionally, let $θ$ be parameters of a model, $y$ the targets associated with $x$ (in the supervised learning case) and $J(θ, x, y)$ be the cost function (optimization objective) to train the neural network. Here we can linearize the cost function around the current value of $θ$ , obtaining an optimal max-norm constrained perturbation of

$η = \epsilon sign(∇_xJ(θ, x, y))$

[1] Goodfellow, I. J., Shlens, J., and Szegedy, C. Explaining and harnessing adversarial examples. https://arxiv.org/abs/1412.6572 (2014).
[2] Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., and Fergus, R. Intriguing properties of neural networks.https://arxiv.org/abs/1312.6199 (2013).
[3] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. Generative adversarial nets. In Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q.Weinberger, Eds. Curran Associates, Inc., 2014, pp. 2672–2680.

The Explanation of Adversarial Examples

内容目录