Home C&W Attack
Post
Cancel

C&W Attack

C&W attack is a pretty insightful attack in adversarial machine learning. So I use this blog to summarize the general ideas of it.

Towards Evaluating the Robustness of Neural Networks

Authors: Nicholas Carlini, David Wagner

Key points:

  • Seven Objective Functions
  • Three Box Constraints
  • $c$ choosing method
  • Three Attack Methods
  • Images’ discretization

The paper introduces a famous attacking method called CW attack. They demonstrate defensive distillation is not robust and promising enough under their attacking and CW attack is a better benchmark for future defense methods. Besides, they suggest defense should prevent the transferability of the adversarial examples.

1. Introduction

As we all know, deep neural networks are vulnerable to adversarial examples. And there are several methods to robust these models like defensive distillation. The authors construct three attacks for \(L_0, L_1, L_\infty\) to prove the weakness of these defensive methods and discover that adversarial examples are transferable between different models.

2. Background

In the project, the authors assume the adversary is accessible to all of the neural network as a white-box attacking. They evaluate the targeted attack in three conditions: Average case(select random target), Best case(select least difficult target) and Worst case(select most difficult target). On the other hand, when discussing the distance of the adversarial examples, the authors use three metrics from \(L_p\) norm to generate these examples. The p-norm is defined as \(\vert\vert v \vert\vert_p=(\sum_{i=1}^n\vert v_i\vert^p)^{\frac{1}{p}}\).

3. Approach

The authors discuss four currently advanced attacking methods including L-BFGS, FGS, JSMA and Deepfool. They use two networks for MNIST and CIFAR-10 classification and the pre-trained Inception v3 network for Image-Net classification.

The initial method to find adversarial examples is listed as follows:

minimize \(D(x, x+\delta )\) such that \(C(x+\delta)=t, x+\delta\in[0,1]^n\)

\(\delta\) means perturbation and \(D()\) is the distance metric from p-norms while \(C(x)=argmax_iF(x)_i\) is the classifier to set label. \(\delta\) can be found by minimizing \(D(x,x+\delta)\).

The authors discuss seven objective functions changing \(C(x+\delta)=t\) to \(f(x+\delta)\leq0\). The final problem has been edited as follows:

minimize \(\vert \vert \delta \vert \vert +c\cdot f(x+\delta)\), such that \(x + \delta \in [0,1]^n\).

\(c\) is a constant setting by binary search. After testing, there is a key that the objective function cannot have a great changeable derivation because \(c\) should change as well to balance the weights between \(\delta\) and \(f(x+\delta)\).

After setting the math problem, the constraint should be clear, the authors list three ways to do the job.

  • Projected gradient descent
  • Clipped gradient descent
  • Change of variables

4. Three Attacks and their Objective Functions

4.1 \(L_2\) Attack

minimize \(\vert \vert \frac{1}{2}(tanh(w)+1)-x\vert \vert^2_2+c\cdot f(\frac{1}{2}(tanh(w)+1))\) with \(f\) defined as \(f(x')=max(max\{Z(x')_i:i\neq t\}-Z(x')_t, -\kappa)\).

\(\kappa\) presents the confidence of the \(i\)th label and the \(\kappa\) is set \(0\) in the paper. As you can see, the bigger \(\kappa\), the higher success rate of attacking.

Besides, to avoid local optimization, they use multiple starting-point gradient descent in the ball of adversarial range.

4.2 \(L_0\) Attack

Iterating fixing less important pixels and using \(L_2\) attack. After every iteration, they compute \(g=\nabla f(x+\delta)\) and select \(i=\mathop{\arg\min}_\limits{i}g_i\cdot\delta_i\) and then fix \(i\).

4.3 \(L_\infty\) Attack

minimize \(c\cdot f(x+\delta)+\sum_i [(\delta_i-\tau)^+]\). Using \(\tau\) to set a threshold to avoid adding perturbation on several most influential pixels.

The \(L\infty\) distance measures the maximum change to any of the coordinates: \(\vert\vert x-x'\vert\vert_\infty=max(\vert x_1-x'_1\vert,\dots,\vert x_n-x'_n\vert)\)

This post is licensed under CC BY 4.0 by the author.

Adversarial Machine Learning

Evaluation of Adversarial Example Defenses