Evaluation of Adversarial Example Defenses

On Adaptive Attacks to Adversarial Example Defenses

Authors: Florian Tramer, Nicholas Carlini, Wieland Brendel, Aleksander Madry

Key Points:

Thirteen Advanced Defenses Evaluation

Introduction

Nowadays many defense methods are not efficient enough to be described and evaluated because they do not use proper attacks to evaluate the performance. So the paper analyses thirteen recent defenses by performing adaptive attack in the right way by adjusting objective functions and hyper-parameters.
Background

The widely adopted approach is promoted by CW attack and Madry. For example, normalization function includes sigmoid function based on the $l_p$-norm.
\[x_{i+1}=Proj(x_i+\alpha \cdot normalize(\nabla_{x_i} L(x_i,y)))\]
There are four common attack strategies:
- Projected Gradient Descent - C&W attack: $maximize_{x'}L(x',y)-\lambda\cdot\vert\vert x'-x\vert\vert_p$
- Back Pass Differentiable Approximation: Replacing one layer of a neural network $f^i(x)$ by an approximate function $g(x)$ when computing gradient by back propagation if the layer is non-differentiable. - Expectation Over Transformation: computing gradient by randomized components such as randomized transformation.
Attack Themes

The whole paper will evaluate thirteen defense methods in the following seven prospectives.
- Strive for simplicity as to loss function and gradient descent. - Attack the full defense.
- Identify and target important defense parts. - Adapt the objective function to simplify the attack.
- Ensure the loss function is consistent which is a good proxy for attack success. - Optimize the loss function with different methods
- Use a strong adaptive attack for adversarial training or these generated adversarial examples are useless.
K-Winners Take All

Essence: Replacing the standard ReLU activation function by outputting k largest elements in every layer and setting 0 to other elements to avoid gradient-based attack in a neural network.

The results of different types of attack:
- Gradient-based attack: not working because adversary can just find the direction in a very small region which is meaningless.
- Black-box attack
  - score-based attack: working
  - decision-based attack: working if it is convergence.
  - transfer-based attack: being proved that it is not working in the original paper
The Odds are Odd

Essence:

Evaluation of Adversarial Example Defenses

On Adaptive Attacks to Adversarial Example Defenses

Further Reading

Coding Interview

[DSN2023] On Adversarial Robustness of Point Cloud Semantic Segmentation

C&W Attack