On Adaptive Attacks to Adversarial Example Defenses
Authors: Florian Tramer, Nicholas Carlini, Wieland Brendel, Aleksander Madry
Key Points:
- Thirteen Advanced Defenses Evaluation
-
Introduction
Nowadays many defense methods are not efficient enough to be described and evaluated because they do not use proper attacks to evaluate the performance. So the paper analyses thirteen recent defenses by performing adaptive attack in the right way by adjusting objective functions and hyper-parameters.
-
Background
The widely adopted approach is promoted by CW attack and Madry. For example, normalization function includes sigmoid function based on the $l_p$-norm.
\[x_{i+1}=Proj(x_i+\alpha \cdot normalize(\nabla_{x_i} L(x_i,y)))\]There are four common attack strategies:
- Projected Gradient Descent - C&W attack: \(maximize_{x'}L(x',y)-\lambda\cdot\vert\vert x'-x\vert\vert_p\)
- Back Pass Differentiable Approximation: Replacing one layer of a neural network \(f^i(x)\) by an approximate function \(g(x)\) when computing gradient by back propagation if the layer is non-differentiable. - Expectation Over Transformation: computing gradient by randomized components such as randomized transformation.
-
Attack Themes
The whole paper will evaluate thirteen defense methods in the following seven prospectives.
- Strive for simplicity as to loss function and gradient descent. - Attack the full defense.
- Identify and target important defense parts. - Adapt the objective function to simplify the attack.
- Ensure the loss function is consistent which is a good proxy for attack success. - Optimize the loss function with different methods
- Use a strong adaptive attack for adversarial training or these generated adversarial examples are useless.
-
Essence: Replacing the standard ReLU activation function by outputting k largest elements in every layer and setting 0 to other elements to avoid gradient-based attack in a neural network.
The results of different types of attack:
- Gradient-based attack: not working because adversary can just find the direction in a very small region which is meaningless.
- Black-box attack
- score-based attack: working
- decision-based attack: working if it is convergence.
- transfer-based attack: being proved that it is not working in the original paper
-
The Odds are Odd
Essence: