Recently, we take part in a competition about the white-box adversarial attacks on ML defense Models which is organized by Tsinghua University and UIUC. Basically, the organizer provides 15 defense models based on the adversarial training in Stage One and we need to design a general attack algorithm to achieve the highest success rate. In Stage Two, several hidden models will also be added to evaluate the attack algorithm and count for the final score. We should implement the attack algorithm on ARES which is a platform built by Tsinghua University. We need to attack 1000 images from Cifar10 and 1000 images from ImageNet.
There are two constrains about the attack. Firstly, the perturbation budget is 8/255 for Cifar10 and 4/255 for ImageNet in \(L_\infty\) norm. Besides, the mean number of the gradient calculations for each image is constrained to 100 while the number of the inference is 200 and the runtime of the the whole process is less than 3 hours on Tesla V100 GPU.
At the very begining, we focus on the C&W attack and try to simplify the optimization proceed to meet the second requirement. The objective function is:
minimize \(\vert \vert \frac{1}{2}(tanh(w)+1)-x\vert \vert^2_2+c\cdot f(\frac{1}{2}(tanh(w)+1))\)
And we design an algorithm to search for \(c\). If the \(L_\infty\) distance is larger than the perturbation budget, \(c=c/10\). Or \(c=2\cdot c\).
However, the result is not good and the score is 41.76.
After several tries, we find that PGD is a better basic algorithm to improve.
We have combined several attack algorithms together originated from PGD.
- ODS: Output Diversified Sampling
- Auto Attack: APGD-CE
- Gradient step
- Restarts from the best point
- Exploration vs exploitation
Due to the time limit, we do not get a pretty high score. Our final score is 47.11 in the end and rank 38 in 1681 teams.
From our experience, the optimized-based adversarial attack is hard to be imporved on the steps since it is greatly impacted by the original point. And we assume that most of the top rank algorithms are basically use different parameters and some tricks.
We will publish our codes on the Github sooner>-<.
Other methods
Some top methods have been listed on the forum. Basically, most of them use ODI and APGD which are the same as ours. But they use ensemble learning by using several loss function.
So the ideal method is like that:
- Use FGSM to attack the most vulnerable images.
- Then, use ODI-PGD or APGD to solve the left images.
- If it does not work, try different loss functions.
Some useful tricks:
- Adaptive iteration number.
- Decrease step length.
- Use momentum to avoid local optimal position.