Adversarial Machine Learning

I try to summarize the main ideas and advanced technologies in this blog. Besides, I will record the progresses and details about the process. As we know, Adversarial Machine Learning has many applications. From the talk of Ian Goodfellow from ICLR2019, the applications include generative Modeling, Security, Model-based optimization, RL, extreme reliability, label efficiency, domain adaptation, Fairness accountability and transparency and Neuroscience.

1. Basic Concepts and Algorithms

Firstly, I need to confirm some basic concepts and their abbreviations, such as Adversarial Example(AE), Targeted Attack(TA) and Un-targeted Attack(UA). The types of attacking and metrics are listed as follows.

Un-targeted Attack:

Non-iterative Un-targeted attacks: Fast Gradient Sign Method, R+FGSM
Iterative Un-targeted attacks: Basic Iterative Method, Projected Gradient Descent(PGD), U-MI-FGSM, Deep Fool, Universal Adversarial Perturbation, OptMargin

Targeted Attack：Specifying the label to be the least likely class.

Non-iterative Targeted attacks: Least-Likely Class(LLC) attack, R+LLC
Iterative Targeted attacks: Box-constrained L-BFGS(BLB), Iterative LLC, targeted MI-FGSM(T-MI-FGSM), Jacobian-based Saliency Map Attack(JSMA), Carlini and Wagner(CW) attack, Elastic-net Attack to DNNs(EAD), Expectation Over Transformation(EOT),Backward Pass Differentiable Approximation(BPDA)

Attacking Metrics:

Misclassification: Misclassification Ratio, Average Confidence of Adversarial Class(ACAC), Average Confidence of True Class(ACTC)
Imperceptibility: Average \(L_p\) Distortion, Average Structural Similarity, Perturbation Sensitivity Distance(PSD)
Robustness: Noise Tolerance Estimation, Robustness to Gaussian Blur(RGB), Robustness to Image Compression

Secondly, the defense part is listed as follows.

Adversarial Training: Naive Adversarial Training(NAT), Ensemble Adversarial Training(EAT), PGD-based Adversarial Training(PAT)

Gradient Masking/Regularization: Defensive Distillation, Input Gradient Regularization(IGR)

Input Transformation: Ensemble Input Transformation(EIT), Random Transformations-based defense(RT), Pixel Defense(PD), Thermometer Encoding(TE)

Region-based Classification

Detection-only Defenses: Local Intrinsic Dimensionality(LID), Feature Squeezing(FS), MagNet

Defensing Metrics:

Classification Accuracy Variance(CAV)
Classification Rectify/Sacrifice Ratio(CRR/CSR)
Classification Confidence Variance(CCV)
Classification Output Stability(COS)

2. Attack

When we talk about the adversarial machine learning, we need to the set the threat model firstly. A threat model will outline the attacking type and the defensing ways including the evaluation of the defense. In the adversary’s part of a threat model, goals, knowledge and capabilities will be clear.

Goals: generating inputs to force a ML system to conclude erroneous results.
Knowledge: the knowledge the adversary is assumed to have.
Capabilities: the requirements and the methods of the attacking.
- Causing bit-flips on the weights of a neural network.
- Causing errors during the data processing pipeline.
- Making backdoors.
- Perturbing the images to fool the ML systems.

To be more specific, for some natural input \(x\) and similarity metric \(D\), \(x'\) is a adversarial example if \(D(x, x')\leq \epsilon\) for some small \(\epsilon\) and \(x'\) is misclassified. A choice for \(D\) is \(l_p\)-norm. While the choice of \(D\) and \(\epsilon\) varies based on the missions and a small \(\epsilon\) is not always important for malware detection. The definition of the adversary’s capability can be set as follows.

(1) \(\large{\mathbb{E}_{(x,y)\sim\chi}[max_{x':D(x,x')<\epsilon}L(f(x'),y)]}\): \(L\) means loss function.

(2) \(\large{\mathbb{E}_{(x,y)\sim\chi}[min_{x'\in A_{x,y}}D(x,x')]}\): \(A_{x,y}\) results from the definition of adversarial example, e.g. \(A_{x,y}=\{x'\rvert x' \neq y\}\) for misclassification.

3. Defense Evaluation

After summarizing the algorithms of adversarial machine learning, I need to specify the defense methods to evaluate the performances.

At the very beginning, the aim of the defense is as follows.

Defend against an adversary who will attack the system.
Test the worst-case situations of algorithms.
Measure progress of algorithms

Basically, the challenge of security evaluations is the difficulty to evaluate the worst-case robustness and the different assumptions between vision and security.

On the other hands, when we evaluate the algorithms or frameworks, we need to be clear about the requirements. For example, if one proposes a defense method, he should do things as follows.

(1) Be skeptical of the results.

(2) Try to find the best way to attack the defense method, even if it is not from the existing adversarial attacks.

(3) Release full source code and pre-trained models.

There is a basic way to complete the evaluation and the pitfall needed to avoid from [3] chapter 3.

State a precise threat model
Perform adaptive attacks
Release pre-trained models and source code
Report clean model accuracy when not under attack
Perform basic sanity tests on attack success rates
Generate an attack success rate vs. perturbation budget curve
Verify adaptive attacks perform better than any other
Describe the attacks applied, including all hyper-parameters

More information is available form [3]

4. Important Researches

Now I want to summarize some important and intuitive papers to dig some real problems and have a deep understanding about them. The reading path is introduced from [5].

Adversarial Machine Learning

1. Basic Concepts and Algorithms

2. Attack

3. Defense Evaluation

4. Important Researches

4.1 Evasion Attacks against Machine Learning at Test Time

4.2 Intriguing properties of neural networks

4.3 Explaining and Harnessing Adversarial Examples

Reference:

Adversarial Machine Learning

1. Basic Concepts and Algorithms

2. Attack

3. Defense Evaluation

4. Important Researches

4.1 Evasion Attacks against Machine Learning at Test Time

4.2 Intriguing properties of neural networks

4.3 Explaining and Harnessing Adversarial Examples

Reference:

Further Reading

powershell cookbook

安全笔记概论

安全笔记--Web攻击技术