Recently, CCS2020 has finished. In the blog, I would like to talk about the papers concentrating on Attacking and Defending ML Systems in the conference. I will summarize four papers in the following list:
- Gotta Catch’Em All: Using Honeypots to Catch Adversarial Attacks on Neural Networks
- A Tale of Evil Twins: Adversarial Inputs versus Poisoned Models
- DeepDyve: Dynamic Verification for Deep Neural Networks
- Composite Backdoor Attack for Deep Neural Network by Mixing Existing Benign Features
Gotta Catch’Em All: Using Honeypots to Catch Adversarial Attacks on Neural Networks
Main Idea: The paper uses honeypot on DNN to trapdoor the adversarial attacks and lead them to generate adversarial examples that similar to the trapdoors. They evaluate their defense on PGD, CW, Elastic Net and BPDA.
Highlights:
- Originally using trapdoors to defense adversarial attack
- Showing the robustness of the defense on BPDA and surrogate model attack.
- Generating the minimal impact on normal classification performance.
Summary:
We can separate the defense into the following steps:
- Embedding the trapdoors: It generates the trapdoor training dataset by augmenting the original dataset.
- Training the Trap-doored Model: The trap-doored model can classify the data with the trapdoor to the trapdoor label. And then it records the trapdoor signatures that the data with the trapdoor’s neuron activation vector.
- Detecting Adversarial Attacks: Comparing the trapdoor signatures and the inputs’ neuron activation vector.
Experiments:
- Datasets: MNIST, CIFAR10, GTSRB, YouTube Face.
- The defense is not working on adaptive attack from Dr. Nicholas Carlini.
Thinking:
The defense method looks like using the attack method to defense. Like since the model will suffer from the adversarial attack, we can attack ourselves at first and know the attack result in advance.
Composite Backdoor Attack for Deep Neural Network by Mixing Existing Benign Features
Main Idea: The paper designs a new trojan attack Composite Backdoor Attack which can elude scanners by triggers composed from benign features of multiple labels.
Highlights:
- Create a new attack, composite attack.
- Implement the composite attack on various tasks to verify the performance.
- Design a possible defense.
Summary:
The composite attack is different from the classic adversarial attack because the perturbation can be found apparently and is the main body of other images. The composite attack can be implemented in the following steps.
- Mixer construction: The main bodies of the two images are combined to generate the new image with the target label.
- Training Data Generation: Use the images in the same class to generate clean data.
- Trojan Training from scratch or the pre-trained model.
Experiments:
- Object Recognition
- Traffic Sign Recognition
- Face Recognition
- Topic Classification
- Object Detection
Thinking:
The composite attack is a new attack which just use benign features. They implement various experiments to evaluate the performance of the attack.
DeepDyve: Dynamic Verification for Deep Neural Networks
Main idea: The paper develops a lightweight dynamic verification checker DeepDyve to make sure the prediction is correct. The checker DNN just focuses on the labels that are hard to distinguish from the original DNN.
The advantages include the small structure and fault-tolerant. The threat model is that the attacker succeeds if the model’s output is not the same as the one in an attack-free environment.
Highlights:
- The checker is lightweight, fault-tolerant and dynamic verification.
Summary:
As to build the final checker, the initial checker is chose from lots of candidate checkers based on the overhead and fault coverage. And then the checker is manipulated to achieve better coverage/overhead trade-off.
When using the checker, they will compare the results from the original DNN and the checker DNN. If the prediction is not the same, they will re-calculate the result and use the original DNN’s output.
Experiments:
- Datasets: CIFAR10, GTSRB, CIFAR100, Tiny-Imagenet
Thinking:
I just roughly read the paper. There are some key questions like how to generate the checker DNN.
A Tale of Evil Twins: Adversarial Inputs versus Poisoned Models
Main idea: The paper checks two vectors of the adversarial attack that are the poisoned data and the poisoned model. After that the paper develops a new attack IMC that balances the weights of the two parts.
Highlights:
- The paper checks the impact between the poisoned data and the poisoned model. For example, if I increase the perturbation on the poisoned data, the decreasing of changing the poisoned model.
- The paper develops Input Model Combination Attack(IMC) and TrojanNN attack.
Summary:
The paper promotes three new desiderata:
- Efficacy: attack success rate
- Fidelity: maintaining the original accuracy
- Specificity: the misclassified labels directs to the target labels.
Two effects:
Leverage Effect: Small cost of fidelity will improve significantly specificity and vice versa.
Amplification Effect: Adversarial input sand poisoned models amplify each other.
Experiments:
- Datasets: CIFAR10, Mini-ImgeNet, ISIC, GTRSB
- Models: ResNet18, ResNet18, ResNet101, ResNet18
Thinking:
The paper introduces the story in a totally different way. It firstly builds a new attack objective, and promotes IMC attack and makes some observations on the attack objective. Then it improves the attack by promoting a new attack TrojanNN attack. Finally, it discusses the potential countermeasures.