Paper Summary 2022

In the blog, I summarize the papers I feel interested and record some thoughts when reading.

Rethinking Graph Neural Netowrks for Anomaly Detection

Main Idea

Based on the observation that anomalies lead to the right-shift effect in spectral energy distribution from low-frequencies to high frequencies, the paper propose Beta Wavelet GNN to get better performance.

Key insight

Spectral Analysis of Graph Anomaly

They define the anomaly degree $\sigma /

\mu

$ wherethe graph features are independent frawn from a Gaussian distribution: $x\sim N(\mu e_N,\sigma^2I_N)$.

They observe that with the increase of the degree of anomaly nodes, the spectral energy concentrate less on low-frequency eigenvalues.
They redefine the high-drequency area to avoid the high cost of eigen-decomposition:
- \[S_{high}=\int_0^{\lambda_N} 1-f(t)dt\]

Hammond’s Graph Wavelet
- It is composed of a mother wave and a group of wavelet bases: $W=(W_{\psi_1}, W_{\psi_2},\cdots)$
- $W_{\psi_i}(x)=Ug_i(\Lambda)^T x$ where $g_i(\cdot)$ is a kernel function and $g_i(\lambda)=diag(g_i(\lambda))$. -
Beta Wavelet on Graph
Beta Wavelet Graph Neural Network
- $Z_i=W_{i, C-i}(MLP(X))$
- $H=AGG([Z_0, Z_1, \cdots, Z_C])$

Experiments

Datasets:

YelpChi
Amazon

Comparisons:

MLP/SVM
GCN/ChebyNet/GAT/GraphSAGE…
GraphConsis/CAREGNN…

A Comparative Study for Unsupervised Network Representation Learning

arxiv

Main Idea

The paper summarizes several unsupervised network representation learning (UNRL) approaches over graphs:

Random Walk based
Matrix Factorization
Deep Learning based They consider 9 UNRL methods and 11 datasets on two tasks: node classification and link prediction, and analyze all of them to show a clear blueprint of the topic.

The involved algorithms are listed in the table:

Experiments

Datasets:

Graph Structure of Neural Networks

arxiv/code/ ICML2020

Main Idea

The paper takes the neural networks as graphs and develops a graph-based representation called relational graph. Then they design a graph generator WS-flex to systematically explore the design space of neural networks.

In the Average Path Length-Clustering Coefficient figures, they claim to find sweet spots to find the best performance models. They furthermore provide a quick way to identify a sweet spot.

SIGL: Securing Software Installations Through Deep Graph Learning

arXiv

Main Idea

The paper designs a tool, SIGL, to detect malicious behavior on software installation. It first build the provenace graphs by system call activities. Then it uses Graph LSTM as an autoencoder to generate embeddings. System Overview:

Data Collection and Featurization
Model Training and Validation
Anomaly Detection and Prioritization
Key insights
Formalize the problem of detecting malicious software installation.
Define the Software Installation Graphs and use an autoencoder to generate graph-level embeddings.

Threat model

They define the installation behavior of a software package as a chain of system events which can be displayed as a directed acyclic graph. Nodes represent processes and objects while edges represent interactions. The operating system and audit framework are benign.

Experiments

Dataset: synthetic dataset on Windows ETW and Linux Aduit

It also consider the poisoning attack and adversarial attack. For poisoning attacks, different ratios of malware data from 5% to 25% are added. For adversarial attacks, they design restrict black-box attack (RBA) including feature and structure attacks and practical black-box attack (PBA) including hierarchical reinforcement learning attack.

You are what you do: Hunting Stealthy Malware via Data Provenance Analysis

paper/slide/video

Main idea

The paper designs a provenance-based approach ProvDetector to detect stealthy malware.

Provenance graph building
Representation Extraction
- Rareness-based path selection
Embedding
Anomaly Detection

Key insights

Detection of marginal deviation: Impersonation-based stealthy malware blends into benign programs. ProvDetector needs to identify and isolate the marginal outlier events from the benign behaviors.
Scalable model building: The size of provenance graphs is super large. ProvDetector works on the suspicious subgraphs by ranking the top $K$ uncommon causal paths.

Threat Model

The paper concentrates on stealthy malware that leverages legitimate methods or machine services from the victim’s host to exploit and execute malicious activities.

The operating systems and the loggers are benign or in the trusted computing base (TCB) and the adversary cannot change the records.
The side channel attack is also not considered

Experiments

They collected process instances from 23 programs. In conclusions, they tested on 1150 malware samples that hijack benign processes.

They also evaluated the interpretation of detection results.

Simple models do not perform well because they can only consider one hop neighborhoods.
Whole Graph modeling is not a good feature based on their experiments when they use graph2vec to detect hijacked process attacks.

Log2vec: A heterogeneous Graph Embedding Based Approach for Detecting Cyber Threats within Enterprise

paper/

Main idea

The paper design an embedding generator to detect malicious logs. It firstly build a heterogeneous graph from logs that each node represents the log event and they design 10 rules to decide the edges. Basically, the edges show the common hosts or temporal sequence between nodes. In general, the heterogeneous graph looks like a self-defined connections instead of the raw node interaction structures. After the graph generating, they use random walk and word2vec to generate the node embedddings and clustering algorithm to detect anomaly nodes.

Key insights

The graph definition is totally different from provenance graphs. It looks like a multi-dimensional log sequence.
It did not use GNN models. Instead, Random walk + word2vec is used.

Experiments

datasets:

cert
LANL

Learning Causal Effects on Hypergraphs

paper/code

Main Idea

The paper concentrates on individual treatment effect (ITE) estimation on hypergraphs. One example of ITE estimation is how much an intervention (wearing face covering) will causally afect an outcome (COVID-19 infection) of each individual node. They use a hypergraph neural network to learn high-order interference.

Key insights

The first one works on the problem of ITE estimation under high-order interference on hypergraphs.
Design a framework to models confounders and high-order interference by representation learning and hypergraph neural networks.

Experiments

Dataset: Semi-synthetic data from Contact, Goodreads and Microsoft Teams.

Multi-Dimensional Network Embedding with Hierarchical Structure

paper

Main Idea

The paper designs a multi-dimensional graph model that considers the edge types and node types in different layers/dimensions. For example, if a set of nodes can be shown as users/products and edges can be viewing or purchasing, we can draw two graphs that represent viewing and purchasing information. In that case, two graphs have same nodes but different edges.

Model Structure

For different edge types, using a specific embedding to record the dimension information.
For different node types, using an embedding to record the hierarchical strucutre information.
Use a skip-gram model to learn the embeddings. And simply adding the embeddings together.

GraphMAE: Self-Supervised Masked Graph Autoencoders

paper/code

Main Idea

The paper focuses on Self-supervised Learning (SSL) by graph autoencoders (GAEs). It lists four common problems of the current models:

The structure information is over-emphasized.
Feature reconstruction without corruption is not robust.
The mean square error is sensitive and unstable.
The decoder architectures are of little expressiveness. They improve the following parts:
Masked feature reconstruction.
Scaled cosine error.
Re-mask decoding.

Background of Self-supervised Learning[1]

Generative: Use an encoder to generate middle vector $z$ and use an decoder to rebuild the orginal input $x$.
Contrastive: After getting the middle vector $z$, compare the similarity.
Generative-Contrastive: Based on the generative model, use a large model like ResNet to compare them.

Model Techniques

Use feature reconstruction as the objective.
Reconstruct the masked features and apply a uniform random sampling strategy.
Use a GNN-decoder with re-mask decoding.
Use scaled cosine error as the criterion.

Reference:

1. Self-supervised Learning: Generative or Contrastive

Paper Summary 2022

Rethinking Graph Neural Netowrks for Anomaly Detection

Main Idea

Key insight

Experiments

A Comparative Study for Unsupervised Network Representation Learning

Main Idea

Experiments

Graph Structure of Neural Networks

Main Idea

SIGL: Securing Software Installations Through Deep Graph Learning

Main Idea

Key insights

Threat model

Experiments

You are what you do: Hunting Stealthy Malware via Data Provenance Analysis

Main idea

Key insights

Threat Model

Experiments

Log2vec: A heterogeneous Graph Embedding Based Approach for Detecting Cyber Threats within Enterprise

Main idea

Key insights

Experiments

Learning Causal Effects on Hypergraphs

Main Idea

Key insights

Experiments

Multi-Dimensional Network Embedding with Hierarchical Structure

Main Idea

Model Structure

GraphMAE: Self-Supervised Masked Graph Autoencoders

Main Idea

Background of Self-supervised Learning[1]

Model Techniques

Further Reading

Paper Summary 2023

[DSN2023] On Adversarial Robustness of Point Cloud Semantic Segmentation

NDSS2022