In the blog, I summarize the papers I feel interested and record some thoughts when reading.
Rethinking Graph Neural Netowrks for Anomaly Detection
Main Idea
Based on the observation that anomalies lead to the right-shift effect in spectral energy distribution from low-frequencies to high frequencies, the paper propose Beta Wavelet GNN to get better performance.
Key insight
- Spectral Analysis of Graph Anomaly
-
They define the anomaly degree $\sigma / \mu $ wherethe graph features are independent frawn from a Gaussian distribution: $x\sim N(\mu e_N,\sigma^2I_N)$. - They observe that with the increase of the degree of anomaly nodes, the spectral energy concentrate less on low-frequency eigenvalues.
- They redefine the high-drequency area to avoid the high cost of eigen-decomposition:
- \[S_{high}=\int_0^{\lambda_N} 1-f(t)dt\]
-
- Hammond’s Graph Wavelet
- It is composed of a mother wave and a group of wavelet bases: $W=(W_{\psi_1}, W_{\psi_2},\cdots)$
- $W_{\psi_i}(x)=Ug_i(\Lambda)^T x$ where $g_i(\cdot)$ is a kernel function and $g_i(\lambda)=diag(g_i(\lambda))$. -
- Beta Wavelet on Graph
- Beta Wavelet Graph Neural Network
- $Z_i=W_{i, C-i}(MLP(X))$
- $H=AGG([Z_0, Z_1, \cdots, Z_C])$
Experiments
Datasets:
- YelpChi
- Amazon
Comparisons:
- MLP/SVM
- GCN/ChebyNet/GAT/GraphSAGE…
- GraphConsis/CAREGNN…
A Comparative Study for Unsupervised Network Representation Learning
Main Idea
The paper summarizes several unsupervised network representation learning (UNRL) approaches over graphs:
- Random Walk based
- Matrix Factorization
- Deep Learning based They consider 9 UNRL methods and 11 datasets on two tasks: node classification and link prediction, and analyze all of them to show a clear blueprint of the topic.
The involved algorithms are listed in the table:
Experiments
Graph Structure of Neural Networks
Main Idea
The paper takes the neural networks as graphs and develops a graph-based representation called relational graph
. Then they design a graph generator WS-flex
to systematically explore the design space of neural networks.
In the Average Path Length-Clustering Coefficient figures, they claim to find sweet spots to find the best performance models. They furthermore provide a quick way to identify a sweet spot.
SIGL: Securing Software Installations Through Deep Graph Learning
Main Idea
The paper designs a tool, SIGL, to detect malicious behavior on software installation. It first build the provenace graphs by system call activities. Then it uses Graph LSTM as an autoencoder to generate embeddings. System Overview:
- Data Collection and Featurization
- Model Training and Validation
- Anomaly Detection and Prioritization
Key insights
- Formalize the problem of detecting malicious software installation.
- Define the
Software Installation Graphs
and use an autoencoder to generate graph-level embeddings.
Threat model
They define the installation behavior of a software package as a chain of system events which can be displayed as a directed acyclic graph. Nodes represent processes and objects while edges represent interactions. The operating system and audit framework are benign.
Experiments
Dataset: synthetic dataset on Windows ETW and Linux Aduit
It also consider the poisoning attack and adversarial attack. For poisoning attacks, different ratios of malware data from 5% to 25% are added. For adversarial attacks, they design restrict black-box attack (RBA) including feature and structure attacks and practical black-box attack (PBA) including hierarchical reinforcement learning attack.
You are what you do: Hunting Stealthy Malware via Data Provenance Analysis
Main idea
The paper designs a provenance-based approach ProvDetector
to detect stealthy malware.
- Provenance graph building
- Representation Extraction
- Rareness-based path selection
- Embedding
- Anomaly Detection
Key insights
- Detection of marginal deviation: Impersonation-based stealthy malware blends into benign programs.
ProvDetector
needs to identify and isolate the marginal outlier events from the benign behaviors. - Scalable model building: The size of provenance graphs is super large.
ProvDetector
works on the suspicious subgraphs by ranking the top $K$ uncommon causal paths.
Threat Model
The paper concentrates on stealthy malware that leverages legitimate methods or machine services from the victim’s host to exploit and execute malicious activities.
- The operating systems and the loggers are benign or in the trusted computing base (TCB) and the adversary cannot change the records.
- The side channel attack is also not considered
Experiments
They collected process instances from 23 programs. In conclusions, they tested on 1150 malware samples that hijack benign processes.
They also evaluated the interpretation of detection results.
- Simple models do not perform well because they can only consider one hop neighborhoods.
- Whole Graph modeling is not a good feature based on their experiments when they use
graph2vec
to detect hijacked process attacks.
Log2vec: A heterogeneous Graph Embedding Based Approach for Detecting Cyber Threats within Enterprise
Main idea
The paper design an embedding generator to detect malicious logs. It firstly build a heterogeneous graph from logs that each node represents the log event and they design 10 rules to decide the edges. Basically, the edges show the common hosts or temporal sequence between nodes. In general, the heterogeneous graph looks like a self-defined connections instead of the raw node interaction structures. After the graph generating, they use random walk and word2vec to generate the node embedddings and clustering algorithm to detect anomaly nodes.
Key insights
- The graph definition is totally different from provenance graphs. It looks like a multi-dimensional log sequence.
- It did not use GNN models. Instead, Random walk + word2vec is used.
Experiments
datasets:
- cert
- LANL
Learning Causal Effects on Hypergraphs
Main Idea
The paper concentrates on individual treatment effect
(ITE) estimation on hypergraphs. One example of ITE estimation is how much an intervention (wearing face covering) will causally afect an outcome (COVID-19 infection) of each individual node. They use a hypergraph neural network to learn high-order interference.
Key insights
- The first one works on the problem of ITE estimation under high-order interference on hypergraphs.
- Design a framework to models confounders and high-order interference by representation learning and hypergraph neural networks.
Experiments
Dataset: Semi-synthetic data from Contact, Goodreads and Microsoft Teams.
Multi-Dimensional Network Embedding with Hierarchical Structure
Main Idea
The paper designs a multi-dimensional graph model that considers the edge types and node types in different layers/dimensions. For example, if a set of nodes can be shown as users/products and edges can be viewing or purchasing, we can draw two graphs that represent viewing and purchasing information. In that case, two graphs have same nodes but different edges.
Model Structure
- For different edge types, using a specific embedding to record the dimension information.
- For different node types, using an embedding to record the hierarchical strucutre information.
- Use a skip-gram model to learn the embeddings. And simply adding the embeddings together.
GraphMAE: Self-Supervised Masked Graph Autoencoders
Main Idea
The paper focuses on Self-supervised Learning (SSL) by graph autoencoders (GAEs). It lists four common problems of the current models:
- The structure information is over-emphasized.
- Feature reconstruction without corruption is not robust.
- The mean square error is sensitive and unstable.
- The decoder architectures are of little expressiveness. They improve the following parts:
- Masked feature reconstruction.
- Scaled cosine error.
- Re-mask decoding.
Background of Self-supervised Learning[1]
- Generative: Use an encoder to generate middle vector $z$ and use an decoder to rebuild the orginal input $x$.
- Contrastive: After getting the middle vector $z$, compare the similarity.
- Generative-Contrastive: Based on the generative model, use a large model like ResNet to compare them.
Model Techniques
- Use feature reconstruction as the objective.
- Reconstruct the masked features and apply a uniform random sampling strategy.
- Use a GNN-decoder with re-mask decoding.
- Use scaled cosine error as the criterion.
Reference:
-
- Self-supervised Learning: Generative or Contrastive