From 7bab2d30b971b9201cc7dc1b5f4ecdc6f179f900 Mon Sep 17 00:00:00 2001 From: = <=> Date: Thu, 9 Jan 2020 13:16:26 +0100 Subject: [PATCH] l --- notes.org | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) diff --git a/notes.org b/notes.org index 8c3b86d..8ced776 100644 --- a/notes.org +++ b/notes.org @@ -69,3 +69,36 @@ - When looking at attribute-based noise, REGAL outperforms FINAL(which uses a proximity embedding but handles attributes) in both accuracy and runtime, mostly. FINAL achieves slightly higher accuracy with small noise, due to it's reliance on attributes. FINAL incurs significant runtime increases as it uses extra attribute information. - The sensitivity to changes in parameters is shown to be quite significant and they conclude that the discount factor \delta should be between 0.01 and 0.1. The hop distance K should be less than 3. Setting structural and attributed similarity to 1 does fairly well and that the top-a (using more than 1, such as 5 or 10) accuracy is significantly better than the top-1. Higher number of landmarks means higher accuracy and it should be $p = t*log_2(n)$ for $t \approx 10$. - So it's highly scalable, it's suitable for cross-network analysis, it leverages the power of structural identity, does not require any prior alignment information, it's robust to different settings and datasets and it's very fast and quite accurate. +** Low Rank Spectral Network Alignment +*** Intro +- EigenAlign requires a memory which is linear in the size of the graphs, whereas most other methods require quadratic memory. +- The key step to this insight is identifying low-rank structure in the node-similarity matrix used by EigenAlign for determining matches. +- With an exact, closed-form low-rank structure, we then solve a maximum weight bipartite matching problem on that low-rank matrix to produce the matching between the graphs. +- For this task, we show a new, a-posteriori, approximation bound for a simple algorithm to approximate a maximum weight bipartite matching problem on a low-rank matrix. +- There are two major approaches to network alignment problems: local network alignment, where the goal is to find local regions of the graph that are similar to any given node, and global network alignment, where the goal is to understand how two large graphs would align to each other. +- The EigenAlign method uses the dominant eigenvector of a matrix related to the product-graph between the two networks in order to estimate the similarity. The eigenvector information is rounded into a matching between the vertices of the graphs by solving a maximum-weight bipartite matching problem on a dense bipartite graph +- a key innovation of EigenAlign is that it explicitly models nodes that may not have a match in the network. In this way, it is able to provably align many simple graph models such as Erdős-Rényi when the graphs do not have too much noise. + + Even though is still suffers from the quadratic memory requirement. +*** Network Alignment formulations +**** The Canonical Network Alignment problem +- In some cases we additionally receive information about which nodes in one network can be paired with nodes in the other. This additional information is presented in the form of a bipartite graph whose edge weights are stored in a matrix L; if L_uv > 0, this indicates outside evidence that node u in G_A should be matched to node v in G_B. +**** Objective Functions for Network Alignment +- Describes the problem as seeking to find a matrix P which has 1 in index u,v, if u is matched with (only) v in the other graph. +- We then seek a matrix P which maximises the number of overlapping edges between G_A and G_B, so the number of adjacent node pairs should be mapped to adjacent node pairs in the other graph. We get an integer quadratic program. +- There is no downside to matches that do not produce an overlap, so edges in G_A which are mapped to non-edges (wtf is this?) in G_B, or vice versa. +- They define the $AlignmentScore(P) = s_O(#overlaps) + s_N(#non-informative) + s_C(#conflicts)$, where the different $s$ are weights, such that $s_O > s_N > s_C$. This score defines a matrix $M$, which is massive. This matrix M is used to define a quadratic assigment problem, which is equivalent to maximising ther AlignmentScore. +- One can however solve an eigenvector equation instead of the quadratic program, which is what EigenAlign does. + 1) Find the eigenvector $x$ of M that corresponds to the eigenvalue of largest magnitude. M is of dimension $n_A n_B \times n_A n_B$, where $n_A$ and $n_B$ are the number of nodes in G_A and G_B, so the eigenvector is of dimension $n_A n_B$ and can thus be reshaped into a matrix X of size $n_A \times n_B$ where each entry represents a score for every pair of nodes between the two graphs. This is the similarity matrix, as it reflects the topological similarity between vertices of G_A and G_B. + 2) Run bipartite matching on the similarity matrix X, that maximises the total weight of the final alignment. +- The authors show that the similarity matrix X can be represented through an exact low-rank factorisation. This allows them to avoid quadratic storage requirement of EigenAlign. They also present new fast techniques for bipartite matching problems on low-rank matrices. Together, this yields a far more scalable algorithm. +*** Low Rank Factors of EigenAlign +- Use power iteration (some algo used to compute an eigenvector + value from a diagonalisable matrix) on M to find the dominant eigenvector which can then be reshaped into the sim matrix X. This can also be solved as an optimisation problem. +- If matrix X is estimated with the power-method starting from a rank 1 matrix, then the kth iteration of the power method results in a rank k+1 matrix that can be explicitly and exactly computed. +- We wish to show that the matrix X can be factorised via an only two-factor decomposition: $X_k = UV^T$ for X of rank k. +** Cross-Network Embedding for Multi-Network Alignment +*** Intro +- Recently, data mining through analyzing the complex structure and diverse relationships on multi-network has attracted much attention in both academia and industry. One crucial prerequisite for this kind of multi-network mining is to map the nodes across different networks, i.e., so-called network alignment. +- CrossMNA is for multi-network alignment through investigating structural information only. +- Uses two types of node embedding vectors: + 1) Inter-vector for network alignment + 2) Intra-vector for other downstream network analysis tasks