105 lines
20 KiB
Org Mode
105 lines
20 KiB
Org Mode
* Graph alignment
|
||
** REGAL
|
||
*** Intro
|
||
- network alignment, or the task of identifying corresponding nodes in different networks, has applications across the social and natural sciences.
|
||
- REGAL (REpresentation learning-based Graph ALignment) + Motivated by recent advancements in node representation learning for single-graph tasks
|
||
+ a framework that leverages the power of automatically learned node representations to match nodes across different graphs.
|
||
- xNetMF, an elegant and principled node embedding formulation that uniquely generalizes to multi-network problems.
|
||
- network alignment or matching, which is the problem of finding corresponding nodes in different networks. + Crucial for identifying similar users in different social networks or analysing chemical compounds
|
||
- Many existing methods try to relax the computationally hard optimization problem, as designing features that directly compared for nodes in different networks is not an easy task.
|
||
- we propose network alignment via matching latent, learned node representations.
|
||
- *Problem:* Given two graphs G_1 and G_2 with nodesets V_1 and V_2 and possibly node attributes A_1 and A_2 resp., devise an efficient network alignment method that aligns nodes by learning directly comparable node representations Y_1 and Y_2, from which a node mapping $\phi: V_1 \rightarrow V_2$ between the networks can be inferred.
|
||
- REGAL is a framework that efficiently identifies node matchings by greedily aligning their latent feature representations.
|
||
- They use Cross-Network Matrix Factorization (xNetMF) to learn the representations
|
||
+ xNetMF preserves structural similarities rather than proximity-based similarities, allowing for generalization beyond a single network. + xNetMF is formulated as matrix factorization over a similarity matrix which incorporates structural similarity and attribute agreement between nodes in disjoint graphs.
|
||
+ Constructing the similarity matrix is tough, as is requires computing all pairs of similarities between nodes in the multiple networks, they extend the Nyström low-rank approximation, which is commonly used for large-scale kernel machines. + This makes xNetMF a principled and efficient implicit matrix factorization-based approach.
|
||
- our approach can be applied to attributed and unattributed graphs with virtually no change in formulation, and is unsupervised: it does not require prior alignment information to find high-quality matchings.
|
||
- Many well-known node embedding methods based on shallow architectures such as the popular skip-gram with negative sampling (SGNS) have been cast in matrix factorization frameworks. However, ours is the first to cast node embedding using SGNS to capture structural identity in such a framework
|
||
- we consider the significantly harder problem of learning embeddings that may be individually matched to infer node-level alignments.
|
||
*** REGAL Description
|
||
- Let G_1(V_1, E_1) and G_2(V_2, E_2) be two unweighted and undirected graphs (described in the setting of two graphs, but can be extended to more), with node sets V_1 and V_2 and edge sets E_1 and E_2; and possible node attribute sets A_1 and A_2.
|
||
+ Graphs does not have to be the same size
|
||
- Let n = |V_1| + |V_2|, so the amount of nodes across the two graphs.
|
||
- The steps are then:
|
||
1) *Node Identity Extraction:* Extract structure and attribute-related info from all n nodes
|
||
2) *Efficient Similarity-based Representation:* Obtains node embeddings, conceptually by factorising a similarity matrix of the node identities from step 1. However, the computation of this similarity matrix and the factorisation of it is expensive, so they extend the Nystrom Method for low-rank matrix approximation to perform an implicit similarity matrix factorisation by *(a)* comparing similarity of each node only to a sample of p << n so-called "landmark nodes" and *(b)* using these node-to-landmark similarities to construct the representations from a decomposition of its low-rank approximation.
|
||
3) *Fast Node Representation Alignment:* Align nodes between the two graphs by greedily matching the embeddings with an efficient data structure (KD-tree) that allows for fast identification of the top-a most similar embeddings from the other graph.
|
||
- The first two steps are the xNetMF method
|
||
**** Step 1
|
||
- The goal of REGAL’s representation learning module, xNetMF, is to define node “identity” in a way that generalizes to multi-network problems.
|
||
- As nodes in multi-network problems have no direct connections to each other, their proximity can't be sampled by random walks on separate graphs. This is overcome by instead focusing on more broadly comparable, generalisable quantities: Structural Identity which relates to structural roles and Attribute-Based Identity.
|
||
- *Structural Identity*: In network alignment, the well-established assumption is that aligned nodes have similar structural connectivity or degrees. Thus, we can use the degrees of the neighbours of a node as structural identity. They also consider neighbors up to k hops from the original node. + For some node $u \in V$, $R_u^k$ is then the set of nodes at exactly (up to??) k hops from $u$. We could capture the degrees of these nodes within a vector of length the highest degree within the graph $(D)$ $d_u^k$ where the i'th entry of $d_u^k(i)$ then denotes the amount of nodes in $R_u^k$ of degree $i$. This will however potentially be very long and very sparse, if a single node has a high degree, forcing up the length of $d_u^k$. Instead, nodes are bin'ned together into $b = [log_2(D)]$ logarithmically scaled buckets with entry $i$ of $d_u^k$ contains number of nodes $u \in R_u^k$ such that $floor([log_2(deg(u))]) = i$. Is both much shorter ($log_2(D)$) but also more robust to noise.
|
||
- *Attribute-Based Identity*: Given $F$ node attributes, they create for each node $u$ an $F$-dimensional vector $f_u$ representing the values of $u$. So $f_u(i)$ = the i'th attribute of $u$.
|
||
- *Cross-Network Node Similarity*: Relies on the structural and attribute information rather than direct proximity: $sim(u,v) = exp[-\gamma_s \cdot \left\lVert d_u - d_v \right\rVert_2^2 - \gamma_a \cdot dist(f_u, f_v)]$ where $\gamma_s, \gamma_a$ are scalar params controlling effect of structural and attribute based identity, $dist(f_u, f_v)$ is attribute-based dist of nodes $u$ and $v$ and $d_u = \sum_k=1^K \delta^{k-1} d_u^k$ describes the neighbor degree vector for $u$ aggregated over $K$ different hops where $\delta$ is a discount factor for greater hop distances and K is the maximum hop distance to consider. So they compare structural identities at several levels by combining the neighborhood degree distributions at several hop distances. The distance between attribute based identities depends on the type of node attributes, real-valued, categorial, so on. For categorical attributes, the number of disagreeing features can be used as an attribute-based distance measure.
|
||
**** Step 2
|
||
- Avoids random walks due to two reasons:
|
||
1) The variance they introduce in the representation learning often makes embeddings across different networks non-comparable
|
||
2) they can add to the computational expense. For example, node2vec’s total runtime is dominated by its sampling time.
|
||
- Use an implicit matrix factorisation-based approach that leverages a combined structural and attribute-based similarity matrix S, which is a result of the sim function from step 1, and considers similarities at different neighborhoods.
|
||
- We need to find $n \times p$ matrices $Y$ and $Z$ such that $S \approx YZ^T$ where $Y$ is the node embedding matrix and $Z$ is irrelevant. Thus, we need to find these node embeddings $Y$ WITHOUT actually computing $S$.
|
||
+ Finding Y can naturally be done by computing S via sim() and then factorise it (via some function using something called the Frobenius Norm as error function apparently). This is very expensive though. + Can also be done by creating a sparse matrix by computing only the "most important" similarities for each node, choosing only a small number of comparisons for instance by looking at similarity of node degree. This is fragile to noise though.
|
||
- We will approximate S with a low-rank matrix $\tilde{S}$ which is never explicitly computed. We randomly select $p << n$ "landmark" nodes chosen across both graphs G_1 and G_2 and then compute the similarities to all $n$ nodes in the these graphs using the sim() function. This yields a $n \times p$ similarity matrix $C$. (Note that we only compute it for the $p$ landmark nodes, yielding the $n \times p$ matrix). From $C$ we can extract a $p \times p$ "landmark-to-landmark" matrix, which is called $W$. $C$ and $W$ can be used to approximate the full similarity matrix which then allows us to obtain the node embeddings without ever computing and factorising the approximative similarity matrix $\tilde{S}$. To accomplish this, they extend the Nystrom method such that the low-rank matrix $\tilde{S}$ is given as: $\tilde{S} = CW^{\dag}C^T$. $C$ is the landmark-to-all similarity matrix and $W^\dag$ is the pseudoinverse (??) of $W$, the landmark-to-landmark similarity matrix. The landmark nodes are chosen randomly, as more elaborate methods such as looking at node centrality and such are much more inefficient and offers little to none improvements. Since \tilde{S} contains an estimate for all similarities within the graphs, it would still take $n^2$ space, but luckily we never have to compute this.
|
||
- We can actually get the node embeddings $Y$ from a decomposition of the equation for \tilde{S}.
|
||
- Given graphs G_1(V_1, E_1) and G_2(V_2, E_2) with $n \times n$ joint combined structural and attribute-based similarity matrix $S \approx YZ^T$, its node embeddings $Y$ can then be approximated as: $\tilde{Y} \approx CU\Sigma^{1/2}$, where $C$ is the $n \times p$ landmark-to-all matrix and $W^\dag = U\Sigma V^T$ is the full rank singular value decomposition of the pseudoinverse of the small $p \times p$ landmark-to-landmark sim matrix W.
|
||
+ Given the full rank SVD of $p \times p$ matrix $W^\dag$ as $U\Sigma V^T$, we can then write $S \approx \tilde{S} = C(U\Sigma V^T) C^T = (CU\Sigma^{1/2}) = (\Sigma^{1/2}V^T C^T) = \tilde{Y} \tilde{Z}^T$. + So we can compute $\tilde{Y}$ based on the SVD (Which is expensive) on the small matrix $W^\dag$ and the matrix $C$. The p-dimensional node embeddings of the two graphs are then subsets of the $\tilde{Y}$.
|
||
**** Step 3
|
||
- We have to efficiently align nodes, assuming $u \in V_1$, $v \in V_2$ may match if their xNetMF embeddings are similar. Let \tilde{Y}_1 and \tilde{Y}_2 denote the matrices of the p-dimensional embeddings of G_1 and G_2.
|
||
- We take the likeliness of (soft) alignment to be proportional to the similarity between the nodes’ embeddings. Thus, we greedily align nodes to their closest match in the other graph based on embedding similarity.
|
||
- A naive way of finding alignments for each node would be to compute similarities of all pairs between node embeddings (The rows of \tilde{Y}_1 and \tilde{Y}_2) and then choose the top-1 for each node. This is inefficient though.
|
||
- Instead, we store the embeddings of \tilde{Y}_2 in a k-d tree, which accelerates exact similarity search for nearest neighbor algorithms. For each node in G_1 we then query this tree with its embeddings to find the $a << n$ closest embeddings from nodes in G_2. This allows us to compute "soft" alignments where we return one or more nodes with the most similar embeddings. The similarity between the p-dimensional embeddings of $u$ and $v$ are defined as: $sim_{emb}(\tilde{Y}_1[u], \tilde{Y}_2[v]) = e^{-\left\lVert \tilde{Y}_1[u] - \tilde{Y}_2[v] \right\rVert_2^2}$, converting the euclidean distance to similarity.
|
||
*** Complexity Analysis
|
||
- We assume both graphs have $n_1 = n_2 = n$ nodes.
|
||
1) *Extracting Node Identity*: Takes approximately $O(nKd_{avg}^2)$ time finding neighborhoods up to distance $K$, by joining the neighborhoods of neighbors at the previous hop. We can construct $R_u^k = \Cup_{v \in R_u^{k-1}} R_v^1 - \Cup_{i=1}^{k-1} R_u^i$ for node $u$. Could also be solved using breadth-first-search in time $O(n^3)$.
|
||
2) *Computing Similarities*: Similarities are computed of the length-b features (weighted counts of node degrees in the k-hop neighborhoods split into b buckets) between each node and the p landmark nodes in time: $O(npb)$
|
||
3) *Obtaining Representations*: Constructing the pseudoinverse $W^\dag$ and computing the SVD of this $p \times p$ matrix takes time $O(p^3)$ and then multiplying it with $C$ in time $O(np^2)$. Since $p << n$, total time is $O(np^2)$.
|
||
4) *Aligning Embeddings*: Constructing k-d tree and using it to find the top alignments in G_2 for each of the n nodes in G_1 is average-case time complexity $O(nlog(n))$.
|
||
- Total time complexity is then: $O(n * max(pb, p^2, Kd_{avg}^2, log(n))$
|
||
- It suffices to pick small values $K$ and $p$ and picking $b$ logarithmically in $n$. $d_{avg}$ is oftentimes small in practice. $d_{avg}$ explains the average node degree.
|
||
*** Experiments
|
||
- They test on networks where they find a real network dataset with some adjacency matrix $A$. They then generate a new network with adjacency matrix $A' = P*A*P^T$ where $P$ is some randomly generated permutation matrix. Structural noise is added to $A'$ by removing edges with probability $p_s$ without disconnecting any nodes.
|
||
- For experiments with attributes, they generate synthetic attributes for nodes, if the graph does not have any. Noise is added by flipping binary values or choosing values randomly with probability $p_a$.
|
||
- In accuracy when noise is added, Regal using xNetMF and Regal using Struct2Vec (which is some other form of computing the embeddings) far outperform any other algorithms. Apparently struct2vec adds some noise as it samples something called contexts, which might add variance to it, which is why xNetMF likely wins with low noise. As noise grows however, struct2vec wins in accuracy, but not speed.
|
||
- When looking at attribute-based noise, REGAL outperforms FINAL(which uses a proximity embedding but handles attributes) in both accuracy and runtime, mostly. FINAL achieves slightly higher accuracy with small noise, due to it's reliance on attributes. FINAL incurs significant runtime increases as it uses extra attribute information.
|
||
- The sensitivity to changes in parameters is shown to be quite significant and they conclude that the discount factor \delta should be between 0.01 and 0.1. The hop distance K should be less than 3. Setting structural and attributed similarity to 1 does fairly well and that the top-a (using more than 1, such as 5 or 10) accuracy is significantly better than the top-1. Higher number of landmarks means higher accuracy and it should be $p = t*log_2(n)$ for $t \approx 10$.
|
||
- So it's highly scalable, it's suitable for cross-network analysis, it leverages the power of structural identity, does not require any prior alignment information, it's robust to different settings and datasets and it's very fast and quite accurate.
|
||
** Low Rank Spectral Network Alignment
|
||
*** Intro
|
||
- EigenAlign requires a memory which is linear in the size of the graphs, whereas most other methods require quadratic memory.
|
||
- The key step to this insight is identifying low-rank structure in the node-similarity matrix used by EigenAlign for determining matches.
|
||
- With an exact, closed-form low-rank structure, we then solve a maximum weight bipartite matching problem on that low-rank matrix to produce the matching between the graphs.
|
||
- For this task, we show a new, a-posteriori, approximation bound for a simple algorithm to approximate a maximum weight bipartite matching problem on a low-rank matrix.
|
||
- There are two major approaches to network alignment problems: local network alignment, where the goal is to find local regions of the graph that are similar to any given node, and global network alignment, where the goal is to understand how two large graphs would align to each other.
|
||
- The EigenAlign method uses the dominant eigenvector of a matrix related to the product-graph between the two networks in order to estimate the similarity. The eigenvector information is rounded into a matching between the vertices of the graphs by solving a maximum-weight bipartite matching problem on a dense bipartite graph
|
||
- a key innovation of EigenAlign is that it explicitly models nodes that may not have a match in the network. In this way, it is able to provably align many simple graph models such as Erdős-Rényi when the graphs do not have too much noise.
|
||
+ Even though is still suffers from the quadratic memory requirement.
|
||
*** Network Alignment formulations
|
||
**** The Canonical Network Alignment problem
|
||
- In some cases we additionally receive information about which nodes in one network can be paired with nodes in the other. This additional information is presented in the form of a bipartite graph whose edge weights are stored in a matrix L; if L_uv > 0, this indicates outside evidence that node u in G_A should be matched to node v in G_B.
|
||
**** Objective Functions for Network Alignment
|
||
- Describes the problem as seeking to find a matrix P which has 1 in index u,v, if u is matched with (only) v in the other graph.
|
||
- We then seek a matrix P which maximises the number of overlapping edges between G_A and G_B, so the number of adjacent node pairs should be mapped to adjacent node pairs in the other graph. We get an integer quadratic program.
|
||
- There is no downside to matches that do not produce an overlap, so edges in G_A which are mapped to non-edges (wtf is this?) in G_B, or vice versa.
|
||
- They define the $AlignmentScore(P) = s_O(#overlaps) + s_N(#non-informative) + s_C(#conflicts)$, where the different $s$ are weights, such that $s_O > s_N > s_C$. This score defines a matrix $M$, which is massive. This matrix M is used to define a quadratic assigment problem, which is equivalent to maximising ther AlignmentScore.
|
||
- One can however solve an eigenvector equation instead of the quadratic program, which is what EigenAlign does.
|
||
1) Find the eigenvector $x$ of M that corresponds to the eigenvalue of largest magnitude. M is of dimension $n_A n_B \times n_A n_B$, where $n_A$ and $n_B$ are the number of nodes in G_A and G_B, so the eigenvector is of dimension $n_A n_B$ and can thus be reshaped into a matrix X of size $n_A \times n_B$ where each entry represents a score for every pair of nodes between the two graphs. This is the similarity matrix, as it reflects the topological similarity between vertices of G_A and G_B.
|
||
2) Run bipartite matching on the similarity matrix X, that maximises the total weight of the final alignment.
|
||
- The authors show that the similarity matrix X can be represented through an exact low-rank factorisation. This allows them to avoid quadratic storage requirement of EigenAlign. They also present new fast techniques for bipartite matching problems on low-rank matrices. Together, this yields a far more scalable algorithm.
|
||
*** Low Rank Factors of EigenAlign
|
||
- Use power iteration (some algo used to compute an eigenvector + value from a diagonalisable matrix) on M to find the dominant eigenvector which can then be reshaped into the sim matrix X. This can also be solved as an optimisation problem.
|
||
- If matrix X is estimated with the power-method starting from a rank 1 matrix, then the kth iteration of the power method results in a rank k+1 matrix that can be explicitly and exactly computed.
|
||
- We wish to show that the matrix X can be factorised via an only two-factor decomposition: $X_k = UV^T$ for X of rank k.
|
||
** Cross-Network Embedding for Multi-Network Alignment
|
||
*** Intro
|
||
- Recently, data mining through analyzing the complex structure and diverse relationships on multi-network has attracted much attention in both academia and industry. One crucial prerequisite for this kind of multi-network mining is to map the nodes across different networks, i.e., so-called network alignment.
|
||
- CrossMNA is for multi-network alignment through investigating structural information only.
|
||
- Uses two types of node embedding vectors:
|
||
1) Inter-vector for network alignment
|
||
2) Intra-vector for other downstream network analysis tasks
|