adma_prep/notes.html

314 lines
14 KiB
HTML
Raw Permalink Normal View History

2020-01-08 18:17:07 +00:00
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
<title>notes</title>
<!-- 2020-01-08 Wed 13:49 -->
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
<meta name="generator" content="Org-mode" />
<meta name="author" content="alex" />
<style type="text/css">
<!--/*--><![CDATA[/*><!--*/
.title { text-align: center; }
.todo { font-family: monospace; color: red; }
.done { color: green; }
.tag { background-color: #eee; font-family: monospace;
padding: 2px; font-size: 80%; font-weight: normal; }
.timestamp { color: #bebebe; }
.timestamp-kwd { color: #5f9ea0; }
.right { margin-left: auto; margin-right: 0px; text-align: right; }
.left { margin-left: 0px; margin-right: auto; text-align: left; }
.center { margin-left: auto; margin-right: auto; text-align: center; }
.underline { text-decoration: underline; }
#postamble p, #preamble p { font-size: 90%; margin: .2em; }
p.verse { margin-left: 3%; }
pre {
border: 1px solid #ccc;
box-shadow: 3px 3px 3px #eee;
padding: 8pt;
font-family: monospace;
overflow: auto;
margin: 1.2em;
}
pre.src {
position: relative;
overflow: visible;
padding-top: 1.2em;
}
pre.src:before {
display: none;
position: absolute;
background-color: white;
top: -10px;
right: 10px;
padding: 3px;
border: 1px solid black;
}
pre.src:hover:before { display: inline;}
pre.src-sh:before { content: 'sh'; }
pre.src-bash:before { content: 'sh'; }
pre.src-emacs-lisp:before { content: 'Emacs Lisp'; }
pre.src-R:before { content: 'R'; }
pre.src-perl:before { content: 'Perl'; }
pre.src-java:before { content: 'Java'; }
pre.src-sql:before { content: 'SQL'; }
table { border-collapse:collapse; }
caption.t-above { caption-side: top; }
caption.t-bottom { caption-side: bottom; }
td, th { vertical-align:top; }
th.right { text-align: center; }
th.left { text-align: center; }
th.center { text-align: center; }
td.right { text-align: right; }
td.left { text-align: left; }
td.center { text-align: center; }
dt { font-weight: bold; }
.footpara:nth-child(2) { display: inline; }
.footpara { display: block; }
.footdef { margin-bottom: 1em; }
.figure { padding: 1em; }
.figure p { text-align: center; }
.inlinetask {
padding: 10px;
border: 2px solid gray;
margin: 10px;
background: #ffffcc;
}
#org-div-home-and-up
{ text-align: right; font-size: 70%; white-space: nowrap; }
textarea { overflow-x: auto; }
.linenr { font-size: smaller }
.code-highlighted { background-color: #ffff00; }
.org-info-js_info-navigation { border-style: none; }
#org-info-js_console-label
{ font-size: 10px; font-weight: bold; white-space: nowrap; }
.org-info-js_search-highlight
{ background-color: #ffff00; color: #000000; font-weight: bold; }
/*]]>*/-->
</style>
<script type="text/javascript">
/*
@licstart The following is the entire license notice for the
JavaScript code in this tag.
Copyright (C) 2012-2013 Free Software Foundation, Inc.
The JavaScript code in this tag is free software: you can
redistribute it and/or modify it under the terms of the GNU
General Public License (GNU GPL) as published by the Free Software
Foundation, either version 3 of the License, or (at your option)
any later version. The code is distributed WITHOUT ANY WARRANTY;
without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU GPL for more details.
As additional permission under GNU GPL version 3 section 7, you
may distribute non-source (e.g., minimized or compacted) forms of
that code without the copy of the GNU GPL normally required by
section 4, provided you include this license notice and a URL
through which recipients can access the Corresponding Source.
@licend The above is the entire license notice
for the JavaScript code in this tag.
*/
<!--/*--><![CDATA[/*><!--*/
function CodeHighlightOn(elem, id)
{
var target = document.getElementById(id);
if(null != target) {
elem.cacheClassElem = elem.className;
elem.cacheClassTarget = target.className;
target.className = "code-highlighted";
elem.className = "code-highlighted";
}
}
function CodeHighlightOff(elem, id)
{
var target = document.getElementById(id);
if(elem.cacheClassElem)
elem.className = elem.cacheClassElem;
if(elem.cacheClassTarget)
target.className = elem.cacheClassTarget;
}
/*]]>*///-->
</script>
<script type="text/javascript" src="http://orgmode.org/mathjax/MathJax.js"></script>
<script type="text/javascript">
<!--/*--><![CDATA[/*><!--*/
MathJax.Hub.Config({
// Only one of the two following lines, depending on user settings
// First allows browser-native MathML display, second forces HTML/CSS
// config: ["MMLorHTML.js"], jax: ["input/TeX"],
jax: ["input/TeX", "output/HTML-CSS"],
extensions: ["tex2jax.js","TeX/AMSmath.js","TeX/AMSsymbols.js",
"TeX/noUndefined.js"],
tex2jax: {
inlineMath: [ ["\\(","\\)"] ],
displayMath: [ ['$$','$$'], ["\\[","\\]"], ["\\begin{displaymath}","\\end{displaymath}"] ],
skipTags: ["script","noscript","style","textarea","pre","code"],
ignoreClass: "tex2jax_ignore",
processEscapes: false,
processEnvironments: true,
preview: "TeX"
},
showProcessingMessages: true,
displayAlign: "center",
displayIndent: "2em",
"HTML-CSS": {
scale: 100,
availableFonts: ["STIX","TeX"],
preferredFont: "TeX",
webFont: "TeX",
imageFont: "TeX",
showMathMenu: true,
},
MMLorHTML: {
prefer: {
MSIE: "MML",
Firefox: "MML",
Opera: "HTML",
other: "HTML"
}
}
});
/*]]>*///-->
</script>
</head>
<body>
<div id="content">
<h1 class="title">notes</h1>
<div id="table-of-contents">
<h2>Table of Contents</h2>
<div id="text-table-of-contents">
<ul>
<li><a href="#sec-1">1. Graph alignment</a>
<ul>
<li><a href="#sec-1-1">1.1. REGAL</a>
<ul>
<li><a href="#sec-1-1-1">1.1.1. Intro</a></li>
<li><a href="#sec-1-1-2">1.1.2. REGAL Description</a></li>
</ul>
</li>
</ul>
</li>
</ul>
</div>
</div>
<div id="outline-container-sec-1" class="outline-2">
<h2 id="sec-1"><span class="section-number-2">1</span> Graph alignment</h2>
<div class="outline-text-2" id="text-1">
</div><div id="outline-container-sec-1-1" class="outline-3">
<h3 id="sec-1-1"><span class="section-number-3">1.1</span> REGAL</h3>
<div class="outline-text-3" id="text-1-1">
</div><div id="outline-container-sec-1-1-1" class="outline-4">
<h4 id="sec-1-1-1"><span class="section-number-4">1.1.1</span> Intro</h4>
<div class="outline-text-4" id="text-1-1-1">
<ul class="org-ul">
<li>network alignment, or the task of identifying corresponding nodes in different networks, has applications across the social and natural sciences.
</li>
<li>REGAL (REpresentation learning-based Graph ALignment)
<ul class="org-ul">
<li>Motivated by recent advancements in node representation learning for single-graph tasks
</li>
<li>a framework that leverages the power of automatically learned node representations to match nodes across different graphs.
</li>
</ul>
</li>
<li>xNetMF, an elegant and principled node embedding formulation that uniquely generalizes to multi-network problems.
</li>
<li>network alignment or matching, which is the problem of finding corresponding nodes in different networks.
<ul class="org-ul">
<li>Crucial for identifying similar users in different social networks or analysing chemical compounds
</li>
</ul>
</li>
<li>Many existing methods try to relax the computationally hard optimization problem, as designing features that directly compared for nodes in different networks is not an easy task.
</li>
<li>we propose network alignment via matching latent, learned node representations.
</li>
<li><b>Problem:</b> Given two graphs G<sub>1</sub> and G<sub>2</sub> with nodesets V<sub>1</sub> and V<sub>2</sub> and possibly node attributes A<sub>1</sub> and A<sub>2</sub> resp., devise an efficient network alignment method that aligns nodes by learning directly comparable node representations Y<sub>1</sub> and Y<sub>2</sub>, from which a node mapping \(\phi: V_1 \rightarrow V_2\) between the networks can be inferred.
</li>
<li>REGAL is a framework that efficiently identifies node matchings by greedily aligning their latent feature representations.
</li>
<li>They use Cross-Network Matrix Factorization (xNetMF) to learn the representations
<ul class="org-ul">
<li>xNetMF preserves structural similarities rather than proximity-based similarities, allowing for generalization beyond a single network.
</li>
<li>xNetMF is formulated as matrix factorization over a similarity matrix which incorporates structural similarity and attribute agreement between nodes in disjoint graphs.
</li>
<li>Constructing the similarity matrix is tough, as is requires computing all pairs of similarities between nodes in the multiple networks, they extend the Nyström low-rank approximation, which is commonly used for large-scale kernel machines.
</li>
<li>This makes xNetMF a principled and efficient implicit matrix factorization-based approach.
</li>
</ul>
</li>
<li>our approach can be applied to attributed and unattributed graphs with virtually no change in formulation, and is unsupervised: it does not require prior alignment information to find high-quality matchings.
</li>
<li>Many well-known node embedding methods based on shallow architectures such as the popular skip-gram with negative sampling (SGNS) have been cast in matrix factorization frameworks. However, ours is the first to cast node embedding using SGNS to capture structural identity in such a framework
</li>
<li>we consider the significantly harder problem of learning embeddings that may be individually matched to infer node-level alignments.
</li>
</ul>
</div>
</div>
<div id="outline-container-sec-1-1-2" class="outline-4">
<h4 id="sec-1-1-2"><span class="section-number-4">1.1.2</span> REGAL Description</h4>
<div class="outline-text-4" id="text-1-1-2">
<ul class="org-ul">
<li>Let G<sub>1</sub>(V<sub>1</sub>, E<sub>1</sub>) and G<sub>2</sub>(V<sub>2</sub>, E<sub>2</sub>) be two unweighted and undirected graphs (described in the setting of two graphs, but can be extended to more), with node sets V<sub>1</sub> and V<sub>2</sub> and edge sets E<sub>1</sub> and E<sub>2</sub>; and possible node attribute sets A<sub>1</sub> and A<sub>2</sub>.
<ul class="org-ul">
<li>Graphs does not have to be the same size
</li>
</ul>
</li>
<li>Let n = |V<sub>1|</sub> + |V<sub>2|</sub>, so the amount of nodes across the two graphs.
</li>
<li>The steps are then:
<ol class="org-ol">
<li><b>Node Identity Extraction:</b> Extract structure and attribute-related info from all n nodes
</li>
<li><b>Efficient Similarity-based Representation:</b> Obtains node embeddings, conceptually by factorising a similarity matrix of the node identities from step 1. However, the computation of this similarity matrix and the factorisation of it is expensive, so they extend the Nystrom Method for low-rank matrix approximation to perform an implicit similarity matrix factorisation by <b>(a)</b> comparing similarity of each node only to a sample of p &lt;&lt; n so-called "landmark nodes" and <b>(b)</b> using these node-to-landmark similarities to construct the representations from a decomposition of its low-rank approximation.
</li>
<li><b>Fast Node Representation Alignment:</b> Align nodes between the two graphs by greedily matching the embeddings with an efficient data structure (KD-tree) that allows for fast identification of the top-a most similar embeddings from the other graph.
</li>
</ol>
</li>
<li>The first two steps are the xNetMF method
</li>
</ul>
</div>
<ol class="org-ol"><li><a id="sec-1-1-2-1" name="sec-1-1-2-1"></a>Step 1<br /><div class="outline-text-5" id="text-1-1-2-1">
<ul class="org-ul">
<li>The goal of REGALs representation learning module, xNetMF, is to define node “identity” in a way that generalizes to multi-network problems.
</li>
<li>As nodes in multi-network problems have no direct connections to each other, their proximity can't be sampled by random walks on separate graphs. This is overcome by instead focusing on more broadly comparable, generalisable quantities: Structural Identity which relates to structural roles and Attribute-Based Identity.
</li>
<li><b>Structural Identity</b>: In network alignment, the well-established assumption is that aligned nodes have similar structural connectivity or degrees. Thus, we can use the degrees of the neighbours of a node as structural identity. They also consider neighbors up to k hops from the original node.
<ul class="org-ul">
<li>For some node \(u \in V\), \(R_u^k\) is then the set of nodes at exactly (up to??) k hops from \(u\). We could capture the degrees of these nodes within a vector of length the highest degree within the graph \((D)\) \(d_u^k\) where the i'th entry of \(d_u^k(i)\) then denotes the amount of nodes in \(R_u^k\) of degree \(i\). This will however potentially be very long and very sparse, if a single node has a high degree, forcing up the length of \(d_u^k\). Instead, nodes are bin'ned together into \(b = [log_2(D)]\) logarithmically scaled buckets with entry \(i\) of \(d_u^k\) contains number of nodes \(u \in R_u^k\) such that \(floor([log_2(deg(u))]) = i\). Is both much shorter (\(log_2(D)\)) but also more robust to noise.
</li>
</ul>
</li>
<li><b>Attribute-Based Identity</b>: Given \(F\) node attributes, they create for each node \(u\) an \(F\)-dimensional vector \(f_u\) representing the values of \(u\). So \(f_u(i)\) = the i'th attribute of \(u\).
</li>
<li><b>Cross-Network Node Similarity</b>: Relies on the structural and attribute information rather than direct proximity: \(sim(u,v) = exp[-\gamma_s \cdot \left\lVert d_u - d_v \right\rVert_2^2 - \gamma_a \cdot dist(f_u, f_v)]\)
</li>
</ul>
</div>
</li></ol>
</div>
</div>
</div>
</div>
<div id="postamble" class="status">
<p class="author">Author: alex</p>
<p class="date">Created: 2020-01-08 Wed 13:49</p>
<p class="creator"><a href="http://www.gnu.org/software/emacs/">Emacs</a> 25.2.2 (<a href="http://orgmode.org">Org</a> mode 8.2.10)</p>
<p class="validation"><a href="http://validator.w3.org/check?uri=referer">Validate</a></p>
</div>
</body>
</html>