From 10c2d2f2b719c55948a9490faedff8ac931bd0d4 Mon Sep 17 00:00:00 2001 From: = <=> Date: Thu, 13 Dec 2018 16:04:57 +0100 Subject: [PATCH] Added kademlia notes --- notes.org | 95 ++++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 94 insertions(+), 1 deletion(-) diff --git a/notes.org b/notes.org index acd899e..d28a6c9 100644 --- a/notes.org +++ b/notes.org @@ -15,6 +15,10 @@ TODO, potentially read all of the experiments performed in pastry. Potentially n - Each Chord node needs "routing" information about only a few others nodes, leading to better scaling. - Each node maintains information about O(log N) other nodes and resolves lookups via O(log N) messages. A change in the network results in no more than O(log^2 N) messages. - Chords performance degrades gracefully, when information is out of date in the nodes routing tables. It's difficult to maintain consistency of O(log N) states. Chord only requires one piece of information per node to be correct, in order to guarantee correct routing. +- Finger tables only forward looking +- I.e messages arriving at a peer tell it nothing useful, knowledge must be gained explicitly +- Rigid routing structure +- Locality difficult to establish *** System Model - Chord simplifies the design of P2P systems and applications based on it by addressing the following problems: 1) *Load balance:* Chord acts as a Distributed Hash Function, spreading keys evenly over the nodes, which provides a natural load balance @@ -179,7 +183,96 @@ than one hop away from the destination), but there is no routing table entry. As - This paper presents and evaluates Pastry, a generic peer-to-peer content location and routing system based on a self-organizing overlay network of nodes connected via the Internet. Pastry is completely decentralized, fault-resilient, scalable, and reliably routes a message to the live node with a nodeId numerically closest to a key. Pastry can be used as a building block in the construction of a variety of peer-to-peer Internet applications like global file sharing, file storage, group communication and naming systems. Results with as many as 100,000 nodes in an emulated network confirm that Pastry is efficient and scales well, that it is self-organizing and can gracefully adapt to node failures, and that it has good locality properties. ** Kademlia -TODO +*** Abstract +- A peer-to-peer distributed hash table with provable consistency and performance in a fault-prone environment +- system routes queries and locates nodes using a novel XOR-based metric topology +- The topology has the property that every message exchanged conveys or reinforces useful contact information. +- The system exploits this information to send parallel, asynchronous query messages that tolerate node failures without imposing timeout delays on users. +*** Introduction +- Kademlia is a P2P DHT +- Kademlia has a number of desirable features not simultaneously offered by any previous DHT. It minimizes the number of configuration messages nodes must send to learn about each other. +- Configuration information spreads automatically as a side-effect of key lookups. +- Kademlia uses parallel, asynchronous queries to avoid timeout delays from failed nodes. +- Keys are opaque, 160-bit quantities (e.g., the SHA-1 hash of some larger data) +- Participating computers each have a node ID in the 160-bit key space. +- (key,value) pairs are stored on nodes with IDs “close” to the key for some notion of closeness. +- XOR is symmetric, allowing Kademlia participants to receive lookup queries from precisely the same distribution of nodes contained in their routing tables +- Without this property, systems such as Chord do not learn useful routing information from queries they receive. +- Worse yet, asymmetry leads to rigid routing tables. Each entry in a Chord node’s finger table must store the precise node preceding some interval in the ID space. +- Each entry in a Chord node’s finger table must store the precise node preceding some interval in the ID space. Any node actually in the interval would be too far from nodes preceding it in the same interval. Kademlia, in contrast, can send a query to any node within an interval, allowing it to select routes based on latency or even send parallel, asynchronous queries to several equally appropriate nodes. +- Kademlia most resembles Pastry’s first phase, which (though not described this way by the authors) successively finds nodes roughly half as far from the target ID by Kademlia’s XOR metric. +- In a second phase, however, Pastry switches distance metrics to the numeric difference between IDs. It also uses the second, numeric difference metric in replication. Unfortunately, nodes close by the second metric can be quite far by the first, creating discontinuities at particular node ID values, reducing performance, and complicating attempts at formal analysis of worst-case behavior. +*** System Description +- Kademlia assign 160-bit opaque IDs to nodes and provide a lookup algorithm that locates successively “closer” nodes to any desired ID, converging to the lookup target in logarithmically many steps +- An identifier is opaque if it provides no information about the thing it identifies other than being a seemingly random string or number +- Kademlia effectively treats nodes as leaves in a binary tree, with each node’s position determined by the shortest unique prefix of its ID +- For any given node, we divide the binary tree into a series of successively lower subtrees that don’t contain the node. The highest subtree consists of the half of the binary tree not containing the node. +- The next subtree consists of the half of the remaining tree not containing the node, and so forth +- The Kademlia protocol ensures that every node knows of at least one node in each of its subtrees, if that subtree contains a node. With this guarantee, any node can locate any other node by its ID +**** XOR Metric +- Each Kademlia node has a 160-bit node ID. Node IDs are currently just random 160-bit identifiers, though they could equally well be constructed as in Chord. +- Every message a node transmits includes its node ID, permitting the recipient to record the sender’s existence if necessary. +- Keys, too, are 160-bit identifiers. To assign hkey,valuei pairs to particular nodes, Kademlia relies on a notion of distance between two identifiers. Given two 160-bit identifiers, x and y, Kademlia defines the distance between them as their bitwise exclusive or (XOR) intepreted as an integer. +- XOR is nice, as it is symmetric, offers the triangle property even though it's non-euclidean. +- We next note that XOR captures the notion of distance implicit in our binary-tree-based sketch of the system. +- In a fully-populated binary tree of 160-bit IDs, the magnitude of the distance between two IDs is the height of the smallest subtree containing them both. When a tree is not fully populated, the closest leaf to an ID x is the leaf whose ID shares the longest common prefix of x. +- Overlap in regards to closest might happen. In this case the closest leaf to x will be the closest leaf to ID x~ produced by flipping the bits in corresponding to the empty branches of the tree (???) +- Like Chord’s clockwise circle metric, XOR is unidirectional. For any given point x and distance ∆ > 0, there is exactly one point y such that d(x, y) = ∆. Unidirectionality ensures that all lookups for the same key converge along the same path, regardless of the originating node. Thus, caching hkey,valuei pairs along the lookup path alleviates hot spots. +**** Node state +- For each 0 ≤ i < 160, every node keeps a list of (IP address, UDP port, Node ID) triples for nodes of distance between 2^i and 2^i+1 from itself. We call these lists k-buckets. +- Each k-bucket is kept sorted by time last seen—least-recently seen node at the head, most-recently seen at the tail. For small values of i, the k-buckets will generally be empty (as no appropriate nodes will exist). For large values of i, the lists can grow up to size k, where k is a system-wide replication parameter. +- k is chosen such that it is unlikely that k nodes will fail at the same time. +- When a message is received, request or reply, from another node, the receiver updates its appropriate k-bucket, for the sender's node id. If the node is already present there, it's moved to the tail, if it's not there and there is room, it's inserted. If the bucket is full, the least recently seen node is pinged, if it doesn't respond, it gets replaced, if it does respond, the new node is discarded. +- k-buckets effectively implement a least-recently seen eviction policy, except that live nodes are never removed from the list. +- This works well for systems with an otherwise high churn rate, as nodes who are alive for a longer period, are more likely to stay alive. +- A second benefit of k-buckets is that they provide resistance to certain DoS attacks. One cannot flush nodes’ routing state by flooding the system with new nodes, as new nodes are only inserted, once the old ones die. +**** Kademlia Protocol +- The Kademlia protocol consists of four RPCs: ping, store, find node, and find value. +- The ping RPC probes a node to see if it is online. +- store a node to store a (key, value) pair for later retrieval +- find node takes a 160-bit ID as an argument. The recipient of the RPC returns (IP address, UDP port, Node ID) triples for the k nodes it knows aboutclosest to the target ID. These triples can come from a single k-bucket, or they may come from multiple k-buckets if the closest k-bucket is not full. In any case, the RPC recipient must return k items (unless there are fewer than k nodes in all its k-buckets combined, in which case it returns every node it knows about). +- find value behaves like find node—returning (IP address, UDP port, Node ID) triples—with one exception. If the RPC recipient has received a store RPC for the key, it just returns the stored value. +- In all RPCs, the recipient must echo a 160-bit random RPC ID, which provides some resistance to address forgery. pings can also be piggy-backed on RPC replies for the RPC recipient to obtain additional assurance of the sender’s network address. +***** Node lookup +1) Node lookup is performed recursively. The lookup initiator starts by picking alpha nodes from its closest k-bucket (is the closest to the iniator or closest to the node we wish to lookup ??). +2) The iniator then sends parallel async find_node RPCs to these alpha nodes. +3) In the recursive step, the initiator resends the find node to nodes it has learned about from previous RPCs. (This recursion can begin before all α of the previous RPCs have returned). +4) If a response is not found the alpha nodes queried, the iniator instead query all of the k nodes which were returned. +5) Lookup terminates when all k has responded or failed to respond. +- When α = 1, the lookup algorithm resembles Chord’s in terms of message cost and the latency of detecting failed nodes. However, can route for lower latency because it has the flexibility of choosing any one of k nodes to forward a request to. +***** Store +- Most operations are implemented in terms of the above lookup procedure. To store a (key,value) pair, a participant locates the k closest nodes to the key and sends them store RPCs +- Additionally, each node re-publishes (key,value) pairs as necessary to keep them alive +- For file sharing, it's required that the original publisher of a (key,value) pair to republish it every 24 hours. Otherwise, (key,value) pairs expire 24 hours after publication, so as to limit stale index information in the system. +***** Find value +- To find a (key,value) pair, a node starts by performing a lookup to find the k nodes with IDs closest to the key. However, value lookups use find value rather than find node RPCs. Moreover, the procedure halts immediately when any node returns the value. For caching purposes, once a lookup succeeds, the requesting node stores the (key,value) pair at the closest node it observed to the key that did not return the value. +- Because of the unidirectionality of the topology, future searches for the key are likely to hit cached entries before querying the closest node. +- To avoid overcaching, the expiration time of any key-value pair is determined by the distance between the current node and the node whose ID is closest to the key ID. +***** Refreshing buckets +- To handle pathological cases in which there are no lookups for a particular ID range, each node refreshes any bucket to which it has not performed a node lookup in the past hour. Refreshing means picking a random ID in the bucket’s range and performing a node search for that ID +***** Joining network +- To join the network, a node u must have a contact to an already participating node w. u inserts w into the appropriate k-bucket. u then performs a node lookup for its own node ID. Finally, u refreshes all k-buckets further away than its closest neighbor. During the refreshes, u both populates its own k-buckets and inserts itself into other nodes’ k-buckets as necessary. +**** Routing Table +- The routing table is a binary tree whose leaves are k-buckets. +- each k-bucket covers some range of the ID space, and together the k-buckets cover the entire 160-bit ID space with no overlap. +- When a node u learns of a new contact and this can be inserted into a bucket, this is done. Otherwise, if the k-bucket’s range includes u’s own node ID, then the bucket is split into two new buckets, the old contents divided between the two, and the insertion attempt repeated. This is what leads to one side of the binary tree being one large bucket, as it won't get split +- If tree is highly unbalanced, issues may arise (what issues ??). To avoid these, buckets may split, regardless of the node's own ID residing in these. +- nodes split k-buckets as required to ensure they have complete knowledge of a surrounding subtree with at least k nodes. +**** Efficient key re-publishing +- Keys must be periodically republished as to avoid data disappearing from the network or that data is stuck on un-optimal nodes, as new nodes closer to the data might join the network. +- To compensate for nodes leaving the network, Kademlia republishes each key-value pair once an hour. +- As long as republication intervals are not exactly synchronized, only one node will republish a given key-value pair every hour. +*** Implementation Notes +**** Optimized contact accounting +- To reduce traffic, Kademlia delays probing contacts until it has useful messages to send them. When a Kademlia node receives an RPC from an unknown contact and the k-bucket for that contact is already full with k entries, the node places the new contact in a replacement cache of nodes eligible to replace stale k-bucket entries. +- When a contact fails to respond to 5 RPCs in a row, it is considered stale. If a k-bucket is not full or its replacement cache is empty, Kademlia merely flags stale contacts rather than remove them. This ensures, among other things, that if a node’s own network connection goes down teporarily, the node won’t completely void all of its k-buckets. +- This is nice because Kademlia uses UDP. +**** Accelerated lookups +- Another optimization in the implementation is to achieve fewer hops per lookup by increasing the routing table size. Conceptually, this is done by considering IDs b bits at a time instead of just one bit at a time +- This also changes the way buckets are split. +- This also changes the XOR-based routing apparently. +*** Summary +- With its novel XOR-based metric topology, Kademlia is the first peer-to-peer system to combine provable consistency and performance, latency-minimizingrouting, and a symmetric, unidirectional topology. Kademlia furthermore introduces a concurrency parameter, α, that lets people trade a constant factor in bandwidth for asynchronous lowest-latency hop selection and delay-free fault recovery. Finally, Kademlia is the first peer-to-peer system to exploit the fact that node failures are inversely related to uptime. * Accessing and Developing WoT ** Chapter 6 *** REST STUFF