bouvinos_exam/notes.org

* Structured P2P Networks
TODO, potentially read all of the experiments performed in pastry. Potentially not, who cares. Also the math in Kademlia.
** Chord
*** Introduction
- A fundamental problem that confronts peer-to-peer applications is to efficiently locate the node that stores a particular data item.
-  Chord provides support for just one operation: given a key, it maps the key onto a node.
- Data location can be easily implemented on top of Chord by associating a key with each data item, and storing the key/data item pair at the node to which the key maps.
- Peer-to-peer systems and applications are distributed systems without any centralised control or hierarchical structure or organisation and each peer is equivalent in functionality.
- Peer-to-peer applications can promote a lot of features, such as redundant storage, permanence, selection of nearby server, anonymity, search, authentication and hierarchical naming (note the structure is still peer-to-peer, the names are just attributes and data the peers hold)
- Core operation in most P2P systems is efficient location of data items.
- Chord is a scalable protocol for lookup in a dynamic peer-to-peer system with frequent node arrivals and departures.
- Chord uses a variant of consistent hashing to assign keys to Chord nodes.
- Consistent hashing is a special kind of hashing such that when a hash table is resized, only K/n keys need to be remapped on average, where K is the number of keys and n is the number of slots.
- Additionally, consistent hashing tends to balance load, as each node will receive roughly the same amount of keys.
- Each Chord node needs "routing" information about only a few others nodes, leading to better scaling.
- Each node maintains information about O(log N) other nodes and resolves lookups via O(log N) messages. A change in the network results in no more than O(log^2 N) messages.
- Chords performance degrades gracefully, when information is out of date in the nodes routing tables. It's difficult to maintain consistency of O(log N) states. Chord only requires one piece of information per node to be correct, in order to guarantee correct routing.
- Finger tables only forward looking
- I.e messages arriving at a peer tell it nothing useful, knowledge must be gained explicitly
- Rigid routing structure
- Locality difficult to establish
*** System Model
- Chord simplifies the design of P2P systems and applications based on it by addressing the following problems:
  1) *Load balance:* Chord acts as a Distributed Hash Function, spreading keys evenly over the nodes, which provides a natural load balance
  2) *Decentralization:* Chord is fully distributed. Improves robustness and is nicely suited for loosely-organised p2p applications
  3) *Scalability:* The cost of a lookup grows as the log of the number of nodes
  4) *Availability:* Chord automatically adjusts its internal tables to reflect newly joined nodes as well as node failures, ensur- ing that, barring major failures in the underlying network, the node responsible for a key can always be found. This is true even if the system is in a continuous state of change.
  5) *Flexible naming:* No constraints on the structure of the keys it looks up
**** Use cases of Chord
- Cooperative Mirroring: Essentially a load balancer
- Time-Shared Storage: If a person wishes some data to be always available machine is only occasionally available, they can offer to store others’ data while they are up, in return for having their data stored elsewhere when they are down.
- Distributed Indexes: A key in this application could be a few keywords and values would be machines offering documents with those keywords
- Large-scale Combinatorial Search: Keys are candidate solutions to the problem; Chord maps these keys to the machines responsible for testing them as solutions.
*** The Base Chord Protocol
- The Chord protocol specifies how to find the locations of keys, how new nodes join the system, and how to recover from the failure (or planned departure) of existing nodes.
**** Overview
- Chord improves the scalability of consistent hashing by avoid- ing the requirement that every node know about every other node.
**** Consistent Hashing
- The consistent hash function assigns each node and key an m-bit identifier using a base hash function such as SHA-1. A node’s identifier is chosen by hashing the node’s IP address, while a key identifier is produced by hashing the key.
- Identifiers are ordered in an identifier circle modulo 2^m
- Key k is assigned to the first node whose identifier is equal to or follows (the identifier of) k in the identifier space.
- This node is called the successor node of key k, succ(k). It's the first node clockwise from k, if identifiers are presented as a circle.
- To maintain the consistent hashing mapping when a node n joins the network, certain keys previously assigned to n’s successor now become assigned to n. When node n leaves the network, all of its assigned keys are reassigned to n’s successor.
- The claims about the effeciency of consistent hashing, relies on the identifiers being chosen uniformly randomly. SHA-1 is very much deterministic, as is all hash functions. As such, an adversary could in theory pick a bunch of identifiers close to each other and thus force a single node to carry a lot of files, ruining the balance. However, it's considered difficult to break these hash functions, as such we can't produce files with specific hashes.
- When consistent hashing is implemented as described above, the theorem proves a bound of eps = O(log N). The consistent hashing paper shows that eps can be reduced to an arbitrarily small constant by having each node run O(log N) “virtual nodes” each with its own identifier.
- This is difficult to pre-determine, as the load on the system is unknown a priori.
**** Scalable Key Location
- A very small amount of routing information suffices to imple- ment consistent hashing in a distributed environment. Each node need only be aware of its successor node on the circle.
- Queries for a given identifier can be passed around the circle via these suc- cessor pointers until they first encounter a node that succeeds the identifier; this is the node the query maps to.
- To avoid having to potentialy traverse all N nodes, if the identifiers are "unlucky", Chord maintains extra information.
- m is the number of bits in the keys
- Each node n maintains a routing table with at most m entries, called the finger table.
- The i'th entry in the table at node n contains the identity of the first node, s, that succeeds n by at least 2^(i-1) on the identifier circle, s = succ(n+2^(i-1)) for 1 <= i <= m and everything i mod 2^m
- Node s in the ith finger of node n is n.finger[i].node
- A finger table entry includes both the Chord identifier and the IP address (and port number) of the relevant node.
- First, each node stores information about only a small number of other nodes, and knows more about nodes closely following it on the identifier circle than about nodes farther away.
- The nodes keep an interval for each key implicitly, which essentially covers the keys that the the specific key is the predecessor for. This allows for quickly looking up a key, if it's not known, since one can find the interval which contains it!
- The finger pointers at repeatedly doubling distances around the circle cause each iteration of the loop in find predecessor to halve the distance to the target identifier.
**** Node Joins
- In dynamic networks, nodes can join and leave at any time. Thus the main challenge is to preserve the ability to lookup of every key.
- There are to invariants:
  1) Each node's succ is correctly maintained
  2) For every key k, node succ(k) is responsible for k
- We also want the finger tables to be correct
- To simplify the join and leave mechanisms, each node in Chord maintains a predecessor pointer.
- To preserve the invariants stated above, Chord must perform three tasks when a node n joins the network:
  1) Initialise the predecessor and fingers of node n
  2) Update the fingers and predecessors of existings node to reflect the addition of n
  3) Notify the higher layer software so that it can transfer state (e.g. values) associated with keys that node n is now responsible for.
***** Initializing fingers and predecessor
- Node n learns its pre- decessor and fingers by asking n' to look them up
***** Updating fingers of existing nodes
- Thus, for a given n, the algorithm starts with the ith finger of node n, and then continues to walk in the counter-clock-wise direction on the identifier circle until it encounters a node whose ith finger precedes n.
***** Transfering keys
- Node n contacts it's the node immediately following itself and simply asks for the transfering of all appropriate values
*** Concurrenct operations and failures
**** Stabilitzation
- The join algorithm in Section 4 aggressively maintains the finger tables of all nodes as the network evolves. Since this invariant is difficult to maintain in the face of concurrent joins in a large net- work, we separate our correctness and performance goals.
- A basic “stabilization” protocol is used to keep nodes’ successor pointers up to date, which is sufficient to guarantee correctness of lookups. Those successor pointers are then used to verify and correct fin- ger table entries, which allows these lookups to be fast as well as correct.
- Joining nodes can affect performance in three ways, all tables are still correct and result is found, succ is correct but fingers aren't, result will still be found and everything is wrong, in which case nothing might be found. The lookup can then be retried shortly after.
- Our stabilization scheme guarantees to add nodes to a Chord ring in a way that preserves reachability of existing nodes
- We have not discussed the adjustment of fingers when nodes join because it turns out that joins don’t substantially damage the per- formance of fingers. If a node has a finger into each interval, then these fingers can still be used even after joins.
**** Failures and Replication
- When a node n fails, nodes whose finger tables include n must find n’s successor. In addition, the failure of n must not be allowed to disrupt queries that are in progress as the system is re-stabilizing.
- The key step in failure recovery is maintaining correct successor pointers
- To help achieve this, each Chord node maintains a “successor-list” of its r nearest successors on the Chord ring.
- If node n notices that its successor has failed, it replaces it with the first live en- try in its successor list. At that point, n can direct ordinary lookups for keys for which the failed node was the successor to the new successor. As time passes, stabilize will correct finger table entries and successor-list entries pointing to the failed node.
*** Simulations and Experimental Results
- The probability that a particular bin does not contain any is for large values of N approximately 0.368
- As we discussed earlier, the consistent hashing paper solves this problem by associating keys with virtual nodes, and mapping mul- tiple virtual nodes (with unrelated identifiers) to each real node. Intuitively, this will provide a more uniform coverage of the iden- tifier space.
*** Conclusion
- Attractive features of Chord include its simplicity, provable cor- rectness, and provable performance even in the face of concurrent node arrivals and departures. It continues to function correctly, al- beit at degraded performance, when a node’s information is only partially correct. Our theoretical analysis, simulations, and exper- imental results confirm that Chord scales well with the number of nodes, recovers from large numbers of simultaneous node failures and joins, and answers most lookups correctly even during recov- ery.
** Pastry
- Pastry, a scalable, distributed object location and routing substrate for wide-area peer-to-peer applications.
- It can be used to support a variety of peer-to-peer applications, including global data storage, data sharing, group communication and naming.
- Each node in the Pastry network has a unique identifier (nodeId). When presented with a message and a key, a Pastry node efficiently routes the message to the node with a nodeId that is numerically closest to the key, among all currently live Pastry nodes. Each Pastry node keeps track of its immediate neighbors in the nodeId space, and notifies applications of new node arrivals, node failures and recoveries.
- Pastry takes into account network locality; it seeks to minimize the distance messages travel, according to a to scalar proximity metric like the number of IP routing hops.
- Experimental results obtained with a prototype implementation on an emulated network of up to 100,000 nodes confirm Pastry’s scalability and efficiency, its ability to self-organize and adapt to node failures, and its good network locality properties.
*** Introduction
- Pastry is completely decentralized, fault-resilient, scalable, and reliable. Moreover, Pastry has good route locality properties.
- Pastry is intended as general substrate for the construction of a variety of peer-to- peer Internet applications like global file sharing, file storage, group communication and naming systems.
- Several application have been built on top of Pastry to date, including a global, persistent storage utility called PAST [11, 21] and a scalable publish/subscribe system called SCRIBE [22]. Other applications are under development.
-  Each node in the Pastry network has a unique numeric identifier (nodeId)
- When presented with a message and a numeric key, a Pastry node efficiently routes the message to the node with a nodeId that is numeri- cally closest to the key, among all currently live Pastry nodes.
- The expected number of routing steps is O(log N), where N is the number of Pastry nodes in the network.
-  At each Pastry node along the route that a message takes, the application is notified and may perform application-specific computations related to the message.
- Pastry takes into account network locality; it seeks to minimize the distance mes- sages travel, according to a scalar proximity metric like the number of IP routing hops.
- Because nodeIds are randomly assigned, with high probability, the set of nodes with adjacent nodeId is diverse in geography, ownership, jurisdiction, etc. Applications can leverage this, as Pastry can route to one of   nodes that are numerically closest to the key.
- A heuristic ensures that among a set of nodes with the   closest nodeIds to the key, the message is likely to first reach a node “near” the node from which the message originates, in terms of the proximity metric.
**** PAST
- PAST, for instance, uses a fileId, computed as the hash of the file’s name and owner, as a Pastry key for a file. Replicas of the file are stored on the k Pastry nodes with nodeIds numerically closest to the fileId. A file can be looked up by sending a message via Pastry, using the fileId as the key. By definition, the lookup is guaranteed to reach a node that stores the file as long as one of the k nodes is live.
- Moreover, it follows that the message is likely to first reach a node near the client, among the  k nodes; that node delivers the file and consumes the message. Pastry’s notification mechanisms allow PAST to maintain replicas of a file on the   nodes closest to the key, despite node failure and node arrivals, and using only local coordination among nodes with adjacent nodeIds.
**** SCRIBE
- in the SCRIBE publish/subscribe System, a list of subscribers is stored on the node with nodeId numerically closest to the topicId of a topic, where the topicId is a hash of the topic name. That node forms a rendez-vous point for publishers and subscribers. Subscribers send a message via Pastry using the topicId as the key; the registration is recorded at each node along the path. A publisher sends data to the rendez-vous point via Pastry, again using the topicId as the key. The rendez-vous point forwards the data along the multicast tree formed by the reverse paths from the rendez-vous point to all subscribers.
*** Design of Pastry
- A Pastry system is a self-organizing overlay network of nodes, where each node routes client requests and interacts with local instances of one or more applications.
- Each node in the Pastry peer-to-peer overlay network is assigned a 128-bit node identifier (nodeId).
- The nodeId is used to indicate a node’s position in a circular nodeId space, which ranges from 0 to 2^128 - 1 (sounds like a modular ring type thing, as in Chord).
- Nodeids are distributed uniformly in the 128-bit nodeid space, such as computing the hash of IP.
- As a result of this random assignment of nodeIds, with high probability, nodes with adjacent nodeIds are diverse in geography, ownership, jurisdiction, network attachment, etc.
- Under normal conditions, in a network of N nodes, Pastry can route to the numerically closest node to a given key in less than log_(2^b) N steps. b is some random configuration parameter.
- For the purpose of routing, nodeIds and keys are thought of as a sequence of digits with base 2^b.
- In each routing step, a node normally forwards the message to a node whose nodeId shares with the key a prefix that is at least one digit (or   bits) longer than the prefix that the key shares with the present node’s id. If no such node is known, the message is forwarded to a node whose nodeId shares a prefix with the key as long as the current node, but is numerically closer to the key than the present node’s id. To support this routing procedure, each node maintains some routing state
- Despite concurrent node failures, eventual delivery is guaranteed unless |L|/2 nodes with adjacent nodeIds fail simul- taneously (|L| is a configuration parameter with a typical value of 16 or 32).
**** Pastry Node State
- Each Pastry node maintains a routing table, a neighborhood set and a leaf set.
- A node’s routing table, R, is organized into log_(2^b) N rows with 2^b - 1 entries each.
- The 2^b - 1 entries at row n each refer to a node whose nodeid shares the present node's nodeid in the first n digits, but whose n+1th digit has one of the 2^b - 1 possible values other than then n+1th digit in the present node's id.
- Each entry in the routing table contains the IP address of one of potentially many nodes whose nodeId have the appropriate prefix; in practice, a node is chosen that is close to the present node, according to the proximity metric.
-  If no node is known with a suitable nodeId, then the routing table entry is left empty.
- The neighborhood set M contains the nodeIds and IP addresses of the |M| nodes that are closest (according the proximity metric) to the local node.
- Applications are responsible for providing proximity metrics
- The neighborhood set is not normally used in routing messages; it is useful in maintaining locality properties
- The leaf set L is the set of nodes with the |L|/2 numerically closest larger nodeIds, and the |L|/2 nodes with numerically closest smaller nodeIds, relative to the present node’s nodeId. The leaf set is used during the message routing
**** Routing
- Given a message, the node first checks to see if the key falls within the range of nodeIds covered by its leaf set
- If so, the message is forwarded directly to the destination node, namely the node in the leaf set whose nodeId is closest to the key (possibly the present node)
- If the key is not covered by the leaf set, then the routing table is used and the message is forwarded to a node that shares a common prefix with the key by at least one more digit
- In certain cases, it is possible that the appropriate entry in the routing table is empty or the associated node is not reachable, in which case the message is forwarded to a node that shares a prefix with the key at least as long as the local node, and is numerically closer to the key than the present node’s id.
- Such a node must be in the leaf set unless the message has already arrived at the node with numerically closest nodeId. And, unless |L|/2 adjacent nodes in the leaf set have failed simultaneously, at least one of those nodes must be live.
- It can be shown that the expected number of routing steps is log_(2^b) N steps.
-  If a message is forwarded using the routing table, then the set of nodes whose ids have a longer prefix match with the key is reduced by a factor of 2^b in each step, which means the destination is reached in log_(2^b) N steps.
- If the key is within range of the leaf set, then the destination node is at most one hop away.
- The third case arises when the key is not covered by the leaf set (i.e., it is still more
than one hop away from the destination), but there is no routing table entry. Assuming accurate routing tables and no recent node failures, this means that a node with the appropriate prefix does not exist.
- The likelihood of this case, given the uniform distribution of nodeIds, depends on |L|. Analysis shows that with |L| = 2^b and |L| = 2 * 2^b, the probability that this case arises during a given message transmission is less than .02 and 0.006, respectively. When it happens, no more than one additional routing step results with high probability.
**** Pastry API
- Substrate: not an application itself, rather it provides Application Program Interface (API) to be used by applications. Runs on all nodes joined in a Pastry network
- Pastry exports following operations; nodeId and route.
- Applications layered on top of PAstry must export the following operations; deliver, forward, newLeafs.
**** Self-organization and adaptation
***** Node Arrival
- When a new node arrives, it needs to initialize its state tables, and then inform other nodes of its presence. We assume the new node knows initially about a nearby Pastry node A, according to the proximity metric, that is already part of the system.
- Let us assume the new node’s nodeId is X.
- Node X then asks A to route a special “join” message with the key equal to X. Like any message, Pastry routes the join message to the existing node Z whose id is numerically closest to X.
- In response to receiving the “join” request, nodes A, Z, and all nodes encountered on the path from A to Z send their state tables to X.
- Node X initialized its routing table by obtaining the i-th row of its routing table from the   i-th node encountered along the route from A to Z to
- X can use Z's leaf set as basis, since Z is closest to X.
- X use A's neighborhood to initialise its own
- Finally, X transmits a copy of its resulting state to each of the nodes found in its neighborhood set, leaf set, and routing table. Those nodes in turn update their own state based on the information received.
***** Node Depature
-  A Pastry node is considered failed when its immediate neighbors in the nodeId space can no longer communicate with the node.
- To replace a failed node in the leaf set, its neighbor in the nodeId space contacts the live node with the largest index on the side of the failed node, and asks that node for its leaf table.
- The failure of a node that appears in the routing table of another node is detected when that node attempts to contact the failed node and there is no response.
- To replace a failed node in a routing table entry, a node contacts the other nodes in the row of the failed node and asks if any of them knows a node with the same prefix.
- a node attempts to contact each member of the neighborhood set periodically to see if it is still alive.
**** Locality
- Pastry’s notion of network proximity is based on a scalar proximity metric, such as the number of IP routing hops or geographic distance.
-  It is assumed that the application provides a function that allows each Pastry node to determine the “distance” of a node with a given IP address to itself.
- Throughout this discussion, we assume that the proximity space defined by the cho- sen proximity metric is Euclidean; that is, the triangulation inequality holds for dis- tances among Pastry nodes.
- If the triangulation inequality does not hold, Pastry’s basic routing is not affected; however, the locality properties of Pastry routes may suffer.
***** Route Locality
- although it cannot be guaranteed that the distance of a message from its source increases monotonically at each step, a message tends to make larger and larger strides with no possibility of returning to a node within d_i of any node i encountered on the route, where d_i is the distance of the routing step taken away from node i. Therefore, the messag ehas nowhere to go but towards its destination.
***** Locating the nearest among k nodes
- Recall that Pastry routes messages towards the node with the nodeId closest to the key, while attempting to travel the smallest possible distance in each step.
-  Pastry makes only local routing decisions, minimizing the distance traveled on the next step with no sense of global direction.
**** Arbitrary node failures and network partitions
- As routing is deterministic by default, a malicious node can fuck things up. Randomized routing fixes this.
- Another challenge are IP routing anomalies in the Internet that cause IP hosts to be unreachable from certain IP hosts but not others.
- However, Pastry’s self-organization protocol may cause the creation of multiple, isolated Pastry overlay networks during periods of IP routing failures. Because Pastry relies almost exclusively on information exchange within the overlay network to self-organize, such isolated overlays may persist after full IP connectivity resumes.
- One solution to this problem involves the use of IP multicast.
*** Conclusion
- This paper presents and evaluates Pastry, a generic peer-to-peer content location and routing system based on a self-organizing overlay network of nodes connected via the Internet. Pastry is completely decentralized, fault-resilient, scalable, and reliably routes a message to the live node with a nodeId numerically closest to a key. Pastry can be used as a building block in the construction of a variety of peer-to-peer Internet applications like global file sharing, file storage, group communication and naming systems.  Results with as many as 100,000 nodes in an emulated network confirm that Pastry is efficient and scales well, that it is self-organizing and can gracefully adapt to node failures, and that it has good locality properties.

** Kademlia
*** Abstract
- A peer-to-peer distributed hash table with provable consistency and performance in a fault-prone environment
- system routes queries and locates nodes using a novel XOR-based metric topology
- The topology has the property that every message exchanged conveys or reinforces useful contact information.
- The system exploits this information to send parallel, asynchronous query messages that tolerate node failures without imposing timeout delays on users.
*** Introduction
- Kademlia is a P2P DHT
- Kademlia has a number of desirable features not simultaneously offered by any previous DHT. It minimizes the number of configuration messages nodes must send to learn about each other.
- Configuration information spreads automatically as a side-effect of key lookups.
- Kademlia uses parallel, asynchronous queries to avoid timeout delays from failed nodes.
- Keys are opaque, 160-bit quantities (e.g., the SHA-1 hash of some larger data)
- Participating computers each have a node ID in the 160-bit key space.
- (key,value) pairs are stored on nodes with IDs “close” to the key for some notion of closeness.
- XOR is symmetric, allowing Kademlia participants to receive lookup queries from precisely the same distribution of nodes contained in their routing tables
- Without this property, systems such as Chord do not learn useful routing information from queries they receive.
- Worse yet, asymmetry leads to rigid routing tables. Each entry in a Chord node’s finger table must store the precise node preceding some interval in the ID space.
- Each entry in a Chord node’s finger table must store the precise node preceding some interval in the ID space. Any node actually in the interval would be too far from nodes preceding it in the same interval. Kademlia, in contrast, can send a query to any node within an interval, allowing it to select routes based on latency or even send parallel, asynchronous queries to several equally appropriate nodes.
- Kademlia most resembles Pastry’s first phase, which (though not described this way by the authors) successively finds nodes roughly half as far from the target ID by Kademlia’s XOR metric.
- In a second phase, however, Pastry switches distance metrics to the numeric difference between IDs. It also uses the second, numeric difference metric in replication. Unfortunately, nodes close by the second metric can be quite far by the first, creating discontinuities at particular node ID values, reducing performance, and complicating attempts at formal analysis of worst-case behavior.
*** System Description
- Kademlia assign 160-bit opaque IDs to nodes and provide a lookup algorithm that locates successively “closer” nodes to any desired ID, converging to the lookup target in logarithmically many steps
-  An identifier is opaque if it provides no information about the thing it identifies other than being a seemingly random string or number
- Kademlia effectively treats nodes as leaves in a binary tree, with each node’s position determined by the shortest unique prefix of its ID
- For any given node, we divide the binary tree into a series of successively lower subtrees that don’t contain the node. The highest subtree consists of the half of the binary tree not containing the node.
- The next subtree consists of the half of the remaining tree not containing the node, and so forth
- The Kademlia protocol ensures that every node knows of at least one node in each of its subtrees, if that subtree contains a node. With this guarantee, any node can locate any other node by its ID
**** XOR Metric
- Each Kademlia node has a 160-bit node ID. Node IDs are currently just random 160-bit identifiers, though they could equally well be constructed as in Chord.
- Every message a node transmits includes its node ID, permitting the recipient to record the sender’s existence if necessary.
- Keys, too, are 160-bit identifiers. To assign hkey,valuei pairs to particular nodes, Kademlia relies on a notion of distance between two identifiers. Given two 160-bit identifiers, x and y, Kademlia defines the distance between them as their bitwise exclusive or (XOR) intepreted as an integer.
- XOR is nice, as it is symmetric, offers the triangle property even though it's non-euclidean.
- We next note that XOR captures the notion of distance implicit in our binary-tree-based sketch of the system.
- In a fully-populated binary tree of 160-bit IDs, the magnitude of the distance between two IDs is the height of the smallest subtree containing them both. When a tree is not fully populated, the closest leaf to an ID x is the leaf whose ID shares the longest common prefix of x.
- Overlap in regards to closest might happen. In this case the closest leaf to x will be the closest leaf to ID x~ produced by flipping the bits in corresponding to the empty branches of the tree (???)
- Like Chord’s clockwise circle metric, XOR is unidirectional. For any given point x and distance ∆ > 0, there is exactly one point y such that d(x, y) = ∆. Unidirectionality ensures that all lookups for the same key converge along the same path, regardless of the originating node. Thus, caching hkey,valuei pairs along the lookup path alleviates hot spots.
**** Node state
- For each 0 ≤ i < 160, every node keeps a list of (IP address, UDP port, Node ID) triples for nodes of distance between 2^i and 2^i+1 from itself. We call these lists k-buckets.
- Each k-bucket is kept sorted by time last seen—least-recently seen node at the head, most-recently seen at the tail. For small values of i, the k-buckets will generally be empty (as no appropriate nodes will exist). For large values of i, the lists can grow up to size k, where k is a system-wide replication parameter.
- k is chosen such that it is unlikely that k nodes will fail at the same time.
- When a message is received, request or reply, from another node, the receiver updates its appropriate k-bucket, for the sender's node id. If the node is already present there, it's moved to the tail, if it's not there and there is room, it's inserted. If the bucket is full, the least recently seen node is pinged, if it doesn't respond, it gets replaced, if it does respond, the new node is discarded.
- k-buckets effectively implement a least-recently seen eviction policy, except that live nodes are never removed from the list.
- This works well for systems with an otherwise high churn rate, as nodes who are alive for a longer period, are more likely to stay alive.
- A second benefit of k-buckets is that they provide resistance to certain DoS attacks. One cannot flush nodes’ routing state by flooding the system with new nodes, as new nodes are only inserted, once the old ones die.
**** Kademlia Protocol
- The Kademlia protocol consists of four RPCs: ping, store, find node, and find value.
- The ping RPC probes a node to see if it is online.
- store a node to store a (key, value) pair for later retrieval
- find node takes a 160-bit ID as an argument. The recipient of the RPC returns (IP address, UDP port, Node ID) triples for the k nodes it knows aboutclosest to the target ID. These triples can come from a single k-bucket, or they may come from multiple k-buckets if the closest k-bucket is not full. In any case, the RPC recipient must return k items (unless there are fewer than k nodes in all its k-buckets combined, in which case it returns every node it knows about).
- find value behaves like find node—returning (IP address, UDP port, Node ID) triples—with one exception. If the RPC recipient has received a store RPC for the key, it just returns the stored value.
- In all RPCs, the recipient must echo a 160-bit random RPC ID, which provides some resistance to address forgery. pings can also be piggy-backed on RPC replies for the RPC recipient to obtain additional assurance of the sender’s network address.
***** Node lookup
1) Node lookup is performed recursively. The lookup initiator starts by picking alpha nodes from its closest k-bucket (is the closest to the iniator or closest to the node we wish to lookup ??).
2) The iniator then sends parallel async find_node RPCs to these alpha nodes.
3) In the recursive step, the initiator resends the find node to nodes it has learned about from previous RPCs. (This recursion can begin before all α of the previous RPCs have returned).
4) If a response is not found the alpha nodes queried, the iniator instead query all of the k nodes which were returned.
5) Lookup terminates when all k has responded or failed to respond.
- When α = 1, the lookup algorithm resembles Chord’s in terms of message cost and the latency of detecting failed nodes. However, can route for lower latency because it has the flexibility of choosing any one of k nodes to forward a request to.
***** Store
- Most operations are implemented in terms of the above lookup procedure. To store a (key,value) pair, a participant locates the k closest nodes to the key and sends them store RPCs
- Additionally, each node re-publishes (key,value) pairs as necessary to keep them alive
- For file sharing, it's required that the original publisher of a (key,value) pair to republish it every 24 hours. Otherwise, (key,value) pairs expire 24 hours after publication, so as to limit stale index information in the system.
***** Find value
- To find a (key,value) pair, a node starts by performing a lookup to find the k nodes with IDs closest to the key. However, value lookups use find value rather than find node RPCs. Moreover, the procedure halts immediately when any node returns the value. For caching purposes, once a lookup succeeds, the requesting node stores the (key,value) pair at the closest node it observed to the key that did not return the value.
- Because of the unidirectionality of the topology, future searches for the key are likely to hit cached entries before querying the closest node.
- To avoid overcaching, the expiration time of any key-value pair is determined by the distance between the current node and the node whose ID is closest to the key ID.
***** Refreshing buckets
- To handle pathological cases in which there are no lookups for a particular ID range, each node refreshes any bucket to which it has not performed a node lookup in the past hour. Refreshing means picking a random ID in the bucket’s range and performing a node search for that ID
***** Joining network
- To join the network, a node u must have a contact to an already participating node w. u inserts w into the appropriate k-bucket. u then performs a node lookup for its own node ID. Finally, u refreshes all k-buckets further away than its closest neighbor. During the refreshes, u both populates its own k-buckets and inserts itself into other nodes’ k-buckets as necessary.
**** Routing Table
- The routing table is a binary tree whose leaves are k-buckets.
- each k-bucket covers some range of the ID space, and together the k-buckets cover the entire 160-bit ID space with no overlap.
- When a node u learns of a new contact and this can be inserted into a bucket, this is done. Otherwise, if the k-bucket’s range includes u’s own node ID, then the bucket is split into two new buckets, the old contents divided between the two, and the insertion attempt repeated. This is what leads to one side of the binary tree being one large bucket, as it won't get split
- If tree is highly unbalanced, issues may arise (what issues ??). To avoid these, buckets may split, regardless of the node's own ID residing in these.
- nodes split k-buckets as required to ensure they have complete knowledge of a surrounding subtree with at least k nodes.
**** Efficient key re-publishing
- Keys must be periodically republished as to avoid data disappearing from the network or that data is stuck on un-optimal nodes, as new nodes closer to the data might join the network.
- To compensate for nodes leaving the network, Kademlia republishes each key-value pair once an hour.
- As long as republication intervals are not exactly synchronized, only one node will republish a given key-value pair every hour.
*** Implementation Notes
**** Optimized contact accounting
- To reduce traffic, Kademlia delays probing contacts until it has useful messages to send them. When a Kademlia node receives an RPC from an unknown contact and the k-bucket for that contact is already full with k entries, the node places the new contact in a replacement cache of nodes eligible to replace stale k-bucket entries.
- When a contact fails to respond to 5 RPCs in a row, it is considered stale. If a k-bucket is not full or its replacement cache is empty, Kademlia merely flags stale contacts rather than remove them. This ensures, among other things, that if a node’s own network connection goes down teporarily, the node won’t completely void all of its k-buckets.
- This is nice because Kademlia uses UDP.
**** Accelerated lookups
- Another optimization in the implementation is to achieve fewer hops per lookup by increasing the routing table size. Conceptually, this is done by considering IDs b bits at a time instead of just one bit at a time
- This also changes the way buckets are split.
- This also changes the XOR-based routing apparently.
*** Summary
- With its novel XOR-based metric topology, Kademlia is the first peer-to-peer system to combine provable consistency and performance, latency-minimizingrouting, and a symmetric, unidirectional topology. Kademlia furthermore introduces a concurrency parameter, α, that lets people trade a constant factor in bandwidth for asynchronous lowest-latency hop selection and delay-free fault recovery. Finally, Kademlia is the first peer-to-peer system to exploit the fact that node failures are inversely related to uptime.
** Bouvin notes
- While first generation of structured P2P networks were largely application specific and had few guarantees, usually using worst case O(N) time, the second generation is based on structured network overlays. They are typically capable of guaranteeing O(log N) time and space and exact matches.
- Much more scalable than unstructured P2P networks measured in number of hops for routing However, churn results in control traffic; slow peers can slowdown entire system (especially in Chord); weak peers may be overwhelmed by control traffic
- The load is evenly distributed across the network, based on the uniformness of the ID space More powerful peers can choose to host several virtual peers
- Most systems have various provisions for maintaining proper routing and defending against malicious peers
- A backhoe is unlikely to take out a major part of the system – at least if we store at k closest nodes

* Mobile Ad-hoc Networks and Wireless Sensor Networks
TODO : Finish the survey on sensor networks. I stopped when they started talking about actual schemes, as Bouvin didn't mention this in his presentation
** Routing in Mobile Ad-hoc Networks
*** Introduction
- Routing is the process of passing some data, a message, along in a network.
- The message originates from a source host, travels through intermediary hosts and ends up at a destination host.
- Intermediary hosts are called routers
- Usually a few questions has to be answered when a message is routed:
  1) How do the hosts acting as routers know which way to send the message?
  2) What should be done if multiple paths connect the sender and receiver?
  3) Does an answer to the message have to follow the same path as the original message?
- Simple solution: Broadcast message, i.e. send it to every single person you know, every time.
- Creates a lot of traffic, it's also known as flooding.
- The responsibility of a routing protocol is to answer the three questions posed.
*** Basic Routing Protocols
- A routing protocol must enable a path or route to be found, through the network
- A network is usually modelled as a graph, inside the computers.
- This allows for edges to be weighted. Can either be distance, traffic or some metric like that
- There are two classes, Link State (LS) and Distance Vector (DS). Main difference being whether or not they use global information.
- Algorithms using global information are "Link State", as all nodes need to maintain state information about all links in the network.
- Distance Vector algorithms do not rely on global information.
**** Link State
- All nodes and links with weights are known to all nodes.
- This makes the problem a SSSP problem (single-source shortest path)
***** Djikstra
- A set W is initialised, containing only the source node v.
- In each iteration, the edge e with the lowest cost connecting W with a node u which isn't in W, is chosen and u is added to the set.
- Algorithm loops n-1 times, and then the shortest path to all other nodes have been found.
- Requires each router to have complete knowledge of the network.
- Can be accomplished by broadcasting the identities and costs of all of the outgoing links to all other routers in the network. This has to be done, every time a weight or link changes.
- Unrealistic for anything but very small networks.
- Works great for small stable networks however.
**** Distance Vector
- No global knowledge is needed
- The shortest distance to any given node is calculated in cooperation between the nodes.
- Based on Bellman-Ford.
- Apparently original Bellman-Ford requires global knowledge. This is a knock-off algorithm.
***** Bellman-Ford
- Decentralised, no global information is needed
- Requires the information of the neighbours and the link costs between them and the node
- Each node stores a distance table
- The distance table is just a mapping between a node name and the distance to it. It's over all known nodes, so not just neighbours.
- When a new node is encountered, this is simply added
- Node sends updates to its neighbours. This message states that the distance from the node v to node u has changed. As such, the neighbours can compute their distance to this node u as well and update their table.
- This update may cause a chain of updates, as the neighbours might discover that this new distance is better than what they currently had.
- The route calculation is bootstrapped by having all nodes broadcast their distances to their neighbours, when the network is created.
- Algorithm
  1) Distance table is initialised for node x
  2) Node x sends initial updates to neighbours
  3) The algorithm loops, waiting for updates or link cost changes of directly connected links (neighbours (?))
  4) Whenever either event is received, the appropriate actions are taken, such as sending updates or changing values in the distance table
- Generates less traffic, since only neighbours are needed to be known.
- Doesn't need global knowledge, general advantage in large networks or networks with high churn rate
- Doesn't have to recompute the entire distance table whenever a single value changes, as Djikstras algorithm has to.
- Suffers from the "Count-to-infinity" problem, which happens when a route pass twice through the same node and a link starts going towards infinity. If there is a network A - B - C - D. A dies. B sets the distance to infinity. When tables are shared, B sees that C knows a route to A of distance 2, as such it updates its distance to 3. (1 to C, 2 from C to A). C then has to update its distance to A to 4 and so it goes.
- A way of avoiding this, is only sending information to the neighbours that are not exclusive links to the destination, so C shouldn't send any information to B about A, as B is the only way to A.
*** MANET Routing
- Both Djikstra and Bellman-Ford were designed to operate in fairly stable networks.
- MANETs are usually quite unstable, as possibly all nodes are mobile and may be moving during communication.
- MANETs typically consist of resource-poor, energy constrained devices with limited bandwidth and high error rates.
- Also has missing infrastructure and high mobility
- According to Belding-Royer (who he??), the focus should be on the following properties
  1) Minimal control overhead (due to limited energy and bandwidth)
  2) Minimal processing overhead (Typically small processors)
  3) Multihop routing capability (No infrastructure, nodes must act as routers)
  4) Dynamic topology maintenance (High churn rates forces topology to be dynamic and be capable of easily adapting)
  5) Loop prevention (Loops just take a lot of bandwidth)
- MANET routing protocols are typically either pro-active or re-active.
***** Proactive
- Every node maintains a routing table of routes to all other nodes in the network
- Routing tables are updated whenever a change occurs in the network
- When a node needs to send a message to another node, it has a route to that node in its routing table
- Two examples of proactive protocols
  1) Destination-sequenced distance vector (DSDV)
  2) Optimised link state routing (OSLR)
***** Reactive
- Called on-demand routing protocols
- Does not maintain routing tables at all times
- A route is discovered when it is needed, i.e. when the source has some data to send
- Two examples of reactive protocols
  1) Ad-hoc on demand distance vector (AODV)
  2) Dynamic source routing (DSR)
***** Combination of proactive and reactive
- Zone Routing Protocol (ZRP)
**** Local connectivity management
- MANET protocols have in common, that they need to have a mechanism that allows discovery of neighbours
- Neighbours are nodes within broadcast range, i.e. they can be reached within one hop
- Neighbours can be found by periodically broadcasting "Hello" messages. These won't be relayed. These messages contain information about the neighbours known by the sending node.
- When a hello message from x is received by y, y can check, if y is in the neighbour list of x. If y is there, the link must be bi-directional. Otherwise, it's likely uni-directional.
**** Destination-Sequenced Distance Vector
- Uses sequence numbers to avoid loops.
- Has message overhead which grows as O(n²), when a change occurs in the network.
***** Using sequence numbers
- Each node maintains a counter that represents its current sequence number. This counter starts at zero and is incremented by two whenever it is updated.
- A sequence number set by a node itself, will always be even.
- The number of a node is propagared through the network in the update messages that are sent to the neighbours.
- Whenever an update message is sent, the sender increments its number and prefixes this to the message.
- Whenever an update message is received, the receiver can get the number. This information is stored in the receiving nodes route table and is further propagated in the network in subsequent update messages sent, regarding routes to that destination.
- Like this, the sequence number set by the destination node is stamped on every route to that node.
- Update messages thus contain; a destination node, a cost of the route, a next-hop node and the latest known destination sequence number.
- On receiving an update message, these rules apply:
  1) If the sequence number of the updated route is higher than what is currently stored, the route table is updated.
  2) If the numbers are the same, the route with the lowest cost is chosen.
- If a link break is noticed, the noticer sets the cost to be inf to that node and increments the gone node's sequence number by one and sends out an update.
- Thus, the sequence number is odd, whenever a node discovers a link breakage.
- Because of this, any further updates from the disappeared node, will automatically supersede this number
- This makes DSDV loop free
- Sequence numbers are changed in the following ways
  1) When a line breaks. The number is changed by a neighbouring node. Link breakage can't form a loop
  2) When a node sends an update message. The node changes its own sequence number and broadcasts this. This information is passed on from the neighbours.
- Thus, the closer you are, the more recent sequence number you know.
- When picking routes, we trust the routers who knows the most recent sequence number, in addition to picking the shortest route.
***** Sending updates
- Two types of updates; full and incremental
- Full updates contain information about all routes known by the sender. These are sent infrequently.
- Incremental updates contain only changed routes. These are sent regularly.
- Decreases control bandwidth.
- Full updates are sent in some relatively large interval
- Incremental updates are sent frequently
- Full updates are allowed to use multiple network protocol data units, NPDUs (??????), whereas incremental can only use one. Too many incremental to fit in a single -> send full instead
- When an update to a route is received, different actions are taken, depending on the information:
  1) If it's a new route, schedule for immediate update, send incremental update ASAP
  2) If a route has improved, send in next incremental update
  3) If sequence number has changed, but route hasn't, send in next incremental if space
***** Issue
- Suffers from routing fluctuations
- A node could repeatedly switch between a couple of routes
- Essentially, one route is slower, but for some reason the update comes from that first, while the other is quicker, but the number comes slower. You receive an update and update the route to be the slowest. Then you receive the slower update and have to to another update, as the new route is shorter.
- Fixed by introduction delay. If the cost to a destination changes this information is scheduled for advertisement at a time depending on the average settling time for that destination.
**** Optimised Link State Routing
- Designed to be effective in an environment with a dense population of mobile devices, which communicate often.
- Introduces multi point relay (MPR) sets. These are a subset of one-hop neighbours of a node, that is used for routing the messages of that node. These routers are called MPR selectors.
***** Multipoint relay set
- Selected independently by each node as a subset of its neighbours.
- Selected such that the set covers all nodes that are two hops away
- Doesn't have to be optimal
- Each node stores a list of both one-hop and two-hop neighbours. Collected from the hello messages which are broadcasted regardless. These should also contain neighbours. This means that all neighbours of the one-hop neighbours, must be the set of two-hop neighbours. We can then simply check if we know all.
***** Routing with MPR
- A topology control (TC) message is required to create a routing table for the entire network
- This is sent via the MPR and will eventually reach the entire network. It's not as much flooding as the standard LS algorithm.
**** Ad-hoc On-Demand Distance Vector
- Reactive
- Routes are acquired when they are needed
- Assumes symmetrical links
- Uses sequence numbers to avoid loops
***** Path Discovery
- When a node wishes to send something, a path discovery mechanism is triggered
- If node x wishes to send something to node y, but it doesn't know a route to y, a route request (RREQ) message is send to x's neighbours. The RREG contains:
  1) Source address
  2) Source seq no
  3) Broadcast id - A unique id of the current RREQ
  4) Destination addr
  5) Destionation seq no
  6) Hop count - The number of hops so far, incremented when RREQ is forwarded
- (source addr, broadcast id) uniquely identifies a RREQ. This can be used to check if RREG has been seen before.
- When RREQ is received, two actions can be taken
  1) If a route to the destination is known and that path has a sequence number equal or greater than the destionation seq no in the RREQ, it responds to the RREQ by sending a RREP (route reply) back to the source.
  2) If it doesn't have a recent route, it broadcasts the RREQ to neighbours with an increased hop count.
- When a RREQ is received, the address of the neighbour from whom this was received, is recorded. This allows for the generation of a reverse path, should the destination node be found.
- RREP contains source, destination addr, destionation seq no, the total number of hops from source to dest and a lifetime value for the route.
- If multiple RREPs are received by an intermediary node, only the first one is forwarded and the rest are forwarded if their destination sequence number is higher or they have a lower hop count, but the same dest seq no.
- When the RREP is send back to the source, the intermediary nodes record which node they received the RREP from, to generate a forward path to route data along.
***** Evaluation
- Tries to minimise control traffic flowing, by having nodes only maintain active routes.
- Loops prevented with sequence numbers
- No system wide broadcasts of entire routing tables
- Every route is only maintained, as long as it's used. It has a timeout and is discarded, if this timeout is reached.
- Path finding can be costly, as a long of RREG gets propagated through the network
- Expanding ring algorithm can help control the amount of messages going out, but if the receiver isn't close, this can be even more costly than the standard way
- Upon link failure; Upstream neighbour sends RREP with seq. no. +1 and hop count set to infinity to any active neighbours—that is neighbours that are using the route.
**** Dynamic Source Routing
- On-demand protocol
- DSR is a source routing protocol. This is main difference between DSR and AODV
- Source routing is a technique, where every message contains a header describing the entire path that the message must follow.
- When a message is received, the node checks if it's the destination node, if not, it forwards the message to the next node in the path.
- There is no need for intermediate nodes to keep any state about active routes, as was the case in the AODV protocol.
- DSR doesn't assume symmetrical links and can use uni-directional links, i.e. one route can be used from A to B and then a different route from B to A.
***** Path Discovery
- Discovery is similiar to AODV
- RREQ contains the source and destination address and a request id.
- Source address and request id defines the RREQ
- When an intermediate node receives a RREQ it does a few things.
  1) If it has no route to the dest, it appends itself to the list of nodes in the RREQ and then forwards it to its neighbours
  2) If it does have a route to the dest, it appends this route to the list of nodes and sends a RREP back to the source, containing this route.
- This system uses the same amount of messages, as AODV, and finds the same routes.
- When a node is ready to send RREP back to source, it can do one of 3 things:
  1) If it already has a route to the source, it can send RREP back along this path
  2) It can reverse the route in the RREP (i.e., the list the nodes append themselves to, when forwarding)
  3) It can initiate a new RREQ to find a route to the source
- The second option assumes symmetrical links.
- The third approach can cause a loop, as the source and the dest host can endlessly look for each other
- Can be avoided by piggybacking the RREP on the second RREQ message. The receiver of this RREQ will be given a path to use when returning the reply.
***** Route cache
- There is no route table
- DSR use a route cache of currently known routes. The route cache of a node is in effect a tree rooted at the node
- This tree can contain multiple routes to a single destination
- This means it's most robust against broken links, as even though a link breaks, another can maybe be used
- Might take up O(n²) space
***** Promiscuous mode operation
- DSR takes advantage of the fact that wireless devices can overhear messages that aren't addressed to them.
- Since messages tend to be broadcasted, other nodes within the range of the broadcast, can also read the message
- Having nodes overhear messages that are not addressed to them, is called promiscuous mode operation.
- It's not required for DSR to work, but it improves the protocol.
- When two nodes on a path moves out of transmission range, some sort of acking mechanism must be used. This is usually done by using link-layer acks, but if such functionality isn't available, this must be done through other means.
- A passive ack is when a host, after sending a message to the next hop host in a path, overhears that the receiving host is transmitting the message again. This can be taken as a sign, that the host has in fact received the message and is now in the process of forwarding it towards the next hop.
- A host that overhears a message may add the route of the message to its route cache
- It might also be an error message, then the route cache can be corrected.
- Can also be used for route shortening, if A sends to B who sends to C, but C overhears the message to B, C can send an RREP to A and let A know the route can be shortened.
***** Evaluation
- Like AODV, DSR only uses active routes, i.e. routes timeout
- Control messages used are kept low by using same optimisations as AODV
- Storage overhead is O(n) - Route cache and information about recently received RREQ
- Loops are easily avoided in source routing, since nodes can just check if they're already a part of a path. If so, message is discarded.
**** Zone Routing Protocol
- Hybrid protocol
- In ZRP, each node defines a zone consisting of all of it's n-hop neighbours, where n may be varied.
- Within this zone, the node proactively maintains a routing table of routes to all other nodes in the zone. This is done using intrazone routing protocol, which is LS based.
- These zones can be used, when sending to nodes within the zone
- Outside the zone, a re-active interzone routing scheme is used.
- This uses a concept called bordercasting.
- The source node sends a route request (essentially an RREQ message) to all of the nodes on the border of its zone.
- These border nodes check if they can reach the dest directly. If not, they propagate the message to their border nodes.
***** Evaluation
- Less control traffic when doing route discovery, as messages are either sent to border nodes (skipping a lot of intermediary hops) or they're sent directly to someone within the zone.
- More control messages within limited range of the zones though.
- Storage complexity of O(n²) where n is the number of neighbours within the zone.
- Since LS is used, the running time is O(m + n log n), where m is edges connecting the n nodes in the zone.
- In dense scenarios, ZRP won't be feasible.

** Energy Efficient MANET Routing
- All mentioned protocols in chapter 2 try to minimise control traffic, which, albeit does save energy since transmitting fewer messages is nice, but this is done primarily to avoid wasting bandwidth.
*** Introduction to energy efficient routing
- Two main approaches
  1) Power-save
  2) power-control
- Power-save is concerned with sleep states. In a power-save protocol the mobile nodes utilise that their network interfaces can enter into a sleep state where less energy is consumed.
- Power-control utilises no sleep states. Instead the power used when transmitting data is varied; which also varies transmission range of nodes.
- Power-control can save some energy, but the real energy saver is in power-save, as the real waste in most MANETs is idle time.
- As such, power-save is the most important, but power-control can be used to complement it.
- Goal of the energy efficiency is important to define:
- One approach is to maximise overall lifetime of the entire network
- Stronger nodes that have a longer battery life, may be asked to do a lot of the heavy lifting.
- Another approach is to use minimum energy when routing, such that the route using the minimum amount of energy is taken.
- The physical position of nodes can be important when making routing decisions.
- Protocols tend to assume there is some positioning mechanism available, such as GPS.
- This is not assumed here.
- A third energy saving approach is load balancing. The protocol attempts to balance the load in such a way that it maximises overall lifetime. (This sounds a lot like having a few strong nodes do heavylifting)
*** The power-control approach
- Power-control protocols cut down on energy consumption by controlling the transmission power of the wireless interfaces.
- Turning down transmission power when sending to neighbours is nice. It consumes less energy for the sender, since the range is lowered, less nodes have to spend energy overhearing the message.
- There is a non-linear relation between transmission range and energy used, thus, more hops might in fact yield less energy spent.
- System called PARO uses this, as it allows more intermediary nodes, if this lowers the overall cost of the path.
*** Power-save approach
- Protocols that use the power-save approach cut down on energy consumption by utilising the sleep states of the network interfaces
- When a node sleeps, it can't participate in the network
- This means these protocols have to either
  1) use retransmissions of messages to make sure that a message is received
  2) make sure that all of the nodes do not sleep at the same time, and thus delegate the work of routing data to the nodes that are awake.
- Power-save protocols define ways in which nodes can take turns sleeping and being awake, so that none, or at least a very small percentage of the messages sent in the network are lost, due to nodes being in the sleep state.
- They are specifications of how it is possible to maximise the amount of time that nodes are sleeping, while still retaining the same connectivity and loss rates comparable to a network where no nodes are sleeping.
- IEEE 802.11 ad hoc power saving mode, part of the IEEE standard, uses sleep states.
- It uses the protocol on the link layer and is as such independent of which routing protocol is used on network layer.
- BECA/AFECA uses retransmissions
- Span specifices when nodes can sleep and delegates routing to the rest
**** IEEE
- Beacon interval within which each node can take a number of actions
- In the end of each beacon interval, the nodes compete for transmission of the next beacon, the one who first transmits, win.
- In the beginning of ea h bea on interval all nodes must be awake.
- It works in a few phases, where nodes can announce to receivers that they want to send stuff. After this phase, any node which wasn't contacted, can safely sleep.
**** BECA/AFECA
- The difference between BECA and AFECA is that AFECA takes node density into consideration when determining the period of time that a node may sleep.
- Both approaches are only power saving algorithms and not routing protocols. This means that they need to work together with some existing MANET routing protocol.
- It makes sense to choose an on-demand routing protocol for this purpose, as pro-active would keep the nodes alive.
***** Basic Energy-Conserving algorithm (BECA)
- Based on retransmissions
- Consists of timing information that defines the periods that nodes spend in the different states defined by the algorithm, and a specification of how many retransmissions are needed.
- BECA has three states
  1) sleeping
  2) listening
  3) active
- Some rules to ensure no messages are lost
  1) T_listen = T_retransmissions
  2) T_sleep = k * T_retransmissions, for some k
  3) Number_of_retrans >= k + 1
  4) T_idle = T_retransmissions
- If A sends to B, but B sleeps, the message will be retrans R >= k + 1 times with interval T_restrans, until the message has been received.
- Since T_sleep is defined as k * T_retrans, at least one of the retrans will be received, even when B sleeps just before A transmits the message.
- Incurs higher latency, worst case k * T_retrans and on average (k * T_retrans) / 2. This latency is added for each hop.
- Thus, to keep this low, k must be somewhat small, which counteracts the energy saving.
- Thus, one needs to find a nice ratio.
- Apparently k = 1 is nice.
- A nice feature of BECA, which also applies to AFECA, is that in high traffic scenarios, where all nodes are on at all times, nodes are simply kept in the active state. In this way the power saving mechanism is disabled and the performance of the protocol is thus as good as the underlying protocol.
***** Adaptive Fidelity energy-conserving algorithm (AFECA)
- Same power save model as BECA, except instead of T_sleep, it has T_varia_sleep
- T_vs is varied according to amount of neighbours surrounding a node.
- This is estimated when in listening state, according to how many are overheard.
- Nodes are removed from the estimation after they timeout at T_gone time.
- T_vs is then defined as T_vs = Random(1, amount_of_neighbours) * T_sleep
- Sleep time of (N * T_sleep) / 2 on average
- Favours nodes in dense areas, due to N, which is amount_of_neighbours.
- When number_of_retrans isn't changed, but the sleep time is, packets might be lost. A fix could be to make this variable as well.
- Apparently doubles the overall lifetime, as network density rises.
**** Span
- Power-save approach based on notion of connected dominating sets (CDSs).
- A CDS is a connected subgraph S of G, such that every vertex u in G is either in S or adjacent to some vertex v in S.
- So all nodes can be reached from the CDS
- A CDS is ideal for routing purposes since the defnition of a CDS means that all nodes of the network can be reached from it. It is therefore possible to use the nodes in the CDS as the only routers in the network.
- These are called coordinators. They are the routing backbone.
- Non-coordinator nodes are thus not used for routing purposes and they may therefore spend some of their time sleeping.
- A coordinator selection scheme attempts to distribute the coordinator responsibility among the nodes.
- Nodes have battery capacity and utility. Utility being it's reach in the network.
- The coordinator selection algorithm is invoked periodically at every non-coordinator node. The result is a delay before the node becomes a coordinator.
- So is a coordinator-withdrawal algorithm, at the coordinators.
- the potential coordinator node needs information about its one and two-hop neighbours, and for each neighbour also whether that neighbour is a coordinator. This information is maintained pro-actively by using a standard HELLO message approach, as the one described, where each HELLO message contains information about neighbours and coordinators of the sending node.
- As mentioned both the utility of the node and the remaining energy is taken into consideration when finding new coordinators. The way that it is implemented is by using a randomised back-off delay that the node uses before announcing itself as a new coordinator.
- This ensures there is a somewhat linear relation between energy capacity and willingness to become a coordinator.
- Additionally, nodes that offer a good connectivity of the routing backbone are preferred, which means less coordinators overall.
- Also, there is a random part, such that the coordinator announcements are evenly distributed.
- After waiting for the calculated amount of time two things may have happened:
  1) Another node in the vicinity may have announced that it wants to become a coordinator
  2) No one announcements have been heard and the node thus announces that it's now a coordinator.
- Nodes can withdraw if everything is connected regardless of it being there or if anything can partly reach each other. Then the node becomes a tentative coordinator, who wants to leave and they aren't considered coordinators, for the coordinator selection algorithm.
- Span isn't a routing protocol.
- Span doesn't play nicely with AODV, since the neighbourhoods which can be used change often, as only the CDS may forward messages in Span, which results in a lot links breaking constantly. It's fine for geographic forwarding (a greedy GPS dependent protocol) though.
*** Span on BECA/AFECA
- Span needs to work with another power saving algorithm, that actually puts the nodes to sleep
- The neighbourhood information needed by Span can be piggybacked on the HELLO messages used by AODV.
- This can be used to build the CDS backbone
- We make coordinators be the only nodes who can forward RREQ. RREP simply follow the reverse path, as such no need to worry there.
- There is no need to perform retransmissions, since the coordinator nodes are always awake.
- A larger ratio between T_l and T_s is allowed, which results in lower energy consumption.
** A Survey on Sensor Networks
- The development of low-cost sensor networks has been enabled
- Can be used for various application areas (health, military, home).
- MANETs are intended to handle ad hoc communication from one arbitrary node to another
- Wireless Sensor Networks (WSN) is about sensing, collecting, and shipping data in one direction—the sink
*** Introduction
- A sensor network is composed of a large number of sensor nodes, that are densely deployed either inside the phenomenon or very close to is (phenomenon ???)
- The position need not be engineered or predetermined -> allows random deployment
- Means the networks must possess self-organizing capabilities
- Sensor nodes are fitted with an onboard processor, which allows for carrying out simple computations and thus transmitting only the required partially processed data, rather than all the raw data.
- For military, it helps the network has rapid deployment, self-organization and high fault tolerance.
- They require wireless ad hoc networking techniques. Many exists, but they aren't well suited to the unique features and application requirements of sensor networks.
- The difference between sensor networks and ad hoc networks:
  1) The number of sensor nodes in a sensor network can be several orders of magnitude higher than nodes in an ad hoc network
  2) Sensor nodes are densely deployed
  3) Sensor nodes are prone to failures
  4) The topology of a sensor network changes very frequently
  5) Sensor nodes mainly use a broadcast communication paradigm, whereas most ad hoc networks are based on point-to-point communications
  6) Sensor nodes are limited in power, computational capacities and memory
  7) Sensor nodes may not have a global identification ID, because of the large amount of overhead and large number of sensors
*** Sensor Networks Communication Architecture
- Sensor networks are usually scattered in a sensor field (basically one big cloud of sensors)
- The scattered sensor nodes has the capabilities to collect data and route data back to the sink
- Routing can be via a multihop infrastructureless architecture (I'd imagine they just broadcast until they find a node who can contact the sink ??)
- Design of the sensor network is influenced by many factors
  1) Fault tolerance
  2) Scalability
  3) Production costs
  4) Operation environment
  5) Sensor network topology
  6) Hardware constraints
  7) Transmission media
  8) Power consumption
**** Design Factors
***** Fault Tolerance
- Sensor nodes may fail or be blocked due to lack of power, or have physical damage or environmental interference
- Failure of nodes shouldn't affect overall task of network
- Fault tolerance describes the ability to sustain sensor network functionalities without interruption due to node failures
- The Reliability or Fault Tolerance is modeled using the Poisson Distribution
- The Poisson distribution expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known constant rate and independently of the time since the last event.
***** Scalability
- The number of sensor nodes deployed in studying a phenomenon may be the order of hundreds or thousands
- Protocols should utilise the high density in the sensor network
***** Production Cost
- The cost of a single node is very important, as a sensor network consists of many.
***** Hardware constraints
- A sensor node consists of a sensing unit, a processing unit, a transceiver unit and a power unit.
- Perhaps also application specific components (bluetooth, GPS, mobilizer, stuff like that)
***** Sensor Network Topology
- Perhaps 20 nodes/m3
- Topology maintenance and change in three phases
  1) Predeployment and deployment phase (Can be deployed one by one, thrown to the winds or even by rocket, stuff like that)
  2) Post-deployment phase (Topology changes can be due to reachability (because of jamming, noise and such), available energy, malfunctioning)
  3) Redeployment of additional nodes phase (Additional nodes can be deployed)
***** Environment
- Can be inside, outside, in the ocean, chemically contaminated field, wherever
***** Transmission Media
- Links can be formed by radio, infrared or optical media (or really whatever)
***** Power consumption
- Only space for limited power source
- Nodes dying can force changes to topology and such, which can be costly.
**** Protocol Stack
- Combines power and routing awareness, integrates data with networking protocols, communicates power effeciently through the wireless medium and promotes cooperative efforts of sensor nodes.
- Consists of physical layer (simple but robust modulation, transmission and receiving techniques), data link layer, network layer (routing data supplied by transport layer), transport layer, application layer, power management plane, mobility management plane and task management plane.

***** Physical Layer
- Long distance wireless communication can be expensive
- The physical layer is responsible for frequency selection, carrier frequency generation, signal detection, modulation and data encryption. Thus far, the 915 MHz ISM band has been widely suggested for sensor networks.
**** Data Link Layer
- The data link layer is responsible for the multi- plexing of data streams, data frame detection, medium access and error control. It ensures reli- able point-to-point and point-to-multipoint connections in a communication network
- The medium access control (MAC) protocol in a wireless multi-hop self-organizing sensor network must achieve two goals. The first is the creation of the network infrastructure. The second objective is to fairly and efficiently share communication resources between sensor nodes.
***** Power Saving Modes of Operation
- Regardless of which type of medium access scheme is used for sensor networks, it certainly must support the operation of power  modes for the sensor node.
- The most obvious means of power conservation is to turn the transceiver off when it is not required. Although this power saving method seemingly provides significant energy gains, an important point  must not be overlooked is that sensor nodes communicate using short data packets.
- The shorter the packets, the more the dominance of startup energy
- Turning the transceiver off during idling may not always be efficient due to energy spent in turning it back each time.
- As a result, operation in a power-saving mode is energy-efficient only if the time spent in that mode is greater than a certain threshold.
- The threshold time is found to depend on the transition times and the individual power consumption of the modes in question.
***** Error Control
- Another important function of the data link layer is the error control of transmission data. Two important modes of error control in communication networks are forward error correction (FEC) and automatic repeat request (ARQ)
- Forward Error Correction + Link reliability is an important parameter in the design of any wireless network, and more so in sensor net- works, due to the unpredictable and harsh nature of channels encountered in various application scenarios.
  + Apparently encodes and decodes the messages, which costs processing power. Ratio to determine if this is worth it. +  The central idea is the sender encodes the message in a redundant way by using an error-correcting code (ECC).
**** Network Layer
- Traditional ad hoc routing techniques do not usually fit the requirements of the sensor networks due to the reasons explained earlier.
- The networking layer of sensor networks is usually designed according to the following principles:
  1) Power efficiency is always an important consideration.
  2) Sensor networks are mostly data-centric.
  3) Data aggregation is useful only when it does not hinder the collaborative effort of the sensor nodes.
  4) An ideal sensor network has attribute-based addressing and location awareness.
- Energy-efficient routes can be found based on the available power (PA) in the nodes or the energy required (α) for transmission in the links along the routes.
- An energy-efficient route is selected by one of the following approaches.
***** Maximum PA Route
- The route that has a maximum total PA is preferred
- The total PA is calculated by summing the PAs of each node along the route
- it is important not to consider routes derived by extending routes that can connect the sensor node to the sink as an alternative route
  + Otherwise, we could end up simply going through the highest possible amount of nodes, as this would lead to the largest PA, but it would be hella inefficient.
- Can be useful however. This somewhat resembles the routing protocol where the strongest should carry the biggest burden.
***** Minimum energy (ME) route
- The route that consumes minimum energy to transmit the data packets between the sink and the sensor node is the ME route.
***** Minimum hop (MH) route
- The route that makes the minimum hop to reach the sink is preferred.
- Note that the ME scheme selects the same route as the MH when the same amount of energy (i.e., all α are the same) is used on every link.
***** Maximum minimum PA node route
- The route along which the minimum PA is larger than the minimum PAs of the other routes is preferred.
- This scheme precludes the risk of using up a sensor node with low PA much earlier than the others because they are on a route with nodes that have very high PAs.
- I reckon the PA here refers to the individual nodes PA.
- prefer path with the largest of the smallest PA along the route.
***** Data-centric approach
- In datacentric routing, interest dissemination is performed to assign the sensing tasks to the sensor nodes. There are two approaches used for interest dissemination: sinks broadcast the interest, and sensor nodes broadcast an advertisement for the available data and wait for a request from the interested nodes.
  1) Either sinks broadcast what they want
  2) Nodes broadcast they have something. Metadata is cheap to broadcast.
- Data-centric routing requires attribute-based naming. For attribute based naming, the users are more interested in querying an attribute of the phenomenon, rather than querying an individual node. + The areas where the temp is higher than 70 degrees, rather than the temp at this specific node
- Attribute-based naming is used to carry out queries by using the attributes of the phenomenon. Attribute-based naming also makes broadcasting, attribute-based multicasting, geocasting, and anycasting important for sensor networks.
- Data aggregation is a technique used to solve the implosion and overlap problems in data-centric routing
- In this technique, a sensor network is usually perceived as a reverse multicast tree where the sink asks the sensor nodes to report the ambient condition of the phenomena. Data coming from multiple sensor nodes are aggregated as if they are about the same attribute of the phenomenon when they reach the same routing node on the way back to the sink.
- With this respect, data aggregation is known as data fusion
- In sensor networks, data can be aggregated, i.e., collected into bigger packets along the way the sink
* Accessing and Developing WoT
** Chapter 6
*** REST STUFF
- The first layer is called access. This layer is aptly named Access because it covers the most fundamental piece of the WoT puzzle: how to connect a Thing to the web so that it can be accessed using standard web tools and libraries.
- REST provides a set of architectural constraints that, when applied as a whole, empha- sizes scalability of component interactions, generality of interfaces, independent deploy- ment of components, and intermediary components to reduce interaction latency, enforce security, and encapsulate legacy systems.
- In short, if the architecture of any distributed system follows the REST constraints, that system is said to be RESTful.
- Maximises interoperability and scalability
- Five constraints: Client/server, Uniform interfaces, Stateless, Cacheable, Layered system
**** Client/server
- Maximises decoupling, as client doesn't need to know how the server works and vice versa
- Such a separation of concerns between data, control logic, and presentation improves scalability and portability because loose coupling means each component can exist and evolve independently.
**** Uniform interfaces
- Loose coupling between components can be achieved only when using a uniform interface that all components in the system respect.
-  This is also essential for the Web of Things because new, unknown devices can be added to and removed from the system at any time, and interacting with them will require min- imal effort.
**** Stateless
- The client context and state should be kept only on the client, not on the server.
- Each request to server should contain client state, visibility (monitoring and debugging of the server), robustness (recovering from network or application failures) and scalability are improved.
**** Cacheable
- Caching is a key element in the performance (loading time) of the web today and therefore its usability.
- Servers can define policies as when data expires and when updates must be reloaded from the server.
**** Layered
- For example, in order to scale, you may make use of a proxy behaving like a load balancer. The sole purpose of the proxy would then be to forward incoming requests to the appropriate server instance.
- Another layer may behave like a gateway, and translate HTTP requests to other protocols.
- Similarly, there may be another layer in the architecture responsible for caching responses in order to minimize the work needed to be done by the server.
**** HATEOAS
- Servers shouldn’t keep track of each client’s state because stateless applications are easier to scale. Instead, application state should be addressable via its own URL, and each resource should contain links and information about what operations are possible in each state and how to navigate across states. HATEOAS is particularly useful at the Find layer
**** Principles of the uniform interface of the web
- Our point here is that what REST and HTTP have done for the web, they can also do for the Web of Things. As long as a Thing follows the same rules as the rest of the web—that is, shares this uniform interface—that Thing is truly part of the web. In the end, the goal of the Web of Things is this: make it possible for any physical object to be accessed via the same uniform interface as the rest of the web. This is exactly what the Access layer enables
- Addressable resources—A resource is any concept or piece of data in an application that needs to be referenced or used. Every resource must have a unique identi- fier and should be addressable using a unique referencing mechanism. On the web, this is done by assigning every resource a unique URL.
- Manipulation of resources through representations—Clients interact with services using multiple representations of their resources. Those representations include HTML, which is used for browsing and viewing content on the web, and JSON, which is better for machine-readable content.
- Self-descriptive messages—Clients must use only the methods provided by the pro- tocol—GET, POST, PUT, DELETE, and HEAD among others—and stick to their meaning as closely as possible. Responses to those operations must use only well-known response codes—HTTP status codes, such as 200, 302, 404, and 500.
- Hypermedia as the engine of the application state (HATEOAS)—Servers shouldn’t keep track of each client’s state because stateless applications are easier to scale. Instead, application state should be addressable via its own URL, and each resource should contain links and information about what operations are possi- ble in each state and how to navigate across states.
***** Principle #1, adressable resources
- REST is a resource-oriented architecture (ROA)
- A resource is explicitly identified and can be individually addressed, by its URI
- A URI is a sequence of characters that unambiguously identifies an abstract or physi- cal resource. There are many possible types of URIs, but the ones we care about here are those used by HTTP to both identify and locate on a network a resource on the web, which is called the URL (Uniform Resource Locator) for that resource.
- An important and powerful consequence of this is the addressability and portability of resource identifiers: they become unique (internet- or intranet-wide)
- Hierachical naming!
***** Principle #2, manipulation of resources through representation
- On the web, Multipurpose Internet Mail Extensions (MIME) types have been introduced as standards to describe various data for- mats transmitted over the internet, such as images, video, or audio. The MIME type for an image encoded as PNG is expressed with image/png and an MP3 audio file with audio/mp3. The Internet Assigned Numbers Authority (IANA) maintains the list of the all the official MIME media types.
- The tangible instance of a resource is called a representation, which is a standard encoding of a resource using a MIME type.
- HTTP defines a simple mechanism called content negotiation that allows clients to request a preferred data format they want to receive from a specific service. Using the Accept header, clients can specify the format of the representation they want to receive as a response. Likewise, servers specify the format of the data they return using the Content-Type header.
- The Accept: header of an HTTP request can also contain not just one but a weighted list of media types the client understands
- MessagePack can be used to pack JSON into a binary format, to make it lighter.
- A common way of dealing with unofficial MIME types is to use the x- extension, so if you want your client to ask for MessagePack, use Content-Type: application/x-msgpack.
***** Principle #3: self-descriptive messages
-  REST emphasizes a uniform interface between components to reduce coupling between operations and their implementation. This requires every resource to support a standard, common set of operations with clearly defined semantics and behavior.
- The most commonly used among them are GET, POST, PUT, DELETE, and HEAD. Although it seems that you could do everything with just GET and POST, it’s important to correctly use all four verbs to avoid bad surprises in your applications or introducing security risks.
- CRUD operations; create, read, update and delete
- HEAD is a GET, but only returns the headers
- POST should be used only to create a new instance of something that doesn’t have its own URL yet
- PUT is usually modeled as an idempotent but unsafe update method. You should use PUT to update something that already exists and has its own URL, but not to create a new resource
- Unlike POST, it’s idempotent because sending the same PUT message once or 10 times will have the same effect, whereas a POST would create 10 different resources.
- A bunch of error codes as well: 200, 201, 202, 401, 404, 500, 501
- CORS—ENABLING CLIENT-SIDE JAVASCRIPT TO ACCESS RESOURCES
***** CORS
- Although accessing web resources from different origins located on various servers in any server-side application doesn’t pose any problem, JavaScript applications running in web browsers can’t easily access resources across origins for security reasons. What we mean by this is that a bit of client-side JavaScript code loaded from the domain apples.com won’t be allowed by the browser to retrieve particular representations of resources from the domain oranges.com using particular verbs.
- This security mechanism is known as the same- origin policy and is there to ensure that a site can’t load any scripts from another domain. In particular, it ensures that a site can’t misuse cookies to use your credentials to log onto another site.
- Fortunately for us, a new standard mechanism called cross-origin resource sharing (CORS)9 has been developed and is well supported by most modern browsers and web servers.
When a script in the browser wants to make a cross-site request, it needs to include an Origin header containing the origin domain. The server replies with an Access- Control-Allow-Origin header that contains the list of allowed origin domains (or * to allow all origin domains)
- When the browser receives the reply, it will check to see if the Access-Control- Allow-Origin corresponds to the origin, and if it does, it will allow the cross-site request.
For verbs other than GET/HEAD, or when using POST with representations other than application/x-www-form-urlencoded, multipart/form-data, or text/ plain, an additional request called preflight is needed. A preflight request is an HTTP request with the verb OPTIONS that’s used by a browser to ask the target server whether it’s safe to send the cross-origin request.
***** Principle #4 : Hypermedia as the Engine of Application State
- contains two subconcepts: hypermedia and application state.
- This fourth principle is centered on the notion of hypermedia, the idea of using links as connections between related ideas.
- Links have become highly popular thanks to web browsers yet are by no means limited to human use. For example, UUIDs used to identify RFID tags are also links.
- Based on this representation of the device, you can easily follow these links to retrieve additional information about the subresources of the device
- The application state—the AS in HATEOAS—refers to a step in a process or workflow, similar to a state machine, and REST requires the engine of application state to be hypermedia driven.
- Each possible state of your device or application needs to be a RESTful resource with its own unique URL, where any client can retrieve a representation of the current state and also the possible transitions to other states. Resource state, such as the status of an LED, is kept on the server and each request is answered with a representation of the current state and with the necessary information on how to change the resource state, such as turn off the LED or open the garage door.
- In other words, applications can be stateful as long as client state is not kept on the server and state changes within an application happen by following links, which meets the self-contained-messages constraint.
- The OPTIONS verb can be used to retrieve the list of operations permitted by a resource, as well as metadata about invocations on this resource.

***** Five-step process
- A RESTful architecture makes it possible to use HTTP as a universal protocol for web-connected devices. We described the process of web-enabling Things, which are summarized in the five main steps of the web Things design process:
1) Integration strategy—Choose a pattern to integrate Things to the internet and the web, either directly or through a proxy or gateway. This will be covered in chapter 7, so we’ll skip this step for now.
2) Resource design—Identify the functionality or services of a Thing and organize the hierarchy of these services. This is where we apply design rule #1: address- able resources.
3) Representation design—Decide which representations will be served for each resource. The right representation will be selected by the clients, thanks to design rule #2: content negotiation.
4) Interface design—Decide which commands are possible for each service, along with which error codes. Here we apply design rule #3: self-descriptive messages.
5) Resource linking design—Decide how the different resources are linked to each other and especially how to expose those resources and links, along with the operations and parameters they can use. In this final step we use design rule #4: Hypermedia as the Engine of Application State.

**** Design rules
***** #2–CONTENT NEGOTIATION
- Web Things must support JSON as their default representation.
- Web Things support UTF8 encoding for requests and responses
- Web Things may offer an HTML interface/representation (UI).
***** #3 : Self-descriptive messages
- Web Things must support the GET, POST, PUT, and DELETE HTTP verbs.
- Web Things must implement HTTP status codes 20x, 40x, 50x.
- Web Things must support a GET on their root URL.
- Web Things should support CORS
***** #4 : HATEOAS
- Web Things should support browsability with links.
- Web Things may support OPTIONS for each of its resources.

*** EVENT STUFF
**** Events and stuff
- Unfortunately, the request-response model is insufficient for a number of IoT use cases. More precisely, it doesn’t match event-driven use cases where events must be communicated (pushed) to the clients as they happen.
- A client-initiated model isn’t practical for applications where notifications need to be sent asynchronously by a device to clients as soon as they’re produced.
- polling is one way of circumventing the problem, however it's inefficient, as the client will need to make many requests which will simply return the same response. Additionally, we might not "poll" at the exact time an event takes place.
- Most of the requests will end up with empty responses (304 Not Modified) or with the same response as long as the value observed remains unchanged.
**** Publish/subscribe
- What’s really needed on top of the request-response pattern is a model called publish/subscribe (pub/sub) that allows further decoupling between data consumers (subscribers) and producers (publishers). Publishers send messages to a central server, called a broker, that handles the routing and distribution of the messages to the various subscribers, depending on the type or content of messages.
- A publisher can send notifications into a topic, which subscribers can have subscribed to
**** Webhooks
- The simplest way to implement a publish-subscribe system over HTTP without break- ing the REST model is to treat every entity as both a client and a server. This way, both web Things and web applications can act as HTTP clients by initiating requests to other servers, and they can host a server that can respond to other requests at the same time. This pattern is called webhooks or HTTP callbacks and has become popular on the web for enabling different servers to talk to each other.
- The implementation of this model is fairly simple. All we need is to implement a REST API on both the Thing and on the client, which then becomes a server as well. This means that when the Thing has an update, it POSTs it via HTTP to the client
- Webhooks are a conceptually simple way to implement bidirectional communication between clients and servers by turning everything into a server.
- webhooks have one big drawback: because they need the subscriber to have an HTTP server to push the notification, this works only when the subscriber has a publicly accessible URL or IP address.
**** Comet
- Comet is an umbrella term that refers to a range of techniques for circumventing the limitations of HTTP polling and webhooks by introducing event-based communication over HTTP.
- This model enables web servers to push data back to the browser without the client requesting it explicitly. Since browsers were initially not designed with server-sent events in mind, web application developers have exploited several specification loop- holes to implement Comet-like behavior, each with different benefits and drawbacks.
- Among them is a technique called long polling
- With long poll- ing, a client sends a standard HTTP request to the server, but instead of receiving the response right away, the server holds the request until an event is received from the sensor, which is then injected into the response returned to the client’s request that was held idle. As soon as the client receives the response, it immediately sends a new request for an update, which will be held until the next update comes from the sensor, and so on.
**** Websockets
- WebSocket is part of the HTML5 specification. The increasing support for HTML5 in most recent web and mobile web browsers means WebSocket is becoming ubiquitously available to all web apps
- WebSockets enables a full-duplex communication channel over a single TCP connection. In plain English, this means that it creates a permanent link between the client and the server that both the client and the server can use to send messages to each other. Unlike techniques we’ve seen before, such as Comet, WebSocket is standard and opens a TCP socket. This means it doesn’t need to encapsulate custom, non-web content in HTTP messages or keep the connection artificially alive as is needed with Comet implementations.
- A websockets starts out with a handshake: The first step is to send an HTTP call to the server with a special header asking for the protocol to be upgraded to WebSockets. If the web server sup- ports WebSockets, it will reply with a 101 Switch- ing Protocols status code, acknowledging the opening of a full-duplex TCP socket.
- Once the initial handshake takes place, the client and the server will be able to send messages back and forth over the open TCP connection; these messages are not HTTP messages but WebSockets data frames
- The overhead of each WebSockets data frame is 2 bytes, which is small compared to the 871-byte overhead of an HTTP message meta- data (headers and the like)
- the hierarchical structure of Things and their resources as URLs can be reused as-is for WebSockets.
- we can subscribe to events for a Thing’s resource by using its corre- sponding URL and asking for a protocol upgrade to WebSockets. Moreover, Web- Sockets do not dictate the format of messages that are sent back and forth. This means we can happily use JSON and give messages the structure and semantics we want.
- Moreover, because WebSockets consist of an initial handshake followed by basic message framing layered over TCP, they can be directly implemented on many plat- forms supporting TCP/IP—not just web browsers. They can also be used to wrap sev- eral other internet-compatible protocols to make them web-compatible. One example is MQTT, a well-known pub/sub protocol for the IoT that can be inte- grated to the web of browsers via WebSockets
- The drawback, however, is that keeping a TCP connection permanently open can lead to an increase in battery consumption and is harder to scale than HTTP on the server side.
**** HTTP/2
- This new version of HTTP allows multiplexing responses—that is, sending responses in parallel, This fixes the head-of-line blocking problem of HTTP/1.x where only one request can be outstanding on a TCP/IP connection at a time.
- HTTP/2 also introduces compressed headers using an efficient and low-memory compression format.
- Finally, HTTP/2 introduces the notion of server push. Concretely, this means that the server can provide content to clients without having to wait for them to send a request. In the long run, widespread adoption of server push over HTTP/2 might even remove the need for an additional protocol for push like WebSocket or webhooks.
*** SUMMARY
- When applied correctly, the REST architecture is an excellent substrate on which to create large-scale and flexible distributed systems.
- REST APIs are interesting and easily applicable to enable access to data and ser- vices of physical objects and other devices.
- Various mechanisms, such as content negotiation and caching of Hypermedia as the Engine of Application State (HATEOAS), can help in creating great APIs for Things.
- A five-step design process (integration strategy, resource design, representation design, interface design, and resource linking) allows anyone to create a mean- ingful REST API for Things based on industry best practices.
- The latest developments in the real-time web, such as WebSockets, allow creat- ing highly scalable, distributed, and heterogeneous real-time data processing applications. Devices that speak directly to the web can easily use web-based push messaging to stream their sensor data efficiently.
- HTTP/2 will bring a number of interesting optimizations for Things, such as multiplexing and compression.
** Chapter 7
*** Connecting to the web
**** Direct Integration
- The most straightforward integration pattern is the direct integration pattern. It can be used for devices that support HTTP and TCP/IP and can therefore expose a web API directly. This pattern is particularly useful when a device can directly connect to the internet; for example, it uses Wi-Fi or Ethernet
**** Gateway Integration
- Second, we explore the gateway integra- tion pattern, where resource-constrained devices can use non-web protocols to talk to a more powerful device (the gateway), which then exposes a REST API for those non-web devices. This pattern is particularly useful for devices that can’t connect directly to the internet; for example, they support only Bluetooth or ZigBee or they have limited resources and can’t serve HTTP requests directly.
**** Cloud Integration
- Third, the cloud integration pattern allows a powerful and scalable web platform to act as a gateway. This is useful for any device that can connect to a cloud server over the internet, regardless of whether it uses HTTP or not, and that needs more capability than it would be able to offer alone.
*** Five step process
1) Integration strategy—Choose a pattern to integrate Things to the internet and the web. The patterns are presented in this chapter.
2) Resource design—Identify the functionality or services of a Thing, and organize the hierarchy of these services.
3) Representation design—Decide which representations will be served for each resource.
4) Interface design—Decide which commands are possible for each service, along with which error codes.
5) Resource linking design—Decide how the different resources are linked to each other.
**** Direct integration
- the direct integration pattern is the perfect choice when the device isn’t battery powered and when direct access from clients such as mobile web apps is required.
- the resource design. You first need to consider the physical resources on your device and map them into REST resources.
- The next step of the design process is the representation design. REST is agnostic of a par- ticular format or representation of the data. We mentioned that JSON is a must to guarantee interoperability, but it isn’t the only interesting data representation available.
- a modular way based on the middleware pattern.
- In essence, a middleware can execute code that changes the request or response objects and can then decide to respond to the client or call the next middleware in the stack using the next() function.
- The core of this implementation is using the Object.observe() function.9 This allows you to asynchronously observe the changes happening to an object by registering a callback to be invoked whenever a change in the observed object is detected.
**** Gateway integration pattern
- Gateway integration pattern. In this case, the web Thing can’t directly offer a web API because the device might not support HTTP directly. An application gateway is working as a proxy for the Thing by offering a web API in the Thing’s name. This API could be hosted on the router in the case of Bluetooth or on another device that exposes the web Thing API; for example, via CoAP.
- The direct integration pattern worked well because your Pi was not battery powered, had access to a decent bandwidth (Wi-Fi/Ethernet), and had more than enough RAM and storage for Node. But not all devices are so lucky. Native sup- port for HTTP/WS or even TCP/IP isn’t always possible or even desirable. For batterypowered devices, Wi-Fi or Ethernet is often too much of a power drag, so they need to rely on low-power protocols such as ZigBee or Bluetooth instead. Does it mean those devices can’t be part of the Web of Things? Certainly not.
- Such devices can also be part of the Web of Things as long as there’ s an intermedi- ary somewhere that can expose the device’s functionality through a WoT API like the one we described previously. These intermediaries are called application gateways (we’ll call them WoT gateways hereafter), and they can talk to Things using any non-web application protocols and then translate those into a clean REST WoT API that any HTTP client can use.
- They can add a layer of security or authentication, aggregate and store data temporarily, expose semantic descriptions for Things that don’t have any, and so on.
- CoAP is a service layer protocol that is intended for use in resource-constrained internet devices, such as wireless sensor network nodes. CoAP is designed to easily translate to HTTP for simplified integration with the web
- CoAP is an interesting protocol based on REST, but because it isn’t HTTP and uses UDP instead of TCP, a gateway that translates CoAP messages from/to HTTP is needed
- It’s therefore ideal for device-to-device communi- cation over low-power radio communication, but you can’t talk to a CoAP device from a JavaScript application in your browser without installing a special plugin or browser extension. Let’s fix this by using your Pi as a WoT gateway to CoAP devices.
- By proxying, the gateway essentially just send a request to the CoAP device whenever the gateway receives a request and it'll return the value to the requester, once it receives a value from the CoAP device.
***** Summary
- For some devices, it might not make sense to support HTTP or WebSockets directly, or it might not even be possible, such as when they have very limited resources like memory or processing, when they can’t connect to the internet directly (such as your Bluetooth activity tracker), or when they’re battery-powered. Those devices will use more optimized communication or application protocols and thus will need to rely on a more powerful gateway that connects them to the Web of Things, such as your mobile phone to upload the data from your Bluetooth bracelet, by bridging/translat- ing various protocols. Here we implemented a simple gateway from scratch using Express, but you could also use other open source alternatives such as OpenHab13 or The Thing System.
**** Cloud Integration pattern
- Cloud integration pattern. In this pattern, the Thing can’t directly offer a Web API. But a cloud service acts as a powerful application gateway, offering many more features in the name of the Thing. In this particular example, the web Thing connects via MQTT to a cloud service, which exposes the web Thing API via HTTP and the WebSockets API. Cloud services can also offer many additional features such as unlimited data storage, user management, data visualization, stream processing, support for many concurrent requests, and more.
- Using a cloud server has several advantages. First, because it doesn’t have the physical constraints of devices and gateways, it’s much more scalable and can process and store a virtually unlimited amount of data. This also allows a cloud platform to support many protocols at the same time, handle protocol translation efficiently, and act as a scalable intermediary that can support many more concurrent clients than an IoT device could.
- Second, those platforms can have many features that might take consid- erable time to build from scratch, from industry-grade security, to specialized analytics capabilities, to flexible data visualization tools and user and access management
- Third, because those platforms are natively connected to the web, data and services from your devices can be easily integrated into third-party systems to extend your devices.
*** Summary
- There are three main integration patterns for connecting Things to the web: direct, gateway, and cloud.
- Regardless of the pattern you choose, you’ll have to work through the following steps: resource design, representation design, and interface design.
- Direct integration allows local access to the web API of a Thing. You tried this by building an API for your Pi using the Express Node framework.
- The resource design step in Express was implemented using routes, each route representing the path to the resources of your Pi.
- We used the idea of middleware to implement support for different representa- tions— for example, JSON, MessagePack, and HTML—in the representation design step.
- The interface design step was implemented using HTTP verbs on routes as well as by integrating a WebSockets server using the ws Node module.
- Gateway integration allows integrating Things without web APIs (or not sup- porting web or even internet protocols) to the WoT by providing an API for them. You tried this by integrating a CoAP device via a gateway on your cloud.
- Cloud integration uses servers on the web to act as shadows or proxies for Things. They augment the API of Things with such features as scalability, analy- tics, and security. You tried this by using the EVRYTHNG cloud.
* Discovery and Security for the Web of Things
** Chapter 8
- Having a single and common data model that all web Things can share would further increase interoperability and ease of integration by making it possible for applications and services to interact without the need to tailor the application manually for each specific device.
- The ability to easily discover and understand any entity of the Web of Things—what it is and what it does—is called findability.
- How to achieve such a level of interoperability—making web Things findable—is the purpose of the second layer
- The goal of the Find layer is to offer a uniform data model that all web Things can use to expose their metadata using only web standards and best practices.
- Metadata means the description of a web Thing, including the URL, name, current location, and status, and of the services it offers, such as sensors, actuators, com- mands, and properties
- this is useful for discovering web Things as they get con- nected to a local network or to the web. Second, it allows applications, services, and other web Things to search for and find new devices without installing a driver for that Thing
*** Findability problem
- For a Thing to be interacted with using HTTP and WebSocket requests, there are three fundamental problems
  1) How do we know where to send the requests, such as root URL/resources of a web Thing?
  2) How do we know what requests to send and how; for example, verbs and the format of payloads?
  3) How do we know the meaning of requests we send and responses we get, that is, semantics?
- The bootstrap problem. This problem is concerned with how the ini- tial link between two entities on the Web of Things can be established.
-  Lets assume the Thing can be found, how is it interacted with, if it exposes a UI at the root of its URL? In this case, a clean and user- centric web interface can solve problem 3 because humans would be able to read and understand how to do this.
- Problem 2 also would be taken care of by the web page, which would hardcode which request to send to which endpoint.
- But what if the heater has no user interface, only a RESTful API?1 Because Lena is an experienced front-end developer and never watches TV, she decides to build a sim- ple JavaScript app to control the heater. Now she faces the second problem: even though she knows the URL of the heater, how can she find out the structure of the heater API? What resources (endpoints) are available? Which verbs can she send to which resource? How can she specify the temperature she wants to set? How does she know if those parameters need to be in Celsius or Fahrenheit degrees?
*** Discovering Things
- The bootstrap problem deals with two scopes:
  1) first, how to find web Things that are physically nearby—for example, within the same local network
  2) second, how to find web Things that are not in the same local network—for example, find devices over the web.
**** Network discovery
- In a computer network, the ability to automatically discover new participants is common.
- In your LAN at home, as soon as a device connects to the network, it automatically gets an IP address using DHCP
- Once the device has an IP address, it can then broadcast data packets that can be caught by other machines on the same network.
- a broadcast or multicast of a message means that this message isn’t sent to a particular IP address but rather to a group of addresses (multicast) or to everyone (broadcast), which is done over UDP.
- This announcement process is called a network discovery protocol, and it allows devices and applications to find each other in local networks. This process is commonly used by various discovery protocols such as multicast Domain Name System (mDNS), Digital Living Network Alliance (DLNA), and Universal Plug and Play (UPnP).
- Most internet-connected TVs and media players can use DLNA to discover network-attached storage (NAS)
- your laptop can find and configure printers on your network with minimal effort thanks to network-level discovery protocols such as Apple Bonjour that are built into iOS and OSX.
***** mDNS
- In mDNS, clients can discover new devices on a network by listening for mDNS mes- sages such as the one in the following listing. The client populates the local DNS tables as messages come in, so, once discovered, the new service—here a web page of a printer—can be used via its local IP address or via a URI usually ending with the .local domain. In this example, it would be http://evt-bw-brother.local.
- The limitation of mDNS, and of most network-level discovery protocols, is that the network-level information can’t be directly accessed from the web.
***** Network discovery on the web
- Because HTTP is an Application layer protocol, it doesn’t know a thing about what’s underneath—the network protocols used to shuffle HTTP requests around.
- The real question here is why the configu- ration and status of a router is only available through a web page for humans and not accessible via a REST API. Put simply, why don’t all routers also offer a secure API where its configuration can be seen and changed by others’ devices and applications in your network?
- Providing such an API is easy to do. For example, you can install an open-source operating system for routers such as OpenWrt and modify the software to expose the IP addresses assigned by the DHCP server of the router as a JSON document.
- This way, you use the existing HTTP server of your router to create an API that exposes the IP addresses of all the devices in your network. This makes sense because almost all net- worked devices today, from printers to routers, already come with a web user inter- face. Other devices and applications can then retrieve the list of IP addresses in the network via a simple HTTP call (step 2 in figure 8.3) and then retrieve the metadata of each device in the network by using their IP address (step 3 of figure 8.3).
***** Resource discovery on the web
- Although network discovery does the job locally, it doesn’t propagate beyond the boundaries of local networks.
- how do we find new Things when they connect, how do we understand the services they offer, and can we search for the right Things and their data in composite applications?
- On the web, new resources (pages) are discovered through hyperlinks. Search engines periodically parse all the pages in their database to find outgoing links to other pages. As soon as a link to a page not yet indexed is found, that new page is parsed and added to directory. This process is known as web crawling.
***** Crawling
- From the root HTML page of the web Thing, the crawler can find the sub-resources, such as sensors and actuators, by discovering outgoing links and can then create a resource tree of the web Thing and all its resources. The crawler then uses the HTTP OPTIONS method to retrieve all verbs supported for each resource of the web Thing. Finally, the crawler uses content negotiation to understand which format is available for each resource.
***** HATEOAS and web linking
- The simple way of crawling, of basically looping through links found is a good start, but it also has several limitations. First, all links are treated equally because there’s no notion of the nature of a link; the link to the user interface and the link to the actuator resource look the same—they’re just URLs.
- Additionally, it requires the web Thing to offer an HTML interface, which might be too heavy for resource-constrained devices. Finally, it also means that a client needs to both understand HTML and JSON to work with our web Things.
- A better solution for discovering the resources of any REST API is to use the HATEOAS principle to describe relationships between the various resources of a web Thing.
- A simple method to implement HATEOAS with REST APIs is to use the mechanism of web linking defined in RFC 5988. The idea is that the response to any HTTP request to a resource always contains a set of links to related resources—for example, the previous, next, or last page that contains the results of a search. These would be contained in the LINK header.
- encoding the links as HTTP headers introduces a more general framework to define relationships between resources outside the representation of the resource—directly at the HTTP level.
- When doing an HTTP GET on any Web Thing, the response should include a Link header that contains links to related resources. In particular, you should be able to get information about the device, its resources (API endpoints), and the documentation of the API using only Link headers.
- The URL of each resource is contained between angle brackets (<URL>) and the type of the link is denoted by rel="X", where X is the type of the rela- tion.
***** New HATEOAS rel link things
- REL="MODEL" : This is a link to a Web Thing Model resource; see section 8.3.1.
- REL="TYPE" : This is a link to a resource that contains additional metadata about this web Thing.
- REL="HELP" : This relationship type is a link to the documentation, which means that a GET to devices.webofthings.io/help would return the documentation for the API in a human-friendly (HTML) or machine-readable (JSON) format.
- REL="UI" : This relationship type is a link to a graphical user interface (GUI) for interacting with the web Thing.
*** Describing web Things
- knowing only the root URL is insufficient to interact with the Web Thing API because we still need to solve the sec- ond problem mentioned at the beginning of this chapter: how can an application know which payloads to send to which resources of a web Thing?
- how can we formally describe the API offered by any web Thing?
- The simplest solution is to provide a written documentation for the API of your web Thing so that developers can use it (1 and 2 in figure 8.4).
- This approach, however, is insufficient to automatically find new devices, understand what they are, and what services they offer.
- In addition, manual implementation of the payloads is more error-prone because the developer needs to ensure that all the requests they send are valid
- By using a unique data model to define formally the API of any web Thing (the Web Thing Model), we’ll have a powerful basis to describe not only the metadata but also the operations of any web Thing in a standard way (cases 3 and 4 of figure 8.4).
- This is the cornerstone of the Web of Things: creating a model to describe physical Things with the right balance between expressiveness—how flexible the model is—and usability— how easy it is to describe any web Thing with that model.
**** Introducing the Web Thing model
- Once we find a web Thing and understand its API structure, we still need a method to describe what that device is and does. In other words, we need a conceptual model of a web Thing that can describe the resources of a web Thing using a set of well-known concepts.
- In the previous chapters, we showed how to organize the resources of a web Thing using the /sensors and /actuators end points. But this works only for devices that actually have sensors and actuators, not for complex objects and scenarios that are com- mon in the real world that can’t be mapped to actuators or sensors. To achieve this, the core model of the Web of Things must be easily applicable for any entity in the real world, ranging from packages in a truck, to collectible card games, to orange juice bot- tles. This section provides exactly such a model, which is called the Web Thing Model.
***** Entities
- the Web of Things is composed of web Things.
- A web Thing is a digital representation of a physical object—a Thing—accessible on the web. Think of it like this: your Facebook profile is a digital representation of yourself, so a web Thing is the “Facebook profile” of a physical object.
- The web Thing is a web resource that can be hosted directly on the device, if it can connect to the web, or on an intermediate in the network such as a gateway or a cloud service that bridges non-web devices to the web.
- All web Things should have the following resources:
  1) Model—A web Thing always has a set of metadata that defines various aspects about it such as its name, description, or configurations.
  2)  Properties—A property is a variable of a web Thing. Properties represent the internal state of a web Thing. Clients can subscribe to properties to receive a notification message when specific conditions are met; for example, the value of one or more properties changed.
  3) Actions—An action is a function offered by a web Thing. Clients can invoke a function on a web Thing by sending an action to the web Thing. Examples of actions are “open” or “close” for a garage door, “enable” or “disable” for a smoke alarm, and “scan” or “check in” for a bottle of soda or a place. The direc- tion of an action is from the client to the web Thing.
  4) Things—A web Thing can be a gateway to other devices that don’t have an inter- net connection. This resource contains all the web Things that are proxied by this web Thing. This is mainly used by clouds or gateways because they can proxy other devices.
****** Metadata
- In the Web Thing Model, all web Things must have some associated metadata to describe what they are. This is a set of basic fields about a web Thing, including its identifiers, name, description, and tags, and also the set of resources it has, such as the actions and properties. A GET on the root URL of any web Thing always returns the metadata using this format, which is JSON by default
****** Properties
- Web Things can also have properties. A property is a collection of data values that relate to some aspect of the web Thing. Typically, you’d use properties to model any dynamic time series of data that a web Thing exposes, such as the current and past states of the web Thing or its sensor values—for example, the temperature or humid- ity sensor readings.
****** Actions
- Actions are another important type of resources of a web Thing because they represent the various commands that can be sent to that web Thing.
- In theory, you could also use properties to change the status of a web Thing, but this can be a prob- lem when both an application and the web Thing itself want to edit the same property.
- The actions object of the Web Thing Model has an object called resources, which contains all the types of actions (commands) supported by this web Thing.
- Actions are sent to a web Thing with a POST to the URL of the action {WT}/actions/{id}, where id is the ID of the action
****** Things
- a web Thing can act as a gateway between the web and devices that aren’t connected to the internet. In this case, the gateway can expose the resources—properties, actions, and metadata—of those non-web Things using the web Thing.
- The web Thing then acts as an Application-layer gateway for those non-web Things as it converts incoming HTTP requests for the devices into the various protocols or interfaces they support natively. For example, if your WoT Pi has a Bluetooth dongle, it can find and bridge Bluetooth devices nearby and expose them as web Things.
- The resource that contains all the web Things proxied by a web Thing gateway is {WT}/things, and performing a GET on that resource will return the list of all web Things currently available
**** The WoT pie model
- A new tree structure, fitting the discussed model, where the different sensors end up in /properties, setLedState ends up in /actions, we have no /things and /model is the metadata as well as all sensor data, their properties, the actions, everything.
- Following the model allows for dynamically creating routes and such, as all information is maintained in the model of the Thing, /model, /properties, /actions, /things.
**** Summary
- In this section, we introduced the Web Thing Model, a simple JSON-based data model for a web Thing and its resources. We also showed how to implement this model using Node.js and run it on a Raspberry Pi. We showed that this model is quite easy to understand and use, and yet is sufficiently flexible to represent all sorts of devices and products using a set of properties and actions. The goal is to propose a uniform way to describe web Things and their capabilities so that any HTTP client can find web Things and interact with them. This is sufficient for most use cases, and this model has all you need to be able to generate user interfaces for web Things automatically.
*** The Semantic Web of Things (Ontologies)
- In an ideal world, search engines and any other applications on the web could also understand the Web Thing Model. Given the root URL of a web Thing, any applica- tion could retrieve its JSON model and understand what the web Thing is and how to interact with it.
- The question now is how to expose the Web Thing Model using an existing web standard so that the resources are described in a way that means some- thing to other clients. The answer lies in the notion of the Semantic Web and, more precisely, the notion of linked data that we introduce in this section.
- Semantic Web refers to an extension of the web that promotes common data formats to facilitate meaningful data exchange between machines. Thanks to a set of stan- dards defined by the World Wide Web Consortium (W3C), web pages can offer a stan- dardized way to express relationships among them so that machines can understand the meaning and content of those pages. In other words, the Semantic Web makes it easier to find, share, reuse, and process information from any content on the web thanks to a common and extensible data description and interchange format.
**** Linked Data and RDFa
- The HTML specification alone doesn’t define a shared vocabulary that allows you to describe in a standard and non-ambiguous manner the elements on a page and what they relate to.
***** Linked Data
- Enter the vision of linked data, which is a set of best practices for publishing and connecting structured data on the web, so that web resources can be interlinked in a way that allows computers to automatically understand the type and data of each resource.
- This vision has been strongly driven by complex and heavy standards and tools centered on the Resource Description Framework (RDF)
- Although powerful and expressive, RDF would be overkill for most simple scenarios, and this is why a simpler method to structure con- tent on the web is desirable.
- RDFa emerged as a lighter version of RDF that can be embedded into HTML code
- Most search engines can use these annotations to generate better search listings and make it easier to find your websites.
- using RDFa to describe the metadata of a web Thing will make that web Thing findable and search- able by Google.
***** RFDa
- vocab defines the vocabulary used for that element, in this case the Web of Things Model vocabulary defined previously.
- property defines the various fields of the model such as name, ID, or descrip- tion.
- typeof defines the type of those elements in relation to the vocabulary of the element.
- This allows other applications to parse the HTML representation of the device and automatically understand which resources are available and how they work.
***** JSON-LD
- JSON-LD is an interesting and lightweight semantic annotation format for linked data that, unlike RDFa and Microdata, is based on JSON.29 It’s a simple way to semanti- cally augment JSON documents by adding context information and hyperlinks for describing the semantics of the different elements of a JSON objects.
***** Micro-summary
- This simple example already illustrates the essence of JSON-LD it gives a context to the content of a JSON document. As a consequence, all clients that understand the http://schema.org/Product context will be able to automatically process this informa- tion in a meaningful way. This is the case with search engines, for example. Google and Yahoo! process JSON-LD payloads using the Product schema to render special search results; as soon as it gets indexed, our Pi will be known by Google and Yahoo! as a Raspberry Pi product. This means that the more semantic data we add to our Pi, the more findable it will become. As an example, try adding a location to your Pi using the Place schema,33 and it will eventually become findable by location.
We could also use this approach to create more specific schemas on top of the Web Thing Model; for instance, an agreed-upon schema for the data and functions a wash- ing machine or smart lock offers. This would facilitate discovery and enable automatic integration with more and more web clients.
*** Summary
- The ability to find nearby devices and services is essential in the Web of Things and is known as the bootstrap problem. Several protocols can help in discover- ing the root URL of Things, such as mDNS/Bonjour, QR codes or NFC tags.
- The last step of the web Things design process, resource linking design (also known as HATEOAS in REST terms), can be implemented using the web linking mechanism in HTTP headers.
- Beyond finding the root URL and sub-resources, client applications also need a mechanism to discover and understand what data or services a web Thing offers.
- The services of Things can be modeled as properties (variables), actions (func- tions), and links. The Web Thing Model offers a simple, flexible, fully web-com- patible, and extensible data model to describe the details of any web Thing. This model is simple to adapt for your devices and easy to use for your products and applications.
- The Web Thing Model can be extended with more specific semantic descriptions such as those based on JSON-LD and available from the Schema.org repository.
** Chapter 9
- In most cases, Internet of Things deployments involve a group of devices that com- municate with each other or with various applications within closed networks— rarely over open networks such as the internet. It would be fair to call such deploy- ments the “intranets of Things” because they’re essentially isolated, private net- works that only a few entities can access. But the real power of the Web of Things lies in opening up these lonely silos and facilitating interconnection between devices and applications at a large scale.
- when it comes to public data such as data.gov initiatives, real-time traffic/weather/pollution conditions in a city, or a group of sensors deployed in a jungle or a volcano, it would be great to ensure that the general public or researchers anywhere in the world could access that data. This would enable anyone to create new innovative applications with it and possibly gener- ate substantial economic, environmental, and social value.
- How to share this data in secure and flexible way is what Layer 3 provides,
- The Share layer of the Web of Things. This layer focuses on how devices and their resources must be secured so that they can only be accessed by authorized users and applications.
- First, we’ll show how Layer 3 of the WoT architecture covers the security of Things: how to ensure that only authorized parties can access a given resource. Then we’ll show how to use existing trusted systems to allow sharing physical resources via the web.
*** Securing Things
- Ultimately, every security breach hurts the entire web because it erodes the overall trust of users in technology.
- Security in the Web of Things is even more critical than in the web. Because web Things are physical objects that will be deployed everywhere in the real world, the risks associated with IoT attacks can be catastrophic.
- Digitally augmented devices allow collecting fine-grained information about people, when they took their last insulin shot, their last jog and where they ran. It can also be used to remote control cars, houses and the like.
- the majority of IoT solutions don’t comply with even the most basic security best practices; think clear-text passwords and communications, invalid certificates, old software versions with exploitable bugs, and so on.
**** Securing the IoT has three major problems
- First, we must consider how to encrypt the communications between two enti- ties (for example, between an app and a web Thing) so that a malicious inter- ceptor—a “man in the middle”—can’t access the data being transmitted in clear text. This is referred to as securing the channel
- Second, we must find a way to ensure that when a client talks to a host, it can ensure that the host is really “himself”
- Third, we must ensure that the correct access control is in place. We need to set up a method to control which user can access what resource of what server or Thing and when and then to ensure that the user is really who they claim to be.
**** Encryption 101
- encryption is an essential ingredient for any secure system.
- Without encryption, any attempt to secure a Thing will be in vain because attackers can sniff the communication and understand the security mechanisms that were put in place.
***** Symmetric Encryption
- The oldest form of encoding a message is symmetric encryption. The idea is that the sender and receiver share a secret key that can be used to both encode and decode a message in a specific way
***** Assymetric Encryption
- another method called asymmetric encryption has become popular because it doesn’t require a secret to be shared between parties. This method uses two related keys, one public and the other private (secret)
**** Web Security with TLS: The S of HTTPS
- Fortunately , there are standard protocols for securely encrypting data between clients and servers on the web.
- The best known protocol for this is Secure Sockets Layer (SSL)
- SSL 3.0 has a lot of vulnerabilities (Heartbleed and the like). These events inked the death of this proto- col, which was replaced by the much more secure but conceptually similar Transport Layer Security (TLS)
***** TLS 101
- Despite its name, TLS is an Application layer protocol (see chapter 5). TLS not only secures HTTP (HTTPS) communication but is also the basis of secure WebSocket (WSS) and secure MQTT (MQTTS)
- First, it helps the client ensure that the server is who it says it is; this is the SSL/TLS authentication. Second, it guarantees that the data sent over the communication channel can’t be read by any- one other than the client and the server involved in the transaction (also known as SSL/TLS encryption).
1) The client, such as a mobile app, tells the server, such as a web Thing, which protocols and encryption algorithms it supports. This is somewhat similar to the content negotiation process we described in chapter 6.
2) The server sends the public part of its certificate to the client. The goal here is for the client to make sure it knows who the server is. All web clients have a list of certificates they trust.12 In the case of your Pi, you can find them in /etc/ssl/certs. SSL certificates form a trust chain, meaning that if a client doesn’t trust certificate S1 that the server sends back, but it trusts certificate S2 that was used to sign S1, the web client can accept S1 as well.
3) The rest of the process generates a key from the public certificates. This key is then used to encrypt the data going back and forth between the server and the client in a secure manner. Because this process is dynamic, only the client and the server know how to decrypt the data they exchange during this session. This means the data is now securely encrypted: if an attacker manages to capture data packets, they will remain meaningless.
***** Beyond Self-signed certificates
- Clearly, having to deal with all these security exceptions isn’t nice, but these excep- tions exist for a reason: to warn clients that part of the security usually covered by SSL/ TLS can’t be guaranteed with the certificate you generated. Basically, although the encryption of messages will work with a self-signed certificate (the one you created with the previous command), the authenticity of the server (the Pi) can’t be guaran- teed. In consequence, the chain of trust is broken—problem 2
- In an IoT context, this means that attackers could pretend to be the Thing you think you’re talk- ing to.
- The common way to generate certificates that guarantee the authenticity of the server is to get them from a well-known and trusted certificate authority (CA). There exists an amount of these; LetsEncrypt, Symantec and GeoTrust.
*** Authentication and access control
- Once we encrypt the communication between Things and clients as shown in the pre- vious section, we want to enable only some applications to access it.
- First, this means that the Things—or a gateway to which Things are connected—need to be able to know the sender of each request (identification).
- Second, devices need to trust that the sender really is who they claim to be (authentication)
- Third, the devices also need to know if they should accept or reject each request depending on the identity of this sender and which request has been sent (authorization).
**** Access control with REST and API tokens
- Server-based authentication is used when we use our username/password to log into a website, we initiate a secure session with the server that's stored for a limited time in the server application's memory or in a local browser cookie.
- server-based authentication is usually stateful because the state of the client is stored on the server. But as you saw in chapter 6, HTTP is a stateless protocol; therefore, using a server-based authentication method goes against this principle and poses certain problems. First, the performance and scalability of the overall systems are limited because each session must be stored in memory and over- head increases when there are many authenticated users. Second, this authentication method poses certain security risks—for example, cross-site request forgery.
- alternative method called token-based authentication has become popular and is used by most web APIs.
- Because this token is added to the headers or query parameters of each HTTP request sent to the server, all interactions remain stateless.
- API tokens shouldn’t be valid forever. API tokens, just like passwords, should change regularly.
**** OAuth: a web authorization framework
- OAuth is an open standard for authorization and is essentially a mechanism for a web or mobile app to delegate the authentication of a user to a third-party trusted service; for example, Facebook, LinkedIn, or Google.
- OAuth dynamically generates access tokens using only web protocols.
- OUath allows sharing resources and token sharing between applications.
- In short, OAuth standardizes how to authenticate users, generate tokens with an expiration date, regenerate tokens, and provide access to resources in a secure and standard manner over the web.
- At the end of the token exchange process, the application will know who the user is and will be able to access resources on the resource server on behalf of the user. The application can then also renew the token before it expires using an optional refresh token or by running the authorization process again.
- OAuth delegated authentication and access flow. The application asks the user if they want to give it access to resources on a third-party trusted service (resource server). If the user accepts, an authorization grant code is generated. This code can be exchanged for an access token with the authorization server. To make sure the authorization server knows the application, the application has to send an app ID and app secret along with the authorization grant code. The access token can then be used to access protected resources within a certain scope from the resource server.
- Implementing an OAuth server on a Linux-based embedded device such as the Pi or the Intel Edison isn’t hard because the protocol isn’t really heavy. But maintaining the list of all applications, users, and their access scope on each Thing is clearly not going to work and scale for the IoT.
***** OAuth Roles
- A typical OAuth scenario involves four roles
 1) A resource owner—This is the user who wants to authorize an application to access one of their trusted accounts; for example, your Facebook account.
 2) The resource server—Is the server providing access to the resources the user wants to share? In essence, this is a web API accepting OAuth tokens as credentials.
 3) The authorization server—This is the OAuth server managing authorizations to
access the resources. It’s a web server offering an OAuth API to authenticate and authorize users. In some cases, the resource server and the authorization server can be the same, such as in the case of Facebook.
 4) The application—This is the web or mobile application that wants to access the resources of the user. To keep the trust chain, the application has to be known by the authorization server in advance and has to authenticate itself using a secret token, which is an API key known only by the authorization server and the application.
*** The Social Web of Things
- Using OAuth to manage access control to Things is tempting, but not if each Thing has to maintain its own list of users and application. This is where the gateway integration pattern can be used.
- use the notion of delegated authentication offered by OAuth, which allows you to use the accounts you already have with OAuth providers you trust, such as Facebook, Twitter, or LinkedIn.
- The Social Web of Things is usually what covers the sharing of access to devices via existing social network relationships.
**** A Social Web of Things authentication proxy
- The idea of the Social Web of Things is to create an authentication proxy that controls access to all Things it proxies by identifying users of client applications using trusted third-party services.
- Again, we have four actors: a Thing, a user using a client application, an authenti- cation proxy, and a social network (or any other service with an OAuth server). The client app can use the authentication proxy and the social network to access resources on the Thing. This concept can be implemented in three phases:
  1) The first phase is the Thing proxy trust. The goal here is to ensure that the proxy can access resources on the Thing securely. If the Thing is protected by an API token (device token), it could be as simple as storing this token on the proxy. If the Thing is also an OAuth server, this step follows an OAuth authentication flow, as shown in figure 9.6. Regardless of the method used to authenticate, after this phase the auth proxy has a secret that lets it access the resources of the Thing.
  2) The second phase is the delegated authentication step. Here, the user in the client app authenticates via an OAuth authorization server as in figure 9.6. The authentication proxy uses the access token returned by the authorization server to identify the user of the client app and checks to see if the user is authorized to access the Thing. If so, the proxy returns the access token or generates a new one to the client app.
  3) The last phase is the proxied access step. Once the client app has a token, it can use it to access the resources of the Thing through the authentication proxy. If the token is valid, the authentication proxy will forward the request to the Thing using the secret (device token) it got in phase 1 and send the response back to the client app.
- All communication is encrypted using TLS
- Social Web of Things authentication proxy: the auth proxy first establishes a secret with the Thing over a secure channel. Then, a client app requests access to a resource via the auth proxy. It authenticates itself via an OAuth server (here Facebook) and gets back an access token. This token is then used to access resources on the Thing via the auth proxy. For instance, the /temp resource is requested by the client app and given access via the auth proxy forwarding the request to the Thing and relaying the response to the client app.
**** Leveraging Social Networks
- This is the very idea of the Social Web of Things: instead of creating abstract access control lists, we can reuse existing social structures as a basis for sharing our Things. Because social networks increasingly reflect our social relationships, we can reuse that knowledge to share access to our Things with friends via Facebook, or work colleagues via LinkedIn.
****  Implementing Access Control Lists
- In essence, you need to create an access control list (ACL). There are various ways to implement ACLs, such as by storing them in the local database.
**** Proxying Resources of Things
- Finally, you need to implement the actual proxying: once a request is deemed valid by the middleware, you need to contact the Thing that serves this resource and proxy the results back to the client.
*** Beyond book
- But just as HTTP might be too heavy for resource-limited devices, security pro- tocols such as TLS and their underlying cypher suites are too heavy for the most resource-constrained devices. This is why lighter-weight versions of TLS are being developed, such as DTLS,26 which is similar to TLS but runs on top of UDP instead of TCP and also has a smaller memory footprint
- device democracy.27 In this model, devices become more autonomous and favor peer-to-peer interactions over centralized cloud services. Security is ensured using a blockchain mechanism: similar to the way bitcoin transactions are validated by a number of independent parties in the bitcoin network, devices could all participate in making the IoT secure.
*** Summary
- You must cover four basic principles to secure IoT systems: encrypted commu- nication, server authentication, client authentication, and access control.
- Encrypted communication ensures attackers can’t read the content of mes- sages. It uses encryption mechanisms based on symmetric or asymmetric keys.
- You should use TLS to encrypt messages on the web. TLS is based on asymmetric keys: a public key and a private server key.
- Server authentication ensures attackers can’t pretend to be the server. On the web, this is achieved by using SSL (TLS) certificates. The delivery of these certif- icates is controlled through a chain of trust where only trusted parties called certificate authorities can deliver certificates to identify web servers.
- Instead of buying certificates from a trusted third party, you can create self- signed TLS certificates on a Raspberry Pi. The drawback is that web browsers will flag the communication as unsecure because they don’t have the CA certifi- cate in their trust store.
- You can achieve client authentication using simple API tokens. Tokens should rotate on a regular basis and should be generated using crypto secure random algorithms so that their sequence can’t be guessed.
- The OAuth protocol can be used to generate API tokens in a dynamic, standard, and secure manner and is supported by many embedded Linux devices such as the Raspberry Pi.
- The delegated authentication mechanism of OAuth relies on other OAuth pro- viders to authenticate users and create API tokens. As an example, a user of a Thing can be identified using Facebook via OAuth.
- You can implement access control for Things to reflect your social contacts by creating an authentication proxy using OAuth for clients’ authentication and contacts from social networks.

* BitTorrent
** Incentives Build Robutness in BitTorrent
- BitTorrent file distribution uses tit-for-tat as a method of seeking pareto efficiency
  + Pareto =  ingen kan opnå en bedre stilling uden at en anden samtidig opnår en ringere stilling
*** What BitTorrent Does
- When a file is made available using HTTP, all cost is placed on the host machine.
- With BitTorrent, when multiple people are downloading the same file, they can upload pieces to each other.
- This makes a hosting a file with a potentially unlimited amount of downloaders, affordable.
- It has been attempted to find practical ways of doing this before, however, issues have been encountered in regards to what peers have what files, where these files should be sent and the systems tend to have issues with high churn rate, as peers usually don't connect for more than a few hours.
- Also has problems with fairness, as the total download of all peers must be the total upload of all peers.
- In practice it’s very difficult to keep peer download rates from sometimes dropping to zero by chance, much less make upload and download rates be correlated.
- BitTorrent solves these problems.
**** BitTorrent Interface
- BitTorrent is easy to download and launch. This ease of use has contributed greatly to its adoption, as it doesn't take a computer scientist to understand it.
**** Deployment
- The publisher of a file decides if he/she wants to use BitTorrent for distribution
- Downloaders will then use BitTorrent, as it's the only way of getting the file
- There is a risk of downloaders ceasing uploading as soon as their download completes, however it's considered polite to leave it on. This is also a requirement for some trackers.
- Usually the number of incomplete downloaders (people who do not have the whole file), increases very rapidly once the file is made available. This peaks and then falls off at a roughly exponential rate, as people finish the download.
*** Technical Framework
**** Publishing Content
- To start a BitTorrent deployment, a static file with the extension .torrent is put on an ordinary web server. The .torrent contains information about the file, its length, name, and hashing information, and the url of a tracker.
- Trackers help downloaders find each other
- A tracker is simply a server which speaks HTTP, where you can send what you're downloading, what port you listen on and such
- There has to be a seeder to begin with, as someone must have the full file.
**** Peer Distribution
- All logistical problems of downloading are handled between peers. Some information about download and upload rates are sent to the tracker, merely for statistics.
- Tracker only HAS to help peers find each other.
- Although trackers are the only way for peers to find each other, and the only point of coordination at all, the standard tracker algorithm is to return a random list of peers.
- Random graphs have very good robustness properties. Many peer selection algorithms result in a power law graph, which can get segmented after only a small amount of churn. Note that all connections between peers can transfer in both directions.
- random graph is the general term to refer to probability distributions over graphs. Random graphs may be described simply by a probability distribution, or by a random process which generates them.
- A power law graph has many nodes with few links and a few nodes with many links.
- In order to keep track of which peers have what, BitTorrent cuts files into pieces of fixed size, typically a quarter megabyte.
- Hashes are used to verify integrity of files. The hashes are included in the .torrent file
- Peers don't report they have a piece, until they've checked the hash
**** Pipelining
- BitTorrent further breaks the chunks of file into smaller pieces, such that it can request each piece and always keep a static number of these requested. This makes the download more smooth.
**** Piece selection
- Selecting pieces to download in a good order is very important for good performance. A poor piece selection algorithm can result in having all the pieces which are currently on offer or, on the flip side, not having any pieces to upload to peers you wish to.
***** Strict Priority
- Always finish a particular piece, before ordering subpieces from another
- Essentially what we do in SilverStream. This allows for streaming.
***** Rarest First
- When selecting which piece to start downloading next, peers generally download pieces which the fewest of their own peers have first, a technique we refer to as ’rarest first’.
- It also makes sure that pieces which are more common are left for later, so the likelihood that a peer which currently is offering upload will later not have anything of interest is reduced.
- It also makes it easier for the seeder, as leachers will see the others already have the old pieces and as such they will download the new pieces from the seed.
- Additionally, if the seeder stops seeding, it's important that the entire file is circulating still. This is helped by the leachers downloading the "rare" pieces first, such that together, they have the entire file.
***** Random first piece
- An exception to rarest first is when downloading starts. At that time, the peer has nothing to  so it’s important to get a complete piece as quickly as possible.
- Rare pieces tend to only be on one peer, so these would be downloaded slower, than pieces on multiple peers.
- As such, random pieces are selected until a complete piece is made. Then rarest first is used.
***** Endgame mode
- Used in the end when a few sub-pieces are missing
- Just starts broadcasting what it's missing, in an attempt to finish
- In practice doesn't waste much bandwidth, as endgame is short.
*** Choking Algorithms
- BitTorrent does no central resource allocation. Each peer is responsible for attempting to maximize  own download rate.
- Peers do this by downloading from whoever they can and then decide which peers to upload to.
- If peers cooperate, they upload to each other, if they don't want to, they choke other peers.
- Choking is thus a temporary refusal to upload, while they can still download. (I assume they mean the peer who chokes others, can still download from others)
- The choking algorithm isn’t technically part of the BitTorrent wire protocol, but is necessary for good performance. A good choking algorithm should utilize all available resources, provide reasonably consistent download rates for everyone, and be somewhat resistant to peers only downloading and not uploading.
**** Pareto Efficiency
- Well known economic theories show that systems which are pareto efficient, meaning that no two counterparties can make an exchange and both be happier, tend to have all of the above properties.
- In computer science, this is a local optimization algorithm in which pairs of counterparties see if they can improve their lot together. These tend to lead to global optima.
- if two peers are both getting poor reciprocation for some of the upload they are providing, they can often start uploading to each other instead and both get a better download rate than they had before.
- Peers reciprocate uploading to peers which upload to them, with the goal of at any time of having several connections which are actively transferring in both directions. Unutilized connections are also uploaded to on a trial basis to see if better transfer rates could be found using them.
**** BitTorrent's Choking Algorithm
- Each BitTorrent peer always unchokes a fixed number of other peers, default is four.
- This approach allows TCPs built-in congestion control to reliably saturate upload capacity. TCP is used by bittorrent as transport protocol.
- Decisions as to who to unchoke, is based on current download rate. (Calculating current download rate is apparently difficult, so a rolling 20-second average is used)
- To avoid situations in which resources are wasted by rapidly choking and unchoking peers, BitTorrent peers recalculate who they want to choke once every ten seconds, and then leave the situation as is until the next ten second period is up.
**** Optimistic Unchoking
- Simply uploading to the peers which provide the best download rate would suffer from having no method of discovering if currently unused connections are better than the ones being used.
- To fix this, at all times a BitTorrent peer has a single ‘optimistic unchoke’, which is unchoked regardless of the current download rate from it. Which peer is the optimistic unchoke is rotated every third rechoke period
**** Anti-snubbing
- Occasionally, a BitTorrent peer will be choked by all which it was downloading from.
- This will suck, until it gets optimistically unchoked
- When over a minute goes without downloading from a particular peer, BitTorrent assumes it is "snubbed" by that peer and doesn't upload to it, unless it gets optimistically unchoked.
**** Upload Only
- Once download is finished, BitTorrent switches to preferring peers which it has better upload rates to and also preferring peers which no one else happens to be uploading to.

** Attacking a Swarm with a Band of Liars: evaluating the impact of attacks on BitTorrent
*** Introduction
- Peer-to-Peer (P2P) file sharing has become one of the most relevant network applications, allowing the fast dis- semination of content in the Internet. In this category, BitTorrent is one of the most popular protocols.
- BitTorrent is now being used as the core technology of content delivery schemes with proper rights management that are being put in operation (e.g., Azureus Vuze)
- Major companies like Warner Brothers, 20 th Century Fox and BBC are distributing content through it.
- as the popularity of BitTorrent grows, so does the risk and impact of malicious attacks exploiting its potential vulnerabilities.
- This paper identifies attacks on BitTorrent and evaluates their impact on downloading efficiency (and eventual success) of peers taking part on a session (called swarm).
*** Related work
- Several issues with selfish peers, who seek to contribute nothing or as little as it can
- in our work downloading is not relevant to a peer, whose sole intention is to hinder a swarm (using the minimum amount of resources needed). The idea is to make use of false piece announcements and/or large number of sybils to slow down or, ideally, to prevent content from being distributed.
*** The BitTorrent Architecture
- The content (a set of files) to be distributed is described through a meta-data file whose extension is typically .torrent.
- A tracker is a central element that coordinates a swarm and helps peers to find other peers in the same swarm.
- The content published is organized in pieces, and these are subdivided into blocks.
- Peers establish a connection with each other and exchange bitfields containing information about piece availability.
- Three peers are marked as unchoked, and a fourth peer is chosen randomly for uploading between the connected ones. (optimistic unchoking)
**** Peer Connection Policy
- Peers obtain from a tracker the IP addresses from other peers in the same swarm. Peers that take part in the swarm periodically connect to the tracker and this way notify their presence. After a peer connects to the tracker, it informs the number of peers ( numwant ) that it wishes to receive, 50 by default.
*** Subversion and Attack Strategies
**** Sybil
- The Sybil attack in P2P networks consists in a single peer presenting itself with multiple, virtual identities, usually with the aim of exploiting reputation-based systems.
- Using this attack, a malicious peer (henceforth called a liar) may get to represent a substantial fraction of the P2P network, and thus compromise it. + Apparently a Certification Authority is the best form of protection. I'd imagine this is because the peer would need a certificate for all of it's identities and this would be difficult to get?
- In BitTorrent, identities are randomly generated. An attacker, therefore, may exploit this vulnerability and obtain multiple identities.
- Peers that refuse to cooperate can be banned, however it can be difficult to distinguish between people who won't cooperate on purpose from people who have connection issues.
**** Lying Piece Possession
- As discussed in Section 3, peers employ messages Bit-field and Have to inform peers about piece possession. Following the LRF policy, (correct) peers strive to increase uniformity in the amount of copies of each piece. A Piece Lying attack aims at destroying this balance.
- a malicious peer does not adhere to the protocol and announces a piece it does not have (thus it is a liar).
- Thus, it artificially increases the level of replication of a potentially rare piece, causing other peers to download more common pieces first. This could lead to the piece simply disappearing from the network, as no one keeps it circulating. This fucks up the swarm.
- As a malicious peer does not wish to deliver the announced piece, it keeps other peers permanently choked.
- The impact of attacks is generally more effective when performed by many peers acting in collusion. In the specific case of making a piece ever rarer, we expect more peers to make the attack more harmful. The more peers lie about a given piece, the more frequent it will appear to become, and thus in practice the rarer in fact it will be.
**** Eclipsing Correct Peers
- If an attacker has enough physical resources or creates great number of Sybils, it can attack a swarm using a large number of malicious peers.
- The same set of peers can attack multiple swarms
- In an Eclipse attack, a set of malicious, colluding peers arranges for a correct node to peer only with members of the coalition. If successful, the attacker can mediate most or all communication to and from the victim.
- In BitTorrent, this attack inserts a sufficiently high amount of evil peers, so correct ones connect mostly, or only, with evil ones.
- A peer can by default connect to 55, thus one only need 55 evil peers to mess with one good one.
**** Evaluation
- 25 piece liers are inserted. There liars state they have the same 4 pieces, effectively causing this pieces to eventually disappear, as no other peers wants them, according to most rare first. This halts the network after relatively short time, as no leachers can turn to seeders, as no leachers can finish their download.
- In general, more liars means a slower network.
- Sybil attacks are in general effective, more become more and more effective, as the amount of sybils increase.
- Results indicate that BitTorrent is susceptible to attacks in which malicious peers in collusion lie about the possession of pieces and make them artificially rarer.

** Do Incentives Build Robustness in BitTorrent?
*** Abstract
- A fundamental problem with many peer-to-peer systems is the tendency for users to “free ride”—to consume resources without contributing to the system.
- The popular file distribution tool BitTorrent was explicitly designed to address this problem, using a tit-for-tat reciprocity strategy to provide positive incentives for nodes to contribute resources to the swarm.
*** Introduction
- In early peer-to-peer systems such as Napster, the novelty factor sufficed to draw plentiful participation from peers.
- The tremendous success of BitTorrent suggests that TFT is successful at inducing contributions from rational peers. Moreover, the bilateral nature of TFT allows for enforcement without a centralized trusted infrastructure.
- discover the presence of significant altruism in BitTorrent, i.e., all peers regularly make contributions to the system that do not directly improve their performance.
- BitTyrant, a modified BitTorrent client designed to benefit strategic peers. The key idea is to carefully select peers and contribution rates so as to maximize download per unit of upload bandwidth. The strategic behavior of BitTyrant is executed simply through policy modifications to existing clients without any change to the BitTorrent protocol.
- We find that peers individually benefit from BitTyrant’s strategic behavior, irrespective of whether or not other peers are using BitTyrant.
- Peers not using BitTyrant can experience degraded performance due to the absence of altruisitic contributions. Taken together, these results suggest that “incentives do not build robustness in BitTorrent”.
- Robustness requires that performance does not degrade if peers attempt to strategically manipulate the system, a condition BitTorrent does not meet today.
- Average download times currently depend on significant altruism from high capacity peers that, when withheld, reduces performance for all users.
*** BitTorrent Overview
**** Protocol
- BitTorrent focuses on bulk data transfer. All users in a particular swarm are interested in obtaining the same file or set of files.
- Torrent files contain name, metadata, size of files and fingerprints of the data blocks.
- These fingerprints are used to verify data integrity. The metadata file also specifies the address of a tracker server for the torrent, which interactions between peers participating in the swarm.
- Peers exchange blocks and control information with a set of directly connected peers we call the local neighborhood.
- This set of peers, obtained from the tracker, is unstructured and random, requiring no special join or recovery operations when new peers arrive or existing peers depart.
- We refer to the set of peers to which a BitTorrent client is currently sending data as its active set.
- The choking strategy is intended to provide positive incentives for contributing to the system and inhibit free-riding.
- Modulo TCP effects and assuming last-hop bottleneck links, each peer provides an equal share of its available upload capacity to peers to which it is actively sending data. We refer to this rate throughout the paper as a peer’s equal split rate. This rate is determined by the upload capacity of a particular peer and the size of its active set.
- There is no end-all definition on the size of the active set. Sometimes it's static, sometimes it's the square root of your upload capacity.
**** Measurement
- BitTorrent’s behavior depends on a large number of parameters: topology, bandwidth, block size, churn, data availability, number of directly connected peers, active TFT transfers, and number of optimistic unchokes.
*** Modelling altruism in BitTorrent
- Peers, other than the modified client, use the active set sizing recommended by the reference BitTorrent implementation. In practice, other BitTorrent implementations are more popular (see Table 1) and have different active set sizes. As we will show, aggressive active set sizes tend to decrease altruism, and the reference implementation uses the most aggressive strategy among the popular implementations we inspected.
- Active sets are comprised of peers with random draws from the overall upload capacity distribution. If churn is low, over time TFT may match peers with similar equal split rates, biasing active set draws. We argue in the next section that BitTorrent is slow to reach steady-state, particularly for high capacity peers.
- A bunch of other assumptions, allows them to model the altruism
**** Tit-for-tat (TFT) matching time
- By default, the reference BitTorrent client optimistically unchokes two peers every 30 seconds in an attempt to explore the local neighborhood for better reciprocation pairings
- These results suggest that TFT as implemented does not quickly find good matches for high capacity peers, even in the absence of churn.
- We consider a peer as being “content” with a matching once its equal split is matched or exceeded by a peer. However, one of the two peers in any matching that is not exact will be searching for alternates and switching when they are discovered, causing the other to renew its search.
- The long convergence time suggests a potential source of altruism: high capacity clients are forced to peer with those of low capacity while searching for better peers via optimistic unchokes.
**** Probability of reciprocation
- Reciprocation is defined as such: If a peer P sends enough data to a peer Q, causing Q to insert P into its active set for the next round, then Q reciprocates P.
- Reciprocation from Q to P is determined by two factors: the rate at which P sends data to Q and the rates at which other peers send data to Q.
- This can be computed via the raw upload capacity and the equal split rate
- Beyond a certain equal split rate (∼14 KB/s in Figure 3), reciprocation is essentially assured, suggesting that further contribution may be altruistic.
**** Expected download rate
- The sub-linear growth suggests significant unfairness in BitTorrent, particularly for high capacity peers. This unfairness improves performance for the majority of low capacity peers, suggesting that high capacity peers may be able to better allocate their upload capacity to improve their own performance
***** Expected upload rate
- Two factors can control the upload rate of a peer: data availability and capacity limit.
  1) When a peer is constrained by data availability, it does not have enough data of interest to its local neighborhood to saturate its capacity. In this case, the peer’s upload capacity is wasted and utilization suffers. Because of the dependence of upload utilization on data availability, it is crucial that a client downloads new data at a rate fast enough, so that the client can redistribute the downloaded data and saturate its upload capacity. We have found that indeed this is the case in the reference BitTorrent client because of the square root growth rate of its active set size.
  2) capacity limit is obvious
**** Modeling Altruism
- We first consider altruism to be simply the difference between expected upload rate and download rate.
  + This reflects the asymmetry of upload contribution and download rate (The graph essentially shows very high altruism for peers with upload rate above 100 KB/s)
- The second definition is any upload contribution that can be withdrawn without loss in download performance. + This suggests that all peers make altruistic contributions that could be eliminated. Sufficiently low bandwidth peers almost never earn reciprocation, while high capacity peers send much faster than the minimal rate required for reciprocation.
- Both of the effects from the second definition can be exploited. Note that low bandwidth peers, despite not being reciprocated, still receive data in aggregate faster than they send data. This is because they receive indiscriminate optimistic unchokes from other users
**** Validation
- Our modeling results suggest that at least part of the altruism in BitTorrent arises from the sub-linear growth of download throughput as a function of upload rate
- Note that equal split rate, the parameter of Figure 7, is a conservative lower bound on total upload capacity
- Essentially; not entirely wrong
*** Building BitTyrant: A strategic client
- The modeling results of Section 3 suggest that altruism in BitTorrent serves as a kind of progressive tax. As contribution increases, performance improves, but not in direct proportion.
- If performance for low capacity peers is disproportionately high, a strategic user can simply exploit this unfairness by masquerading as many low capacity clients to improve performance
- Also, by flooding the local neighborhood of high capacity peers, low capacity peers can inflate their chances of TFT reciprocation by dominating the active transfer set of a high capacity peer
- Both of the above mentioned attacks can be stopped, by simply refusing multiple connections from the same IP
- Rather than focus on a redesign at the protocol level, we focus on BitTorrent’s robustness to strategic behavior and find that strategizing can improve performance in isolation while promoting fairness at scale.
**** Maximizing reciprocation
- The modeling results of Section 3 and the operational behavior of BitTorrent clients suggest the following three strategies to improve performance.
  1) Maximize reciprocation bandwidth per connection: All things being equal, a node can improve its performance by finding peers that reciprocate with high bandwidth for a low offered rate, dependent only on the other peers of the high capacity node. The reciprocation bandwidth of a peer is dependent on its upload capacity and its active set size. By discovering which peers have large reciprocation bandwidth, a client can optimize for a higher reciprocation bandwidth per connection.
  2) Maximize number of reciprocating peers: A client can expand its active set to maximize the number of peers that reciprocate until the marginal benefit of an additional peer is outweighed by the cost of reduced reciprocation probability from other peers.
  3) Deviate from equal split: On a per-connection basis, a client can lower its upload contribution to a particular peer as long as that peer continues to reciprocate.
- The largest source of altruism in our model is unnecessary contribution to peers in a node’s active set. As such, the third option of being a dick, could work well.
- The reciprocation behavior points to a performance trade-off. If the active set size is large, equal split capacity is reduced, reducing reciprocation probability. However, an additional active set connection is an additional opportunity for reciprocation. To maximize performance, a peer should increase its active set size until an additional connection would cause a reduction in reciprocation across all connections sufficient to reduce overall download performance.
- Strategic high capacity peers can benefit a lot by manipulating their active set size, however, increasing reciprocation probability via active sert sizing is very sensitive and the throughput drops quickly, once the maximum has been reached.
- These challenges suggest that any a priori active set sizing function may not suffice to maximize download rate for strategic clients.
- Instead, they motivate the dynamic algorithm used in BitTyrant that adaptively modifies the size and membership of the active set and the upload bandwidth allocated to each peer
- BitTyrant differs from BitTorrent as it dynamically sizes its active set and varies the sending rate per connection. For each peer p, BitTyrant maintains estimates of the upload rate required for reciprocation, u_p , as well as the download throughput, d_p , received when p reciprocates. Peers are ranked by the ratio d_p /u_p and unchoked in order until the sum of u_p terms for unchoked peers exceeds the upload capacity of the BitTyrant peer.
- the best peers are those that reciprocate most for the least number of bytes contributed to them
**** Sizing local neighbourhood
- Bigger neighbourhood, allows for a bigger active set size. We want this, as graphs show that several hundreds might be ideal, but the BitTorrent is usually capped between 50 and 100.
- Bigger neighbourhood also allows for more optimistic unchokes
- A concern is increased protocol overhead
**** Additional cheating
- The reference BitTorrent client optimistically unchokes peers randomly. Azureus, on the other hand, makes a weighted random choice that takes into account the number of bytes ex- changed with a peer. If a peer has built up a deficit in the number of traded bytes, it is less likely to be picked for optimistic unchokes.
- This can be abused by simply disconnecting, thus wiping your history.
  + Can be stopped by logging IPs
- Early versions of BitTorrent clients used a seeding algorithm wherein seeds upload to peers that are the fastest downloaders, an algorithm that is prone to exploitation by fast peers or clients that falsify download rate by emitting ‘have’ messages.
- A client would prefer to unchoke those peers that have blocks that it needs. Thus, peers can appear to be more attractive by falsifying block announcements to increase the chances of being unchoked.
*** Evaluation
**** Single peer using
- These results demonstrate the significant, real world performance boost that users can realize by behaving strategically. The median performance gain for BitTyrant is a factor of 1.72 with 25% of downloads finishing at least twice as fast with BitTyrant.
- Because of the random set of peers that BitTorrent trackers return and the high skew of real world equal split capacities, BitTyrant cannot always improve performance.
- Another circumstance for which BitTyrant cannot significantly improve performance is a swarm whose aggregate performance is controlled by data availability rather than the upload capacity distribution.
- BitTyrant does not simply improve performance, it also provides more consistent performance across multiple trials. By dynamically sizing the active set and preferentially selecting peers to optimistically unchoke, BitTyrant avoids the randomization present in existing TFT implementations, which causes slow convergence for high capacity peers
- There is a point of diminishing returns for high capacity peers, and BitTyrant can discover it. For clients with high capacity, the number of peers and their available bandwidth distribution are significant factors in determining performance. Our modeling results from Section 4.1 suggest that the highest capacity peers may require several hundred available peers to fully maximize throughput due to reciprocation.
**** Multiple peers using
- In contrast, BitTyrant’s unchoking algorithm transitions naturally from single to multiple swarms. Rather than al- locate bandwidth among swarms, as existing clients do, BitTyrant allocates bandwidth among connections, optimizing aggregate download throughput over all connections for all swarms. This allows high capacity BitTyrant clients to effectively participate in more swarms simultaneously, lowering per-swarm performance for low capacity peers that cannot.
- It can also suck to use it:
  1) If high capacity peers participate in many swarms or otherwise limit altruism, total capacity per swarm decreases. This reduction in capacity lengthens download times for all users of a single swarm regardless of contribution. Although high capacity peers will see an increase in aggregate download rate across many swarms, low capacity peers that cannot successfully compete in multiple swarms simultaneously will see a large reduction in download rates.
  2) New users experience a lengthy bootstrapping period. To maximize throughput, BitTyrant unchokes peers that send fast. New users without data are bootstrapped by the excess capacity of the system only.
  3) Peering relationships are not stable. BitTyrant was designed to exploit the significant altruism that exists in BitTorrent swarms today. As such, it continually reduces send rates for peers that reciprocate, attempting to find the minimum rate required.
*** Conclusion
- although TFT discourages free riding, the bulk of BitTorrent’s performance has little to do with TFT. The dominant performance effect in practice is altruistic contribution on the part of a small minority of high capacity peers.
- More importantly, this altruism is not a consequence of TFT; selfish peers—even those with modest resources—can significantly reduce their contribution and yet improve their download performance.
* Security and Privacy
** S/Kademlia: A Praticable Approach Toweards Secure Key-Based Routing
*** Abstract
- Security is a common problem in completely decentralized peer-to-peer systems. Although several suggestions exist on how to create a secure key-based routing protocol, a practicable approach is still unattended.
- In this paper we introduce a secure key-based routing protocol based on Kademlia
*** Introduction
- A major problem of completely decentralized peer-to-peer systems are security issues.
- All widely deployed structured overlay networks used in the Internet today (i.e. BitTorrent, OverNet and eMule) are based on the Kademlia
*** Background
- common service which is provided by all structured peer-to-peer networks is the keybased routing layer (KBR)
- Every participating node in the overlay chooses a unique nodeId from the same id space and maintains a routing table with nodeIds and IP addresses of neighbors in the overlay topology.
- Every node is responsible for a particular range of the identifier space, usually for all keys close to its nodeId in the id space.
**** Kademlia
- Kademlia is a structured peer-to-peer system which has several advantages compared to protocols like Chord as a results of using a novel XOR metric for distance between points in the identifier space. Because XOR is a symmetric operation, Kademlia nodes receive lookup queries from the same nodes which are also in their local routing tables.
- In Kademlia every node chooses a random 160-bit nodeId and maintains a routing table consisting of up to 160 k-buckets.
*** Attacks on Kademlia
**** Attacks on the underlying network
- We assume, that the underlying network layer doesn’t provide any security properties to the overlay layer. Therefore an attacker could be able to overhear or modify arbitrary data packets. Furthermore we presume nodes can spoof IP addresses and there is no authentication of data packets in the underlay. Consequently, attacks on the underlay can lead to denial of service attacks on the overlay layer.
**** Attacks on overlay routing
***** Eclipse attack
- Tries to place adversarial nodes in the network in a way that one or morenodes are cut off from it.
- Can be prevented, first, if a node can not choose its nodeid freely and secondly, when it is hard to influence the other nodes routing table.
- Kademlia already does the latter, as nodes are only thrown out of buckets when they stop responding.
***** Sybil attack
- In completely decentralized systems there is no instance that controls the quantity of nodeIds an attacker can obtain. Thus an attacker can join the network with lots of nodeIds until he controls a fraction m of all nodes in the network.
- Can not be prevented, but only impeded. Force nodes to pay for authorization. In decentralised systems, this can only be done through system resources.
***** Churn attack
- If the attacker owns some nodes he may induce high churn in the network until the network stabilization fails. Since a Kademlia node is advised to keep long-living contacts in its routing table, this attack does not have a great impact on the Kademlia overlay topology.
***** Adversarial Routing
- Since a node is simply removed from a routing table when it neither responds with routing information nor routes any packet, the only way of influencing the networks’ routing is to return adversarial routing information. For example an adversarial node might just return other collaborating nodes which are closer to the queried key. This way an adversarial node routes a packet into its subnet of collaborators
- Can be prevented by using a lookup algorithm which considers multiple disjoint paths.
**** Other Attack
***** Denial of service
- A adversarial may try to suborn a victim to consume all its resources, i.e. memory, bandwidth, computational power.
***** Attacks on data storage
- Key-based routing protocols are commonly used as building blocks to realize a distributed hash table (DHT) for data storage. To make it more difficult for adversarial nodes to modify stored data items, the same data item is replicated on a number of neighboring nodes.
*** Design
**** Secure nodeid assignment
- It should be hard to generate a large number of nodeIds (to prevent sybil attack) and you shouldn't be able to choose the nodeid freely (to prevent eclipse attack).
- The nodeid should authenticate a node + Can be achieved by hasing ip + port or a public key + The first solution has a significant drawback because with dynamically allocated IP addresses the nodeId will change subsequently.
  + It is also not suitable to limit the number of generated nodeIds if you want to support networks with NAT in which several nodes appear to have the same public IP address. + Finally there is no way of ensuring integrity of exchanged messages with those kind of nodeIds.
  + This is why we advocate to use the hash over a public key to generate the nodeId. With this public key it is possible to sign messages exchanged by nodes.
- Due to computational overhead we differentiate between two signature types:
  1) Weak signature: The weak signature does not sign the whole message. It is limited to IP address, port and a timestamp. The timestamp specifies how long the signature is valid. This prevents replay attacks if dynamic IP addresses are used. Used in FIND_NODE and PING messages.
  2) Strong signature: The strong signature signs the full content of a message. This ensures integrity of the message and resilience against Man-in-the-Middle attacks. Replay attacks can be prevented with nonces inside the RPC messages.
- To impede sybil and eclipse attacks can be done by either using a crypto puzzle or a signature from a central certificate authority, so we need to combine the signature types above with one of the following:
  1) Supervised signature: If a signature’s public key additionally is signed by a trustworthy certificate authority, this signature is called supervised signature. This signature is needed to impede a Sybil attack in the network’s bootstrapping phase where only a few nodes exist in the network. Centralized as fuck and single point of failure.
  2) Crypto puzzle signature: In the absence of a trustworthy authority we need to impede the Eclipse and Sybil attack with a crypto puzzle. Might not completely stop either, but might as well make it as hard as possible for an adversary.
- Two puzzles are created:
  1) A static puzzle that impedes that the nodeId can be chosen freely: Generate key so that c_1 first bits of H(H(key)) = 0; NodeId = H(key) (so NodeId cannot be chosen freely)
  2) dynamic puzzle that ensures that it is complex to generate a huge amount of nodeIds.: Generate X so that c_2 first bits of H(key ⊕ X) = 0; increase c_2 over time to keep NodeId generation expensive
- verification is O(1) — creation is O(2^c_1 + 2^c_2)
**** Sibling Broadcast
- Siblings are nodes which are responsible for a certain key-value pair that needs to be stored in a DHT.
- In the case of Kademlia those key-value pairs are replicated over the k closest nodes (we remember: k is the bucket size).
- we want to consider this number of nodes independently from the bucket size k and introduce the number of siblings as a parameter s.
- A common security problem is the reliability of sibling information which arises when replicated information needs to be stored in the DHT which uses a majority decision to compensate for adversarial nodes.
- Since Kademlia’s original protocol converges to a list of siblings, it is complicated to analyze and prove the coherency of sibling information.
- For this reason we introduce a sibling list of size η · s per node, which ensures that each node knows at least s siblings to a ID within the nodes’ siblings range with high probability.
- thus, routing tables in S/Kademlia consist of the usual k-buckets and a sorted list of siblings of size η · s.
**** Routing table maintenance
- To secure routing table maintenance in S/Kademlia we categorize signaling messages to the following classes: Incoming signed RPC requests, responses or unsigned messages. Each of those messages contains the sender address. If the message is weakly or strong signed, this address can not be forged or associated with another nodeId
- We call the sender address valid if the message is signed and actively valid, if the sender address is valid and comes from a RPC response. Kademlia uses those sender addresses to maintain their routing tables.
- Actively valid sender addresses are immediately added to their corresponding bucket, when it is not full. Valid sender addresses are only added to a bucket if the nodeId prefix differs in an appropriate amount of bits + This is needed, since otherwise an attacker can easily generate nodeIds that share a prefix with the victims nodeid and flood his buckets, since buckets close to own nodeid are only sparsely filled.
- Sender addresses from unsigned messages will simply be ignored.
- If a message contains more information about other nodes, then each of them can be added by invoking a ping RPC on them. If a node already exists in the routing table it is moved at the tail of the bucket.
**** Lookup over disjoint paths
- The original Kademlia lookup iteratively queries α nodes with a FIND NODE RPC for the closest k nodes to the destination key. α is a system-wide redundancy parameter
- In each step the returned nodes from previous RPCs are merged into a sorted list from which the next α nodes are picked. A major drawback of this approach is, that the lookup fails as soon as a single adversarial node is queried.
- We extended this algorithm to use d disjoint paths and thus increase the lookup success ratio in a network with adversarial nodes. The initiator starts a lookup by taking the k closest nodes to the destination key from his local routing table and distributes them into d independent lookup buckets. From there on the node continues with d parallel lookups similar to the traditional Kademlia lookup.
  + Each peer is queried only once, to keep the paths from being disjoint.
- By using sibling list, lookup doesn't converge at a single node, but terminates on d close-by neighbours, which all know the complete s siblings for the destionation key. So this should still succeed even if k-1 of the neighbors are evil.
*** Evaluations and results
- The figure clearly shows that by increasing the number of parallel disjoint paths d the fraction of successful lookups can be considerably improved. In this case the communication overhead increases linearly with d. We also see that with k = 16 there is enough redundancy in the k-buckets to actually create d disjoint paths.
- In the second setup we adapted k = 2 · d to the number of disjoint paths to keep a minimum of redundancy in the routing tables and consequently reduce communication overhead. The results in figure 5 show, that a smaller k leads to a smaller fraction of successful lookups compared to figure 4. The reason for this is the increased average path length due to the smaller routing table as shown in the path length distribution diagram
- Larger values for k, than 8.. 16, would also increase the probability that a large fraction of buckets are not full for a long time. This unnecessarily makes the routing table more vulnerable to Eclipse attacks.
*** Related work
- They state that an important step to defend these attacks is detection by defining verifiable system invariants. For example nodes can detect incorrect lookup routing by verifying that the lookup gets “closer” to the destination key. + This could be done by Pastry, as it has GPS or location information
  + Kademlia as well, as distance can be calculated.
- To prevent Sybil: In [13] Rowaihy et al. present an admission control system for structured peer-to- peer systems. The systems constructs a tree-like hierarchy of cooperative admission control nodes, from which a joining node has to gain admission. Another approach [7] to limit Sybil attacks is to store the IP addresses of participating nodes in a secure DHT. In this way the number of nodeIds per IP address can be limited by querying the DHT if a new node wants to join.
*** Conclusion
- We propose several practicable solutions to make Kademlia more resilient. First we suggest to limit free nodeId generation by using crypto puzzles in combination with public key cryptography. Furthermore we extend the Kademlia routing table by a sibling list. This reduces the complexity of the bucket splitting algorithm and allows a DHT to store data in a safe replicated way. Finally we propose a lookup algorithm which uses multiple disjoint paths to increase the lookup success ratio. The evaluation of S/Kademlia in the simulation frame- work OverSim has shown, that even with 20% of adversarial nodes still 99% of all lookups are successful if disjoint paths are used. We believe that the proposed extensions to the Kademlia protocol are practical and could be used to easily secure existing Kademlia networks.
** Protecting Free Expression Online with Freenet
-  Freenet uses a decentralized P2P architecture to create an uncensorable and secure global information storage system.
- The growth of censorship and ero- sion of privacy on the Internet increasingly threatens freedom of expression in the digital age. Personal information flows are becoming subject to pervasive monitoring and surveillance, and various state and corporate actors are trying to block access to controversial information and even destroy certain materials altogether.
- Freenet, is a distributed information storage system designed to address information privacy and survivability concerns
- In simulations of up to 200,000 nodes, Freenet has proved scalable and fault tolerant. It operates as a self-organizing P2P network that pools unused disk space across potentially hundreds of thousands of desktop computers to create a collaborative virtual file system.
- To increase network robustness and eliminate single points of failure, Freenet employs a completely decentralized architecture.
- Participants could operate maliciously or fail without warning at any time + Freenet implements strategies to protect data integrity and prevent privacy leaks in the former instance, and provide for graceful degradation and redundant data availability in the latter.
-  The system is also designed to adapt to usage patterns, automatically replicating and deleting files to make the most effective use of available storage in response to demand.
*** Design Motivation
- the prevention of censorship and the maintenance of privacy are both fundamental to free expression in a potentially hostile world.
-  Preserving the availability of controver- sial information is only half the problem; individuals can often be subject to adverse personal consequences for writing or reading such information and might need to conceal their activity in order to protect themselves.
- A common objection to mechanisms for secure communication is that criminals might use them to evade law enforcement.
- Freenet is not particularly attractive for such purposes, as it is designed to broadcast content to the world — not so useful for secret criminal plots.
- In designing Freenet, we focused on
  1) privacy for information producers, consumers, and holders;
  2) resistance to information censorship;
  3) high availability and reliability through decentralization; and
  4) efficient, scalable, and adaptive storage and routing.
-  Because disk space is finite, a tradeoff exists between publishing new documents and preserving old ones.
*** Freenet Architecture
- Freenet participants each run a node that provides the network some storage space. To add a new file, a user sends the network an insert message containing the file and its assigned location-independent globally unique identifier (GUID), which causes the file to be stored on some set of nodes.
- During a file’s lifetime, it might migrate to or be replicated on other nodes.
- To retrieve a file, a user sends out a request message containing the GUID key.
**** GUID Keys
- Freenet GUID keys are calculated using SHA-1 secure hashes. The network employs two main types of keys: content-hash keys, used for primary data storage, and signed-subspace keys, intended for higher-level human use.
***** Content-hash keys
- The content-hash key (CHK) is the low-level data-storage key and is generated by hashing the contents of the file to be stored.
- So every file has a unique identifier
- Unlike with URLs, you can be certain that a CHK reference will point to the exact file intended. CHKs also permit identical copies of a file inserted by different peo-ple to be automatically coalesced because every user will calculate the same key for the file.
***** Signed-subspace keys
- The signed-subspace key (SSK) sets up a personal namespace that anyone can read but only its owner can write to.
-  To add a file you first choose a short text description, such as politics/us/pentagon-papers.
- You would then calculate the file’s SSK by hashing the public half of the subspace key and the descriptive string independently before concatenating them and hashing again.
- Signing the file with the private half of the key provides an integrity check as every node that handles a signed-subspace file verifies its signature before accepting it.
- To retrieve a file from a subspace, you need only the subspace’s public key
- Adding or updating a file, on the other hand, requires the private key in order to generate a valid signature.
- Typically, SSKs are used to store indirect files containing pointers to CHKs rather than to store data files directly.
- These pointers make it easier for people to update their files and such
- you can use indirect files to create hierar- chical namespaces from directory files that point to other files and directories.
- SSKs can also be used to implement an alterna- tive domain name system for nodes that change address frequently. Each such node would have its own subspace, and you could contact it by looking up its public key — its address-resolution key — to retrieve the current address.
***** Messaging and Privacy
- Freenet was designed from the beginning under the assumption of hostile attack from both inside and out.
- Unfortunately, these considerations have had the side effect of hampering changes that might improve Freenet’s routing characteristics. To date, we have not discovered a way to guarantee better data locatability without compromising security.
- Privacy in Freenet is maintained using a varia- tion of Chaum’s mix-net scheme for anonymous communication.
-  Rather than move directly from sender to recipient, messages travel through node- to-node chains, where each link is individually encrypted, until the message finally reaches its recipient (kinda like TOR)
- Because each node in the chain knows only about its immediate neighbors, the end points could be anywhere among the network’s hundreds of thousands of nodes, which are continually exchanging indecipherable messages.
***** Routing
- Routing queries to data is the most important element of the Freenet system.
- Centralized is shit
- Randomly broadcasting is shit
- Freenet avoids both problems by using a steepest-ascent hill-climbing search: Each node forwards queries to the node that it thinks is closest to the target. You might start searching for Jordan by asking a friend who once played college basketball, for example, who might pass your request on to a former coach, who could pass it to a talent scout, who might pass it to Jordan’s agent, who could put you in touch with the man himself.
***** Requesting files
- Every node maintains a routing table that lists the addresses of other nodes and the GUID keys it thinks they hold.
- Otherwise, the node for-wards the request to the node in its table with the closest key to the one requested.
- If the request is successful, each node in the chain passes the file back upstream and creates a new entry in its routing table associating the data holder with the requested key.
- Nodes might also cache a copy
- To conceal the identity of the data holder, nodes will occasionally alter reply messages, setting the holder tags to point to themselves before passing them back up the chain. Later requests will still locate the data because the node retains the true data holder’s identity in its own routing table and forwards queries to the correct holder. Routing tables are never revealed to other nodes.
- To limit resource usage, the requester gives each query a time-to-live limit that is decremented at each node.
- If a node sends a query to a recipient that is already in the chain, the message is bounced back and the node tries to use the next-closest key instead. If a node runs out of candidates to try, it reports failure back to its predecessor in the chain, which then tries its second choice, and so on.
- With this approach, the request homes in closer with each hop until the key is found. A subsequent query for this key will tend to approach the first request’s path, and a locally cached copy can satisfy the query after the two paths converge.
- Nodes that reliably answer queries will be added to more routing tables, and hence, will be contacted more often than nodes that do not.
***** Inserting files
- An insert message follows the same path that a request for the same key would take, sets the routing table entries in the same way, and stores the file on the same nodes. Thus, new files are placed where queries would look for them.
- To insert a file, a user assigns it a GUID key and sends an insert message to the user’s own node containing the new key with a TTL value that rep- resents the number of copies to store.
- Inserts might fail if the CHK is already present at a node or the user has already inserted another file with the same description (for SSKs). In the latter case, the user should choose a different descrip- tion or perform an update rather than an insert.
-  If the TTL expires with- out collision, the final node returns an “all clear” message. The user then sends the data down the path established by the initial insert message.
- Each node along the path verifies the data against its GUID, stores it, and creates a routing table entry that lists the data holder as the final node in this chain.
**** Data Encryption
- For political or legal reasons, node operators might wish to remain ignorant of the contents of their data stores. To this end, we encourage pub- lishers to encrypt all data before insertion.
- Data encryption keys are not used in routing or included in network messages. Inserters distribute them directly to end users at the same time as the corresponding GUIDs.
- Thus, node operators can not read their own files, but users can decrypt them after retrieval.
**** Network Evolution
- The network evolves over time as new nodes join and existing nodes create new connections after handling queries. As more requests are handled, local knowledge about other nodes in the network improves, and routes adapt to become more accu- rate without requiring global directories.
***** Adding Nodes
- To join the network, a new node first generates a public-private key pair for itself. This pair serves to logically identify the node and is used to sign a physical address reference.
- Certification might be useful in the future for deciding whether to trust a new node, but for now Freenet uses no trust mechanism.
- Next, the node sends an announcement message including the public key and physical address to an existing node, located through some out-of-band means such as personal communication or lists of nodes posted on the Web
- the nodes in the chain collectively assign the new node a random GUID in the key- space using a cryptographic protocol for shared random number generation that prevents any par- ticipant from biasing the result. This assigns the new node responsibility for a region of keyspace that all agree on, while preventing evil people from influencing the assignment, as all have to agree.
***** Training Routes
- As more requests are processed, the network’s routing should become better trained. Nodes’ routing tables should specialize in handling clusters of similar keys because each node will mostly receive requests for keys that are similar to the keys it is associated with in other nodes’ routing tables.
- When those requests succeed, the node learns about previously unknown nodes that can supply such keys and creates new routing entries for them.
- Taken together, the twin effects of clustering in routing tables and data stores should improve the effectiveness of future queries in a self-rein- forcing cycle.
***** Key Clustering
- Because GUID keys are derived from hashes, the closeness of keys in a data store is unrelated to the corresponding files’ contents.
- This lack of semantic closeness is unimportant, however, because the routing algorithm is based on the locations of particular keys, rather than particular topics.
- In fact, hashes are useful because they ensure that similar works will be scattered throughout the network, lessen- ing the chances that a single node’s failure will make an entire category of files unavailable
**** Searching
- Not solved yet
- Freenet can be spidered, or individuals can publish lists of bookmarks. However, these approaches are not entirely satisfactory in terms of Freenet’s design goals.
- One simple approach for a true Freenet search would be to create a special public subspace for indirect keyword files. When authors insert files, they could also insert several indirect files corre- sponding to search keywords for the original file.
- The system would allow multiple keyword files with the same key to coexist (unlike with normal files), and requests for such keys could return mul- tiple matches.
**** Managing Storage
- To encourage participation, Freenet does not require payment for inserts or impose restrictions on the amount of data that publishers can insert.
- Given finite disk space, however, the system must sometimes decide which files to keep. It currently prioritizes space allocation by popularity, as measured by the frequency of requests per file. Each node orders the files in its data store by time of last request, and when a new file arrives that cannot fit in the space available, the node deletes the least recently requested files until there is room.
- Because routing table entries are smaller, they can be kept around longer than files. Evicted files don’t necessarily disappear right away because the node can respond to a later request for the file using its routing table to contact the original data holder, which might be able to supply another copy.
- Why would the original holder be more like- ly to have the file? Freenet’s data holder pointers have a treelike structure. Nodes at the leaves might see only a few local requests for a file, but those higher up the tree receive requests from a larger part of the network, which makes their copies more popular.
- File distribution is therefore determined by two competing forces: tree growth and pruning.
- The query-routing mechanism automatically cre- ates more copies in an area of the network where a file is requested, and the tree grows in that direction.
**** Performance Analysis
- Freenet demonstrates good scal- ability and fault-tolerance characteristics that can be explained in terms of a small-world network model.5 Small-world networks are characterized by a power-law distribution
- In such a distribution, the majority of nodes has relatively few local connections to other nodes, but a significant small number of nodes have large wide-ranging sets of connections.
- This is not surprising, as power-law distributions tend to arise naturally when networks grow by pref- erential attachment
- The new-node announcement protocol initially creates a preferential attachment effect because following random links gives a higher probability of arriving at nodes that have more links.
- During normal oper- ation, the effect continues because well-known nodes tend to see more requests and become even better connected (“the rich get richer”).
***** Path length
- By extrapolation, it appears that Freenet should be capable of scaling to one million nodes with a median path length of just 30.
***** Fault Tolerance
- the network is surprisingly robust against quite large failures.
- The power-law distribution gives small-world networks a high degree of fault tolerance6 because random failures are most likely to eliminate nodes from the poorly connected majority.
- A small-world network falls apart much more quickly, however, if the well-connected nodes are targeted first.
** Tarzan: A P2P Anonymizing Network Layer
*** Abstract
- Tarzan is a peer-to-peer anonymous IP network overlay. Because it provides IP service, Tarzan is general-purpose and transparent to applications. Organized as a decentralized peer-to-peer overlay, Tarzan is fault-tolerant, highly scalable, and easy to manage.
- Tarzan achieves its anonymity with layered encryption and multihop routing, much like a Chaumian mix. A message initiator chooses a path of peers pseudo-randomly through a restricted topology in a way that adversaries cannot easily influence. Cover traffic prevents a global observer from using traffic analysis to identify an initiator. Protocols toward unbiased peer-selection offer new directions for distributing trust among untrusted entities. Tarzan provides anonymity to either clients or servers, without requiring that both participate. In both cases, Tarzan uses a network address translator (NAT) to bridge between Tarzan hosts and oblivious Internet hosts. Measurements show that Tarzan imposes minimal overhead over a corresponding non-anonymous overlay route.
*** Introduction
- The ultimate goal of Internet anonymization is to allow a host to communicate with an arbitrary server in such a manner that nobody can determine the host’s identity
- Different entities may be interested in exposing the host’s identity, each with varying capabilities to do so:
  1) curious individuals or groups may run their own participating machines to snoop on traffic
  2) parties skirting legality may break into a limited number of others’ machine
  3) large, powerful organizations may tap and monitor Internet backbones.
- Tarzan, a practical system aimed at realizing anonymity against all three flavors of adversary
- Less ambitious approaches, which do not work:
  1) In the simplest alternative, a host sends messages to a server through a proxy, such as Anonymizer.com [1]. This system fails if the proxy reveals a user’s identity [18] or if an adversary can observe the proxy’s traffic. Furthermore, servers can easily block these centralized proxies and adversaries can prevent usage with denial-of-service attacks.
  2) A host can instead connect from a set of mix relays, like onion routing. However, if a corrupt relay receives traffic from a non-core node, the relay can identify this node at the origin of the traffic. Colluding entry and exit relays, can use timing analysis to determine source and destination. External adversaries can do this as well.
- Few of these systems attempt to provide anonymity against an adversary that can passively observe all network traffic. Such protection requires fixing traffic patterns or using cover traffic to make such traffic analysis more difficult
- Some protect only the core of the static mix network and thus allow traffic analysis on its edges. Some simulate full synchrony and thus trivial DoS attacks halt their operation in entirety [7]. And some require central control and knowledge of the entire network
- Tarzan extends known mix-net designs to a peer-to-peer
environment.
- Tarzan nodes communicate over sequences of mix relays chosen from an open-ended pool of volunteer nodes, without any centralized component
- All peers are potential originators of traffic; all peers are potential relays
- we leverage our new concept of a domain to remove potential adversarial bias: An adversary may run hundreds of virtual machines, yet is unlikely to control hundreds of different IP subnets.
- Packets can be routed only between mimics, or pairs of nodes assigned by the system in a secure and universally-verifiable manner.
  1) This technique is practical in that it does not require network synchrony
  2) Consumes only a small factor more bandwidth than the data traffic to be hidden
  3) It is powerful as it shields all network participants, not only core routers
- Tarzan allows client applications on participating hosts to talk to non-participating Internet servers through special IP tunnels. The two ends of a tunnel are a Tarzan node running a client application and a Tarzan node running a network address translator; the latter forwards the client’s traffic to its ultimate Internet destination. Tarzan is transparent to both client applications and servers, though it must be installed and configured on participating nodes.
- Tarzan supports a systems-engineering position: anonymity can be built-in at the transport layer, transparent to most systems, trivial to incorporate, and with a tolerable loss of efficiency compared to its non-anonymous counterpart.
*** Design Goals and Network Model
- A node is an Internet host’s virtual identity in the system, created by running an instantiation of the Tarzan software on a single IP address.
- A tunnel is a virtual circuit for communication spread across an ordered sequence of nodes.
- A relay is a node acting as a packet forwarder as part of a tunnel. (Also known as a router in other protocols)
- Goals of Tarzan:
  1) Application independence: Tarzan should be transparent to existing applications and allow users to interact with existing services. To achieve this, Tarzan should provide the abstraction of an IP tunnel.
  2) Anonymity against malicious nodes: Tarzan should provide sender or recipient anonymity against colluding nodes. We consider these properties in terms of an anonymity set: the set of possible senders of a message. The larger this set, the “more” anonymous an initiator remains.
  3) Fault-tolerance and availability: Tarzan should resist an adversary’s attempts to overload the entire system or to block system entry or exit points.
  4) Performance: Tarzan should maximize the performance of tunnel transmission, subject to our anonymity requirements, to make Tarzan a viable IP-level communication channel.
  5) Anonymity against a global eavesdropper: An adversary observing the entire network should be unable to determine which Tarzan relay initiates a particular message.
- Since anyone can join Tarzan, it'll likely get targetted by malicious users
- A node is malicious if it modifies, drops, or records packets, analyzes traffic patterns, returns incorrect network information, or otherwise does not properly follow the protocols.
- From a naive viewpoint, the fraction of Tarzan nodes that are malicious determines the probability that a tunnel relay is malicious. Yet, a single compromised computer may operate on multiple IP addresses and thus present multiple Tarzan identities.
- To defend against such a situation, we make the observation that a single machine likely controls only a contiguous range of IP addresses
  + typically by promiscuously receiving packets addressed to any IP address on a particular LAN or by acting as a gateway router.
- This observation is useful in bounding the damage each malicious node can cause. We will call this subnet controllable by a single malicious machine a domain
- A node belongs to a /d domain if the node’s d-bit IP prefix matches that of the domain. + A malicious node owns all address space behind it
- Domains capture some notion of fault-independence: While an adversary can certainly subvert nodes within the same domain in a dependent fashion, nodes in different domains may fail independently.
- when selecting relays, Tarzan should consider the notion of distinct domains, not that of distinct nodes.
- Tarzan chooses some fixed IP prefix size as its granularity for counting domains: first among /16 subnet masks, then among /24 masks.
*** Architecture and Design
- Typical use proceeds in three stages. First, a node running an application that desires anonymity selects a set of nodes to form a path through the overlay network. Next, this source-routing node establishes a tunnel using these nodes, which includes the distribution of session keys. Finally, it routes data packets through this tunnel.
- The exit point of the tunnel is a NAT. This NAT forwards the anonymized packets to servers that are not aware of Tarzan, and it receives the response packets from these servers and reroutes the packets over this tunnel.
- Tarzan restricts route selection to pairs of nodes that use  traffic to maintain traffic levels independent of data rates.
**** Packet Relay
- A Tarzan tunnel passes two distinct types of messages between nodes: data packets, to be relayed through existing tunnels, and control packets, containing commands and responses that establish and maintain these virtual circuits.
- A flow tag (similar to MPLS [23]) uniquely identifies each link of each tunnel. A relay rapidly determines how to route a packet tag. Symmetric encryption hides data, and a MAC protects its integrity, on a per-relay basis. Separate keys are used in each direction of each relay.
- In the forward path, the tunnel initiator clears each IP packet’s source address field, performs a nested encoding for each tunnel relay, and encapsulates the result in a UDP packet.
**** Tunnel setup
- When forming a tunnel, a Tarzan node pseudo-randomly selects a series of nodes from the network based on its local topology
- An establish request sent to node h_i is relayed as a normal data packet from h_1 through h_i−1.
**** IP Packet Forwarding
- Tarzan provides a client IP forwarder and a server-side pseudonymous network address translator (PNAT) to create a generic anonymizing IP tunnel.
- The client forwarder replaces its real address in the packets with a random address assigned by the PNAT from the reserved private address space
- The PNAT translates this private address to one of its real addresses.
- The pseudonymous NAT also offers port forwarding to allow ordinary Internet hosts to connect through Tarzan tunnels to anonymous servers.
**** Tunnel failure and reconstruction
- A tunnel fails if one of its relays stops forwarding packets. To detect failure, the initiator regularly sends ping messages to the PNAT through the tunnel and waits for acknowledgments.
**** Peer Discovery
- A Tarzan node requires some means to learn about all other nodes in the network, knowing initially only a few other nodes. Anything less than near-complete network information allows an adversary to bias the distribution of a node’s neighbor set towards malicious peers, leaks information through combinatorial profiling attacks, and results in inconsistencies during relay selection
- Tarzan uses a simple gossip-based protocol for peer discovery.
- This problem can be modeled as a directed graph: vertices represent Tarzan nodes; edges correspond to the relation that node a knows about, and thus can communicate with, node b. Edges are added to the graph as nodes discover other peers.
- Our technique to grow this network graph is similar to the NameDropper resource discovery protocol [16]. In each round of NameDropper, node a simply contacts one neighbor at random and transfers its entire neighbor set.
**** Peer Selection
- If peers are selected at random, we may hit malicious node, as the addresses of these are rarely scattered uniformly through the IP address space. Instead, they are often located in the same IP prefix space. Thus, we choose among distinct IP prefixes, not among all known IP addresses.
- Tarzan uses a three-level hierarchy: first among all known /16 subnets, then among /24 subnets belonging to this 16-bit address space, then among the relevant IP addresses.
**** Cover traffic and link encoding
- If the pattern of inter-node Tarzan traffic varied with usage, a wide-spread eavesdropper could analyze the patterns to link messages to their initiators. Prior work has suggested the use of cover traffic to provide more time-invariant traffic patterns independent of bandwidth demands
- Our key contributions include introducing the concept of a traffic mimic. We propose traffic invariants between a node and its mimics that protect against information leakage. These invariants require some use of cover traffic and yield an anonymity set exponential in path length.
***** Selecting Mimics
- Upon joining the network, node a asks k other nodes to exchange mimic traffic with it. Similarly, an expected k nodes select a as they look for their own mimics. Thus, each node has κ mimic
- Mimics are assigned verifiably at random from the set of nodes in the network.
- A node establishes a bidirectional, time-invariant packet stream with a mimic node, into which real data can be inserted, indistinguishable from the cover traffic.
***** Tunneling through mimics
- We constrain a tunnel initiator’s choice of relays at each hop to those mimics of the previous hop, instead of allowing it to choose any random node in the network. Therefore, nodes only construct tunnels over links protected by cover traffic
***** Unifying traffic patterns
- The packet headers, sizes, and rates of a node’s incoming traffic from its mimics must be identical to its outgoing traffic, so that an eavesdropper cannot conclude that the node originated a message.
* Cloud Computing
** Fog Computing and Its Role in Internet of Things
*** Abstract
- Fog Computing extends the Cloud Computing paradigm to the edge of the network, thus enabling a new breed of applications and services.
- Defining characteristics of the Fog are:
  1) Low latency and location awareness;
  2) Wide-spread geographical distribution;
  3) Mobility;
  4) Very large number of nodes,
  5) Predominant role of wireless access,
  6) Strong presence of streaming and real time applications
  7) Heterogeneity.
-The Fog is the appropriate platform for a number of critical Internet of Things (IoT) services and applications, namely, Connected Vehicle, Smart Grid , SmartCities, and, in general, Wireless Sensors and Actuators Networks (WSANs).
*** Introduction
- The “pay-as-you-go” Cloud Computing model is an efficient alternative to owning and managing private data centers (DCs)
- Several factors contribute to the economy of scale of mega DCs: higher predictability of massive aggregation, which allows higher utilization without degrading performance; convenient location that takes advantage of inexpensive power; and lower OPEX (operation expensens) achieved through the deployment of homogeneous compute, storage, and networking components.
- This bliss becomes a problem for latency-sensitive application
- An emerging wave of Internet deployments, most notably the Internet of Things (IoTs), requires mobility support and geo-distribution in addition to location awareness and low latency.
- A new platform is needed to meet requirements of IoT
- Fog Computing is the answer.
- The fog is a cloud close to the ground
- Cloud and Fod computing can interplay when it comes to data management and analytics, rather than fog computing eating cloud computing
*** The Fog Computing Platform
**** Characterization of Fog Computing
***** How fog differs from cloud
- Fog Computing is a highly virtualized platform that provides compute, storage, and networking services between end devices and traditional Cloud Computing Data Centers
- Entirely wireless
- Typically located at the edge of the network
- Edge Location, Location awareness, and low latency. The origins of the Fog comes from the need to support endpoints with rich services at the edge of the network, for applications with low latency requirements.
- Geographical distribution. Services and applications targeted by the Fog demand widely distributed deployments. Like high quality streaming to moving vehicles through proxies and APs along highways and tracks.
- Large-Scale sensor networks to monitor the environment will also require distributed computing and storage resources
- Very large number of nodes, as a consequence of the wide geo-distribution, as evidenced in sensor networks in general, and the Smart Grid in particular.
- Support for mobility. It is essential for many Fog applications to communicate directly with mobile devices, and therefore support mobility techniques
- Real-time interactions. Important Fog applications involve real-time interactions rather than batch processing. (Cloud is really nice for batch processing)
- Predominance of wireless access.
- Heterogeneity. Fog nodes come in different form factors and will be deployed in all sorts of environments
- Interoperability and federation. Seamless support of certain services (streaming is a good example) requires the cooperation of different providers.
- Support for on-line analytic and interplay with the Cloud. The Fog is positioned to play a significant role in the ingestion and processing of the data close to the source.
**** Fog Players: Providers and Users
- Don't know how the different Fog Computing players will align. Thus these are anticipations
- Subscriber models
- More people will enter the competition.
*** Fog Computing and the IoT
**** Connected Vehicle (CV)
- The Fog has a number of attributes that make it the ideal platform to deliver a rich menu of SCV services in infotainment, safety, traffic support, and analytics: geo-distribution (throughout cities and along roads), mobility and location awareness, low latency, heterogeneity, and support for real-time interactions.
**** Wireless Sensors and Actuators Networks
- The original Wireless Sensor Nodes (WSNs), nicknamed motes, were designed to operate at extremely low power to extend battery life or even to make energy harvesting feasible. Most of these WSNs involve a large number of low bandwidth, low energy, low processing power, small memory motes, operating as sources of a sink (collector), in a unidirectional fashion
- The characteristics of the Fog (proximity and location awareness, geo-distribution, hierarchical organization) make it the suitable platform to support both energy-constrained WSNs and WSANs.
  + WSAN is a wireless sensor and actuator network. The issue is that the system is no longer entirely uni-directional, as signals need to be send to the actuators from controllers. + Current solutions consist of a WSN and a MANET.
*** Analytics and the interplay between the fod and the cloud
- While Fog nodes provide localization, therefore enabling low latency and context awareness, the Cloud provides global centralization. Many applications require both Fog localization, and Cloud globalization, particularly for analytics and Big Data.
- Fog collectors at the edge ingest the data generated by grid sensors and devices. Some of this data relates to protection and control loops that require real-time processing (from milliseconds to sub seconds).
- This first tier of the Fog, designed for machine-to-machine (M2M) interaction, collects, process the data, and issues control commands to the actuators. It also filters the data to be consumed locally, and sends the rest to the higher tiers
- The second and third tier deal with visualization and reporting, human-to-human interaction.

** EdgeIoT: A Mobile Edge Computing for the Internet of Things
*** Abstract
- In order to overcome the scalability problem of the traditional Internet of Things architecture (i.e., data streams generated from distributed IoT devices are transmitted to the remote cloud via the Internet for further analysis), this article proposes a novel approach to mobile edge computing for the IoT architecture, edgeIoT, to handle the data streams at the mobile edge
*** Introduction
- Although IoT can potentially benefit all of society, many technical issues remain to be addressed.
  1) First, the data streams generated by the IoT devices are high in volume and at fast velocity (the European Commission has predicted that there will be up to 100 billion smart devices connected to internet in 2020
  2) The data generated by IoT is send to the cloud for processing, via the internet. However, the internet is not scalable and efficient enough to handle this data. Additionally, it consumes a lot of bandwidth, energy and time to send all of the data to the cloud
  3) since the IoT big data streams are transmitted to the cloud in high volume and at fast velocity, it is necessary to design an efficient data processing architecture to explore the valuable information in real time
  4) user privacy remains a challenging unsolved issue; that is, in order to obtain services and benefits, users should share their sensed data with IoT service providers, and these sensed data may contain users’ personal information.
- we propose an efficient and flexible IoT architecture, edgeIoT, by leveraging fog computing and software defined networking (SDN) to collect, classify, and analyze the IoT data streams at the mobile edge
- Bringing computing resources close to IoT devices minimise traffic in core network and minimise end-to-end delay between computing resources and IoT devices.
- A hierarchical fog computing architecture to provide flexible and scalable resource provisioning for users of IoT
- A proxy virtual machine migration scheme to minimise traffic in core network
*** Mobile Edge Computing for IoTs
- Fog computing, which is defined as a distributed computing infrastructure containing a bunch of high-performance physical machines (PMs) that are well connected with each other
- deploying a number of fog nodes in the network can locally collect, classify, and analyze the raw IoT data stream
- where to deploy the fog nodes to facilitate the communications between IoT devices and fog nodes is still an open issue
- It is difficult to optimize the deployment of fog nodes due to the mobility and heterogeneity features of the IoT devices.
  + Phones and wearables move around and some Things have more energy and speed for transmitting data. As such, different protocols and such need to be supported.
**** Multi-Interface Base Stations in Cellular Network
- Existing Base Stations may be used, as they provide high coverage and they are distributed to potentially connect all IoT devices.
- They can be equipped with multiple wireless interfaces, to support the different data transmission requirements.
- Therefore, a potential deployment is to connect each BS to a fog node to process the aggregated raw data streams.
**** The EdgeIoT Architecture
- Fog nodes can either be directly connected to a base station for minimum end-to-end (E2E) delay when transmitting local data streams
- Can also be deployed at the cellular core network, such that different BSs can share the same node
- Doesn't use traditional cellular core network, as this is slow, inflexible and unscalable
- Introduces SDN-based cellular core (Software-defined networking).  + SDN is meant to address the fact that the static architecture of traditional networks is decentralized and complex while current networks require more flexibility and easy troubleshooting.
- Uses something called OpenFlow controllers and such, as this can be use dto separate out all control functions from the data forwarding function
- Essentially. SDN and OpenFlow adds more flexibility to control the network traffic.
- Fog nodes can offload work to the cloud as the expense of bandwidth
- IoT applications can also be deployed in the fog nodes, to offer services to users. This obviously allows for a more latency free experience, for the user.
**** Hierarchical Fog Computing Architecture
- Most user generated information contain personal information
- Analysis of this data can benefit both user but also society (analysing images can lead to recognition of criminals and such)
- User privacy need to be kept
- Each user is associated with a proxy VM, which is the users private VM and it is situated at a nearby fog node
  + Provides flexible storage and computing resources
- IoT devices of the user are registrered to this VM, which collects the raw data. This VM can process the data and ensure privacy.
- The Proxy VMs can be dynamic, if the user moves, the VM relocates to a new fog node.
- Can also be static or split into two where one is static and the other is dynamic (if there is need for one for the devices at home) + Can of course also become one again, when the user gets home.
- There also exists application VMs for the applications in the Fog Node. There are the ones who need the data from the Proxy VMs
- The application VMs can also be either local, remove or add-on mode
***** Local application VM deployment
- refers to the deployment of an application VM in the fog node to analyze the metadata generated by the local proxy VMs
- Parknet example: each local proxy VM collects the sensed data streams from its smart cars (note that each smart car is equipped with a GPS receiver and a passenger-side-facing ultrasonic range finder to generate the location and parking spot occupancy information) and generates the metadata, which identify the available parking spots, to the application VM.
***** Remote application VM deployment
- refers to the deployment of an application VM in the remote cloud to analyze the metadata generated by the proxy VMs from different fog nodes.
- If the application needs data from several fog nodes, i.e. a larger area.
- Perhaps to do traffic analysis and find hot spots where people shouldn't drive.
***** Add-on application VM deployment
- Event-triggered application VM deployment
- If an even triggers, several application VMs can be deployed, such as to find missing children, the VMs can be deployed on to many fog nodes quickly.
*** Challenges in Implementing EdgeIoT
**** Identifcations between IoT Devices and their proxy VMs
- Each device needs to know the ID of the proxy VM, but the Proxy VM also needs to know the IDs of the devices.
- This can be consume a lot of bandwidth, due to the VM composition and decomposition (i.e. combining and splitting).
- Even if it doesn't happen often, the VM needs to inform each device when this happens.
**** Proxy VM mobility management
- When a user moves to a new BS, it should report its new location to the mobility management entity (MME), which handles the VM migration, compositions and decompositions.
- The MME resides in the OpenFlow control layer
- Adopting existing mobility management in the LTE network (3G and such) is one solution, however requires all devices to have a SIM card for ID.
  + The location update protocol in LTE is also expensive.
- one alternative is to establish a local cluster network (e.g., a body area network) consisting of mobile IoT devices. The user’s mobile phone or other wearable device acts as a cluster head, which can be considered as a gateway for the reporting of locations and data aggregation.
**** Migrating VMs
- It is necessary to estimate the profit for migrating the proxy VM among the fog nodes whenever the user’s mobile IoT devices roam to a new BS. + Required since simply moving whenever a new is entered can be expensive, if we won't be there for long or might not even have fully left the old one.
*** Conclusion
- This article proposes a new architecture, edgeIoT, in order to efficiently handle the raw data streams generated from the massive distributed IoT devices at the mobile edge. The proposed edgeIoT architecture can substantially reduce the traffic load in the core network and the E2E delay between IoT devices and computing resources compared to the traditional IoT architecture, and thus facilitate IoT services provisioning. Moreover, this article has raised three challenges in implementing the proposed edgeIoT architecture and has provided potential solutions.
** On the Integration of Cloud Computing and IoT
*** Introduction and Motivation
- Cloud computing has virtually unlimited capabilities in terms of storage and processing power, is a much more mature technology, and has most of the IoT issues at least partially solved
- CloudIoT is the merger of Cloud and IoT technologies.
*** CLOUD AND IOT: THE NEED FOR THEIR INTEGRATION
- IoT can benefit from the virtually unlimited capabilities and resources of Cloud to compensate its technological constraints
- On the other hand, the Cloud can benefit from IoT by extending its scope to deal with real world things in a more distributed and dynamic manner, and for delivering new services in a large number of real life scenarios.
- Essentially, the Cloud acts as intermediate layer between the things and the applications, where it hides all the complexity and the functionalities necessary to implement the latter.
**** Storage resources
- IoT involves a lot of unstructured or smei-structured data, similar to Big Data
  + It's a huge amount + Different data types
  + Data generation frequency is high
- Cloud is most convenient and cost effective solution, to deal with this data
- Data can be provided in a homogenous way in the cloud and can be secured
**** Computation Resources
- IoT devices have quite limited processing power, so they can't usually do anything on-board.
- Data collected is usually transmitted to more powerful nodes where aggregation and processing is possible + Not scalable
- Cloud offers almost unlimited processing power and can as such be utilised for predictions, computatations and all sorts of stuff
**** Communication resources
- Clouds can act as gateways sort of, since they can offer an effective and cheap solution to connect, track and manage any Thing from anywhere, using customized portals and built-in apps.
**** New capabilities
- IoT is characterized by a very high heterogeneity of devices, technologies, and protocols. Therefore, scalability, interoperability, reliability, efficiency, availability, and security can be very difficult to obtain.
**** New Paradigms
- Sensing, actuation, database, data, ethernet, video surveillance as a service and some more.
*** Applications
**** Healthcare
- IoT and multimedia technologies have made their entrance in the healthcare field thanks to ambient-assisted living and telemedicine
**** Smart City
- IoT can provide a common middleware for future-oriented Smart-City services, acquiring information from different heterogeneous sensing infrastructures, accessing all kinds of geo-location and IoT technologies (e.g., 3D representations through RFID sensors and geo-tagging), and exposing information in a uniform way.
**** Video Surveillance
- Intelligent video surveillance has become a tool of the greatest importance for several security-related applications. As an alternative to in-house, self-contained management systems, complex video analytics require Cloud-based solutions, VSaaS, to properly satisfy the requirements of storage (e.g., stored media is centrally secured, fault-tolerant, on-demand, scalable, and accessible at high-speed) and processing (e.g., video processing, computer vision algorithms and pattern recognition modules to extract knowledge from scenes).
* Pastry and its applications
** PAST
- Large-scale peer-to-peer persistent storage utility
- Based on self-organizing, internet-based overlay network of storage nodes that can coop route fil equeries, store multiple replicas of files and cache additional copies of popular files.
- Storage nodes and files are uniformly distributed IDs
- Files are stored at node whose ID matches the best
- Statistical assignment balances number of files stored
- Non-uniform storage node capacities require explicit load balancing
- Same goes for non-uniform file requests, i.e. popular files. They require caching.
- Peer-to-peer Internet applications have recently been popularized through file sharing applications such as Napster, Gnutella and FreeNet
- P2P systems are interesting because of decentralised protocol, self-organization, adaption and scalability.
- P2P systems can be characterized as distributed systems where all nodes are equal in capabilities and responsibilities.
- PAST had nodes connected to the internet, where each node can iniate and route client requests to insert or retrieve files.
- Nodes can contribute storage to system
- With high probability, a file is replicated on nodes which are diverse in geographic location, ownership, administration, network connectivity, rule of law, et.c.
- PAST is attractive since; it exploits multitude and diversity (geographi, ownership, the same as before ..)
- Offers persistent storage service via a quasi-unique fileId which is generated when the file is inserted.
  + This makes files stored immutable, since a file can't be inserted with the same fileId
- Owner can share fileId to share file and a decryption key if needed
- Pastry ensures that client requests are reliably routed to the appropriate nodes.
- Clients requesting to retrieve a file are routed, most likely, to a node that is "close in the network", to the requesting client.
- As the system is based on pastry, the nodes traversed and the amount of messages sent, is logarithmic in size, given normal operation.
- PAST does not provide searching, directory lookup or key distribution.
- Any host on the internet can be a PAST node, by installing PAST
- A PAST node is minimially an access point for the user, can also contribute storage and participate in routing of requests
- Clients have a few operations:
  1) fileId = Insert(name, owner-cred, k, file). fileId is the secure hash of the file's name, the owner's public key and some salt. k is the replication factor.
  2) file = Lookup(fileId). If fileId exists in PAST and one of k replication nodes can be reached, the file is returned.
  3) Reclaim(fileId, owner-cred). Reclaim storage of k copies of the file, identified by fileId. If the operation completes, PAST doesn't guarantee the file can be found anylonger. Reclaim doesn't guarantee the file is deleted, but the space the file took, can be overwritten.
- A PAST node has a 128-bit ID, nodeId. This indicates the position in the circular namespace of Pastry, which ranges from 0 to 2^128 - 1. It's the hash of the nodes public key.
- No correlation between nodeId and information on the whereabouts and such of the node. (anonymity)
- As such, adjacent nodeIds are likely not close to each other geographically, so they make well for replication.
- Insert: PAST stores the file on the k numerically closest nodeIds to the 128 most significant bits of the fileId.
- Meanwhile the random nature of both nodeId and fileId allows for files to be distributed uniformly and balanced, file sizes and nodes capacity differ, so each node might not be able to store as many files as each other.
- PAST also caches additionally to the k replications, at node's who have more space left. These caches might be discarded at any time.
- Pastry is a p2p routing substrate that is efficient, scalable, fault resilient and self-organizing.
- A file can be located unless all k nodes have failed simultaneously.
- Pastry can route in less than logarithmic steps..
- Eventual delivery is guaranteed unless l/2 nodes with adjacent nodeIds fail at the same time. l is typically 32.
- Tables of PAST maps nodeId to IP of the node.
- If a node joins or leaves, the invariants of the tables can be restored using logarithmic amount of messages, among the affected nodes.
- In line with the overall decentralized architecture of PAST, an important design goal for the storage management is to rely only on local coordination among nodes with nearby nodeIds, to fully integrate storage management with file insertion, and to incur only modest performance overheads related to storage management.
- PAST allows a node that is not one of the k numericaily closest nodes to the fileId to alternatively store the file, if it is in the leaf set of one of those k nodes.
- replica diversion: The purpose is to accommodate differences in the storage capacity and utilization of nodes within a leaf set.
- file diversion is performed when a node's entire leaf set is reaching capacity. Its purpose is to achieve more global load balancing across large portions of the nodeId space. A file is diverted to a different part of the nodeId space by choosing a different salt in the generation of its fileId.
- Replica diversion aims at balancing the remaining free storage space among the nodes in each leaf set. In addition, as the global storage utilization of a PAST system increases, file diversion may also become necessary to balance the storage load among different portions of the nodeId space.
- The purpose of replica diversion is to balance the remaining free storage space among the nodes in a leaf set.
  + A chooses a node B in its leaf set t h a t is not among the k closest and does not already hold a diverted replica of the file. A asks B to store a copy on its behalf, then enters an entry for the file in its table with a pointer to B
- Ensuring that the pointer doesn't disappear, can be done by entering a pointer to the replica stored on B, into the file table of C with the k+1th closest nodeId to the fielId.
- PAST has three policites to control replica diversion
  1) acceptance of replicas into a node's local store
  2) selecting a node to store a diverted replica
  3) deciding when to divert a file to a different part of the nodeId space
- it is not necessary to balance the remaining free storage space among nodes as long as the utilization of all nodes is low.
- it is preferable to divert a large file rather than multiple small ones.
- a replica should always be diverted from a node whose remaining free space is significantly below average to a node whose free space is significantly above average
- when the free space gets uniformly low in a leaf set, it is better to divert the file into another part of the nodeId space than to a t t e m p t to divert replicas at the risk of spreading locally high utilization to neighboring parts of the nodeId space.
- Policy for accepting a file is size_of_file / free_storage. A file is rejected if it consumes a fraction of remaining storage, larger than some value t.
- Different thresholds t for k closests node and nodes not in the k closest. + Threshold is larger (i.e. it's more difficult to reject a file, for the nodes in the set of the k closest)
- the policy discriminates against large files, and decreases the size threshold above which files get rejected as the node's utilization increases.
- This bias minimizes the number of diverted replicas and tends to divert large files first, while leaving room for small files
- A primary store node N that rejects a replica needs to select another node to hold the diverted replica.
  + Choose node with most remaining space which a) are in leafset b) have a nodeid that is not one of the k closest to the fileId c) do not already have a diverted replica of the file
- File is diverted if primary replica node and then diverted replica node both can't store the file. + Nodes holding the file will be told to drop it.
- The purpose of file diversion is to balance the remaining free storage space among different portions of the nodeId space in PAST.
  + File diversion is tried three times, so four inserts. If it fails, failure is reported to user, who can then fragment the file or something to make it smaller.
- PAST maintains the invariant that k copies of each inserted file are maintained on different nodes within a leaf set.
- recall that as part of the Pastry protocol, neighboring nodes in the nodeId space periodically exchange keep-alive messages. + If node is unresponsive for T time, all leaf sets of affected nodes, are adjusted. Missing node is replaced in all leafsets who had it.
- If a node joins, it's inserted into affected leaf sets.
- If a node joins a leafset because it's new or because another node disappeared, this node has to gain a replica of the files it should have. This should be old files from the node of whose spot it took.
- To avoid bandwidth overhead when joining, where the node should request copies of all files it need (expensive)
  + the joining node may instead install a pointer in its file table, referring to the node that has just ceased to be one of the k numerically closest to the fileId, and requiring that node to keep the replica.
- When a PAST network is growing, node additions may create the situation where a node that holds a diverted replica and the node that refers to that replica are no longer part of the same leaf set. + To minimize the associated overhead of continually sending keep-alive messages, affected replicas are gradually migrated to a node within the referring node's leaf set whenever possible.
- When a node fails and storage utilisation is so high the remaining nodes can't store additional replicas, for PAST to keep it's storage invariants a node asks the two most distant members of its leaf set (in the nodeId space) to locate a node in their respective leaf sets that can store the file. Since exactly half of the node's leaf set overlaps with each of these two nodes' leaf sets, a total of 21 nodes can be reached in this way.
- If total disk storage were to decrease due to node and disk failures, that were not balanced out by addition of new ones, then the system would exhaust its storage and replicas and files won't be able to be inserted.
  + PAST addresses this problem by maintaining storage quotas, thus ensuring that demand for storage cannot exceed the supply.
- Goals of cache management
  1) minimize access latency (here routing distance), fetch distance is measured in terms of Pastry routing hops
  2) maximize throughput
  3) balance query load in system
- The k replicas ensures availability, but also gives some load balancing and latency reduction because of locality properties of Pastry
- PAST nodes use the "unused" portion of their advertised disk space to cache files. Cached copies can be evicted and discarded at any time. In particular, when a node stores a new primary or redirected replica of a file, it typically evicts one or more cached files to make room for the replica.
- A file is cached in PAST at a node traversed in lookup or insert operations, if the file size is less than some fraction of the node’s remaining cache size
- Cache hits are tracked by nodes having cached files. They weight each cached file based on this and the size of the file, and evict the one with the least weight first.
- Experiments run with no replica diversion and file diversion, to display the need for explicit load balancing, showed that the need was definitely there. + A lot of insertions were rejected
  + A low amount of overall storage utilisation
- Caching helps things a lot
** Scribe
- Scalable application-level multicast infrastructure.
- Scribe supports large numbers of groups
- Built on top of Pastry
- Pastry is used to create and manage groups and to build efficient multicast trees for the dissemination of messages to each group.
- the use of multicast in applications has been limited because of the lack of wide scale deployment and the issue of how to track group membership.
- In this paper we present Scribe, a large-scale, decentralized application-level multicast infrastructure built upon Pastry, a scalable, self-organizing peer-to-peer location and routing substrate with good locality properties
- Scribe builds a multicast tree, formed by joining the Pastry routes from each group member to a rendez-vous point associated with a group.
- Each entry in the routing table of Pastry refers to one of potentially many nodes whose nodeId have the appropriate prefix. Among such nodes, the one closest to the present node (according to a scalar proximity metric, such as the round trip time) is chosen.
***** Pastry Locality
- The proximity metric is a scalar value that reflects the “distance” between any pair of nodes, such as the round trip time. It is assumed that a function exists that allows each Pastry node to determine the “distance” between itself and a node with a given IP address.
- The short routes property concerns the total distance, in terms of the proximity metric, that messages travel along Pastry routes. Recall that each entry in the node routing tables is chosen to refer to the nearest node, according to the proximity metric, with the appropriate nodeId prefix.
- The route convergence property is concerned with the distance traveled by two messages sent to the same key before their routes converge.
*** Scribe
- Any Scribe node may create a group; other nodes can then join the group, or multicast messages to all members of the group
- No particular delivery order of messages
- Stronger reliability and delivery guarantess can be built on top of Scribe
- Nodes can create, send messages to, and join many groups.
- Groups may have multiple sources of multicast messages and many members.
- Scribe uses Pastry to manage group creation, group joining and to build a per-group multicast tree used to disseminate the messages multicast in the group.
- The Scribe software on each node provides the forward and deliver methods, which are invoked by Pastry whenever a Scribe message arrives.
- Recall that the forward method is called whenever a Scribe message is routed through a node
- The deliver method is called when a Scribe message arrives at the node with nodeId numerically closest to the message’s key, or when a message was addressed to the local node using the Pastry send operation.
- The possible message types in Scribe are JOIN , CREATE , LEAVE and MULTICAST
- Each group has a unique groupId.
- The Scribe node with a nodeId numerically closest to the groupId acts as the rendezvous point for the associated group. The rendez-vous point is the root of the multicast tree created for the group.
- To create a group, a Scribe node asks Pastry to route a CREATE message using the groupId as the key + Pastry delivers this message to the node with the nodeId numerically closest to groupId.
  + This Scribe node becomes the rendez-vous point for the group.
- The groupId is the hash of the group’s textual name concatenated with its creator’s name.
- Alternatively, we can make the creator of a group be the rendez-vous point for the group as follows: a Pastry nodeId can be the hash of the textual name of the node, and a groupId can be the concatenation of the nodeId of the creator and the hash of the textual name of the group.
- This alternative can improve performance with a good choice of creator: link stress and delay will be lower if the creator sends to the group often, or is close in the network to other frequent senders or many group members.
- Scribe creates a multicast tree, rooted at the rendez-vous point, to disseminate the multicast messages in the group.
- The tree is created using a scheme similar to reverse path forwarding (??)
- The tree is formed by joining the Pastry routes from each group member to the rendez-vous point.
- Scribe nodes that are part of a group’s multicast tree are called forwarders with respect to the group; they may or may not be members of the group.
- Each forwarder maintains a children table for the group containing an entry (IP address and nodeId) for each of its children in the multicast tree.
***** Joining
- When a Scribe node wishes to join a group, it asks Pastry to route a JOIN message with the group’s groupId as the key + This will get routed to the rendez-vous point ( node )
  + At each node along the route, Pastry invokes Scribe’s forward method. + Forward checks its list of groups to see if it is currently a forwarder; if so, it accepts the node as a child, adding it to the children table. If the node is not already a forwarder, it creates an entry for the group, and adds the source node as a child in the associated children table
  + It then becomes a forwarder for the group by sending a JOIN message to the next node along the route from the joining node to the rendez-vous point
  1) The peer sends a “join” message towards the groupID
  2) An intermediate node forwarding this message will:
    - If the node is currently a forwarder for the group it adds the sending peer as a child and we’re done.
    - If the node is not a forwarder, it adds the sender as a child and then sends its own “join” message towards the groupID, thus becoming a forwarder for the group
  3) Every node in a group is a forwarder – but this does not mean that it is also a member
- So the rendez-vous point doesn't handle all requests, if there are forwarders on the path
***** Leaving
- The peer locally marks that it is no longer a member but merely a forwarder
- It then proceeds to check whether it has any children in the group + and if it hasn’t it sends a “leave” message to its parent (which continues recursively up the tree if necessary)
  + otherwise it stays on as a forwarder
***** Sending messages
- Multicast sources use Pastry to locate the rendez-vous point of a group: they route to the rendez-vous point + They ask it for its IP
- The IP is cached and can now be used to multicast to the group, avoiding further routing through pastry to get the rendez-vous node.
- If this point fails, a new one has to be found.
- There is a single multicast tree for each group and all multicast sources use the above procedure to multicast messages to the group.
  + The rendez-vous point can perform access control, as it's a centralised point of which all messages must go through
- Thus, pastry is not used for data traffic, only for handling joining and leaving of tree
- Top-down
***** Reliability
- Scribe provides only best-effort delivery of messages but it offers a framework for applications to implement stronger reliability guarantees.
- Uses TCP
***** Repairing
- Periodically, each non-leaf node in the tree sends a heartbeat message to its children.
- Multicast messages are implicit heartbeats (if you can send a message, you're alive!)
- A child suspects that its parent is faulty when it fails to receive heartbeat messages.
- Upon detection of the failure of its parent, a node calls Pastry to route a JOIN message to the group’s identifier. Pastry will route the message to a new parent, thus repairing the multicast tree. + I.e., the node "joins" again and gets a new parent (forwarder)
- The state associated with the rendez-vous point, which identifies the group creator and has an access control list, is replicated across the closest nodes to the root node in the nodeId space (where a typical value of is 5). It should be noted that these nodes are in the leaf set of the root node. If the root fails, its immediate children detect the failure and join again through Pastry. Pastry routes the join messages to a new root (the live node with the numerically closest nodeId to the groupId). This new root will be among these k nodes, as such, it will have the state.
***** Evaluations
- slightly longer delay due to more routing I'd imagine
  + Helps that pastry uses short routes
- Significantly less node stress, due to the tree structure, so work is nicely delegated
- Decent link stress, due to tree structure and route convergence, as group members which are close tend to be children of the same parent, so the parent needs a single copy, but this can be forwarded to two.