1655 lines
149 KiB
TeX
1655 lines
149 KiB
TeX
% Created 2018-12-13 Thu 23:57
|
||
\documentclass[11pt]{article}
|
||
\usepackage[utf8]{inputenc}
|
||
\usepackage[T1]{fontenc}
|
||
\usepackage{fixltx2e}
|
||
\usepackage{graphicx}
|
||
\usepackage{longtable}
|
||
\usepackage{float}
|
||
\usepackage{wrapfig}
|
||
\usepackage{rotating}
|
||
\usepackage[normalem]{ulem}
|
||
\usepackage{amsmath}
|
||
\usepackage{textcomp}
|
||
\usepackage{marvosym}
|
||
\usepackage{wasysym}
|
||
\usepackage{amssymb}
|
||
\usepackage{hyperref}
|
||
\tolerance=1000
|
||
\usepackage{minted}
|
||
\author{alex}
|
||
\date{\today}
|
||
\title{notes}
|
||
\hypersetup{
|
||
pdfkeywords={},
|
||
pdfsubject={},
|
||
pdfcreator={Emacs 25.2.2 (Org mode 8.2.10)}}
|
||
\begin{document}
|
||
|
||
\maketitle
|
||
\tableofcontents
|
||
|
||
\section{Structured P2P Networks}
|
||
\label{sec-1}
|
||
TODO, potentially read all of the experiments performed in pastry. Potentially not, who cares. Also the math in Kademlia.
|
||
\subsection{Chord}
|
||
\label{sec-1-1}
|
||
\subsubsection{Introduction}
|
||
\label{sec-1-1-1}
|
||
\begin{itemize}
|
||
\item A fundamental problem that confronts peer-to-peer applications is to efficiently locate the node that stores a particular data item.
|
||
\item Chord provides support for just one operation: given a key, it maps the key onto a node.
|
||
\item Data location can be easily implemented on top of Chord by associating a key with each data item, and storing the key/data item pair at the node to which the key maps.
|
||
\item Peer-to-peer systems and applications are distributed systems without any centralised control or hierarchical structure or organisation and each peer is equivalent in functionality.
|
||
\item Peer-to-peer applications can promote a lot of features, such as redundant storage, permanence, selection of nearby server, anonymity, search, authentication and hierarchical naming (note the structure is still peer-to-peer, the names are just attributes and data the peers hold)
|
||
\item Core operation in most P2P systems is efficient location of data items.
|
||
\item Chord is a scalable protocol for lookup in a dynamic peer-to-peer system with frequent node arrivals and departures.
|
||
\item Chord uses a variant of consistent hashing to assign keys to Chord nodes.
|
||
\item Consistent hashing is a special kind of hashing such that when a hash table is resized, only K/n keys need to be remapped on average, where K is the number of keys and n is the number of slots.
|
||
\item Additionally, consistent hashing tends to balance load, as each node will receive roughly the same amount of keys.
|
||
\item Each Chord node needs "routing" information about only a few others nodes, leading to better scaling.
|
||
\item Each node maintains information about O(log N) other nodes and resolves lookups via O(log N) messages. A change in the network results in no more than O(log$^{\text{2}}$ N) messages.
|
||
\item Chords performance degrades gracefully, when information is out of date in the nodes routing tables. It's difficult to maintain consistency of O(log N) states. Chord only requires one piece of information per node to be correct, in order to guarantee correct routing.
|
||
\item Finger tables only forward looking
|
||
\item I.e messages arriving at a peer tell it nothing useful, knowledge must be gained explicitly
|
||
\item Rigid routing structure
|
||
\item Locality difficult to establish
|
||
\end{itemize}
|
||
\subsubsection{System Model}
|
||
\label{sec-1-1-2}
|
||
\begin{itemize}
|
||
\item Chord simplifies the design of P2P systems and applications based on it by addressing the following problems:
|
||
\begin{enumerate}
|
||
\item \textbf{Load balance:} Chord acts as a Distributed Hash Function, spreading keys evenly over the nodes, which provides a natural load balance
|
||
\item \textbf{Decentralization:} Chord is fully distributed. Improves robustness and is nicely suited for loosely-organised p2p applications
|
||
\item \textbf{Scalability:} The cost of a lookup grows as the log of the number of nodes
|
||
\item \textbf{Availability:} Chord automatically adjusts its internal tables to reflect newly joined nodes as well as node failures, ensur- ing that, barring major failures in the underlying network, the node responsible for a key can always be found. This is true even if the system is in a continuous state of change.
|
||
\item \textbf{Flexible naming:} No constraints on the structure of the keys it looks up
|
||
\end{enumerate}
|
||
\end{itemize}
|
||
\begin{enumerate}
|
||
\item Use cases of Chord
|
||
\label{sec-1-1-2-1}
|
||
\begin{itemize}
|
||
\item Cooperative Mirroring: Essentially a load balancer
|
||
\item Time-Shared Storage: If a person wishes some data to be always available machine is only occasionally available, they can offer to store others’ data while they are up, in return for having their data stored elsewhere when they are down.
|
||
\item Distributed Indexes: A key in this application could be a few keywords and values would be machines offering documents with those keywords
|
||
\item Large-scale Combinatorial Search: Keys are candidate solutions to the problem; Chord maps these keys to the machines responsible for testing them as solutions.
|
||
\end{itemize}
|
||
\end{enumerate}
|
||
\subsubsection{The Base Chord Protocol}
|
||
\label{sec-1-1-3}
|
||
\begin{itemize}
|
||
\item The Chord protocol specifies how to find the locations of keys, how new nodes join the system, and how to recover from the failure (or planned departure) of existing nodes.
|
||
\end{itemize}
|
||
\begin{enumerate}
|
||
\item Overview
|
||
\label{sec-1-1-3-1}
|
||
\begin{itemize}
|
||
\item Chord improves the scalability of consistent hashing by avoid- ing the requirement that every node know about every other node.
|
||
\end{itemize}
|
||
\item Consistent Hashing
|
||
\label{sec-1-1-3-2}
|
||
\begin{itemize}
|
||
\item The consistent hash function assigns each node and key an m-bit identifier using a base hash function such as SHA-1. A node’s identifier is chosen by hashing the node’s IP address, while a key identifier is produced by hashing the key.
|
||
\item Identifiers are ordered in an identifier circle modulo 2$^{\text{m}}$
|
||
\item Key k is assigned to the first node whose identifier is equal to or follows (the identifier of) k in the identifier space.
|
||
\item This node is called the successor node of key k, succ(k). It's the first node clockwise from k, if identifiers are presented as a circle.
|
||
\item To maintain the consistent hashing mapping when a node n joins the network, certain keys previously assigned to n’s successor now become assigned to n. When node n leaves the network, all of its assigned keys are reassigned to n’s successor.
|
||
\item The claims about the effeciency of consistent hashing, relies on the identifiers being chosen uniformly randomly. SHA-1 is very much deterministic, as is all hash functions. As such, an adversary could in theory pick a bunch of identifiers close to each other and thus force a single node to carry a lot of files, ruining the balance. However, it's considered difficult to break these hash functions, as such we can't produce files with specific hashes.
|
||
\item When consistent hashing is implemented as described above, the theorem proves a bound of eps = O(log N). The consistent hashing paper shows that eps can be reduced to an arbitrarily small constant by having each node run O(log N) “virtual nodes” each with its own identifier.
|
||
\item This is difficult to pre-determine, as the load on the system is unknown a priori.
|
||
\end{itemize}
|
||
\item Scalable Key Location
|
||
\label{sec-1-1-3-3}
|
||
\begin{itemize}
|
||
\item A very small amount of routing information suffices to imple- ment consistent hashing in a distributed environment. Each node need only be aware of its successor node on the circle.
|
||
\item Queries for a given identifier can be passed around the circle via these suc- cessor pointers until they first encounter a node that succeeds the identifier; this is the node the query maps to.
|
||
\item To avoid having to potentialy traverse all N nodes, if the identifiers are "unlucky", Chord maintains extra information.
|
||
\item m is the number of bits in the keys
|
||
\item Each node n maintains a routing table with at most m entries, called the finger table.
|
||
\item The i'th entry in the table at node n contains the identity of the first node, s, that succeeds n by at least 2$^{\text{(i-1)}}$ on the identifier circle, s = succ(n+2$^{\text{(i-1)}}$) for 1 <= i <= m and everything i mod 2$^{\text{m}}$
|
||
\item Node s in the ith finger of node n is n.finger[i].node
|
||
\item A finger table entry includes both the Chord identifier and the IP address (and port number) of the relevant node.
|
||
\item First, each node stores information about only a small number of other nodes, and knows more about nodes closely following it on the identifier circle than about nodes farther away.
|
||
\item The nodes keep an interval for each key implicitly, which essentially covers the keys that the the specific key is the predecessor for. This allows for quickly looking up a key, if it's not known, since one can find the interval which contains it!
|
||
\item The finger pointers at repeatedly doubling distances around the circle cause each iteration of the loop in find predecessor to halve the distance to the target identifier.
|
||
\end{itemize}
|
||
\item Node Joins
|
||
\label{sec-1-1-3-4}
|
||
\begin{itemize}
|
||
\item In dynamic networks, nodes can join and leave at any time. Thus the main challenge is to preserve the ability to lookup of every key.
|
||
\item There are to invariants:
|
||
\begin{enumerate}
|
||
\item Each node's succ is correctly maintained
|
||
\item For every key k, node succ(k) is responsible for k
|
||
\end{enumerate}
|
||
\item We also want the finger tables to be correct
|
||
\item To simplify the join and leave mechanisms, each node in Chord maintains a predecessor pointer.
|
||
\item To preserve the invariants stated above, Chord must perform three tasks when a node n joins the network:
|
||
\begin{enumerate}
|
||
\item Initialise the predecessor and fingers of node n
|
||
\item Update the fingers and predecessors of existings node to reflect the addition of n
|
||
\item Notify the higher layer software so that it can transfer state (e.g. values) associated with keys that node n is now responsible for.
|
||
\end{enumerate}
|
||
\end{itemize}
|
||
\begin{enumerate}
|
||
\item Initializing fingers and predecessor
|
||
\label{sec-1-1-3-4-1}
|
||
\begin{itemize}
|
||
\item Node n learns its pre- decessor and fingers by asking n' to look them up
|
||
\end{itemize}
|
||
\item Updating fingers of existing nodes
|
||
\label{sec-1-1-3-4-2}
|
||
\begin{itemize}
|
||
\item Thus, for a given n, the algorithm starts with the ith finger of node n, and then continues to walk in the counter-clock-wise direction on the identifier circle until it encounters a node whose ith finger precedes n.
|
||
\end{itemize}
|
||
\item Transfering keys
|
||
\label{sec-1-1-3-4-3}
|
||
\begin{itemize}
|
||
\item Node n contacts it's the node immediately following itself and simply asks for the transfering of all appropriate values
|
||
\end{itemize}
|
||
\end{enumerate}
|
||
\end{enumerate}
|
||
\subsubsection{Concurrenct operations and failures}
|
||
\label{sec-1-1-4}
|
||
\begin{enumerate}
|
||
\item Stabilitzation
|
||
\label{sec-1-1-4-1}
|
||
\begin{itemize}
|
||
\item The join algorithm in Section 4 aggressively maintains the finger tables of all nodes as the network evolves. Since this invariant is difficult to maintain in the face of concurrent joins in a large net- work, we separate our correctness and performance goals.
|
||
\item A basic “stabilization” protocol is used to keep nodes’ successor pointers up to date, which is sufficient to guarantee correctness of lookups. Those successor pointers are then used to verify and correct fin- ger table entries, which allows these lookups to be fast as well as correct.
|
||
\item Joining nodes can affect performance in three ways, all tables are still correct and result is found, succ is correct but fingers aren't, result will still be found and everything is wrong, in which case nothing might be found. The lookup can then be retried shortly after.
|
||
\item Our stabilization scheme guarantees to add nodes to a Chord ring in a way that preserves reachability of existing nodes
|
||
\item We have not discussed the adjustment of fingers when nodes join because it turns out that joins don’t substantially damage the per- formance of fingers. If a node has a finger into each interval, then these fingers can still be used even after joins.
|
||
\end{itemize}
|
||
\item Failures and Replication
|
||
\label{sec-1-1-4-2}
|
||
\begin{itemize}
|
||
\item When a node n fails, nodes whose finger tables include n must find n’s successor. In addition, the failure of n must not be allowed to disrupt queries that are in progress as the system is re-stabilizing.
|
||
\item The key step in failure recovery is maintaining correct successor pointers
|
||
\item To help achieve this, each Chord node maintains a “successor-list” of its r nearest successors on the Chord ring.
|
||
\item If node n notices that its successor has failed, it replaces it with the first live en- try in its successor list. At that point, n can direct ordinary lookups for keys for which the failed node was the successor to the new successor. As time passes, stabilize will correct finger table entries and successor-list entries pointing to the failed node.
|
||
\end{itemize}
|
||
\end{enumerate}
|
||
\subsubsection{Simulations and Experimental Results}
|
||
\label{sec-1-1-5}
|
||
\begin{itemize}
|
||
\item The probability that a particular bin does not contain any is for large values of N approximately 0.368
|
||
\item As we discussed earlier, the consistent hashing paper solves this problem by associating keys with virtual nodes, and mapping mul- tiple virtual nodes (with unrelated identifiers) to each real node. Intuitively, this will provide a more uniform coverage of the iden- tifier space.
|
||
\end{itemize}
|
||
\subsubsection{Conclusion}
|
||
\label{sec-1-1-6}
|
||
\begin{itemize}
|
||
\item Attractive features of Chord include its simplicity, provable cor- rectness, and provable performance even in the face of concurrent node arrivals and departures. It continues to function correctly, al- beit at degraded performance, when a node’s information is only partially correct. Our theoretical analysis, simulations, and exper- imental results confirm that Chord scales well with the number of nodes, recovers from large numbers of simultaneous node failures and joins, and answers most lookups correctly even during recov- ery.
|
||
\end{itemize}
|
||
\subsection{Pastry}
|
||
\label{sec-1-2}
|
||
\begin{itemize}
|
||
\item Pastry, a scalable, distributed object location and routing substrate for wide-area peer-to-peer applications.
|
||
\item It can be used to support a variety of peer-to-peer applications, including global data storage, data sharing, group communication and naming.
|
||
\item Each node in the Pastry network has a unique identifier (nodeId). When presented with a message and a key, a Pastry node efficiently routes the message to the node with a nodeId that is numerically closest to the key, among all currently live Pastry nodes. Each Pastry node keeps track of its immediate neighbors in the nodeId space, and notifies applications of new node arrivals, node failures and recoveries.
|
||
\item Pastry takes into account network locality; it seeks to minimize the distance messages travel, according to a to scalar proximity metric like the number of IP routing hops.
|
||
\item Experimental results obtained with a prototype implementation on an emulated network of up to 100,000 nodes confirm Pastry’s scalability and efficiency, its ability to self-organize and adapt to node failures, and its good network locality properties.
|
||
\end{itemize}
|
||
\subsubsection{Introduction}
|
||
\label{sec-1-2-1}
|
||
\begin{itemize}
|
||
\item Pastry is completely decentralized, fault-resilient, scalable, and reliable. Moreover, Pastry has good route locality properties.
|
||
\item Pastry is intended as general substrate for the construction of a variety of peer-to- peer Internet applications like global file sharing, file storage, group communication and naming systems.
|
||
\item Several application have been built on top of Pastry to date, including a global, persistent storage utility called PAST [11, 21] and a scalable publish/subscribe system called SCRIBE \footnote{DEFINITION NOT FOUND.}. Other applications are under development.
|
||
\item Each node in the Pastry network has a unique numeric identifier (nodeId)
|
||
\item When presented with a message and a numeric key, a Pastry node efficiently routes the message to the node with a nodeId that is numeri- cally closest to the key, among all currently live Pastry nodes.
|
||
\item The expected number of routing steps is O(log N), where N is the number of Pastry nodes in the network.
|
||
\item At each Pastry node along the route that a message takes, the application is notified and may perform application-specific computations related to the message.
|
||
\item Pastry takes into account network locality; it seeks to minimize the distance mes- sages travel, according to a scalar proximity metric like the number of IP routing hops.
|
||
\item Because nodeIds are randomly assigned, with high probability, the set of nodes with adjacent nodeId is diverse in geography, ownership, jurisdiction, etc. Applications can leverage this, as Pastry can route to one of nodes that are numerically closest to the key.
|
||
\item A heuristic ensures that among a set of nodes with the closest nodeIds to the key, the message is likely to first reach a node “near” the node from which the message originates, in terms of the proximity metric.
|
||
\end{itemize}
|
||
\begin{enumerate}
|
||
\item PAST
|
||
\label{sec-1-2-1-1}
|
||
\begin{itemize}
|
||
\item PAST, for instance, uses a fileId, computed as the hash of the file’s name and owner, as a Pastry key for a file. Replicas of the file are stored on the k Pastry nodes with nodeIds numerically closest to the fileId. A file can be looked up by sending a message via Pastry, using the fileId as the key. By definition, the lookup is guaranteed to reach a node that stores the file as long as one of the k nodes is live.
|
||
\item Moreover, it follows that the message is likely to first reach a node near the client, among the k nodes; that node delivers the file and consumes the message. Pastry’s notification mechanisms allow PAST to maintain replicas of a file on the nodes closest to the key, despite node failure and node arrivals, and using only local coordination among nodes with adjacent nodeIds.
|
||
\end{itemize}
|
||
\item SCRIBE
|
||
\label{sec-1-2-1-2}
|
||
\begin{itemize}
|
||
\item in the SCRIBE publish/subscribe System, a list of subscribers is stored on the node with nodeId numerically closest to the topicId of a topic, where the topicId is a hash of the topic name. That node forms a rendez-vous point for publishers and subscribers. Subscribers send a message via Pastry using the topicId as the key; the registration is recorded at each node along the path. A publisher sends data to the rendez-vous point via Pastry, again using the topicId as the key. The rendez-vous point forwards the data along the multicast tree formed by the reverse paths from the rendez-vous point to all subscribers.
|
||
\end{itemize}
|
||
\end{enumerate}
|
||
\subsubsection{Design of Pastry}
|
||
\label{sec-1-2-2}
|
||
\begin{itemize}
|
||
\item A Pastry system is a self-organizing overlay network of nodes, where each node routes client requests and interacts with local instances of one or more applications.
|
||
\item Each node in the Pastry peer-to-peer overlay network is assigned a 128-bit node identifier (nodeId).
|
||
\item The nodeId is used to indicate a node’s position in a circular nodeId space, which ranges from 0 to 2$^{\text{128}}$ - 1 (sounds like a modular ring type thing, as in Chord).
|
||
\item Nodeids are distributed uniformly in the 128-bit nodeid space, such as computing the hash of IP.
|
||
\item As a result of this random assignment of nodeIds, with high probability, nodes with adjacent nodeIds are diverse in geography, ownership, jurisdiction, network attachment, etc.
|
||
\item Under normal conditions, in a network of N nodes, Pastry can route to the numerically closest node to a given key in less than log$_{\text{(2}^{\text{b}}\text{)}}$ N steps. b is some random configuration parameter.
|
||
\item For the purpose of routing, nodeIds and keys are thought of as a sequence of digits with base 2$^{\text{b}}$.
|
||
\item In each routing step, a node normally forwards the message to a node whose nodeId shares with the key a prefix that is at least one digit (or bits) longer than the prefix that the key shares with the present node’s id. If no such node is known, the message is forwarded to a node whose nodeId shares a prefix with the key as long as the current node, but is numerically closer to the key than the present node’s id. To support this routing procedure, each node maintains some routing state
|
||
\item Despite concurrent node failures, eventual delivery is guaranteed unless |L|/2 nodes with adjacent nodeIds fail simul- taneously (|L| is a configuration parameter with a typical value of 16 or 32).
|
||
\end{itemize}
|
||
\begin{enumerate}
|
||
\item Pastry Node State
|
||
\label{sec-1-2-2-1}
|
||
\begin{itemize}
|
||
\item Each Pastry node maintains a routing table, a neighborhood set and a leaf set.
|
||
\item A node’s routing table, R, is organized into log$_{\text{(2}^{\text{b}}\text{)}}$ N rows with 2$^{\text{b}}$ - 1 entries each.
|
||
\item The 2$^{\text{b}}$ - 1 entries at row n each refer to a node whose nodeid shares the present node's nodeid in the first n digits, but whose n+1th digit has one of the 2$^{\text{b}}$ - 1 possible values other than then n+1th digit in the present node's id.
|
||
\item Each entry in the routing table contains the IP address of one of potentially many nodes whose nodeId have the appropriate prefix; in practice, a node is chosen that is close to the present node, according to the proximity metric.
|
||
\item If no node is known with a suitable nodeId, then the routing table entry is left empty.
|
||
\item The neighborhood set M contains the nodeIds and IP addresses of the |M| nodes that are closest (according the proximity metric) to the local node.
|
||
\item Applications are responsible for providing proximity metrics
|
||
\item The neighborhood set is not normally used in routing messages; it is useful in maintaining locality properties
|
||
\item The leaf set L is the set of nodes with the |L|/2 numerically closest larger nodeIds, and the |L|/2 nodes with numerically closest smaller nodeIds, relative to the present node’s nodeId. The leaf set is used during the message routing
|
||
\end{itemize}
|
||
\item Routing
|
||
\label{sec-1-2-2-2}
|
||
\begin{itemize}
|
||
\item Given a message, the node first checks to see if the key falls within the range of nodeIds covered by its leaf set
|
||
\item If so, the message is forwarded directly to the destination node, namely the node in the leaf set whose nodeId is closest to the key (possibly the present node)
|
||
\item If the key is not covered by the leaf set, then the routing table is used and the message is forwarded to a node that shares a common prefix with the key by at least one more digit
|
||
\item In certain cases, it is possible that the appropriate entry in the routing table is empty or the associated node is not reachable, in which case the message is forwarded to a node that shares a prefix with the key at least as long as the local node, and is numerically closer to the key than the present node’s id.
|
||
\item Such a node must be in the leaf set unless the message has already arrived at the node with numerically closest nodeId. And, unless |L|/2 adjacent nodes in the leaf set have failed simultaneously, at least one of those nodes must be live.
|
||
\item It can be shown that the expected number of routing steps is log$_{\text{(2}^{\text{b}}\text{)}}$ N steps.
|
||
\item If a message is forwarded using the routing table, then the set of nodes whose ids have a longer prefix match with the key is reduced by a factor of 2$^{\text{b}}$ in each step, which means the destination is reached in log$_{\text{(2}^{\text{b}}\text{)}}$ N steps.
|
||
\item If the key is within range of the leaf set, then the destination node is at most one hop away.
|
||
\item The third case arises when the key is not covered by the leaf set (i.e., it is still more
|
||
\end{itemize}
|
||
than one hop away from the destination), but there is no routing table entry. Assuming accurate routing tables and no recent node failures, this means that a node with the appropriate prefix does not exist.
|
||
\begin{itemize}
|
||
\item The likelihood of this case, given the uniform distribution of nodeIds, depends on |L|. Analysis shows that with |L| = 2$^{\text{b}}$ and |L| = 2 * 2$^{\text{b}}$, the probability that this case arises during a given message transmission is less than .02 and 0.006, respectively. When it happens, no more than one additional routing step results with high probability.
|
||
\end{itemize}
|
||
\item Pastry API
|
||
\label{sec-1-2-2-3}
|
||
\begin{itemize}
|
||
\item Substrate: not an application itself, rather it provides Application Program Interface (API) to be used by applications. Runs on all nodes joined in a Pastry network
|
||
\item Pastry exports following operations; nodeId and route.
|
||
\item Applications layered on top of PAstry must export the following operations; deliver, forward, newLeafs.
|
||
\end{itemize}
|
||
\item Self-organization and adaptation
|
||
\label{sec-1-2-2-4}
|
||
\begin{enumerate}
|
||
\item Node Arrival
|
||
\label{sec-1-2-2-4-1}
|
||
\begin{itemize}
|
||
\item When a new node arrives, it needs to initialize its state tables, and then inform other nodes of its presence. We assume the new node knows initially about a nearby Pastry node A, according to the proximity metric, that is already part of the system.
|
||
\item Let us assume the new node’s nodeId is X.
|
||
\item Node X then asks A to route a special “join” message with the key equal to X. Like any message, Pastry routes the join message to the existing node Z whose id is numerically closest to X.
|
||
\item In response to receiving the “join” request, nodes A, Z, and all nodes encountered on the path from A to Z send their state tables to X.
|
||
\item Node X initialized its routing table by obtaining the i-th row of its routing table from the i-th node encountered along the route from A to Z to
|
||
\item X can use Z's leaf set as basis, since Z is closest to X.
|
||
\item X use A's neighborhood to initialise its own
|
||
\item Finally, X transmits a copy of its resulting state to each of the nodes found in its neighborhood set, leaf set, and routing table. Those nodes in turn update their own state based on the information received.
|
||
\end{itemize}
|
||
\item Node Depature
|
||
\label{sec-1-2-2-4-2}
|
||
\begin{itemize}
|
||
\item A Pastry node is considered failed when its immediate neighbors in the nodeId space can no longer communicate with the node.
|
||
\item To replace a failed node in the leaf set, its neighbor in the nodeId space contacts the live node with the largest index on the side of the failed node, and asks that node for its leaf table.
|
||
\item The failure of a node that appears in the routing table of another node is detected when that node attempts to contact the failed node and there is no response.
|
||
\item To replace a failed node in a routing table entry, a node contacts the other nodes in the row of the failed node and asks if any of them knows a node with the same prefix.
|
||
\item a node attempts to contact each member of the neighborhood set periodically to see if it is still alive.
|
||
\end{itemize}
|
||
\end{enumerate}
|
||
\item Locality
|
||
\label{sec-1-2-2-5}
|
||
\begin{itemize}
|
||
\item Pastry’s notion of network proximity is based on a scalar proximity metric, such as the number of IP routing hops or geographic distance.
|
||
\item It is assumed that the application provides a function that allows each Pastry node to determine the “distance” of a node with a given IP address to itself.
|
||
\item Throughout this discussion, we assume that the proximity space defined by the cho- sen proximity metric is Euclidean; that is, the triangulation inequality holds for dis- tances among Pastry nodes.
|
||
\item If the triangulation inequality does not hold, Pastry’s basic routing is not affected; however, the locality properties of Pastry routes may suffer.
|
||
\end{itemize}
|
||
\begin{enumerate}
|
||
\item Route Locality
|
||
\label{sec-1-2-2-5-1}
|
||
\begin{itemize}
|
||
\item although it cannot be guaranteed that the distance of a message from its source increases monotonically at each step, a message tends to make larger and larger strides with no possibility of returning to a node within d$_{\text{i}}$ of any node i encountered on the route, where d$_{\text{i}}$ is the distance of the routing step taken away from node i. Therefore, the messag ehas nowhere to go but towards its destination.
|
||
\end{itemize}
|
||
\item Locating the nearest among k nodes
|
||
\label{sec-1-2-2-5-2}
|
||
\begin{itemize}
|
||
\item Recall that Pastry routes messages towards the node with the nodeId closest to the key, while attempting to travel the smallest possible distance in each step.
|
||
\item Pastry makes only local routing decisions, minimizing the distance traveled on the next step with no sense of global direction.
|
||
\end{itemize}
|
||
\end{enumerate}
|
||
\item Arbitrary node failures and network partitions
|
||
\label{sec-1-2-2-6}
|
||
\begin{itemize}
|
||
\item As routing is deterministic by default, a malicious node can fuck things up. Randomized routing fixes this.
|
||
\item Another challenge are IP routing anomalies in the Internet that cause IP hosts to be unreachable from certain IP hosts but not others.
|
||
\item However, Pastry’s self-organization protocol may cause the creation of multiple, isolated Pastry overlay networks during periods of IP routing failures. Because Pastry relies almost exclusively on information exchange within the overlay network to self-organize, such isolated overlays may persist after full IP connectivity resumes.
|
||
\item One solution to this problem involves the use of IP multicast.
|
||
\end{itemize}
|
||
\end{enumerate}
|
||
\subsubsection{Conclusion}
|
||
\label{sec-1-2-3}
|
||
\begin{itemize}
|
||
\item This paper presents and evaluates Pastry, a generic peer-to-peer content location and routing system based on a self-organizing overlay network of nodes connected via the Internet. Pastry is completely decentralized, fault-resilient, scalable, and reliably routes a message to the live node with a nodeId numerically closest to a key. Pastry can be used as a building block in the construction of a variety of peer-to-peer Internet applications like global file sharing, file storage, group communication and naming systems. Results with as many as 100,000 nodes in an emulated network confirm that Pastry is efficient and scales well, that it is self-organizing and can gracefully adapt to node failures, and that it has good locality properties.
|
||
\end{itemize}
|
||
|
||
\subsection{Kademlia}
|
||
\label{sec-1-3}
|
||
\subsubsection{Abstract}
|
||
\label{sec-1-3-1}
|
||
\begin{itemize}
|
||
\item A peer-to-peer distributed hash table with provable consistency and performance in a fault-prone environment
|
||
\item system routes queries and locates nodes using a novel XOR-based metric topology
|
||
\item The topology has the property that every message exchanged conveys or reinforces useful contact information.
|
||
\item The system exploits this information to send parallel, asynchronous query messages that tolerate node failures without imposing timeout delays on users.
|
||
\end{itemize}
|
||
\subsubsection{Introduction}
|
||
\label{sec-1-3-2}
|
||
\begin{itemize}
|
||
\item Kademlia is a P2P DHT
|
||
\item Kademlia has a number of desirable features not simultaneously offered by any previous DHT. It minimizes the number of configuration messages nodes must send to learn about each other.
|
||
\item Configuration information spreads automatically as a side-effect of key lookups.
|
||
\item Kademlia uses parallel, asynchronous queries to avoid timeout delays from failed nodes.
|
||
\item Keys are opaque, 160-bit quantities (e.g., the SHA-1 hash of some larger data)
|
||
\item Participating computers each have a node ID in the 160-bit key space.
|
||
\item (key,value) pairs are stored on nodes with IDs “close” to the key for some notion of closeness.
|
||
\item XOR is symmetric, allowing Kademlia participants to receive lookup queries from precisely the same distribution of nodes contained in their routing tables
|
||
\item Without this property, systems such as Chord do not learn useful routing information from queries they receive.
|
||
\item Worse yet, asymmetry leads to rigid routing tables. Each entry in a Chord node’s finger table must store the precise node preceding some interval in the ID space.
|
||
\item Each entry in a Chord node’s finger table must store the precise node preceding some interval in the ID space. Any node actually in the interval would be too far from nodes preceding it in the same interval. Kademlia, in contrast, can send a query to any node within an interval, allowing it to select routes based on latency or even send parallel, asynchronous queries to several equally appropriate nodes.
|
||
\item Kademlia most resembles Pastry’s first phase, which (though not described this way by the authors) successively finds nodes roughly half as far from the target ID by Kademlia’s XOR metric.
|
||
\item In a second phase, however, Pastry switches distance metrics to the numeric difference between IDs. It also uses the second, numeric difference metric in replication. Unfortunately, nodes close by the second metric can be quite far by the first, creating discontinuities at particular node ID values, reducing performance, and complicating attempts at formal analysis of worst-case behavior.
|
||
\end{itemize}
|
||
\subsubsection{System Description}
|
||
\label{sec-1-3-3}
|
||
\begin{itemize}
|
||
\item Kademlia assign 160-bit opaque IDs to nodes and provide a lookup algorithm that locates successively “closer” nodes to any desired ID, converging to the lookup target in logarithmically many steps
|
||
\item An identifier is opaque if it provides no information about the thing it identifies other than being a seemingly random string or number
|
||
\item Kademlia effectively treats nodes as leaves in a binary tree, with each node’s position determined by the shortest unique prefix of its ID
|
||
\item For any given node, we divide the binary tree into a series of successively lower subtrees that don’t contain the node. The highest subtree consists of the half of the binary tree not containing the node.
|
||
\item The next subtree consists of the half of the remaining tree not containing the node, and so forth
|
||
\item The Kademlia protocol ensures that every node knows of at least one node in each of its subtrees, if that subtree contains a node. With this guarantee, any node can locate any other node by its ID
|
||
\end{itemize}
|
||
\begin{enumerate}
|
||
\item XOR Metric
|
||
\label{sec-1-3-3-1}
|
||
\begin{itemize}
|
||
\item Each Kademlia node has a 160-bit node ID. Node IDs are currently just random 160-bit identifiers, though they could equally well be constructed as in Chord.
|
||
\item Every message a node transmits includes its node ID, permitting the recipient to record the sender’s existence if necessary.
|
||
\item Keys, too, are 160-bit identifiers. To assign hkey,valuei pairs to particular nodes, Kademlia relies on a notion of distance between two identifiers. Given two 160-bit identifiers, x and y, Kademlia defines the distance between them as their bitwise exclusive or (XOR) intepreted as an integer.
|
||
\item XOR is nice, as it is symmetric, offers the triangle property even though it's non-euclidean.
|
||
\item We next note that XOR captures the notion of distance implicit in our binary-tree-based sketch of the system.
|
||
\item In a fully-populated binary tree of 160-bit IDs, the magnitude of the distance between two IDs is the height of the smallest subtree containing them both. When a tree is not fully populated, the closest leaf to an ID x is the leaf whose ID shares the longest common prefix of x.
|
||
\item Overlap in regards to closest might happen. In this case the closest leaf to x will be the closest leaf to ID x\textasciitilde{} produced by flipping the bits in corresponding to the empty branches of the tree (???)
|
||
\item Like Chord’s clockwise circle metric, XOR is unidirectional. For any given point x and distance ∆ > 0, there is exactly one point y such that d(x, y) = ∆. Unidirectionality ensures that all lookups for the same key converge along the same path, regardless of the originating node. Thus, caching hkey,valuei pairs along the lookup path alleviates hot spots.
|
||
\end{itemize}
|
||
\item Node state
|
||
\label{sec-1-3-3-2}
|
||
\begin{itemize}
|
||
\item For each 0 ≤ i < 160, every node keeps a list of (IP address, UDP port, Node ID) triples for nodes of distance between 2$^{\text{i}}$ and 2$^{\text{i}}$+1 from itself. We call these lists k-buckets.
|
||
\item Each k-bucket is kept sorted by time last seen—least-recently seen node at the head, most-recently seen at the tail. For small values of i, the k-buckets will generally be empty (as no appropriate nodes will exist). For large values of i, the lists can grow up to size k, where k is a system-wide replication parameter.
|
||
\item k is chosen such that it is unlikely that k nodes will fail at the same time.
|
||
\item When a message is received, request or reply, from another node, the receiver updates its appropriate k-bucket, for the sender's node id. If the node is already present there, it's moved to the tail, if it's not there and there is room, it's inserted. If the bucket is full, the least recently seen node is pinged, if it doesn't respond, it gets replaced, if it does respond, the new node is discarded.
|
||
\item k-buckets effectively implement a least-recently seen eviction policy, except that live nodes are never removed from the list.
|
||
\item This works well for systems with an otherwise high churn rate, as nodes who are alive for a longer period, are more likely to stay alive.
|
||
\item A second benefit of k-buckets is that they provide resistance to certain DoS attacks. One cannot flush nodes’ routing state by flooding the system with new nodes, as new nodes are only inserted, once the old ones die.
|
||
\end{itemize}
|
||
\item Kademlia Protocol
|
||
\label{sec-1-3-3-3}
|
||
\begin{itemize}
|
||
\item The Kademlia protocol consists of four RPCs: ping, store, find node, and find value.
|
||
\item The ping RPC probes a node to see if it is online.
|
||
\item store a node to store a (key, value) pair for later retrieval
|
||
\item find node takes a 160-bit ID as an argument. The recipient of the RPC returns (IP address, UDP port, Node ID) triples for the k nodes it knows aboutclosest to the target ID. These triples can come from a single k-bucket, or they may come from multiple k-buckets if the closest k-bucket is not full. In any case, the RPC recipient must return k items (unless there are fewer than k nodes in all its k-buckets combined, in which case it returns every node it knows about).
|
||
\item find value behaves like find node—returning (IP address, UDP port, Node ID) triples—with one exception. If the RPC recipient has received a store RPC for the key, it just returns the stored value.
|
||
\item In all RPCs, the recipient must echo a 160-bit random RPC ID, which provides some resistance to address forgery. pings can also be piggy-backed on RPC replies for the RPC recipient to obtain additional assurance of the sender’s network address.
|
||
\end{itemize}
|
||
\begin{enumerate}
|
||
\item Node lookup
|
||
\label{sec-1-3-3-3-1}
|
||
\begin{enumerate}
|
||
\item Node lookup is performed recursively. The lookup initiator starts by picking alpha nodes from its closest k-bucket (is the closest to the iniator or closest to the node we wish to lookup ??).
|
||
\item The iniator then sends parallel async find$_{\text{node}}$ RPCs to these alpha nodes.
|
||
\item In the recursive step, the initiator resends the find node to nodes it has learned about from previous RPCs. (This recursion can begin before all α of the previous RPCs have returned).
|
||
\item If a response is not found the alpha nodes queried, the iniator instead query all of the k nodes which were returned.
|
||
\item Lookup terminates when all k has responded or failed to respond.
|
||
\item When α = 1, the lookup algorithm resembles Chord’s in terms of message cost and the latency of detecting failed nodes. However, can route for lower latency because it has the flexibility of choosing any one of k nodes to forward a request to.
|
||
\end{enumerate}
|
||
\item Store
|
||
\label{sec-1-3-3-3-2}
|
||
\begin{itemize}
|
||
\item Most operations are implemented in terms of the above lookup procedure. To store a (key,value) pair, a participant locates the k closest nodes to the key and sends them store RPCs
|
||
\item Additionally, each node re-publishes (key,value) pairs as necessary to keep them alive
|
||
\item For file sharing, it's required that the original publisher of a (key,value) pair to republish it every 24 hours. Otherwise, (key,value) pairs expire 24 hours after publication, so as to limit stale index information in the system.
|
||
\end{itemize}
|
||
\item Find value
|
||
\label{sec-1-3-3-3-3}
|
||
\begin{itemize}
|
||
\item To find a (key,value) pair, a node starts by performing a lookup to find the k nodes with IDs closest to the key. However, value lookups use find value rather than find node RPCs. Moreover, the procedure halts immediately when any node returns the value. For caching purposes, once a lookup succeeds, the requesting node stores the (key,value) pair at the closest node it observed to the key that did not return the value.
|
||
\item Because of the unidirectionality of the topology, future searches for the key are likely to hit cached entries before querying the closest node.
|
||
\item To avoid overcaching, the expiration time of any key-value pair is determined by the distance between the current node and the node whose ID is closest to the key ID.
|
||
\end{itemize}
|
||
\item Refreshing buckets
|
||
\label{sec-1-3-3-3-4}
|
||
\begin{itemize}
|
||
\item To handle pathological cases in which there are no lookups for a particular ID range, each node refreshes any bucket to which it has not performed a node lookup in the past hour. Refreshing means picking a random ID in the bucket’s range and performing a node search for that ID
|
||
\end{itemize}
|
||
\item Joining network
|
||
\label{sec-1-3-3-3-5}
|
||
\begin{itemize}
|
||
\item To join the network, a node u must have a contact to an already participating node w. u inserts w into the appropriate k-bucket. u then performs a node lookup for its own node ID. Finally, u refreshes all k-buckets further away than its closest neighbor. During the refreshes, u both populates its own k-buckets and inserts itself into other nodes’ k-buckets as necessary.
|
||
\end{itemize}
|
||
\end{enumerate}
|
||
\item Routing Table
|
||
\label{sec-1-3-3-4}
|
||
\begin{itemize}
|
||
\item The routing table is a binary tree whose leaves are k-buckets.
|
||
\item each k-bucket covers some range of the ID space, and together the k-buckets cover the entire 160-bit ID space with no overlap.
|
||
\item When a node u learns of a new contact and this can be inserted into a bucket, this is done. Otherwise, if the k-bucket’s range includes u’s own node ID, then the bucket is split into two new buckets, the old contents divided between the two, and the insertion attempt repeated. This is what leads to one side of the binary tree being one large bucket, as it won't get split
|
||
\item If tree is highly unbalanced, issues may arise (what issues ??). To avoid these, buckets may split, regardless of the node's own ID residing in these.
|
||
\item nodes split k-buckets as required to ensure they have complete knowledge of a surrounding subtree with at least k nodes.
|
||
\end{itemize}
|
||
\item Efficient key re-publishing
|
||
\label{sec-1-3-3-5}
|
||
\begin{itemize}
|
||
\item Keys must be periodically republished as to avoid data disappearing from the network or that data is stuck on un-optimal nodes, as new nodes closer to the data might join the network.
|
||
\item To compensate for nodes leaving the network, Kademlia republishes each key-value pair once an hour.
|
||
\item As long as republication intervals are not exactly synchronized, only one node will republish a given key-value pair every hour.
|
||
\end{itemize}
|
||
\end{enumerate}
|
||
\subsubsection{Implementation Notes}
|
||
\label{sec-1-3-4}
|
||
\begin{enumerate}
|
||
\item Optimized contact accounting
|
||
\label{sec-1-3-4-1}
|
||
\begin{itemize}
|
||
\item To reduce traffic, Kademlia delays probing contacts until it has useful messages to send them. When a Kademlia node receives an RPC from an unknown contact and the k-bucket for that contact is already full with k entries, the node places the new contact in a replacement cache of nodes eligible to replace stale k-bucket entries.
|
||
\item When a contact fails to respond to 5 RPCs in a row, it is considered stale. If a k-bucket is not full or its replacement cache is empty, Kademlia merely flags stale contacts rather than remove them. This ensures, among other things, that if a node’s own network connection goes down teporarily, the node won’t completely void all of its k-buckets.
|
||
\item This is nice because Kademlia uses UDP.
|
||
\end{itemize}
|
||
\item Accelerated lookups
|
||
\label{sec-1-3-4-2}
|
||
\begin{itemize}
|
||
\item Another optimization in the implementation is to achieve fewer hops per lookup by increasing the routing table size. Conceptually, this is done by considering IDs b bits at a time instead of just one bit at a time
|
||
\item This also changes the way buckets are split.
|
||
\item This also changes the XOR-based routing apparently.
|
||
\end{itemize}
|
||
\end{enumerate}
|
||
\subsubsection{Summary}
|
||
\label{sec-1-3-5}
|
||
\begin{itemize}
|
||
\item With its novel XOR-based metric topology, Kademlia is the first peer-to-peer system to combine provable consistency and performance, latency-minimizingrouting, and a symmetric, unidirectional topology. Kademlia furthermore introduces a concurrency parameter, α, that lets people trade a constant factor in bandwidth for asynchronous lowest-latency hop selection and delay-free fault recovery. Finally, Kademlia is the first peer-to-peer system to exploit the fact that node failures are inversely related to uptime.
|
||
\end{itemize}
|
||
\subsection{Bouvin notes}
|
||
\label{sec-1-4}
|
||
\begin{itemize}
|
||
\item While first generation of structured P2P networks were largely application specific and had few guarantees, usually using worst case O(N) time, the second generation is based on structured network overlays. They are typically capable of guaranteeing O(log N) time and space and exact matches.
|
||
\item Much more scalable than unstructured P2P networks measured in number of hops for routing However, churn results in control traffic; slow peers can slowdown entire system (especially in Chord); weak peers may be overwhelmed by control traffic
|
||
\item The load is evenly distributed across the network, based on the uniformness of the ID space More powerful peers can choose to host several virtual peers
|
||
\item Most systems have various provisions for maintaining proper routing and defending against malicious peers
|
||
\item A backhoe is unlikely to take out a major part of the system – at least if we store at k closest nodes
|
||
\end{itemize}
|
||
|
||
\section{Mobile Ad-hoc Networks and Wireless Sensor Networks}
|
||
\label{sec-2}
|
||
\subsection{Routing in Mobile Ad-hoc Networks}
|
||
\label{sec-2-1}
|
||
\subsubsection{Introduction}
|
||
\label{sec-2-1-1}
|
||
\begin{itemize}
|
||
\item Routing is the process of passing some data, a message, along in a network.
|
||
\item The message originates from a source host, travels through intermediary hosts and ends up at a destination host.
|
||
\item Intermediary hosts are called routers
|
||
\item Usually a few questions has to be answered when a message is routed:
|
||
\begin{enumerate}
|
||
\item How do the hosts acting as routers know which way to send the message?
|
||
\item What should be done if multiple paths connect the sender and receiver?
|
||
\item Does an answer to the message have to follow the same path as the original message?
|
||
\end{enumerate}
|
||
\item Simple solution: Broadcast message, i.e. send it to every single person you know, every time.
|
||
\item Creates a lot of traffic, it's also known as flooding.
|
||
\item The responsibility of a routing protocol is to answer the three questions posed.
|
||
\end{itemize}
|
||
\subsubsection{Basic Routing Protocols}
|
||
\label{sec-2-1-2}
|
||
\begin{itemize}
|
||
\item A routing protocol must enable a path or route to be found, through the network
|
||
\item A network is usually modelled as a graph, inside the computers.
|
||
\item This allows for edges to be weighted. Can either be distance, traffic or some metric like that
|
||
\item There are two classes, Link State (LS) and Distance Vector (DS). Main difference being whether or not they use global information.
|
||
\item Algorithms using global information are "Link State", as all nodes need to maintain state information about all links in the network.
|
||
\item Distance Vector algorithms do not rely on global information.
|
||
\end{itemize}
|
||
\begin{enumerate}
|
||
\item Link State
|
||
\label{sec-2-1-2-1}
|
||
\begin{itemize}
|
||
\item All nodes and links with weights are known to all nodes.
|
||
\item This makes the problem a SSSP problem (single-source shortest path)
|
||
\end{itemize}
|
||
\begin{enumerate}
|
||
\item Djikstra
|
||
\label{sec-2-1-2-1-1}
|
||
\begin{itemize}
|
||
\item A set W is initialised, containing only the source node v.
|
||
\item In each iteration, the edge e with the lowest cost connecting W with a node u which isn't in W, is chosen and u is added to the set.
|
||
\item Algorithm loops n-1 times, and then the shortest path to all other nodes have been found.
|
||
\item Requires each router to have complete knowledge of the network.
|
||
\item Can be accomplished by broadcasting the identities and costs of all of the outgoing links to all other routers in the network. This has to be done, every time a weight or link changes.
|
||
\item Unrealistic for anything but very small networks.
|
||
\item Works great for small stable networks however.
|
||
\end{itemize}
|
||
\end{enumerate}
|
||
\item Distance Vector
|
||
\label{sec-2-1-2-2}
|
||
\begin{itemize}
|
||
\item No global knowledge is needed
|
||
\item The shortest distance to any given node is calculated in cooperation between the nodes.
|
||
\item Based on Bellman-Ford.
|
||
\item Apparently original Bellman-Ford requires global knowledge. This is a knock-off algorithm.
|
||
\end{itemize}
|
||
\begin{enumerate}
|
||
\item Bellman-Ford
|
||
\label{sec-2-1-2-2-1}
|
||
\begin{itemize}
|
||
\item Decentralised, no global information is needed
|
||
\item Requires the information of the neighbours and the link costs between them and the node
|
||
\item Each node stores a distance table
|
||
\item The distance table is just a mapping between a node name and the distance to it. It's over all known nodes, so not just neighbours.
|
||
\item When a new node is encountered, this is simply added
|
||
\item Node sends updates to its neighbours. This message states that the distance from the node v to node u has changed. As such, the neighbours can compute their distance to this node u as well and update their table.
|
||
\item This update may cause a chain of updates, as the neighbours might discover that this new distance is better than what they currently had.
|
||
\item The route calculation is bootstrapped by having all nodes broadcast their distances to their neighbours, when the network is created.
|
||
\item Algorithm
|
||
\begin{enumerate}
|
||
\item Distance table is initialised for node x
|
||
\item Node x sends initial updates to neighbours
|
||
\item The algorithm loops, waiting for updates or link cost changes of directly connected links (neighbours (?))
|
||
\item Whenever either event is received, the appropriate actions are taken, such as sending updates or changing values in the distance table
|
||
\end{enumerate}
|
||
\item Generates less traffic, since only neighbours are needed to be known.
|
||
\item Doesn't need global knowledge, general advantage in large networks or networks with high churn rate
|
||
\item Doesn't have to recompute the entire distance table whenever a single value changes, as Djikstras algorithm has to.
|
||
\item Suffers from the "Count-to-infinity" problem, which happens when a route pass twice through the same node and a link starts going towards infinity. If there is a network A - B - C - D. A dies. B sets the distance to infinity. When tables are shared, B sees that C knows a route to A of distance 2, as such it updates its distance to 3. (1 to C, 2 from C to A). C then has to update its distance to A to 4 and so it goes.
|
||
\item A way of avoiding this, is only sending information to the neighbours that are not exclusive links to the destination, so C shouldn't send any information to B about A, as B is the only way to A.
|
||
\end{itemize}
|
||
\end{enumerate}
|
||
\end{enumerate}
|
||
\subsubsection{MANET Routing}
|
||
\label{sec-2-1-3}
|
||
\begin{itemize}
|
||
\item Both Djikstra and Bellman-Ford were designed to operate in fairly stable networks.
|
||
\item MANETs are usually quite unstable, as possibly all nodes are mobile and may be moving during communication.
|
||
\item MANETs typically consist of resource-poor, energy constrained devices with limited bandwidth and high error rates.
|
||
\item Also has missing infrastructure and high mobility
|
||
\item According to Belding-Royer (who he??), the focus should be on the following properties
|
||
\begin{enumerate}
|
||
\item Minimal control overhead (due to limited energy and bandwidth)
|
||
\item Minimal processing overhead (Typically small processors)
|
||
\item Multihop routing capability (No infrastructure, nodes must act as routers)
|
||
\item Dynamic topology maintenance (High churn rates forces topology to be dynamic and be capable of easily adapting)
|
||
\item Loop prevention (Loops just take a lot of bandwidth)
|
||
\end{enumerate}
|
||
\item MANET routing protocols are typically either pro-active or re-active.
|
||
\end{itemize}
|
||
\begin{enumerate}
|
||
\item Proactive
|
||
\label{sec-2-1-3-0-1}
|
||
\begin{itemize}
|
||
\item Every node maintains a routing table of routes to all other nodes in the network
|
||
\item Routing tables are updated whenever a change occurs in the network
|
||
\item When a node needs to send a message to another node, it has a route to that node in its routing table
|
||
\item Two examples of proactive protocols
|
||
\begin{enumerate}
|
||
\item Destination-sequenced distance vector (DSDV)
|
||
\item Optimised link state routing (OSLR)
|
||
\end{enumerate}
|
||
\end{itemize}
|
||
\item Reactive
|
||
\label{sec-2-1-3-0-2}
|
||
\begin{itemize}
|
||
\item Called on-demand routing protocols
|
||
\item Does not maintain routing tables at all times
|
||
\item A route is discovered when it is needed, i.e. when the source has some data to send
|
||
\item Two examples of reactive protocols
|
||
\begin{enumerate}
|
||
\item Ad-hoc on demand distance vector (AODV)
|
||
\item Dynamic source routing (DSR)
|
||
\end{enumerate}
|
||
\end{itemize}
|
||
\item Combination of proactive and reactive
|
||
\label{sec-2-1-3-0-3}
|
||
\begin{itemize}
|
||
\item Zone Routing Protocol (ZRP)
|
||
\end{itemize}
|
||
\item Local connectivity management
|
||
\label{sec-2-1-3-1}
|
||
\begin{itemize}
|
||
\item MANET protocols have in common, that they need to have a mechanism that allows discovery of neighbours
|
||
\item Neighbours are nodes within broadcast range, i.e. they can be reached within one hop
|
||
\item Neighbours can be found by periodically broadcasting "Hello" messages. These won't be relayed. These messages contain information about the neighbours known by the sending node.
|
||
\item When a hello message from x is received by y, y can check, if y is in the neighbour list of x. If y is there, the link must be bi-directional. Otherwise, it's likely uni-directional.
|
||
\end{itemize}
|
||
\item Destination-Sequenced Distance Vector
|
||
\label{sec-2-1-3-2}
|
||
\begin{itemize}
|
||
\item Uses sequence numbers to avoid loops.
|
||
\item Has message overhead which grows as O(n²), when a change occurs in the network.
|
||
\end{itemize}
|
||
\begin{enumerate}
|
||
\item Using sequence numbers
|
||
\label{sec-2-1-3-2-1}
|
||
\begin{itemize}
|
||
\item Each node maintains a counter that represents its current sequence number. This counter starts at zero and is incremented by two whenever it is updated.
|
||
\item A sequence number set by a node itself, will always be even.
|
||
\item The number of a node is propagared through the network in the update messages that are sent to the neighbours.
|
||
\item Whenever an update message is sent, the sender increments its number and prefixes this to the message.
|
||
\item Whenever an update message is received, the receiver can get the number. This information is stored in the receiving nodes route table and is further propagated in the network in subsequent update messages sent, regarding routes to that destination.
|
||
\item Like this, the sequence number set by the destination node is stamped on every route to that node.
|
||
\item Update messages thus contain; a destination node, a cost of the route, a next-hop node and the latest known destination sequence number.
|
||
\item On receiving an update message, these rules apply:
|
||
\begin{enumerate}
|
||
\item If the sequence number of the updated route is higher than what is currently stored, the route table is updated.
|
||
\item If the numbers are the same, the route with the lowest cost is chosen.
|
||
\end{enumerate}
|
||
\item If a link break is noticed, the noticer sets the cost to be inf to that node and increments the gone node's sequence number by one and sends out an update.
|
||
\item Thus, the sequence number is odd, whenever a node discovers a link breakage.
|
||
\item Because of this, any further updates from the disappeared node, will automatically supersede this number
|
||
\item This makes DSDV loop free
|
||
\item Sequence numbers are changed in the following ways
|
||
\begin{enumerate}
|
||
\item When a line breaks. The number is changed by a neighbouring node. Link breakage can't form a loop
|
||
\item When a node sends an update message. The node changes its own sequence number and broadcasts this. This information is passed on from the neighbours.
|
||
\end{enumerate}
|
||
\item Thus, the closer you are, the more recent sequence number you know.
|
||
\item When picking routes, we trust the routers who knows the most recent sequence number, in addition to picking the shortest route.
|
||
\end{itemize}
|
||
\item Sending updates
|
||
\label{sec-2-1-3-2-2}
|
||
\begin{itemize}
|
||
\item Two types of updates; full and incremental
|
||
\item Full updates contain information about all routes known by the sender. These are sent infrequently.
|
||
\item Incremental updates contain only changed routes. These are sent regularly.
|
||
\item Decreases control bandwidth.
|
||
\item Full updates are sent in some relatively large interval
|
||
\item Incremental updates are sent frequently
|
||
\item Full updates are allowed to use multiple network protocol data units, NPDUs (??????), whereas incremental can only use one. Too many incremental to fit in a single -> send full instead
|
||
\item When an update to a route is received, different actions are taken, depending on the information:
|
||
\begin{enumerate}
|
||
\item If it's a new route, schedule for immediate update, send incremental update ASAP
|
||
\item If a route has improved, send in next incremental update
|
||
\item If sequence number has changed, but route hasn't, send in next incremental if space
|
||
\end{enumerate}
|
||
\end{itemize}
|
||
\item Issue
|
||
\label{sec-2-1-3-2-3}
|
||
\begin{itemize}
|
||
\item Suffers from routing fluctuations
|
||
\item A node could repeatedly switch between a couple of routes
|
||
\item Essentially, one route is slower, but for some reason the update comes from that first, while the other is quicker, but the number comes slower. You receive an update and update the route to be the slowest. Then you receive the slower update and have to to another update, as the new route is shorter.
|
||
\item Fixed by introduction delay. If the cost to a destination changes this information is scheduled for advertisement at a time depending on the average settling time for that destination.
|
||
\end{itemize}
|
||
\end{enumerate}
|
||
\item Optimised Link State Routing
|
||
\label{sec-2-1-3-3}
|
||
\begin{itemize}
|
||
\item Designed to be effective in an environment with a dense population of mobile devices, which communicate often.
|
||
\item Introduces multi point relay (MPR) sets. These are a subset of one-hop neighbours of a node, that is used for routing the messages of that node. These routers are called MPR selectors.
|
||
\end{itemize}
|
||
\begin{enumerate}
|
||
\item Multipoint relay set
|
||
\label{sec-2-1-3-3-1}
|
||
\begin{itemize}
|
||
\item Selected independently by each node as a subset of its neighbours.
|
||
\item Selected such that the set covers all nodes that are two hops away
|
||
\item Doesn't have to be optimal
|
||
\item Each node stores a list of both one-hop and two-hop neighbours. Collected from the hello messages which are broadcasted regardless. These should also contain neighbours. This means that all neighbours of the one-hop neighbours, must be the set of two-hop neighbours. We can then simply check if we know all.
|
||
\end{itemize}
|
||
\item Routing with MPR
|
||
\label{sec-2-1-3-3-2}
|
||
\begin{itemize}
|
||
\item A topology control (TC) message is required to create a routing table for the entire network
|
||
\item This is sent via the MPR and will eventually reach the entire network. It's not as much flooding as the standard LS algorithm.
|
||
\end{itemize}
|
||
\end{enumerate}
|
||
\item Ad-hoc On-Demand Distance Vector
|
||
\label{sec-2-1-3-4}
|
||
\begin{itemize}
|
||
\item Reactive
|
||
\item Routes are acquired when they are needed
|
||
\item Assumes symmetrical links
|
||
\item Uses sequence numbers to avoid loops
|
||
\end{itemize}
|
||
\begin{enumerate}
|
||
\item Path Discovery
|
||
\label{sec-2-1-3-4-1}
|
||
\begin{itemize}
|
||
\item When a node wishes to send something, a path discovery mechanism is triggered
|
||
\item If node x wishes to send something to node y, but it doesn't know a route to y, a route request (RREQ) message is send to x's neighbours. The RREG contains:
|
||
\begin{enumerate}
|
||
\item Source address
|
||
\item Source seq no
|
||
\item Broadcast id - A unique id of the current RREQ
|
||
\item Destination addr
|
||
\item Destionation seq no
|
||
\item Hop count - The number of hops so far, incremented when RREQ is forwarded
|
||
\end{enumerate}
|
||
\item (source addr, broadcast id) uniquely identifies a RREQ. This can be used to check if RREG has been seen before.
|
||
\item When RREQ is received, two actions can be taken
|
||
\begin{enumerate}
|
||
\item If a route to the destination is known and that path has a sequence number equal or greater than the destionation seq no in the RREQ, it responds to the RREQ by sending a RREP (route reply) back to the source.
|
||
\item If it doesn't have a recent route, it broadcasts the RREQ to neighbours with an increased hop count.
|
||
\end{enumerate}
|
||
\item When a RREQ is received, the address of the neighbour from whom this was received, is recorded. This allows for the generation of a reverse path, should the destination node be found.
|
||
\item RREP contains source, destination addr, destionation seq no, the total number of hops from source to dest and a lifetime value for the route.
|
||
\item If multiple RREPs are received by an intermediary node, only the first one is forwarded and the rest are forwarded if their destination sequence number is higher or they have a lower hop count, but the same dest seq no.
|
||
\item When the RREP is send back to the source, the intermediary nodes record which node they received the RREP from, to generate a forward path to route data along.
|
||
\end{itemize}
|
||
\item Evaluation
|
||
\label{sec-2-1-3-4-2}
|
||
\begin{itemize}
|
||
\item Tries to minimise control traffic flowing, by having nodes only maintain active routes.
|
||
\item Loops prevented with sequence numbers
|
||
\item No system wide broadcasts of entire routing tables
|
||
\item Every route is only maintained, as long as it's used. It has a timeout and is discarded, if this timeout is reached.
|
||
\item Path finding can be costly, as a long of RREG gets propagated through the network
|
||
\item Expanding ring algorithm can help control the amount of messages going out, but if the receiver isn't close, this can be even more costly than the standard way
|
||
\item Upon link failure; Upstream neighbour sends RREP with seq. no. +1 and hop count set to infinity to any active neighbours—that is neighbours that are using the route.
|
||
\end{itemize}
|
||
\end{enumerate}
|
||
\item Dynamic Source Routing
|
||
\label{sec-2-1-3-5}
|
||
\begin{itemize}
|
||
\item On-demand protocol
|
||
\item DSR is a source routing protocol. This is main difference between DSR and AODV
|
||
\item Source routing is a technique, where every message contains a header describing the entire path that the message must follow.
|
||
\item When a message is received, the node checks if it's the destination node, if not, it forwards the message to the next node in the path.
|
||
\item There is no need for intermediate nodes to keep any state about active routes, as was the case in the AODV protocol.
|
||
\item DSR doesn't assume symmetrical links and can use uni-directional links, i.e. one route can be used from A to B and then a different route from B to A.
|
||
\end{itemize}
|
||
\begin{enumerate}
|
||
\item Path Discovery
|
||
\label{sec-2-1-3-5-1}
|
||
\begin{itemize}
|
||
\item Discovery is similiar to AODV
|
||
\item RREQ contains the source and destination address and a request id.
|
||
\item Source address and request id defines the RREQ
|
||
\item When an intermediate node receives a RREQ it does a few things.
|
||
\begin{enumerate}
|
||
\item If it has no route to the dest, it appends itself to the list of nodes in the RREQ and then forwards it to its neighbours
|
||
\item If it does have a route to the dest, it appends this route to the list of nodes and sends a RREP back to the source, containing this route.
|
||
\end{enumerate}
|
||
\item This system uses the same amount of messages, as AODV, and finds the same routes.
|
||
\item When a node is ready to send RREP back to source, it can do one of 3 things:
|
||
\begin{enumerate}
|
||
\item If it already has a route to the source, it can send RREP back along this path
|
||
\item It can reverse the route in the RREP (i.e., the list the nodes append themselves to, when forwarding)
|
||
\item It can initiate a new RREQ to find a route to the source
|
||
\end{enumerate}
|
||
\item The second option assumes symmetrical links.
|
||
\item The third approach can cause a loop, as the source and the dest host can endlessly look for each other
|
||
\item Can be avoided by piggybacking the RREP on the second RREQ message. The receiver of this RREQ will be given a path to use when returning the reply.
|
||
\end{itemize}
|
||
\item Route cache
|
||
\label{sec-2-1-3-5-2}
|
||
\begin{itemize}
|
||
\item There is no route table
|
||
\item DSR use a route cache of currently known routes. The route cache of a node is in effect a tree rooted at the node
|
||
\item This tree can contain multiple routes to a single destination
|
||
\item This means it's most robust against broken links, as even though a link breaks, another can maybe be used
|
||
\item Might take up O(n²) space
|
||
\end{itemize}
|
||
\item Promiscuous mode operation
|
||
\label{sec-2-1-3-5-3}
|
||
\begin{itemize}
|
||
\item DSR takes advantage of the fact that wireless devices can overhear messages that aren't addressed to them.
|
||
\item Since messages tend to be broadcasted, other nodes within the range of the broadcast, can also read the message
|
||
\item Having nodes overhear messages that are not addressed to them, is called promiscuous mode operation.
|
||
\item It's not required for DSR to work, but it improves the protocol.
|
||
\item When two nodes on a path moves out of transmission range, some sort of acking mechanism must be used. This is usually done by using link-layer acks, but if such functionality isn't available, this must be done through other means.
|
||
\item A passive ack is when a host, after sending a message to the next hop host in a path, overhears that the receiving host is transmitting the message again. This can be taken as a sign, that the host has in fact received the message and is now in the process of forwarding it towards the next hop.
|
||
\item A host that overhears a message may add the route of the message to its route cache
|
||
\item It might also be an error message, then the route cache can be corrected.
|
||
\item Can also be used for route shortening, if A sends to B who sends to C, but C overhears the message to B, C can send an RREP to A and let A know the route can be shortened.
|
||
\end{itemize}
|
||
\item Evaluation
|
||
\label{sec-2-1-3-5-4}
|
||
\begin{itemize}
|
||
\item Like AODV, DSR only uses active routes, i.e. routes timeout
|
||
\item Control messages used are kept low by using same optimisations as AODV
|
||
\item Storage overhead is O(n) - Route cache and information about recently received RREQ
|
||
\item Loops are easily avoided in source routing, since nodes can just check if they're already a part of a path. If so, message is discarded.
|
||
\end{itemize}
|
||
\end{enumerate}
|
||
\item Zone Routing Protocol
|
||
\label{sec-2-1-3-6}
|
||
\begin{itemize}
|
||
\item Hybrid protocol
|
||
\item In ZRP, each node defines a zone consisting of all of it's n-hop neighbours, where n may be varied.
|
||
\item Within this zone, the node proactively maintains a routing table of routes to all other nodes in the zone. This is done using intrazone routing protocol, which is LS based.
|
||
\item These zones can be used, when sending to nodes within the zone
|
||
\item Outside the zone, a re-active interzone routing scheme is used.
|
||
\item This uses a concept called bordercasting.
|
||
\item The source node sends a route request (essentially an RREQ message) to all of the nodes on the border of its zone.
|
||
\item These border nodes check if they can reach the dest directly. If not, they propagate the message to their border nodes.
|
||
\end{itemize}
|
||
\begin{enumerate}
|
||
\item Evaluation
|
||
\label{sec-2-1-3-6-1}
|
||
\begin{itemize}
|
||
\item Less control traffic when doing route discovery, as messages are either sent to border nodes (skipping a lot of intermediary hops) or they're sent directly to someone within the zone.
|
||
\item More control messages within limited range of the zones though.
|
||
\item Storage complexity of O(n²) where n is the number of neighbours within the zone.
|
||
\item Since LS is used, the running time is O(m + n log n), where m is edges connecting the n nodes in the zone.
|
||
\item In dense scenarios, ZRP won't be feasible.
|
||
\end{itemize}
|
||
\end{enumerate}
|
||
\end{enumerate}
|
||
|
||
\subsection{Energy Efficient MANET Routing}
|
||
\label{sec-2-2}
|
||
\begin{itemize}
|
||
\item All mentioned protocols in chapter 2 try to minimise control traffic, which, albeit does save energy since transmitting fewer messages is nice, but this is done primarily to avoid wasting bandwidth.
|
||
\end{itemize}
|
||
\subsubsection{Introduction to energy efficient routing}
|
||
\label{sec-2-2-1}
|
||
\begin{itemize}
|
||
\item Two main approaches
|
||
\begin{enumerate}
|
||
\item Power-save
|
||
\item power-control
|
||
\end{enumerate}
|
||
\item Power-save is concerned with sleep states. In a power-save protocol the mobile nodes utilise that their network interfaces can enter into a sleep state where less energy is consumed.
|
||
\item Power-control utilises no sleep states. Instead the power used when transmitting data is varied; which also varies transmission range of nodes.
|
||
\item Power-control can save some energy, but the real energy saver is in power-save, as the real waste in most MANETs is idle time.
|
||
\item As such, power-save is the most important, but power-control can be used to complement it.
|
||
\item Goal of the energy efficiency is important to define:
|
||
\item One approach is to maximise overall lifetime of the entire network
|
||
\item Stronger nodes that have a longer battery life, may be asked to do a lot of the heavy lifting.
|
||
\item Another approach is to use minimum energy when routing, such that the route using the minimum amount of energy is taken.
|
||
\item The physical position of nodes can be important when making routing decisions.
|
||
\item Protocols tend to assume there is some positioning mechanism available, such as GPS.
|
||
\item This is not assumed here.
|
||
\item A third energy saving approach is load balancing. The protocol attempts to balance the load in such a way that it maximises overall lifetime. (This sounds a lot like having a few strong nodes do heavylifting)
|
||
\end{itemize}
|
||
\subsubsection{The power-control approach}
|
||
\label{sec-2-2-2}
|
||
\begin{itemize}
|
||
\item Power-control protocols cut down on energy consumption by controlling the transmission power of the wireless interfaces.
|
||
\item Turning down transmission power when sending to neighbours is nice. It consumes less energy for the sender, since the range is lowered, less nodes have to spend energy overhearing the message.
|
||
\item There is a non-linear relation between transmission range and energy used, thus, more hops might in fact yield less energy spent.
|
||
\item System called PARO uses this, as it allows more intermediary nodes, if this lowers the overall cost of the path.
|
||
\end{itemize}
|
||
\subsubsection{Power-save approach}
|
||
\label{sec-2-2-3}
|
||
\begin{itemize}
|
||
\item Protocols that use the power-save approach cut down on energy consumption by utilising the sleep states of the network interfaces
|
||
\item When a node sleeps, it can't participate in the network
|
||
\item This means these protocols have to either
|
||
\begin{enumerate}
|
||
\item use retransmissions of messages to make sure that a message is received
|
||
\item make sure that all of the nodes do not sleep at the same time, and thus delegate the work of routing data to the nodes that are awake.
|
||
\end{enumerate}
|
||
\item Power-save protocols define ways in which nodes can take turns sleeping and being awake, so that none, or at least a very small percentage of the messages sent in the network are lost, due to nodes being in the sleep state.
|
||
\item They are specifications of how it is possible to maximise the amount of time that nodes are sleeping, while still retaining the same connectivity and loss rates comparable to a network where no nodes are sleeping.
|
||
\item IEEE 802.11 ad hoc power saving mode, part of the IEEE standard, uses sleep states.
|
||
\item It uses the protocol on the link layer and is as such independent of which routing protocol is used on network layer.
|
||
\item BECA/AFECA uses retransmissions
|
||
\item Span specifices when nodes can sleep and delegates routing to the rest
|
||
\end{itemize}
|
||
\begin{enumerate}
|
||
\item IEEE
|
||
\label{sec-2-2-3-1}
|
||
\begin{itemize}
|
||
\item Beacon interval within which each node can take a number of actions
|
||
\item In the end of each beacon interval, the nodes compete for transmission of the next beacon, the one who first transmits, win.
|
||
\item In the beginning of ea h bea on interval all nodes must be awake.
|
||
\item It works in a few phases, where nodes can announce to receivers that they want to send stuff. After this phase, any node which wasn't contacted, can safely sleep.
|
||
\end{itemize}
|
||
\item BECA/AFECA
|
||
\label{sec-2-2-3-2}
|
||
\begin{itemize}
|
||
\item The difference between BECA and AFECA is that AFECA takes node density into consideration when determining the period of time that a node may sleep.
|
||
\item Both approaches are only power saving algorithms and not routing protocols. This means that they need to work together with some existing MANET routing protocol.
|
||
\item It makes sense to choose an on-demand routing protocol for this purpose, as pro-active would keep the nodes alive.
|
||
\end{itemize}
|
||
\begin{enumerate}
|
||
\item Basic Energy-Conserving algorithm (BECA)
|
||
\label{sec-2-2-3-2-1}
|
||
\begin{itemize}
|
||
\item Based on retransmissions
|
||
\item Consists of timing information that defines the periods that nodes spend in the different states defined by the algorithm, and a specification of how many retransmissions are needed.
|
||
\item BECA has three states
|
||
\begin{enumerate}
|
||
\item sleeping
|
||
\item listening
|
||
\item active
|
||
\end{enumerate}
|
||
\item Some rules to ensure no messages are lost
|
||
\begin{enumerate}
|
||
\item T$_{\text{listen}}$ = T$_{\text{retransmissions}}$
|
||
\item T$_{\text{sleep}}$ = k * T$_{\text{retransmissions}}$, for some k
|
||
\item Number$_{\text{of}}$$_{\text{retrans}}$ >= k + 1
|
||
\item T$_{\text{idle}}$ = T$_{\text{retransmissions}}$
|
||
\end{enumerate}
|
||
\item If A sends to B, but B sleeps, the message will be retrans R >= k + 1 times with interval T$_{\text{restrans}}$, until the message has been received.
|
||
\item Since T$_{\text{sleep}}$ is defined as k * T$_{\text{retrans}}$, at least one of the retrans will be received, even when B sleeps just before A transmits the message.
|
||
\item Incurs higher latency, worst case k * T$_{\text{retrans}}$ and on average (k * T$_{\text{retrans}}$) / 2. This latency is added for each hop.
|
||
\item Thus, to keep this low, k must be somewhat small, which counteracts the energy saving.
|
||
\item Thus, one needs to find a nice ratio.
|
||
\item Apparently k = 1 is nice.
|
||
\item A nice feature of BECA, which also applies to AFECA, is that in high traffic scenarios, where all nodes are on at all times, nodes are simply kept in the active state. In this way the power saving mechanism is disabled and the performance of the protocol is thus as good as the underlying protocol.
|
||
\end{itemize}
|
||
\item Adaptive Fidelity energy-conserving algorithm (AFECA)
|
||
\label{sec-2-2-3-2-2}
|
||
\begin{itemize}
|
||
\item Same power save model as BECA, except instead of T$_{\text{sleep}}$, it has T$_{\text{varia}}$$_{\text{sleep}}$
|
||
\item T$_{\text{vs}}$ is varied according to amount of neighbours surrounding a node.
|
||
\item This is estimated when in listening state, according to how many are overheard.
|
||
\item Nodes are removed from the estimation after they timeout at T$_{\text{gone}}$ time.
|
||
\item T$_{\text{vs}}$ is then defined as T$_{\text{vs}}$ = Random(1, amount$_{\text{of}}$$_{\text{neighbours}}$) * T$_{\text{sleep}}$
|
||
\item Sleep time of (N * T$_{\text{sleep}}$) / 2 on average
|
||
\item Favours nodes in dense areas, due to N, which is amount$_{\text{of}}$$_{\text{neighbours}}$.
|
||
\item When number$_{\text{of}}$$_{\text{retrans}}$ isn't changed, but the sleep time is, packets might be lost. A fix could be to make this variable as well.
|
||
\item Apparently doubles the overall lifetime, as network density rises.
|
||
\end{itemize}
|
||
\end{enumerate}
|
||
|
||
\item Span
|
||
\label{sec-2-2-3-3}
|
||
\end{enumerate}
|
||
|
||
\section{Accessing and Developing WoT}
|
||
\label{sec-3}
|
||
\subsection{Chapter 6}
|
||
\label{sec-3-1}
|
||
\subsubsection{REST STUFF}
|
||
\label{sec-3-1-1}
|
||
\begin{itemize}
|
||
\item The first layer is called access. This layer is aptly named Access because it covers the most fundamental piece of the WoT puzzle: how to connect a Thing to the web so that it can be accessed using standard web tools and libraries.
|
||
\item REST provides a set of architectural constraints that, when applied as a whole, empha- sizes scalability of component interactions, generality of interfaces, independent deploy- ment of components, and intermediary components to reduce interaction latency, enforce security, and encapsulate legacy systems.
|
||
\item In short, if the architecture of any distributed system follows the REST constraints, that system is said to be RESTful.
|
||
\item Maximises interoperability and scalability
|
||
\item Five constraints: Client/server, Uniform interfaces, Stateless, Cacheable, Layered system
|
||
\end{itemize}
|
||
\begin{enumerate}
|
||
\item Client/server
|
||
\label{sec-3-1-1-1}
|
||
\begin{itemize}
|
||
\item Maximises decoupling, as client doesn't need to know how the server works and vice versa
|
||
\item Such a separation of concerns between data, control logic, and presentation improves scalability and portability because loose coupling means each component can exist and evolve independently.
|
||
\end{itemize}
|
||
\item Uniform interfaces
|
||
\label{sec-3-1-1-2}
|
||
\begin{itemize}
|
||
\item Loose coupling between components can be achieved only when using a uniform interface that all components in the system respect.
|
||
\item This is also essential for the Web of Things because new, unknown devices can be added to and removed from the system at any time, and interacting with them will require min- imal effort.
|
||
\end{itemize}
|
||
\item Stateless
|
||
\label{sec-3-1-1-3}
|
||
\begin{itemize}
|
||
\item The client context and state should be kept only on the client, not on the server.
|
||
\item Each request to server should contain client state, visibility (monitoring and debugging of the server), robustness (recovering from network or application failures) and scalability are improved.
|
||
\end{itemize}
|
||
\item Cacheable
|
||
\label{sec-3-1-1-4}
|
||
\begin{itemize}
|
||
\item Caching is a key element in the performance (loading time) of the web today and therefore its usability.
|
||
\item Servers can define policies as when data expires and when updates must be reloaded from the server.
|
||
\end{itemize}
|
||
\item Layered
|
||
\label{sec-3-1-1-5}
|
||
\begin{itemize}
|
||
\item For example, in order to scale, you may make use of a proxy behaving like a load balancer. The sole purpose of the proxy would then be to forward incoming requests to the appropriate server instance.
|
||
\item Another layer may behave like a gateway, and translate HTTP requests to other protocols.
|
||
\item Similarly, there may be another layer in the architecture responsible for caching responses in order to minimize the work needed to be done by the server.
|
||
\end{itemize}
|
||
\item HATEOAS
|
||
\label{sec-3-1-1-6}
|
||
\begin{itemize}
|
||
\item Servers shouldn’t keep track of each client’s state because stateless applications are easier to scale. Instead, application state should be addressable via its own URL, and each resource should contain links and information about what operations are possible in each state and how to navigate across states. HATEOAS is particularly useful at the Find layer
|
||
\end{itemize}
|
||
\item Principles of the uniform interface of the web
|
||
\label{sec-3-1-1-7}
|
||
\begin{itemize}
|
||
\item Our point here is that what REST and HTTP have done for the web, they can also do for the Web of Things. As long as a Thing follows the same rules as the rest of the web—that is, shares this uniform interface—that Thing is truly part of the web. In the end, the goal of the Web of Things is this: make it possible for any physical object to be accessed via the same uniform interface as the rest of the web. This is exactly what the Access layer enables
|
||
\item Addressable resources—A resource is any concept or piece of data in an application that needs to be referenced or used. Every resource must have a unique identi- fier and should be addressable using a unique referencing mechanism. On the web, this is done by assigning every resource a unique URL.
|
||
\item Manipulation of resources through representations—Clients interact with services using multiple representations of their resources. Those representations include HTML, which is used for browsing and viewing content on the web, and JSON, which is better for machine-readable content.
|
||
\item Self-descriptive messages—Clients must use only the methods provided by the pro- tocol—GET, POST, PUT, DELETE, and HEAD among others—and stick to their meaning as closely as possible. Responses to those operations must use only well-known response codes—HTTP status codes, such as 200, 302, 404, and 500.
|
||
\item Hypermedia as the engine of the application state (HATEOAS)—Servers shouldn’t keep track of each client’s state because stateless applications are easier to scale. Instead, application state should be addressable via its own URL, and each resource should contain links and information about what operations are possi- ble in each state and how to navigate across states.
|
||
\end{itemize}
|
||
\begin{enumerate}
|
||
\item Principle \#1, adressable resources
|
||
\label{sec-3-1-1-7-1}
|
||
\begin{itemize}
|
||
\item REST is a resource-oriented architecture (ROA)
|
||
\item A resource is explicitly identified and can be individually addressed, by its URI
|
||
\item A URI is a sequence of characters that unambiguously identifies an abstract or physi- cal resource. There are many possible types of URIs, but the ones we care about here are those used by HTTP to both identify and locate on a network a resource on the web, which is called the URL (Uniform Resource Locator) for that resource.
|
||
\item An important and powerful consequence of this is the addressability and portability of resource identifiers: they become unique (internet- or intranet-wide)
|
||
\item Hierachical naming!
|
||
\end{itemize}
|
||
\item Principle \#2, manipulation of resources through representation
|
||
\label{sec-3-1-1-7-2}
|
||
\begin{itemize}
|
||
\item On the web, Multipurpose Internet Mail Extensions (MIME) types have been introduced as standards to describe various data for- mats transmitted over the internet, such as images, video, or audio. The MIME type for an image encoded as PNG is expressed with image/png and an MP3 audio file with audio/mp3. The Internet Assigned Numbers Authority (IANA) maintains the list of the all the official MIME media types.
|
||
\item The tangible instance of a resource is called a representation, which is a standard encoding of a resource using a MIME type.
|
||
\item HTTP defines a simple mechanism called content negotiation that allows clients to request a preferred data format they want to receive from a specific service. Using the Accept header, clients can specify the format of the representation they want to receive as a response. Likewise, servers specify the format of the data they return using the Content-Type header.
|
||
\item The Accept: header of an HTTP request can also contain not just one but a weighted list of media types the client understands
|
||
\item MessagePack can be used to pack JSON into a binary format, to make it lighter.
|
||
\item A common way of dealing with unofficial MIME types is to use the x- extension, so if you want your client to ask for MessagePack, use Content-Type: application/x-msgpack.
|
||
\end{itemize}
|
||
\item Principle \#3: self-descriptive messages
|
||
\label{sec-3-1-1-7-3}
|
||
\begin{itemize}
|
||
\item REST emphasizes a uniform interface between components to reduce coupling between operations and their implementation. This requires every resource to support a standard, common set of operations with clearly defined semantics and behavior.
|
||
\item The most commonly used among them are GET, POST, PUT, DELETE, and HEAD. Although it seems that you could do everything with just GET and POST, it’s important to correctly use all four verbs to avoid bad surprises in your applications or introducing security risks.
|
||
\item CRUD operations; create, read, update and delete
|
||
\item HEAD is a GET, but only returns the headers
|
||
\item POST should be used only to create a new instance of something that doesn’t have its own URL yet
|
||
\item PUT is usually modeled as an idempotent but unsafe update method. You should use PUT to update something that already exists and has its own URL, but not to create a new resource
|
||
\item Unlike POST, it’s idempotent because sending the same PUT message once or 10 times will have the same effect, whereas a POST would create 10 different resources.
|
||
\item A bunch of error codes as well: 200, 201, 202, 401, 404, 500, 501
|
||
\item CORS—ENABLING CLIENT-SIDE JAVASCRIPT TO ACCESS RESOURCES
|
||
\end{itemize}
|
||
\item CORS
|
||
\label{sec-3-1-1-7-4}
|
||
\begin{itemize}
|
||
\item Although accessing web resources from different origins located on various servers in any server-side application doesn’t pose any problem, JavaScript applications running in web browsers can’t easily access resources across origins for security reasons. What we mean by this is that a bit of client-side JavaScript code loaded from the domain apples.com won’t be allowed by the browser to retrieve particular representations of resources from the domain oranges.com using particular verbs.
|
||
\item This security mechanism is known as the same- origin policy and is there to ensure that a site can’t load any scripts from another domain. In particular, it ensures that a site can’t misuse cookies to use your credentials to log onto another site.
|
||
\item Fortunately for us, a new standard mechanism called cross-origin resource sharing (CORS)9 has been developed and is well supported by most modern browsers and web servers.
|
||
\end{itemize}
|
||
When a script in the browser wants to make a cross-site request, it needs to include an Origin header containing the origin domain. The server replies with an Access- Control-Allow-Origin header that contains the list of allowed origin domains (or * to allow all origin domains)
|
||
\begin{itemize}
|
||
\item When the browser receives the reply, it will check to see if the Access-Control- Allow-Origin corresponds to the origin, and if it does, it will allow the cross-site request.
|
||
\end{itemize}
|
||
For verbs other than GET/HEAD, or when using POST with representations other than application/x-www-form-urlencoded, multipart/form-data, or text/ plain, an additional request called preflight is needed. A preflight request is an HTTP request with the verb OPTIONS that’s used by a browser to ask the target server whether it’s safe to send the cross-origin request.
|
||
\item Principle \#4 : Hypermedia as the Engine of Application State
|
||
\label{sec-3-1-1-7-5}
|
||
\begin{itemize}
|
||
\item contains two subconcepts: hypermedia and application state.
|
||
\item This fourth principle is centered on the notion of hypermedia, the idea of using links as connections between related ideas.
|
||
\item Links have become highly popular thanks to web browsers yet are by no means limited to human use. For example, UUIDs used to identify RFID tags are also links.
|
||
\item Based on this representation of the device, you can easily follow these links to retrieve additional information about the subresources of the device
|
||
\item The application state—the AS in HATEOAS—refers to a step in a process or workflow, similar to a state machine, and REST requires the engine of application state to be hypermedia driven.
|
||
\item Each possible state of your device or application needs to be a RESTful resource with its own unique URL, where any client can retrieve a representation of the current state and also the possible transitions to other states. Resource state, such as the status of an LED, is kept on the server and each request is answered with a representation of the current state and with the necessary information on how to change the resource state, such as turn off the LED or open the garage door.
|
||
\item In other words, applications can be stateful as long as client state is not kept on the server and state changes within an application happen by following links, which meets the self-contained-messages constraint.
|
||
\item The OPTIONS verb can be used to retrieve the list of operations permitted by a resource, as well as metadata about invocations on this resource.
|
||
\end{itemize}
|
||
|
||
\item Five-step process
|
||
\label{sec-3-1-1-7-6}
|
||
\begin{itemize}
|
||
\item A RESTful architecture makes it possible to use HTTP as a universal protocol for web-connected devices. We described the process of web-enabling Things, which are summarized in the five main steps of the web Things design process:
|
||
\item Integration strategy—Choose a pattern to integrate Things to the internet and the web, either directly or through a proxy or gateway. This will be covered in chapter 7, so we’ll skip this step for now.
|
||
\item Resource design—Identify the functionality or services of a Thing and organize the hierarchy of these services. This is where we apply design rule \#1: address- able resources.
|
||
\item Representation design—Decide which representations will be served for each resource. The right representation will be selected by the clients, thanks to design rule \#2: content negotiation.
|
||
\item Interface design—Decide which commands are possible for each service, along with which error codes. Here we apply design rule \#3: self-descriptive messages.
|
||
\item Resource linking design—Decide how the different resources are linked to each other and especially how to expose those resources and links, along with the operations and parameters they can use. In this final step we use design rule \#4: Hypermedia as the Engine of Application State.
|
||
\end{itemize}
|
||
\end{enumerate}
|
||
|
||
\item Design rules
|
||
\label{sec-3-1-1-8}
|
||
\begin{enumerate}
|
||
\item \#2–CONTENT NEGOTIATION
|
||
\label{sec-3-1-1-8-1}
|
||
\begin{itemize}
|
||
\item Web Things must support JSON as their default representation.
|
||
\item Web Things support UTF8 encoding for requests and responses
|
||
\item Web Things may offer an HTML interface/representation (UI).
|
||
\end{itemize}
|
||
\item \#3 : Self-descriptive messages
|
||
\label{sec-3-1-1-8-2}
|
||
\begin{itemize}
|
||
\item Web Things must support the GET, POST, PUT, and DELETE HTTP verbs.
|
||
\item Web Things must implement HTTP status codes 20x, 40x, 50x.
|
||
\item Web Things must support a GET on their root URL.
|
||
\item Web Things should support CORS
|
||
\end{itemize}
|
||
\item \#4 : HATEOAS
|
||
\label{sec-3-1-1-8-3}
|
||
\begin{itemize}
|
||
\item Web Things should support browsability with links.
|
||
\item Web Things may support OPTIONS for each of its resources.
|
||
\end{itemize}
|
||
\end{enumerate}
|
||
\end{enumerate}
|
||
|
||
\subsubsection{EVENT STUFF}
|
||
\label{sec-3-1-2}
|
||
\begin{enumerate}
|
||
\item Events and stuff
|
||
\label{sec-3-1-2-1}
|
||
\begin{itemize}
|
||
\item Unfortunately, the request-response model is insufficient for a number of IoT use cases. More precisely, it doesn’t match event-driven use cases where events must be communicated (pushed) to the clients as they happen.
|
||
\item A client-initiated model isn’t practical for applications where notifications need to be sent asynchronously by a device to clients as soon as they’re produced.
|
||
\item polling is one way of circumventing the problem, however it's inefficient, as the client will need to make many requests which will simply return the same response. Additionally, we might not "poll" at the exact time an event takes place.
|
||
\item Most of the requests will end up with empty responses (304 Not Modified) or with the same response as long as the value observed remains unchanged.
|
||
\end{itemize}
|
||
\item Publish/subscribe
|
||
\label{sec-3-1-2-2}
|
||
\begin{itemize}
|
||
\item What’s really needed on top of the request-response pattern is a model called publish/subscribe (pub/sub) that allows further decoupling between data consumers (subscribers) and producers (publishers). Publishers send messages to a central server, called a broker, that handles the routing and distribution of the messages to the various subscribers, depending on the type or content of messages.
|
||
\item A publisher can send notifications into a topic, which subscribers can have subscribed to
|
||
\end{itemize}
|
||
\item Webhooks
|
||
\label{sec-3-1-2-3}
|
||
\begin{itemize}
|
||
\item The simplest way to implement a publish-subscribe system over HTTP without break- ing the REST model is to treat every entity as both a client and a server. This way, both web Things and web applications can act as HTTP clients by initiating requests to other servers, and they can host a server that can respond to other requests at the same time. This pattern is called webhooks or HTTP callbacks and has become popular on the web for enabling different servers to talk to each other.
|
||
\item The implementation of this model is fairly simple. All we need is to implement a REST API on both the Thing and on the client, which then becomes a server as well. This means that when the Thing has an update, it POSTs it via HTTP to the client
|
||
\item Webhooks are a conceptually simple way to implement bidirectional communication between clients and servers by turning everything into a server.
|
||
\item webhooks have one big drawback: because they need the subscriber to have an HTTP server to push the notification, this works only when the subscriber has a publicly accessible URL or IP address.
|
||
\end{itemize}
|
||
\item Comet
|
||
\label{sec-3-1-2-4}
|
||
\begin{itemize}
|
||
\item Comet is an umbrella term that refers to a range of techniques for circumventing the limitations of HTTP polling and webhooks by introducing event-based communication over HTTP.
|
||
\item This model enables web servers to push data back to the browser without the client requesting it explicitly. Since browsers were initially not designed with server-sent events in mind, web application developers have exploited several specification loop- holes to implement Comet-like behavior, each with different benefits and drawbacks.
|
||
\item Among them is a technique called long polling
|
||
\item With long poll- ing, a client sends a standard HTTP request to the server, but instead of receiving the response right away, the server holds the request until an event is received from the sensor, which is then injected into the response returned to the client’s request that was held idle. As soon as the client receives the response, it immediately sends a new request for an update, which will be held until the next update comes from the sensor, and so on.
|
||
\end{itemize}
|
||
\item Websockets
|
||
\label{sec-3-1-2-5}
|
||
\begin{itemize}
|
||
\item WebSocket is part of the HTML5 specification. The increasing support for HTML5 in most recent web and mobile web browsers means WebSocket is becoming ubiquitously available to all web apps
|
||
\item WebSockets enables a full-duplex communication channel over a single TCP connection. In plain English, this means that it creates a permanent link between the client and the server that both the client and the server can use to send messages to each other. Unlike techniques we’ve seen before, such as Comet, WebSocket is standard and opens a TCP socket. This means it doesn’t need to encapsulate custom, non-web content in HTTP messages or keep the connection artificially alive as is needed with Comet implementations.
|
||
\item A websockets starts out with a handshake: The first step is to send an HTTP call to the server with a special header asking for the protocol to be upgraded to WebSockets. If the web server sup- ports WebSockets, it will reply with a 101 Switch- ing Protocols status code, acknowledging the opening of a full-duplex TCP socket.
|
||
\item Once the initial handshake takes place, the client and the server will be able to send messages back and forth over the open TCP connection; these messages are not HTTP messages but WebSockets data frames
|
||
\item The overhead of each WebSockets data frame is 2 bytes, which is small compared to the 871-byte overhead of an HTTP message meta- data (headers and the like)
|
||
\item the hierarchical structure of Things and their resources as URLs can be reused as-is for WebSockets.
|
||
\item we can subscribe to events for a Thing’s resource by using its corre- sponding URL and asking for a protocol upgrade to WebSockets. Moreover, Web- Sockets do not dictate the format of messages that are sent back and forth. This means we can happily use JSON and give messages the structure and semantics we want.
|
||
\item Moreover, because WebSockets consist of an initial handshake followed by basic message framing layered over TCP, they can be directly implemented on many plat- forms supporting TCP/IP—not just web browsers. They can also be used to wrap sev- eral other internet-compatible protocols to make them web-compatible. One example is MQTT, a well-known pub/sub protocol for the IoT that can be inte- grated to the web of browsers via WebSockets
|
||
\item The drawback, however, is that keeping a TCP connection permanently open can lead to an increase in battery consumption and is harder to scale than HTTP on the server side.
|
||
\end{itemize}
|
||
\item HTTP/2
|
||
\label{sec-3-1-2-6}
|
||
\begin{itemize}
|
||
\item This new version of HTTP allows multiplexing responses—that is, sending responses in parallel, This fixes the head-of-line blocking problem of HTTP/1.x where only one request can be outstanding on a TCP/IP connection at a time.
|
||
\item HTTP/2 also introduces compressed headers using an efficient and low-memory compression format.
|
||
\item Finally, HTTP/2 introduces the notion of server push. Concretely, this means that the server can provide content to clients without having to wait for them to send a request. In the long run, widespread adoption of server push over HTTP/2 might even remove the need for an additional protocol for push like WebSocket or webhooks.
|
||
\end{itemize}
|
||
\end{enumerate}
|
||
\subsubsection{SUMMARY}
|
||
\label{sec-3-1-3}
|
||
\begin{itemize}
|
||
\item When applied correctly, the REST architecture is an excellent substrate on which to create large-scale and flexible distributed systems.
|
||
\item REST APIs are interesting and easily applicable to enable access to data and ser- vices of physical objects and other devices.
|
||
\item Various mechanisms, such as content negotiation and caching of Hypermedia as the Engine of Application State (HATEOAS), can help in creating great APIs for Things.
|
||
\item A five-step design process (integration strategy, resource design, representation design, interface design, and resource linking) allows anyone to create a mean- ingful REST API for Things based on industry best practices.
|
||
\item The latest developments in the real-time web, such as WebSockets, allow creat- ing highly scalable, distributed, and heterogeneous real-time data processing applications. Devices that speak directly to the web can easily use web-based push messaging to stream their sensor data efficiently.
|
||
\item HTTP/2 will bring a number of interesting optimizations for Things, such as multiplexing and compression.
|
||
\end{itemize}
|
||
\subsection{Chapter 7}
|
||
\label{sec-3-2}
|
||
\subsubsection{Connecting to the web}
|
||
\label{sec-3-2-1}
|
||
\begin{enumerate}
|
||
\item Direct Integration
|
||
\label{sec-3-2-1-1}
|
||
\begin{itemize}
|
||
\item The most straightforward integration pattern is the direct integration pattern. It can be used for devices that support HTTP and TCP/IP and can therefore expose a web API directly. This pattern is particularly useful when a device can directly connect to the internet; for example, it uses Wi-Fi or Ethernet
|
||
\end{itemize}
|
||
\item Gateway Integration
|
||
\label{sec-3-2-1-2}
|
||
\begin{itemize}
|
||
\item Second, we explore the gateway integra- tion pattern, where resource-constrained devices can use non-web protocols to talk to a more powerful device (the gateway), which then exposes a REST API for those non-web devices. This pattern is particularly useful for devices that can’t connect directly to the internet; for example, they support only Bluetooth or ZigBee or they have limited resources and can’t serve HTTP requests directly.
|
||
\end{itemize}
|
||
\item Cloud Integration
|
||
\label{sec-3-2-1-3}
|
||
\begin{itemize}
|
||
\item Third, the cloud integration pattern allows a powerful and scalable web platform to act as a gateway. This is useful for any device that can connect to a cloud server over the internet, regardless of whether it uses HTTP or not, and that needs more capability than it would be able to offer alone.
|
||
\end{itemize}
|
||
\end{enumerate}
|
||
\subsubsection{Five step process}
|
||
\label{sec-3-2-2}
|
||
\begin{enumerate}
|
||
\item Integration strategy—Choose a pattern to integrate Things to the internet and the web. The patterns are presented in this chapter.
|
||
\item Resource design—Identify the functionality or services of a Thing, and organize the hierarchy of these services.
|
||
\item Representation design—Decide which representations will be served for each resource.
|
||
\item Interface design—Decide which commands are possible for each service, along with which error codes.
|
||
\item Resource linking design—Decide how the different resources are linked to each other.
|
||
\end{enumerate}
|
||
\begin{enumerate}
|
||
\item Direct integration
|
||
\label{sec-3-2-2-1}
|
||
\begin{itemize}
|
||
\item the direct integration pattern is the perfect choice when the device isn’t battery powered and when direct access from clients such as mobile web apps is required.
|
||
\item the resource design. You first need to consider the physical resources on your device and map them into REST resources.
|
||
\item The next step of the design process is the representation design. REST is agnostic of a par- ticular format or representation of the data. We mentioned that JSON is a must to guarantee interoperability, but it isn’t the only interesting data representation available.
|
||
\item a modular way based on the middleware pattern.
|
||
\item In essence, a middleware can execute code that changes the request or response objects and can then decide to respond to the client or call the next middleware in the stack using the next() function.
|
||
\item The core of this implementation is using the Object.observe() function.9 This allows you to asynchronously observe the changes happening to an object by registering a callback to be invoked whenever a change in the observed object is detected.
|
||
\end{itemize}
|
||
\item Gateway integration pattern
|
||
\label{sec-3-2-2-2}
|
||
\begin{itemize}
|
||
\item Gateway integration pattern. In this case, the web Thing can’t directly offer a web API because the device might not support HTTP directly. An application gateway is working as a proxy for the Thing by offering a web API in the Thing’s name. This API could be hosted on the router in the case of Bluetooth or on another device that exposes the web Thing API; for example, via CoAP.
|
||
\item The direct integration pattern worked well because your Pi was not battery powered, had access to a decent bandwidth (Wi-Fi/Ethernet), and had more than enough RAM and storage for Node. But not all devices are so lucky. Native sup- port for HTTP/WS or even TCP/IP isn’t always possible or even desirable. For batterypowered devices, Wi-Fi or Ethernet is often too much of a power drag, so they need to rely on low-power protocols such as ZigBee or Bluetooth instead. Does it mean those devices can’t be part of the Web of Things? Certainly not.
|
||
\item Such devices can also be part of the Web of Things as long as there’ s an intermedi- ary somewhere that can expose the device’s functionality through a WoT API like the one we described previously. These intermediaries are called application gateways (we’ll call them WoT gateways hereafter), and they can talk to Things using any non-web application protocols and then translate those into a clean REST WoT API that any HTTP client can use.
|
||
\item They can add a layer of security or authentication, aggregate and store data temporarily, expose semantic descriptions for Things that don’t have any, and so on.
|
||
\item CoAP is a service layer protocol that is intended for use in resource-constrained internet devices, such as wireless sensor network nodes. CoAP is designed to easily translate to HTTP for simplified integration with the web
|
||
\item CoAP is an interesting protocol based on REST, but because it isn’t HTTP and uses UDP instead of TCP, a gateway that translates CoAP messages from/to HTTP is needed
|
||
\item It’s therefore ideal for device-to-device communi- cation over low-power radio communication, but you can’t talk to a CoAP device from a JavaScript application in your browser without installing a special plugin or browser extension. Let’s fix this by using your Pi as a WoT gateway to CoAP devices.
|
||
\item By proxying, the gateway essentially just send a request to the CoAP device whenever the gateway receives a request and it'll return the value to the requester, once it receives a value from the CoAP device.
|
||
\end{itemize}
|
||
\begin{enumerate}
|
||
\item Summary
|
||
\label{sec-3-2-2-2-1}
|
||
\begin{itemize}
|
||
\item For some devices, it might not make sense to support HTTP or WebSockets directly, or it might not even be possible, such as when they have very limited resources like memory or processing, when they can’t connect to the internet directly (such as your Bluetooth activity tracker), or when they’re battery-powered. Those devices will use more optimized communication or application protocols and thus will need to rely on a more powerful gateway that connects them to the Web of Things, such as your mobile phone to upload the data from your Bluetooth bracelet, by bridging/translat- ing various protocols. Here we implemented a simple gateway from scratch using Express, but you could also use other open source alternatives such as OpenHab13 or The Thing System.
|
||
\end{itemize}
|
||
\end{enumerate}
|
||
\item Cloud Integration pattern
|
||
\label{sec-3-2-2-3}
|
||
\begin{itemize}
|
||
\item Cloud integration pattern. In this pattern, the Thing can’t directly offer a Web API. But a cloud service acts as a powerful application gateway, offering many more features in the name of the Thing. In this particular example, the web Thing connects via MQTT to a cloud service, which exposes the web Thing API via HTTP and the WebSockets API. Cloud services can also offer many additional features such as unlimited data storage, user management, data visualization, stream processing, support for many concurrent requests, and more.
|
||
\item Using a cloud server has several advantages. First, because it doesn’t have the physical constraints of devices and gateways, it’s much more scalable and can process and store a virtually unlimited amount of data. This also allows a cloud platform to support many protocols at the same time, handle protocol translation efficiently, and act as a scalable intermediary that can support many more concurrent clients than an IoT device could.
|
||
\item Second, those platforms can have many features that might take consid- erable time to build from scratch, from industry-grade security, to specialized analytics capabilities, to flexible data visualization tools and user and access management
|
||
\item Third, because those platforms are natively connected to the web, data and services from your devices can be easily integrated into third-party systems to extend your devices.
|
||
\end{itemize}
|
||
\end{enumerate}
|
||
\subsubsection{Summary}
|
||
\label{sec-3-2-3}
|
||
\begin{itemize}
|
||
\item There are three main integration patterns for connecting Things to the web: direct, gateway, and cloud.
|
||
\item Regardless of the pattern you choose, you’ll have to work through the following steps: resource design, representation design, and interface design.
|
||
\item Direct integration allows local access to the web API of a Thing. You tried this by building an API for your Pi using the Express Node framework.
|
||
\item The resource design step in Express was implemented using routes, each route representing the path to the resources of your Pi.
|
||
\item We used the idea of middleware to implement support for different representa- tions— for example, JSON, MessagePack, and HTML—in the representation design step.
|
||
\item The interface design step was implemented using HTTP verbs on routes as well as by integrating a WebSockets server using the ws Node module.
|
||
\item Gateway integration allows integrating Things without web APIs (or not sup- porting web or even internet protocols) to the WoT by providing an API for them. You tried this by integrating a CoAP device via a gateway on your cloud.
|
||
\item Cloud integration uses servers on the web to act as shadows or proxies for Things. They augment the API of Things with such features as scalability, analy- tics, and security. You tried this by using the EVRYTHNG cloud.
|
||
\end{itemize}
|
||
\section{Discovery and Security for the Web of Things}
|
||
\label{sec-4}
|
||
\subsection{Chapter 8}
|
||
\label{sec-4-1}
|
||
\begin{itemize}
|
||
\item Having a single and common data model that all web Things can share would further increase interoperability and ease of integration by making it possible for applications and services to interact without the need to tailor the application manually for each specific device.
|
||
\item The ability to easily discover and understand any entity of the Web of Things—what it is and what it does—is called findability.
|
||
\item How to achieve such a level of interoperability—making web Things findable—is the purpose of the second layer
|
||
\item The goal of the Find layer is to offer a uniform data model that all web Things can use to expose their metadata using only web standards and best practices.
|
||
\item Metadata means the description of a web Thing, including the URL, name, current location, and status, and of the services it offers, such as sensors, actuators, com- mands, and properties
|
||
\item this is useful for discovering web Things as they get con- nected to a local network or to the web. Second, it allows applications, services, and other web Things to search for and find new devices without installing a driver for that Thing
|
||
\end{itemize}
|
||
\subsubsection{Findability problem}
|
||
\label{sec-4-1-1}
|
||
\begin{itemize}
|
||
\item For a Thing to be interacted with using HTTP and WebSocket requests, there are three fundamental problems
|
||
\begin{enumerate}
|
||
\item How do we know where to send the requests, such as root URL/resources of a web Thing?
|
||
\item How do we know what requests to send and how; for example, verbs and the format of payloads?
|
||
\item How do we know the meaning of requests we send and responses we get, that is, semantics?
|
||
\end{enumerate}
|
||
\item The bootstrap problem. This problem is concerned with how the ini- tial link between two entities on the Web of Things can be established.
|
||
\item Lets assume the Thing can be found, how is it interacted with, if it exposes a UI at the root of its URL? In this case, a clean and user- centric web interface can solve problem 3 because humans would be able to read and understand how to do this.
|
||
\item Problem 2 also would be taken care of by the web page, which would hardcode which request to send to which endpoint.
|
||
\item But what if the heater has no user interface, only a RESTful API?1 Because Lena is an experienced front-end developer and never watches TV, she decides to build a sim- ple JavaScript app to control the heater. Now she faces the second problem: even though she knows the URL of the heater, how can she find out the structure of the heater API? What resources (endpoints) are available? Which verbs can she send to which resource? How can she specify the temperature she wants to set? How does she know if those parameters need to be in Celsius or Fahrenheit degrees?
|
||
\end{itemize}
|
||
\subsubsection{Discovering Things}
|
||
\label{sec-4-1-2}
|
||
\begin{itemize}
|
||
\item The bootstrap problem deals with two scopes:
|
||
\begin{enumerate}
|
||
\item first, how to find web Things that are physically nearby—for example, within the same local network
|
||
\item second, how to find web Things that are not in the same local network—for example, find devices over the web.
|
||
\end{enumerate}
|
||
\end{itemize}
|
||
\begin{enumerate}
|
||
\item Network discovery
|
||
\label{sec-4-1-2-1}
|
||
\begin{itemize}
|
||
\item In a computer network, the ability to automatically discover new participants is common.
|
||
\item In your LAN at home, as soon as a device connects to the network, it automatically gets an IP address using DHCP
|
||
\item Once the device has an IP address, it can then broadcast data packets that can be caught by other machines on the same network.
|
||
\item a broadcast or multicast of a message means that this message isn’t sent to a particular IP address but rather to a group of addresses (multicast) or to everyone (broadcast), which is done over UDP.
|
||
\item This announcement process is called a network discovery protocol, and it allows devices and applications to find each other in local networks. This process is commonly used by various discovery protocols such as multicast Domain Name System (mDNS), Digital Living Network Alliance (DLNA), and Universal Plug and Play (UPnP).
|
||
\item Most internet-connected TVs and media players can use DLNA to discover network-attached storage (NAS)
|
||
\item your laptop can find and configure printers on your network with minimal effort thanks to network-level discovery protocols such as Apple Bonjour that are built into iOS and OSX.
|
||
\end{itemize}
|
||
\begin{enumerate}
|
||
\item mDNS
|
||
\label{sec-4-1-2-1-1}
|
||
\begin{itemize}
|
||
\item In mDNS, clients can discover new devices on a network by listening for mDNS mes- sages such as the one in the following listing. The client populates the local DNS tables as messages come in, so, once discovered, the new service—here a web page of a printer—can be used via its local IP address or via a URI usually ending with the .local domain. In this example, it would be \url{http://evt-bw-brother.local}.
|
||
\item The limitation of mDNS, and of most network-level discovery protocols, is that the network-level information can’t be directly accessed from the web.
|
||
\end{itemize}
|
||
\item Network discovery on the web
|
||
\label{sec-4-1-2-1-2}
|
||
\begin{itemize}
|
||
\item Because HTTP is an Application layer protocol, it doesn’t know a thing about what’s underneath—the network protocols used to shuffle HTTP requests around.
|
||
\item The real question here is why the configu- ration and status of a router is only available through a web page for humans and not accessible via a REST API. Put simply, why don’t all routers also offer a secure API where its configuration can be seen and changed by others’ devices and applications in your network?
|
||
\item Providing such an API is easy to do. For example, you can install an open-source operating system for routers such as OpenWrt and modify the software to expose the IP addresses assigned by the DHCP server of the router as a JSON document.
|
||
\item This way, you use the existing HTTP server of your router to create an API that exposes the IP addresses of all the devices in your network. This makes sense because almost all net- worked devices today, from printers to routers, already come with a web user inter- face. Other devices and applications can then retrieve the list of IP addresses in the network via a simple HTTP call (step 2 in figure 8.3) and then retrieve the metadata of each device in the network by using their IP address (step 3 of figure 8.3).
|
||
\end{itemize}
|
||
\item Resource discovery on the web
|
||
\label{sec-4-1-2-1-3}
|
||
\begin{itemize}
|
||
\item Although network discovery does the job locally, it doesn’t propagate beyond the boundaries of local networks.
|
||
\item how do we find new Things when they connect, how do we understand the services they offer, and can we search for the right Things and their data in composite applications?
|
||
\item On the web, new resources (pages) are discovered through hyperlinks. Search engines periodically parse all the pages in their database to find outgoing links to other pages. As soon as a link to a page not yet indexed is found, that new page is parsed and added to directory. This process is known as web crawling.
|
||
\end{itemize}
|
||
\item Crawling
|
||
\label{sec-4-1-2-1-4}
|
||
\begin{itemize}
|
||
\item From the root HTML page of the web Thing, the crawler can find the sub-resources, such as sensors and actuators, by discovering outgoing links and can then create a resource tree of the web Thing and all its resources. The crawler then uses the HTTP OPTIONS method to retrieve all verbs supported for each resource of the web Thing. Finally, the crawler uses content negotiation to understand which format is available for each resource.
|
||
\end{itemize}
|
||
\item HATEOAS and web linking
|
||
\label{sec-4-1-2-1-5}
|
||
\begin{itemize}
|
||
\item The simple way of crawling, of basically looping through links found is a good start, but it also has several limitations. First, all links are treated equally because there’s no notion of the nature of a link; the link to the user interface and the link to the actuator resource look the same—they’re just URLs.
|
||
\item Additionally, it requires the web Thing to offer an HTML interface, which might be too heavy for resource-constrained devices. Finally, it also means that a client needs to both understand HTML and JSON to work with our web Things.
|
||
\item A better solution for discovering the resources of any REST API is to use the HATEOAS principle to describe relationships between the various resources of a web Thing.
|
||
\item A simple method to implement HATEOAS with REST APIs is to use the mechanism of web linking defined in RFC 5988. The idea is that the response to any HTTP request to a resource always contains a set of links to related resources—for example, the previous, next, or last page that contains the results of a search. These would be contained in the LINK header.
|
||
\item encoding the links as HTTP headers introduces a more general framework to define relationships between resources outside the representation of the resource—directly at the HTTP level.
|
||
\item When doing an HTTP GET on any Web Thing, the response should include a Link header that contains links to related resources. In particular, you should be able to get information about the device, its resources (API endpoints), and the documentation of the API using only Link headers.
|
||
\item The URL of each resource is contained between angle brackets (<URL>) and the type of the link is denoted by rel="X", where X is the type of the rela- tion.
|
||
\end{itemize}
|
||
\item New HATEOAS rel link things
|
||
\label{sec-4-1-2-1-6}
|
||
\begin{itemize}
|
||
\item REL="MODEL" : This is a link to a Web Thing Model resource; see section 8.3.1.
|
||
\item REL="TYPE" : This is a link to a resource that contains additional metadata about this web Thing.
|
||
\item REL="HELP" : This relationship type is a link to the documentation, which means that a GET to devices.webofthings.io/help would return the documentation for the API in a human-friendly (HTML) or machine-readable (JSON) format.
|
||
\item REL="UI" : This relationship type is a link to a graphical user interface (GUI) for interacting with the web Thing.
|
||
\end{itemize}
|
||
\end{enumerate}
|
||
\end{enumerate}
|
||
\subsubsection{Describing web Things}
|
||
\label{sec-4-1-3}
|
||
\begin{itemize}
|
||
\item knowing only the root URL is insufficient to interact with the Web Thing API because we still need to solve the sec- ond problem mentioned at the beginning of this chapter: how can an application know which payloads to send to which resources of a web Thing?
|
||
\item how can we formally describe the API offered by any web Thing?
|
||
\item The simplest solution is to provide a written documentation for the API of your web Thing so that developers can use it (1 and 2 in figure 8.4).
|
||
\item This approach, however, is insufficient to automatically find new devices, understand what they are, and what services they offer.
|
||
\item In addition, manual implementation of the payloads is more error-prone because the developer needs to ensure that all the requests they send are valid
|
||
\item By using a unique data model to define formally the API of any web Thing (the Web Thing Model), we’ll have a powerful basis to describe not only the metadata but also the operations of any web Thing in a standard way (cases 3 and 4 of figure 8.4).
|
||
\item This is the cornerstone of the Web of Things: creating a model to describe physical Things with the right balance between expressiveness—how flexible the model is—and usability— how easy it is to describe any web Thing with that model.
|
||
\end{itemize}
|
||
\begin{enumerate}
|
||
\item Introducing the Web Thing model
|
||
\label{sec-4-1-3-1}
|
||
\begin{itemize}
|
||
\item Once we find a web Thing and understand its API structure, we still need a method to describe what that device is and does. In other words, we need a conceptual model of a web Thing that can describe the resources of a web Thing using a set of well-known concepts.
|
||
\item In the previous chapters, we showed how to organize the resources of a web Thing using the /sensors and /actuators end points. But this works only for devices that actually have sensors and actuators, not for complex objects and scenarios that are com- mon in the real world that can’t be mapped to actuators or sensors. To achieve this, the core model of the Web of Things must be easily applicable for any entity in the real world, ranging from packages in a truck, to collectible card games, to orange juice bot- tles. This section provides exactly such a model, which is called the Web Thing Model.
|
||
\end{itemize}
|
||
\begin{enumerate}
|
||
\item Entities
|
||
\label{sec-4-1-3-1-1}
|
||
\begin{itemize}
|
||
\item the Web of Things is composed of web Things.
|
||
\item A web Thing is a digital representation of a physical object—a Thing—accessible on the web. Think of it like this: your Facebook profile is a digital representation of yourself, so a web Thing is the “Facebook profile” of a physical object.
|
||
\item The web Thing is a web resource that can be hosted directly on the device, if it can connect to the web, or on an intermediate in the network such as a gateway or a cloud service that bridges non-web devices to the web.
|
||
\item All web Things should have the following resources:
|
||
\begin{enumerate}
|
||
\item Model—A web Thing always has a set of metadata that defines various aspects about it such as its name, description, or configurations.
|
||
\item Properties—A property is a variable of a web Thing. Properties represent the internal state of a web Thing. Clients can subscribe to properties to receive a notification message when specific conditions are met; for example, the value of one or more properties changed.
|
||
\item Actions—An action is a function offered by a web Thing. Clients can invoke a function on a web Thing by sending an action to the web Thing. Examples of actions are “open” or “close” for a garage door, “enable” or “disable” for a smoke alarm, and “scan” or “check in” for a bottle of soda or a place. The direc- tion of an action is from the client to the web Thing.
|
||
\item Things—A web Thing can be a gateway to other devices that don’t have an inter- net connection. This resource contains all the web Things that are proxied by this web Thing. This is mainly used by clouds or gateways because they can proxy other devices.
|
||
\end{enumerate}
|
||
\end{itemize}
|
||
\begin{enumerate}
|
||
\item Metadata
|
||
\label{sec-4-1-3-1-1-1}
|
||
\begin{itemize}
|
||
\item In the Web Thing Model, all web Things must have some associated metadata to describe what they are. This is a set of basic fields about a web Thing, including its identifiers, name, description, and tags, and also the set of resources it has, such as the actions and properties. A GET on the root URL of any web Thing always returns the metadata using this format, which is JSON by default
|
||
\end{itemize}
|
||
\item Properties
|
||
\label{sec-4-1-3-1-1-2}
|
||
\begin{itemize}
|
||
\item Web Things can also have properties. A property is a collection of data values that relate to some aspect of the web Thing. Typically, you’d use properties to model any dynamic time series of data that a web Thing exposes, such as the current and past states of the web Thing or its sensor values—for example, the temperature or humid- ity sensor readings.
|
||
\end{itemize}
|
||
\item Actions
|
||
\label{sec-4-1-3-1-1-3}
|
||
\begin{itemize}
|
||
\item Actions are another important type of resources of a web Thing because they represent the various commands that can be sent to that web Thing.
|
||
\item In theory, you could also use properties to change the status of a web Thing, but this can be a prob- lem when both an application and the web Thing itself want to edit the same property.
|
||
\item The actions object of the Web Thing Model has an object called resources, which contains all the types of actions (commands) supported by this web Thing.
|
||
\item Actions are sent to a web Thing with a POST to the URL of the action \{WT\}/actions/\{id\}, where id is the ID of the action
|
||
\end{itemize}
|
||
\item Things
|
||
\label{sec-4-1-3-1-1-4}
|
||
\begin{itemize}
|
||
\item a web Thing can act as a gateway between the web and devices that aren’t connected to the internet. In this case, the gateway can expose the resources—properties, actions, and metadata—of those non-web Things using the web Thing.
|
||
\item The web Thing then acts as an Application-layer gateway for those non-web Things as it converts incoming HTTP requests for the devices into the various protocols or interfaces they support natively. For example, if your WoT Pi has a Bluetooth dongle, it can find and bridge Bluetooth devices nearby and expose them as web Things.
|
||
\item The resource that contains all the web Things proxied by a web Thing gateway is \{WT\}/things, and performing a GET on that resource will return the list of all web Things currently available
|
||
\end{itemize}
|
||
\end{enumerate}
|
||
\end{enumerate}
|
||
\item The WoT pie model
|
||
\label{sec-4-1-3-2}
|
||
\begin{itemize}
|
||
\item A new tree structure, fitting the discussed model, where the different sensors end up in /properties, setLedState ends up in /actions, we have no /things and /model is the metadata as well as all sensor data, their properties, the actions, everything.
|
||
\item Following the model allows for dynamically creating routes and such, as all information is maintained in the model of the Thing, /model, /properties, /actions, /things.
|
||
\end{itemize}
|
||
\item Summary
|
||
\label{sec-4-1-3-3}
|
||
\begin{itemize}
|
||
\item In this section, we introduced the Web Thing Model, a simple JSON-based data model for a web Thing and its resources. We also showed how to implement this model using Node.js and run it on a Raspberry Pi. We showed that this model is quite easy to understand and use, and yet is sufficiently flexible to represent all sorts of devices and products using a set of properties and actions. The goal is to propose a uniform way to describe web Things and their capabilities so that any HTTP client can find web Things and interact with them. This is sufficient for most use cases, and this model has all you need to be able to generate user interfaces for web Things automatically.
|
||
\end{itemize}
|
||
\end{enumerate}
|
||
\subsubsection{The Semantic Web of Things (Ontologies)}
|
||
\label{sec-4-1-4}
|
||
\begin{itemize}
|
||
\item In an ideal world, search engines and any other applications on the web could also understand the Web Thing Model. Given the root URL of a web Thing, any applica- tion could retrieve its JSON model and understand what the web Thing is and how to interact with it.
|
||
\item The question now is how to expose the Web Thing Model using an existing web standard so that the resources are described in a way that means some- thing to other clients. The answer lies in the notion of the Semantic Web and, more precisely, the notion of linked data that we introduce in this section.
|
||
\item Semantic Web refers to an extension of the web that promotes common data formats to facilitate meaningful data exchange between machines. Thanks to a set of stan- dards defined by the World Wide Web Consortium (W3C), web pages can offer a stan- dardized way to express relationships among them so that machines can understand the meaning and content of those pages. In other words, the Semantic Web makes it easier to find, share, reuse, and process information from any content on the web thanks to a common and extensible data description and interchange format.
|
||
\end{itemize}
|
||
\begin{enumerate}
|
||
\item Linked Data and RDFa
|
||
\label{sec-4-1-4-1}
|
||
\begin{itemize}
|
||
\item The HTML specification alone doesn’t define a shared vocabulary that allows you to describe in a standard and non-ambiguous manner the elements on a page and what they relate to.
|
||
\end{itemize}
|
||
\begin{enumerate}
|
||
\item Linked Data
|
||
\label{sec-4-1-4-1-1}
|
||
\begin{itemize}
|
||
\item Enter the vision of linked data, which is a set of best practices for publishing and connecting structured data on the web, so that web resources can be interlinked in a way that allows computers to automatically understand the type and data of each resource.
|
||
\item This vision has been strongly driven by complex and heavy standards and tools centered on the Resource Description Framework (RDF)
|
||
\item Although powerful and expressive, RDF would be overkill for most simple scenarios, and this is why a simpler method to structure con- tent on the web is desirable.
|
||
\item RDFa emerged as a lighter version of RDF that can be embedded into HTML code
|
||
\item Most search engines can use these annotations to generate better search listings and make it easier to find your websites.
|
||
\item using RDFa to describe the metadata of a web Thing will make that web Thing findable and search- able by Google.
|
||
\end{itemize}
|
||
\item RFDa
|
||
\label{sec-4-1-4-1-2}
|
||
\begin{itemize}
|
||
\item vocab defines the vocabulary used for that element, in this case the Web of Things Model vocabulary defined previously.
|
||
\item property defines the various fields of the model such as name, ID, or descrip- tion.
|
||
\item typeof defines the type of those elements in relation to the vocabulary of the element.
|
||
\item This allows other applications to parse the HTML representation of the device and automatically understand which resources are available and how they work.
|
||
\end{itemize}
|
||
\item JSON-LD
|
||
\label{sec-4-1-4-1-3}
|
||
\begin{itemize}
|
||
\item JSON-LD is an interesting and lightweight semantic annotation format for linked data that, unlike RDFa and Microdata, is based on JSON.29 It’s a simple way to semanti- cally augment JSON documents by adding context information and hyperlinks for describing the semantics of the different elements of a JSON objects.
|
||
\end{itemize}
|
||
\item Micro-summary
|
||
\label{sec-4-1-4-1-4}
|
||
\begin{itemize}
|
||
\item This simple example already illustrates the essence of JSON-LD it gives a context to the content of a JSON document. As a consequence, all clients that understand the \url{http://schema.org/Product} context will be able to automatically process this informa- tion in a meaningful way. This is the case with search engines, for example. Google and Yahoo! process JSON-LD payloads using the Product schema to render special search results; as soon as it gets indexed, our Pi will be known by Google and Yahoo! as a Raspberry Pi product. This means that the more semantic data we add to our Pi, the more findable it will become. As an example, try adding a location to your Pi using the Place schema,33 and it will eventually become findable by location.
|
||
\end{itemize}
|
||
We could also use this approach to create more specific schemas on top of the Web Thing Model; for instance, an agreed-upon schema for the data and functions a wash- ing machine or smart lock offers. This would facilitate discovery and enable automatic integration with more and more web clients.
|
||
\end{enumerate}
|
||
\end{enumerate}
|
||
\subsubsection{Summary}
|
||
\label{sec-4-1-5}
|
||
\begin{itemize}
|
||
\item The ability to find nearby devices and services is essential in the Web of Things and is known as the bootstrap problem. Several protocols can help in discover- ing the root URL of Things, such as mDNS/Bonjour, QR codes or NFC tags.
|
||
\item The last step of the web Things design process, resource linking design (also known as HATEOAS in REST terms), can be implemented using the web linking mechanism in HTTP headers.
|
||
\item Beyond finding the root URL and sub-resources, client applications also need a mechanism to discover and understand what data or services a web Thing offers.
|
||
\item The services of Things can be modeled as properties (variables), actions (func- tions), and links. The Web Thing Model offers a simple, flexible, fully web-com- patible, and extensible data model to describe the details of any web Thing. This model is simple to adapt for your devices and easy to use for your products and applications.
|
||
\item The Web Thing Model can be extended with more specific semantic descriptions such as those based on JSON-LD and available from the Schema.org repository.
|
||
\end{itemize}
|
||
\subsection{Chapter 9}
|
||
\label{sec-4-2}
|
||
\begin{itemize}
|
||
\item In most cases, Internet of Things deployments involve a group of devices that com- municate with each other or with various applications within closed networks— rarely over open networks such as the internet. It would be fair to call such deploy- ments the “intranets of Things” because they’re essentially isolated, private net- works that only a few entities can access. But the real power of the Web of Things lies in opening up these lonely silos and facilitating interconnection between devices and applications at a large scale.
|
||
\item when it comes to public data such as data.gov initiatives, real-time traffic/weather/pollution conditions in a city, or a group of sensors deployed in a jungle or a volcano, it would be great to ensure that the general public or researchers anywhere in the world could access that data. This would enable anyone to create new innovative applications with it and possibly gener- ate substantial economic, environmental, and social value.
|
||
\item How to share this data in secure and flexible way is what Layer 3 provides,
|
||
\item The Share layer of the Web of Things. This layer focuses on how devices and their resources must be secured so that they can only be accessed by authorized users and applications.
|
||
\item First, we’ll show how Layer 3 of the WoT architecture covers the security of Things: how to ensure that only authorized parties can access a given resource. Then we’ll show how to use existing trusted systems to allow sharing physical resources via the web.
|
||
\end{itemize}
|
||
\subsubsection{Securing Things}
|
||
\label{sec-4-2-1}
|
||
\begin{itemize}
|
||
\item Ultimately, every security breach hurts the entire web because it erodes the overall trust of users in technology.
|
||
\item Security in the Web of Things is even more critical than in the web. Because web Things are physical objects that will be deployed everywhere in the real world, the risks associated with IoT attacks can be catastrophic.
|
||
\item Digitally augmented devices allow collecting fine-grained information about people, when they took their last insulin shot, their last jog and where they ran. It can also be used to remote control cars, houses and the like.
|
||
\item the majority of IoT solutions don’t comply with even the most basic security best practices; think clear-text passwords and communications, invalid certificates, old software versions with exploitable bugs, and so on.
|
||
\end{itemize}
|
||
\begin{enumerate}
|
||
\item Securing the IoT has three major problems
|
||
\label{sec-4-2-1-1}
|
||
\begin{itemize}
|
||
\item First, we must consider how to encrypt the communications between two enti- ties (for example, between an app and a web Thing) so that a malicious inter- ceptor—a “man in the middle”—can’t access the data being transmitted in clear text. This is referred to as securing the channel
|
||
\item Second, we must find a way to ensure that when a client talks to a host, it can ensure that the host is really “himself”
|
||
\item Third, we must ensure that the correct access control is in place. We need to set up a method to control which user can access what resource of what server or Thing and when and then to ensure that the user is really who they claim to be.
|
||
\end{itemize}
|
||
\item Encryption 101
|
||
\label{sec-4-2-1-2}
|
||
\begin{itemize}
|
||
\item encryption is an essential ingredient for any secure system.
|
||
\item Without encryption, any attempt to secure a Thing will be in vain because attackers can sniff the communication and understand the security mechanisms that were put in place.
|
||
\end{itemize}
|
||
\begin{enumerate}
|
||
\item Symmetric Encryption
|
||
\label{sec-4-2-1-2-1}
|
||
\begin{itemize}
|
||
\item The oldest form of encoding a message is symmetric encryption. The idea is that the sender and receiver share a secret key that can be used to both encode and decode a message in a specific way
|
||
\end{itemize}
|
||
\item Assymetric Encryption
|
||
\label{sec-4-2-1-2-2}
|
||
\begin{itemize}
|
||
\item another method called asymmetric encryption has become popular because it doesn’t require a secret to be shared between parties. This method uses two related keys, one public and the other private (secret)
|
||
\end{itemize}
|
||
\end{enumerate}
|
||
\item Web Security with TLS: The S of HTTPS
|
||
\label{sec-4-2-1-3}
|
||
\begin{itemize}
|
||
\item Fortunately , there are standard protocols for securely encrypting data between clients and servers on the web.
|
||
\item The best known protocol for this is Secure Sockets Layer (SSL)
|
||
\item SSL 3.0 has a lot of vulnerabilities (Heartbleed and the like). These events inked the death of this proto- col, which was replaced by the much more secure but conceptually similar Transport Layer Security (TLS)
|
||
\end{itemize}
|
||
\begin{enumerate}
|
||
\item TLS 101
|
||
\label{sec-4-2-1-3-1}
|
||
\begin{itemize}
|
||
\item Despite its name, TLS is an Application layer protocol (see chapter 5). TLS not only secures HTTP (HTTPS) communication but is also the basis of secure WebSocket (WSS) and secure MQTT (MQTTS)
|
||
\item First, it helps the client ensure that the server is who it says it is; this is the SSL/TLS authentication. Second, it guarantees that the data sent over the communication channel can’t be read by any- one other than the client and the server involved in the transaction (also known as SSL/TLS encryption).
|
||
\item The client, such as a mobile app, tells the server, such as a web Thing, which protocols and encryption algorithms it supports. This is somewhat similar to the content negotiation process we described in chapter 6.
|
||
\item The server sends the public part of its certificate to the client. The goal here is for the client to make sure it knows who the server is. All web clients have a list of certificates they trust.12 In the case of your Pi, you can find them in /etc/ssl/certs. SSL certificates form a trust chain, meaning that if a client doesn’t trust certificate S1 that the server sends back, but it trusts certificate S2 that was used to sign S1, the web client can accept S1 as well.
|
||
\item The rest of the process generates a key from the public certificates. This key is then used to encrypt the data going back and forth between the server and the client in a secure manner. Because this process is dynamic, only the client and the server know how to decrypt the data they exchange during this session. This means the data is now securely encrypted: if an attacker manages to capture data packets, they will remain meaningless.
|
||
\end{itemize}
|
||
\item Beyond Self-signed certificates
|
||
\label{sec-4-2-1-3-2}
|
||
\begin{itemize}
|
||
\item Clearly, having to deal with all these security exceptions isn’t nice, but these excep- tions exist for a reason: to warn clients that part of the security usually covered by SSL/ TLS can’t be guaranteed with the certificate you generated. Basically, although the encryption of messages will work with a self-signed certificate (the one you created with the previous command), the authenticity of the server (the Pi) can’t be guaran- teed. In consequence, the chain of trust is broken—problem 2
|
||
\item In an IoT context, this means that attackers could pretend to be the Thing you think you’re talk- ing to.
|
||
\item The common way to generate certificates that guarantee the authenticity of the server is to get them from a well-known and trusted certificate authority (CA). There exists an amount of these; LetsEncrypt, Symantec and GeoTrust.
|
||
\end{itemize}
|
||
\end{enumerate}
|
||
\end{enumerate}
|
||
\subsubsection{Authentication and access control}
|
||
\label{sec-4-2-2}
|
||
\begin{itemize}
|
||
\item Once we encrypt the communication between Things and clients as shown in the pre- vious section, we want to enable only some applications to access it.
|
||
\item First, this means that the Things—or a gateway to which Things are connected—need to be able to know the sender of each request (identification).
|
||
\item Second, devices need to trust that the sender really is who they claim to be (authentication)
|
||
\item Third, the devices also need to know if they should accept or reject each request depending on the identity of this sender and which request has been sent (authorization).
|
||
\end{itemize}
|
||
\begin{enumerate}
|
||
\item Access control with REST and API tokens
|
||
\label{sec-4-2-2-1}
|
||
\begin{itemize}
|
||
\item Server-based authentication is used when we use our username/password to log into a website, we initiate a secure session with the server that's stored for a limited time in the server application's memory or in a local browser cookie.
|
||
\item server-based authentication is usually stateful because the state of the client is stored on the server. But as you saw in chapter 6, HTTP is a stateless protocol; therefore, using a server-based authentication method goes against this principle and poses certain problems. First, the performance and scalability of the overall systems are limited because each session must be stored in memory and over- head increases when there are many authenticated users. Second, this authentication method poses certain security risks—for example, cross-site request forgery.
|
||
\item alternative method called token-based authentication has become popular and is used by most web APIs.
|
||
\item Because this token is added to the headers or query parameters of each HTTP request sent to the server, all interactions remain stateless.
|
||
\item API tokens shouldn’t be valid forever. API tokens, just like passwords, should change regularly.
|
||
\end{itemize}
|
||
\item OAuth: a web authorization framework
|
||
\label{sec-4-2-2-2}
|
||
\begin{itemize}
|
||
\item OAuth is an open standard for authorization and is essentially a mechanism for a web or mobile app to delegate the authentication of a user to a third-party trusted service; for example, Facebook, LinkedIn, or Google.
|
||
\item OAuth dynamically generates access tokens using only web protocols.
|
||
\item OUath allows sharing resources and token sharing between applications.
|
||
\item In short, OAuth standardizes how to authenticate users, generate tokens with an expiration date, regenerate tokens, and provide access to resources in a secure and standard manner over the web.
|
||
\item At the end of the token exchange process, the application will know who the user is and will be able to access resources on the resource server on behalf of the user. The application can then also renew the token before it expires using an optional refresh token or by running the authorization process again.
|
||
\item OAuth delegated authentication and access flow. The application asks the user if they want to give it access to resources on a third-party trusted service (resource server). If the user accepts, an authorization grant code is generated. This code can be exchanged for an access token with the authorization server. To make sure the authorization server knows the application, the application has to send an app ID and app secret along with the authorization grant code. The access token can then be used to access protected resources within a certain scope from the resource server.
|
||
\item Implementing an OAuth server on a Linux-based embedded device such as the Pi or the Intel Edison isn’t hard because the protocol isn’t really heavy. But maintaining the list of all applications, users, and their access scope on each Thing is clearly not going to work and scale for the IoT.
|
||
\end{itemize}
|
||
\begin{enumerate}
|
||
\item OAuth Roles
|
||
\label{sec-4-2-2-2-1}
|
||
\begin{itemize}
|
||
\item A typical OAuth scenario involves four roles
|
||
\begin{enumerate}
|
||
\item A resource owner—This is the user who wants to authorize an application to access one of their trusted accounts; for example, your Facebook account.
|
||
\item The resource server—Is the server providing access to the resources the user wants to share? In essence, this is a web API accepting OAuth tokens as credentials.
|
||
\item The authorization server—This is the OAuth server managing authorizations to
|
||
\end{enumerate}
|
||
\end{itemize}
|
||
access the resources. It’s a web server offering an OAuth API to authenticate and authorize users. In some cases, the resource server and the authorization server can be the same, such as in the case of Facebook.
|
||
\begin{enumerate}
|
||
\item The application—This is the web or mobile application that wants to access the resources of the user. To keep the trust chain, the application has to be known by the authorization server in advance and has to authenticate itself using a secret token, which is an API key known only by the authorization server and the application.
|
||
\end{enumerate}
|
||
\end{enumerate}
|
||
\end{enumerate}
|
||
\subsubsection{The Social Web of Things}
|
||
\label{sec-4-2-3}
|
||
\begin{itemize}
|
||
\item Using OAuth to manage access control to Things is tempting, but not if each Thing has to maintain its own list of users and application. This is where the gateway integration pattern can be used.
|
||
\item use the notion of delegated authentication offered by OAuth, which allows you to use the accounts you already have with OAuth providers you trust, such as Facebook, Twitter, or LinkedIn.
|
||
\item The Social Web of Things is usually what covers the sharing of access to devices via existing social network relationships.
|
||
\end{itemize}
|
||
\begin{enumerate}
|
||
\item A Social Web of Things authentication proxy
|
||
\label{sec-4-2-3-1}
|
||
\begin{itemize}
|
||
\item The idea of the Social Web of Things is to create an authentication proxy that controls access to all Things it proxies by identifying users of client applications using trusted third-party services.
|
||
\item Again, we have four actors: a Thing, a user using a client application, an authenti- cation proxy, and a social network (or any other service with an OAuth server). The client app can use the authentication proxy and the social network to access resources on the Thing. This concept can be implemented in three phases:
|
||
\begin{enumerate}
|
||
\item The first phase is the Thing proxy trust. The goal here is to ensure that the proxy can access resources on the Thing securely. If the Thing is protected by an API token (device token), it could be as simple as storing this token on the proxy. If the Thing is also an OAuth server, this step follows an OAuth authentication flow, as shown in figure 9.6. Regardless of the method used to authenticate, after this phase the auth proxy has a secret that lets it access the resources of the Thing.
|
||
\item The second phase is the delegated authentication step. Here, the user in the client app authenticates via an OAuth authorization server as in figure 9.6. The authentication proxy uses the access token returned by the authorization server to identify the user of the client app and checks to see if the user is authorized to access the Thing. If so, the proxy returns the access token or generates a new one to the client app.
|
||
\item The last phase is the proxied access step. Once the client app has a token, it can use it to access the resources of the Thing through the authentication proxy. If the token is valid, the authentication proxy will forward the request to the Thing using the secret (device token) it got in phase 1 and send the response back to the client app.
|
||
\end{enumerate}
|
||
\item All communication is encrypted using TLS
|
||
\item Social Web of Things authentication proxy: the auth proxy first establishes a secret with the Thing over a secure channel. Then, a client app requests access to a resource via the auth proxy. It authenticates itself via an OAuth server (here Facebook) and gets back an access token. This token is then used to access resources on the Thing via the auth proxy. For instance, the /temp resource is requested by the client app and given access via the auth proxy forwarding the request to the Thing and relaying the response to the client app.
|
||
\end{itemize}
|
||
\item Leveraging Social Networks
|
||
\label{sec-4-2-3-2}
|
||
\begin{itemize}
|
||
\item This is the very idea of the Social Web of Things: instead of creating abstract access control lists, we can reuse existing social structures as a basis for sharing our Things. Because social networks increasingly reflect our social relationships, we can reuse that knowledge to share access to our Things with friends via Facebook, or work colleagues via LinkedIn.
|
||
\end{itemize}
|
||
\item Implementing Access Control Lists
|
||
\label{sec-4-2-3-3}
|
||
\begin{itemize}
|
||
\item In essence, you need to create an access control list (ACL). There are various ways to implement ACLs, such as by storing them in the local database.
|
||
\end{itemize}
|
||
\item Proxying Resources of Things
|
||
\label{sec-4-2-3-4}
|
||
\begin{itemize}
|
||
\item Finally, you need to implement the actual proxying: once a request is deemed valid by the middleware, you need to contact the Thing that serves this resource and proxy the results back to the client.
|
||
\end{itemize}
|
||
\end{enumerate}
|
||
\subsubsection{Beyond book}
|
||
\label{sec-4-2-4}
|
||
\begin{itemize}
|
||
\item But just as HTTP might be too heavy for resource-limited devices, security pro- tocols such as TLS and their underlying cypher suites are too heavy for the most resource-constrained devices. This is why lighter-weight versions of TLS are being developed, such as DTLS,26 which is similar to TLS but runs on top of UDP instead of TCP and also has a smaller memory footprint
|
||
\item device democracy.27 In this model, devices become more autonomous and favor peer-to-peer interactions over centralized cloud services. Security is ensured using a blockchain mechanism: similar to the way bitcoin transactions are validated by a number of independent parties in the bitcoin network, devices could all participate in making the IoT secure.
|
||
\end{itemize}
|
||
\subsubsection{Summary}
|
||
\label{sec-4-2-5}
|
||
\begin{itemize}
|
||
\item You must cover four basic principles to secure IoT systems: encrypted commu- nication, server authentication, client authentication, and access control.
|
||
\item Encrypted communication ensures attackers can’t read the content of mes- sages. It uses encryption mechanisms based on symmetric or asymmetric keys.
|
||
\item You should use TLS to encrypt messages on the web. TLS is based on asymmetric keys: a public key and a private server key.
|
||
\item Server authentication ensures attackers can’t pretend to be the server. On the web, this is achieved by using SSL (TLS) certificates. The delivery of these certif- icates is controlled through a chain of trust where only trusted parties called certificate authorities can deliver certificates to identify web servers.
|
||
\item Instead of buying certificates from a trusted third party, you can create self- signed TLS certificates on a Raspberry Pi. The drawback is that web browsers will flag the communication as unsecure because they don’t have the CA certifi- cate in their trust store.
|
||
\item You can achieve client authentication using simple API tokens. Tokens should rotate on a regular basis and should be generated using crypto secure random algorithms so that their sequence can’t be guessed.
|
||
\item The OAuth protocol can be used to generate API tokens in a dynamic, standard, and secure manner and is supported by many embedded Linux devices such as the Raspberry Pi.
|
||
\item The delegated authentication mechanism of OAuth relies on other OAuth pro- viders to authenticate users and create API tokens. As an example, a user of a Thing can be identified using Facebook via OAuth.
|
||
\item You can implement access control for Things to reflect your social contacts by creating an authentication proxy using OAuth for clients’ authentication and contacts from social networks.
|
||
\end{itemize}
|
||
% Emacs 25.2.2 (Org mode 8.2.10)
|
||
\end{document} |