diff --git a/notes.org b/notes.org index 21f1d07..c70fdb0 100644 --- a/notes.org +++ b/notes.org @@ -1550,3 +1550,422 @@ access the resources. It’s a web server offering an OAuth API to authenticate - the network is surprisingly robust against quite large failures. - The power-law distribution gives small-world networks a high degree of fault tolerance6 because random failures are most likely to eliminate nodes from the poorly connected majority. - A small-world network falls apart much more quickly, however, if the well-connected nodes are targeted first. +** Tarzan: A P2P Anonymizing Network Layer +*** Abstract +- Tarzan is a peer-to-peer anonymous IP network overlay. Because it provides IP service, Tarzan is general-purpose and transparent to applications. Organized as a decentralized peer-to-peer overlay, Tarzan is fault-tolerant, highly scalable, and easy to manage. +- Tarzan achieves its anonymity with layered encryption and multihop routing, much like a Chaumian mix. A message initiator chooses a path of peers pseudo-randomly through a restricted topology in a way that adversaries cannot easily influence. Cover traffic prevents a global observer from using traffic analysis to identify an initiator. Protocols toward unbiased peer-selection offer new directions for distributing trust among untrusted entities. Tarzan provides anonymity to either clients or servers, without requiring that both participate. In both cases, Tarzan uses a network address translator (NAT) to bridge between Tarzan hosts and oblivious Internet hosts. Measurements show that Tarzan imposes minimal overhead over a corresponding non-anonymous overlay route. +*** Introduction +- The ultimate goal of Internet anonymization is to allow a host to communicate with an arbitrary server in such a manner that nobody can determine the host’s identity +- Different entities may be interested in exposing the host’s identity, each with varying capabilities to do so: + 1) curious individuals or groups may run their own participating machines to snoop on traffic + 2) parties skirting legality may break into a limited number of others’ machine + 3) large, powerful organizations may tap and monitor Internet backbones. +- Tarzan, a practical system aimed at realizing anonymity against all three flavors of adversary +- Less ambitious approaches, which do not work: + 1) In the simplest alternative, a host sends messages to a server through a proxy, such as Anonymizer.com [1]. This system fails if the proxy reveals a user’s identity [18] or if an adversary can observe the proxy’s traffic. Furthermore, servers can easily block these centralized proxies and adversaries can prevent usage with denial-of-service attacks. + 2) A host can instead connect from a set of mix relays, like onion routing. However, if a corrupt relay receives traffic from a non-core node, the relay can identify this node at the origin of the traffic. Colluding entry and exit relays, can use timing analysis to determine source and destination. External adversaries can do this as well. +- Few of these systems attempt to provide anonymity against an adversary that can passively observe all network traffic. Such protection requires fixing traffic patterns or using cover traffic to make such traffic analysis more difficult +- Some protect only the core of the static mix network and thus allow traffic analysis on its edges. Some simulate full synchrony and thus trivial DoS attacks halt their operation in entirety [7]. And some require central control and knowledge of the entire network +- Tarzan extends known mix-net designs to a peer-to-peer +environment. +- Tarzan nodes communicate over sequences of mix relays chosen from an open-ended pool of volunteer nodes, without any centralized component +- All peers are potential originators of traffic; all peers are potential relays +- we leverage our new concept of a domain to remove potential adversarial bias: An adversary may run hundreds of virtual machines, yet is unlikely to control hundreds of different IP subnets. +- Packets can be routed only between mimics, or pairs of nodes assigned by the system in a secure and universally-verifiable manner. + 1) This technique is practical in that it does not require network synchrony + 2) Consumes only a small factor more bandwidth than the data traffic to be hidden + 3) It is powerful as it shields all network participants, not only core routers +- Tarzan allows client applications on participating hosts to talk to non-participating Internet servers through special IP tunnels. The two ends of a tunnel are a Tarzan node running a client application and a Tarzan node running a network address translator; the latter forwards the client’s traffic to its ultimate Internet destination. Tarzan is transparent to both client applications and servers, though it must be installed and configured on participating nodes. +- Tarzan supports a systems-engineering position: anonymity can be built-in at the transport layer, transparent to most systems, trivial to incorporate, and with a tolerable loss of efficiency compared to its non-anonymous counterpart. +*** Design Goals and Network Model +- A node is an Internet host’s virtual identity in the system, created by running an instantiation of the Tarzan software on a single IP address. +- A tunnel is a virtual circuit for communication spread across an ordered sequence of nodes. +- A relay is a node acting as a packet forwarder as part of a tunnel. (Also known as a router in other protocols) +- Goals of Tarzan: + 1) Application independence: Tarzan should be transparent to existing applications and allow users to interact with existing services. To achieve this, Tarzan should provide the abstraction of an IP tunnel. + 2) Anonymity against malicious nodes: Tarzan should provide sender or recipient anonymity against colluding nodes. We consider these properties in terms of an anonymity set: the set of possible senders of a message. The larger this set, the “more” anonymous an initiator remains. + 3) Fault-tolerance and availability: Tarzan should resist an adversary’s attempts to overload the entire system or to block system entry or exit points. + 4) Performance: Tarzan should maximize the performance of tunnel transmission, subject to our anonymity requirements, to make Tarzan a viable IP-level communication channel. + 5) Anonymity against a global eavesdropper: An adversary observing the entire network should be unable to determine which Tarzan relay initiates a particular message. +- Since anyone can join Tarzan, it'll likely get targetted by malicious users +- A node is malicious if it modifies, drops, or records packets, analyzes traffic patterns, returns incorrect network information, or otherwise does not properly follow the protocols. +- From a naive viewpoint, the fraction of Tarzan nodes that are malicious determines the probability that a tunnel relay is malicious. Yet, a single compromised computer may operate on multiple IP addresses and thus present multiple Tarzan identities. +- To defend against such a situation, we make the observation that a single machine likely controls only a contiguous range of IP addresses + + typically by promiscuously receiving packets addressed to any IP address on a particular LAN or by acting as a gateway router. +- This observation is useful in bounding the damage each malicious node can cause. We will call this subnet controllable by a single malicious machine a domain +- A node belongs to a /d domain if the node’s d-bit IP prefix matches that of the domain. + + A malicious node owns all address space behind it +- Domains capture some notion of fault-independence: While an adversary can certainly subvert nodes within the same domain in a dependent fashion, nodes in different domains may fail independently. +- when selecting relays, Tarzan should consider the notion of distinct domains, not that of distinct nodes. +- Tarzan chooses some fixed IP prefix size as its granularity for counting domains: first among /16 subnet masks, then among /24 masks. +*** Architecture and Design +- Typical use proceeds in three stages. First, a node running an application that desires anonymity selects a set of nodes to form a path through the overlay network. Next, this source-routing node establishes a tunnel using these nodes, which includes the distribution of session keys. Finally, it routes data packets through this tunnel. +- The exit point of the tunnel is a NAT. This NAT forwards the anonymized packets to servers that are not aware of Tarzan, and it receives the response packets from these servers and reroutes the packets over this tunnel. +- Tarzan restricts route selection to pairs of nodes that use traffic to maintain traffic levels independent of data rates. +**** Packet Relay +- A Tarzan tunnel passes two distinct types of messages between nodes: data packets, to be relayed through existing tunnels, and control packets, containing commands and responses that establish and maintain these virtual circuits. +- A flow tag (similar to MPLS [23]) uniquely identifies each link of each tunnel. A relay rapidly determines how to route a packet tag. Symmetric encryption hides data, and a MAC protects its integrity, on a per-relay basis. Separate keys are used in each direction of each relay. +- In the forward path, the tunnel initiator clears each IP packet’s source address field, performs a nested encoding for each tunnel relay, and encapsulates the result in a UDP packet. +**** Tunnel setup +- When forming a tunnel, a Tarzan node pseudo-randomly selects a series of nodes from the network based on its local topology +- An establish request sent to node h_i is relayed as a normal data packet from h_1 through h_i−1. +**** IP Packet Forwarding +- Tarzan provides a client IP forwarder and a server-side pseudonymous network address translator (PNAT) to create a generic anonymizing IP tunnel. +- The client forwarder replaces its real address in the packets with a random address assigned by the PNAT from the reserved private address space +- The PNAT translates this private address to one of its real addresses. +- The pseudonymous NAT also offers port forwarding to allow ordinary Internet hosts to connect through Tarzan tunnels to anonymous servers. +**** Tunnel failure and reconstruction +- A tunnel fails if one of its relays stops forwarding packets. To detect failure, the initiator regularly sends ping messages to the PNAT through the tunnel and waits for acknowledgments. +**** Peer Discovery +- A Tarzan node requires some means to learn about all other nodes in the network, knowing initially only a few other nodes. Anything less than near-complete network information allows an adversary to bias the distribution of a node’s neighbor set towards malicious peers, leaks information through combinatorial profiling attacks, and results in inconsistencies during relay selection +- Tarzan uses a simple gossip-based protocol for peer discovery. +- This problem can be modeled as a directed graph: vertices represent Tarzan nodes; edges correspond to the relation that node a knows about, and thus can communicate with, node b. Edges are added to the graph as nodes discover other peers. +- Our technique to grow this network graph is similar to the NameDropper resource discovery protocol [16]. In each round of NameDropper, node a simply contacts one neighbor at random and transfers its entire neighbor set. +**** Peer Selection +- If peers are selected at random, we may hit malicious node, as the addresses of these are rarely scattered uniformly through the IP address space. Instead, they are often located in the same IP prefix space. Thus, we choose among distinct IP prefixes, not among all known IP addresses. +- Tarzan uses a three-level hierarchy: first among all known /16 subnets, then among /24 subnets belonging to this 16-bit address space, then among the relevant IP addresses. +**** Cover traffic and link encoding +- If the pattern of inter-node Tarzan traffic varied with usage, a wide-spread eavesdropper could analyze the patterns to link messages to their initiators. Prior work has suggested the use of cover traffic to provide more time-invariant traffic patterns independent of bandwidth demands +- Our key contributions include introducing the concept of a traffic mimic. We propose traffic invariants between a node and its mimics that protect against information leakage. These invariants require some use of cover traffic and yield an anonymity set exponential in path length. +***** Selecting Mimics +- Upon joining the network, node a asks k other nodes to exchange mimic traffic with it. Similarly, an expected k nodes select a as they look for their own mimics. Thus, each node has κ mimic +- Mimics are assigned verifiably at random from the set of nodes in the network. +- A node establishes a bidirectional, time-invariant packet stream with a mimic node, into which real data can be inserted, indistinguishable from the cover traffic. +***** Tunneling through mimics +- We constrain a tunnel initiator’s choice of relays at each hop to those mimics of the previous hop, instead of allowing it to choose any random node in the network. Therefore, nodes only construct tunnels over links protected by cover traffic +***** Unifying traffic patterns +- The packet headers, sizes, and rates of a node’s incoming traffic from its mimics must be identical to its outgoing traffic, so that an eavesdropper cannot conclude that the node originated a message. +* Cloud Computing +** Fog Computing and Its Role in Internet of Things +*** Abstract +- Fog Computing extends the Cloud Computing paradigm to the edge of the network, thus enabling a new breed of applications and services. +- Defining characteristics of the Fog are: + 1) Low latency and location awareness; + 2) Wide-spread geographical distribution; + 3) Mobility; + 4) Very large number of nodes, + 5) Predominant role of wireless access, + 6) Strong presence of streaming and real time applications + 7) Heterogeneity. +-The Fog is the appropriate platform for a number of critical Internet of Things (IoT) services and applications, namely, Connected Vehicle, Smart Grid , SmartCities, and, in general, Wireless Sensors and Actuators Networks (WSANs). +*** Introduction +- The “pay-as-you-go” Cloud Computing model is an efficient alternative to owning and managing private data centers (DCs) +- Several factors contribute to the economy of scale of mega DCs: higher predictability of massive aggregation, which allows higher utilization without degrading performance; convenient location that takes advantage of inexpensive power; and lower OPEX (operation expensens) achieved through the deployment of homogeneous compute, storage, and networking components. +- This bliss becomes a problem for latency-sensitive application +- An emerging wave of Internet deployments, most notably the Internet of Things (IoTs), requires mobility support and geo-distribution in addition to location awareness and low latency. +- A new platform is needed to meet requirements of IoT +- Fog Computing is the answer. +- The fog is a cloud close to the ground +- Cloud and Fod computing can interplay when it comes to data management and analytics, rather than fog computing eating cloud computing +*** The Fog Computing Platform +**** Characterization of Fog Computing +***** How fog differs from cloud +- Fog Computing is a highly virtualized platform that provides compute, storage, and networking services between end devices and traditional Cloud Computing Data Centers +- Entirely wireless +- Typically located at the edge of the network +- Edge Location, Location awareness, and low latency. The origins of the Fog comes from the need to support endpoints with rich services at the edge of the network, for applications with low latency requirements. +- Geographical distribution. Services and applications targeted by the Fog demand widely distributed deployments. Like high quality streaming to moving vehicles through proxies and APs along highways and tracks. +- Large-Scale sensor networks to monitor the environment will also require distributed computing and storage resources +- Very large number of nodes, as a consequence of the wide geo-distribution, as evidenced in sensor networks in general, and the Smart Grid in particular. +- Support for mobility. It is essential for many Fog applications to communicate directly with mobile devices, and therefore support mobility techniques +- Real-time interactions. Important Fog applications involve real-time interactions rather than batch processing. (Cloud is really nice for batch processing) +- Predominance of wireless access. +- Heterogeneity. Fog nodes come in different form factors and will be deployed in all sorts of environments +- Interoperability and federation. Seamless support of certain services (streaming is a good example) requires the cooperation of different providers. +- Support for on-line analytic and interplay with the Cloud. The Fog is positioned to play a significant role in the ingestion and processing of the data close to the source. +**** Fog Players: Providers and Users +- Don't know how the different Fog Computing players will align. Thus these are anticipations +- Subscriber models +- More people will enter the competition. +*** Fog Computing and the IoT +**** Connected Vehicle (CV) +- The Fog has a number of attributes that make it the ideal platform to deliver a rich menu of SCV services in infotainment, safety, traffic support, and analytics: geo-distribution (throughout cities and along roads), mobility and location awareness, low latency, heterogeneity, and support for real-time interactions. +**** Wireless Sensors and Actuators Networks +- The original Wireless Sensor Nodes (WSNs), nicknamed motes, were designed to operate at extremely low power to extend battery life or even to make energy harvesting feasible. Most of these WSNs involve a large number of low bandwidth, low energy, low processing power, small memory motes, operating as sources of a sink (collector), in a unidirectional fashion +- The characteristics of the Fog (proximity and location awareness, geo-distribution, hierarchical organization) make it the suitable platform to support both energy-constrained WSNs and WSANs. + + WSAN is a wireless sensor and actuator network. The issue is that the system is no longer entirely uni-directional, as signals need to be send to the actuators from controllers. + + Current solutions consist of a WSN and a MANET. +*** Analytics and the interplay between the fod and the cloud +- While Fog nodes provide localization, therefore enabling low latency and context awareness, the Cloud provides global centralization. Many applications require both Fog localization, and Cloud globalization, particularly for analytics and Big Data. +- Fog collectors at the edge ingest the data generated by grid sensors and devices. Some of this data relates to protection and control loops that require real-time processing (from milliseconds to sub seconds). +- This first tier of the Fog, designed for machine-to-machine (M2M) interaction, collects, process the data, and issues control commands to the actuators. It also filters the data to be consumed locally, and sends the rest to the higher tiers +- The second and third tier deal with visualization and reporting, human-to-human interaction. + +** EdgeIoT: A Mobile Edge Computing for the Internet of Things +*** Abstract +- In order to overcome the scalability problem of the traditional Internet of Things architecture (i.e., data streams generated from distributed IoT devices are transmitted to the remote cloud via the Internet for further analysis), this article proposes a novel approach to mobile edge computing for the IoT architecture, edgeIoT, to handle the data streams at the mobile edge +*** Introduction +- Although IoT can potentially benefit all of society, many technical issues remain to be addressed. + 1) First, the data streams generated by the IoT devices are high in volume and at fast velocity (the European Commission has predicted that there will be up to 100 billion smart devices connected to internet in 2020 + 2) The data generated by IoT is send to the cloud for processing, via the internet. However, the internet is not scalable and efficient enough to handle this data. Additionally, it consumes a lot of bandwidth, energy and time to send all of the data to the cloud + 3) since the IoT big data streams are transmitted to the cloud in high volume and at fast velocity, it is necessary to design an efficient data processing architecture to explore the valuable information in real time + 4) user privacy remains a challenging unsolved issue; that is, in order to obtain services and benefits, users should share their sensed data with IoT service providers, and these sensed data may contain users’ personal information. +- we propose an efficient and flexible IoT architecture, edgeIoT, by leveraging fog computing and software defined networking (SDN) to collect, classify, and analyze the IoT data streams at the mobile edge +- Bringing computing resources close to IoT devices minimise traffic in core network and minimise end-to-end delay between computing resources and IoT devices. +- A hierarchical fog computing architecture to provide flexible and scalable resource provisioning for users of IoT +- A proxy virtual machine migration scheme to minimise traffic in core network +*** Mobile Edge Computing for IoTs +- Fog computing, which is defined as a distributed computing infrastructure containing a bunch of high-performance physical machines (PMs) that are well connected with each other +- deploying a number of fog nodes in the network can locally collect, classify, and analyze the raw IoT data stream +- where to deploy the fog nodes to facilitate the communications between IoT devices and fog nodes is still an open issue +- It is difficult to optimize the deployment of fog nodes due to the mobility and heterogeneity features of the IoT devices. + + Phones and wearables move around and some Things have more energy and speed for transmitting data. As such, different protocols and such need to be supported. +**** Multi-Interface Base Stations in Cellular Network +- Existing Base Stations may be used, as they provide high coverage and they are distributed to potentially connect all IoT devices. +- They can be equipped with multiple wireless interfaces, to support the different data transmission requirements. +- Therefore, a potential deployment is to connect each BS to a fog node to process the aggregated raw data streams. +**** The EdgeIoT Architecture +- Fog nodes can either be directly connected to a base station for minimum end-to-end (E2E) delay when transmitting local data streams +- Can also be deployed at the cellular core network, such that different BSs can share the same node +- Doesn't use traditional cellular core network, as this is slow, inflexible and unscalable +- Introduces SDN-based cellular core (Software-defined networking). + + SDN is meant to address the fact that the static architecture of traditional networks is decentralized and complex while current networks require more flexibility and easy troubleshooting. +- Uses something called OpenFlow controllers and such, as this can be use dto separate out all control functions from the data forwarding function +- Essentially. SDN and OpenFlow adds more flexibility to control the network traffic. +- Fog nodes can offload work to the cloud as the expense of bandwidth +- IoT applications can also be deployed in the fog nodes, to offer services to users. This obviously allows for a more latency free experience, for the user. +**** Hierarchical Fog Computing Architecture +- Most user generated information contain personal information +- Analysis of this data can benefit both user but also society (analysing images can lead to recognition of criminals and such) +- User privacy need to be kept +- Each user is associated with a proxy VM, which is the users private VM and it is situated at a nearby fog node + + Provides flexible storage and computing resources +- IoT devices of the user are registrered to this VM, which collects the raw data. This VM can process the data and ensure privacy. +- The Proxy VMs can be dynamic, if the user moves, the VM relocates to a new fog node. +- Can also be static or split into two where one is static and the other is dynamic (if there is need for one for the devices at home) + + Can of course also become one again, when the user gets home. +- There also exists application VMs for the applications in the Fog Node. There are the ones who need the data from the Proxy VMs +- The application VMs can also be either local, remove or add-on mode +***** Local application VM deployment +- refers to the deployment of an application VM in the fog node to analyze the metadata generated by the local proxy VMs +- Parknet example: each local proxy VM collects the sensed data streams from its smart cars (note that each smart car is equipped with a GPS receiver and a passenger-side-facing ultrasonic range finder to generate the location and parking spot occupancy information) and generates the metadata, which identify the available parking spots, to the application VM. +***** Remote application VM deployment +- refers to the deployment of an application VM in the remote cloud to analyze the metadata generated by the proxy VMs from different fog nodes. +- If the application needs data from several fog nodes, i.e. a larger area. +- Perhaps to do traffic analysis and find hot spots where people shouldn't drive. +***** Add-on application VM deployment +- Event-triggered application VM deployment +- If an even triggers, several application VMs can be deployed, such as to find missing children, the VMs can be deployed on to many fog nodes quickly. +*** Challenges in Implementing EdgeIoT +**** Identifcations between IoT Devices and their proxy VMs +- Each device needs to know the ID of the proxy VM, but the Proxy VM also needs to know the IDs of the devices. +- This can be consume a lot of bandwidth, due to the VM composition and decomposition (i.e. combining and splitting). +- Even if it doesn't happen often, the VM needs to inform each device when this happens. +**** Proxy VM mobility management +- When a user moves to a new BS, it should report its new location to the mobility management entity (MME), which handles the VM migration, compositions and decompositions. +- The MME resides in the OpenFlow control layer +- Adopting existing mobility management in the LTE network (3G and such) is one solution, however requires all devices to have a SIM card for ID. + + The location update protocol in LTE is also expensive. +- one alternative is to establish a local cluster network (e.g., a body area network) consisting of mobile IoT devices. The user’s mobile phone or other wearable device acts as a cluster head, which can be considered as a gateway for the reporting of locations and data aggregation. +**** Migrating VMs +- It is necessary to estimate the profit for migrating the proxy VM among the fog nodes whenever the user’s mobile IoT devices roam to a new BS. + + Required since simply moving whenever a new is entered can be expensive, if we won't be there for long or might not even have fully left the old one. +*** Conclusion +- This article proposes a new architecture, edgeIoT, in order to efficiently handle the raw data streams generated from the massive distributed IoT devices at the mobile edge. The proposed edgeIoT architecture can substantially reduce the traffic load in the core network and the E2E delay between IoT devices and computing resources compared to the traditional IoT architecture, and thus facilitate IoT services provisioning. Moreover, this article has raised three challenges in implementing the proposed edgeIoT architecture and has provided potential solutions. +** On the Integration of Cloud Computing and IoT +*** Introduction and Motivation +- Cloud computing has virtually unlimited capabilities in terms of storage and processing power, is a much more mature technology, and has most of the IoT issues at least partially solved +- CloudIoT is the merger of Cloud and IoT technologies. +*** CLOUD AND IOT: THE NEED FOR THEIR INTEGRATION +- IoT can benefit from the virtually unlimited capabilities and resources of Cloud to compensate its technological constraints +- On the other hand, the Cloud can benefit from IoT by extending its scope to deal with real world things in a more distributed and dynamic manner, and for delivering new services in a large number of real life scenarios. +- Essentially, the Cloud acts as intermediate layer between the things and the applications, where it hides all the complexity and the functionalities necessary to implement the latter. +**** Storage resources +- IoT involves a lot of unstructured or smei-structured data, similar to Big Data + + It's a huge amount + + Different data types + + Data generation frequency is high +- Cloud is most convenient and cost effective solution, to deal with this data +- Data can be provided in a homogenous way in the cloud and can be secured +**** Computation Resources +- IoT devices have quite limited processing power, so they can't usually do anything on-board. +- Data collected is usually transmitted to more powerful nodes where aggregation and processing is possible + + Not scalable +- Cloud offers almost unlimited processing power and can as such be utilised for predictions, computatations and all sorts of stuff +**** Communication resources +- Clouds can act as gateways sort of, since they can offer an effective and cheap solution to connect, track and manage any Thing from anywhere, using customized portals and built-in apps. +**** New capabilities +- IoT is characterized by a very high heterogeneity of devices, technologies, and protocols. Therefore, scalability, interoperability, reliability, efficiency, availability, and security can be very difficult to obtain. +**** New Paradigms +- Sensing, actuation, database, data, ethernet, video surveillance as a service and some more. +*** Applications +**** Healthcare +- IoT and multimedia technologies have made their entrance in the healthcare field thanks to ambient-assisted living and telemedicine +**** Smart City +- IoT can provide a common middleware for future-oriented Smart-City services, acquiring information from different heterogeneous sensing infrastructures, accessing all kinds of geo-location and IoT technologies (e.g., 3D representations through RFID sensors and geo-tagging), and exposing information in a uniform way. +**** Video Surveillance +- Intelligent video surveillance has become a tool of the greatest importance for several security-related applications. As an alternative to in-house, self-contained management systems, complex video analytics require Cloud-based solutions, VSaaS, to properly satisfy the requirements of storage (e.g., stored media is centrally secured, fault-tolerant, on-demand, scalable, and accessible at high-speed) and processing (e.g., video processing, computer vision algorithms and pattern recognition modules to extract knowledge from scenes). +* Pastry and its applications +** PAST +- Large-scale peer-to-peer persistent storage utility +- Based on self-organizing, internet-based overlay network of storage nodes that can coop route fil equeries, store multiple replicas of files and cache additional copies of popular files. +- Storage nodes and files are uniformly distributed IDs +- Files are stored at node whose ID matches the best +- Statistical assignment balances number of files stored +- Non-uniform storage node capacities require explicit load balancing +- Same goes for non-uniform file requests, i.e. popular files. They require caching. +- Peer-to-peer Internet applications have recently been popularized through file sharing applications such as Napster, Gnutella and FreeNet +- P2P systems are interesting because of decentralised protocol, self-organization, adaption and scalability. +- P2P systems can be characterized as distributed systems where all nodes are equal in capabilities and responsibilities. +- PAST had nodes connected to the internet, where each node can iniate and route client requests to insert or retrieve files. +- Nodes can contribute storage to system +- With high probability, a file is replicated on nodes which are diverse in geographic location, ownership, administration, network connectivity, rule of law, et.c. +- PAST is attractive since; it exploits multitude and diversity (geographi, ownership, the same as before ..) +- Offers persistent storage service via a quasi-unique fileId which is generated when the file is inserted. + + This makes files stored immutable, since a file can't be inserted with the same fileId +- Owner can share fileId to share file and a decryption key if needed +- Pastry ensures that client requests are reliably routed to the appropriate nodes. +- Clients requesting to retrieve a file are routed, most likely, to a node that is "close in the network", to the requesting client. +- As the system is based on pastry, the nodes traversed and the amount of messages sent, is logarithmic in size, given normal operation. +- PAST does not provide searching, directory lookup or key distribution. +- Any host on the internet can be a PAST node, by installing PAST +- A PAST node is minimially an access point for the user, can also contribute storage and participate in routing of requests +- Clients have a few operations: + 1) fileId = Insert(name, owner-cred, k, file). fileId is the secure hash of the file's name, the owner's public key and some salt. k is the replication factor. + 2) file = Lookup(fileId). If fileId exists in PAST and one of k replication nodes can be reached, the file is returned. + 3) Reclaim(fileId, owner-cred). Reclaim storage of k copies of the file, identified by fileId. If the operation completes, PAST doesn't guarantee the file can be found anylonger. Reclaim doesn't guarantee the file is deleted, but the space the file took, can be overwritten. +- A PAST node has a 128-bit ID, nodeId. This indicates the position in the circular namespace of Pastry, which ranges from 0 to 2^128 - 1. It's the hash of the nodes public key. +- No correlation between nodeId and information on the whereabouts and such of the node. (anonymity) +- As such, adjacent nodeIds are likely not close to each other geographically, so they make well for replication. +- Insert: PAST stores the file on the k numerically closest nodeIds to the 128 most significant bits of the fileId. +- Meanwhile the random nature of both nodeId and fileId allows for files to be distributed uniformly and balanced, file sizes and nodes capacity differ, so each node might not be able to store as many files as each other. +- PAST also caches additionally to the k replications, at node's who have more space left. These caches might be discarded at any time. +- Pastry is a p2p routing substrate that is efficient, scalable, fault resilient and self-organizing. +- A file can be located unless all k nodes have failed simultaneously. +- Pastry can route in less than logarithmic steps.. +- Eventual delivery is guaranteed unless l/2 nodes with adjacent nodeIds fail at the same time. l is typically 32. +- Tables of PAST maps nodeId to IP of the node. +- If a node joins or leaves, the invariants of the tables can be restored using logarithmic amount of messages, among the affected nodes. +- In line with the overall decentralized architecture of PAST, an important design goal for the storage management is to rely only on local coordination among nodes with nearby nodeIds, to fully integrate storage management with file insertion, and to incur only modest performance overheads related to storage management. +- PAST allows a node that is not one of the k numericaily closest nodes to the fileId to alternatively store the file, if it is in the leaf set of one of those k nodes. +- replica diversion: The purpose is to accommodate differences in the storage capacity and utilization of nodes within a leaf set. +- file diversion is performed when a node's entire leaf set is reaching capacity. Its purpose is to achieve more global load balancing across large portions of the nodeId space. A file is diverted to a different part of the nodeId space by choosing a different salt in the generation of its fileId. +- Replica diversion aims at balancing the remaining free storage space among the nodes in each leaf set. In addition, as the global storage utilization of a PAST system increases, file diversion may also become necessary to balance the storage load among different portions of the nodeId space. +- The purpose of replica diversion is to balance the remaining free storage space among the nodes in a leaf set. + + A chooses a node B in its leaf set t h a t is not among the k closest and does not already hold a diverted replica of the file. A asks B to store a copy on its behalf, then enters an entry for the file in its table with a pointer to B +- Ensuring that the pointer doesn't disappear, can be done by entering a pointer to the replica stored on B, into the file table of C with the k+1th closest nodeId to the fielId. +- PAST has three policites to control replica diversion + 1) acceptance of replicas into a node's local store + 2) selecting a node to store a diverted replica + 3) deciding when to divert a file to a different part of the nodeId space +- it is not necessary to balance the remaining free storage space among nodes as long as the utilization of all nodes is low. +- it is preferable to divert a large file rather than multiple small ones. +- a replica should always be diverted from a node whose remaining free space is significantly below average to a node whose free space is significantly above average +- when the free space gets uniformly low in a leaf set, it is better to divert the file into another part of the nodeId space than to a t t e m p t to divert replicas at the risk of spreading locally high utilization to neighboring parts of the nodeId space. +- Policy for accepting a file is size_of_file / free_storage. A file is rejected if it consumes a fraction of remaining storage, larger than some value t. +- Different thresholds t for k closests node and nodes not in the k closest. + + Threshold is larger (i.e. it's more difficult to reject a file, for the nodes in the set of the k closest) +- the policy discriminates against large files, and decreases the size threshold above which files get rejected as the node's utilization increases. +- This bias minimizes the number of diverted replicas and tends to divert large files first, while leaving room for small files +- A primary store node N that rejects a replica needs to select another node to hold the diverted replica. + + Choose node with most remaining space which a) are in leafset b) have a nodeid that is not one of the k closest to the fileId c) do not already have a diverted replica of the file +- File is diverted if primary replica node and then diverted replica node both can't store the file. + + Nodes holding the file will be told to drop it. +- The purpose of file diversion is to balance the remaining free storage space among different portions of the nodeId space in PAST. + + File diversion is tried three times, so four inserts. If it fails, failure is reported to user, who can then fragment the file or something to make it smaller. +- PAST maintains the invariant that k copies of each inserted file are maintained on different nodes within a leaf set. +- recall that as part of the Pastry protocol, neighboring nodes in the nodeId space periodically exchange keep-alive messages. + + If node is unresponsive for T time, all leaf sets of affected nodes, are adjusted. Missing node is replaced in all leafsets who had it. +- If a node joins, it's inserted into affected leaf sets. +- If a node joins a leafset because it's new or because another node disappeared, this node has to gain a replica of the files it should have. This should be old files from the node of whose spot it took. +- To avoid bandwidth overhead when joining, where the node should request copies of all files it need (expensive) + + the joining node may instead install a pointer in its file table, referring to the node that has just ceased to be one of the k numerically closest to the fileId, and requiring that node to keep the replica. +- When a PAST network is growing, node additions may create the situation where a node that holds a diverted replica and the node that refers to that replica are no longer part of the same leaf set. + + To minimize the associated overhead of continually sending keep-alive messages, affected replicas are gradually migrated to a node within the referring node's leaf set whenever possible. +- When a node fails and storage utilisation is so high the remaining nodes can't store additional replicas, for PAST to keep it's storage invariants a node asks the two most distant members of its leaf set (in the nodeId space) to locate a node in their respective leaf sets that can store the file. Since exactly half of the node's leaf set overlaps with each of these two nodes' leaf sets, a total of 21 nodes can be reached in this way. +- If total disk storage were to decrease due to node and disk failures, that were not balanced out by addition of new ones, then the system would exhaust its storage and replicas and files won't be able to be inserted. + + PAST addresses this problem by maintaining storage quotas, thus ensuring that demand for storage cannot exceed the supply. +- Goals of cache management + 1) minimize access latency (here routing distance), fetch distance is measured in terms of Pastry routing hops + 2) maximize throughput + 3) balance query load in system +- The k replicas ensures availability, but also gives some load balancing and latency reduction because of locality properties of Pastry +- PAST nodes use the "unused" portion of their advertised disk space to cache files. Cached copies can be evicted and discarded at any time. In particular, when a node stores a new primary or redirected replica of a file, it typically evicts one or more cached files to make room for the replica. +- A file is cached in PAST at a node traversed in lookup or insert operations, if the file size is less than some fraction of the node’s remaining cache size +- Cache hits are tracked by nodes having cached files. They weight each cached file based on this and the size of the file, and evict the one with the least weight first. +- Experiments run with no replica diversion and file diversion, to display the need for explicit load balancing, showed that the need was definitely there. + + A lot of insertions were rejected + + A low amount of overall storage utilisation +- Caching helps things a lot +** Scribe +- Scalable application-level multicast infrastructure. +- Scribe supports large numbers of groups +- Built on top of Pastry +- Pastry is used to create and manage groups and to build efficient multicast trees for the dissemination of messages to each group. +- the use of multicast in applications has been limited because of the lack of wide scale deployment and the issue of how to track group membership. +- In this paper we present Scribe, a large-scale, decentralized application-level multicast infrastructure built upon Pastry, a scalable, self-organizing peer-to-peer location and routing substrate with good locality properties +- Scribe builds a multicast tree, formed by joining the Pastry routes from each group member to a rendez-vous point associated with a group. +- Each entry in the routing table of Pastry refers to one of potentially many nodes whose nodeId have the appropriate prefix. Among such nodes, the one closest to the present node (according to a scalar proximity metric, such as the round trip time) is chosen. +***** Pastry Locality +- The proximity metric is a scalar value that reflects the “distance” between any pair of nodes, such as the round trip time. It is assumed that a function exists that allows each Pastry node to determine the “distance” between itself and a node with a given IP address. +- The short routes property concerns the total distance, in terms of the proximity metric, that messages travel along Pastry routes. Recall that each entry in the node routing tables is chosen to refer to the nearest node, according to the proximity metric, with the appropriate nodeId prefix. +- The route convergence property is concerned with the distance traveled by two messages sent to the same key before their routes converge. +*** Scribe +- Any Scribe node may create a group; other nodes can then join the group, or multicast messages to all members of the group +- No particular delivery order of messages +- Stronger reliability and delivery guarantess can be built on top of Scribe +- Nodes can create, send messages to, and join many groups. +- Groups may have multiple sources of multicast messages and many members. +- Scribe uses Pastry to manage group creation, group joining and to build a per-group multicast tree used to disseminate the messages multicast in the group. +- The Scribe software on each node provides the forward and deliver methods, which are invoked by Pastry whenever a Scribe message arrives. +- Recall that the forward method is called whenever a Scribe message is routed through a node +- The deliver method is called when a Scribe message arrives at the node with nodeId numerically closest to the message’s key, or when a message was addressed to the local node using the Pastry send operation. +- The possible message types in Scribe are JOIN , CREATE , LEAVE and MULTICAST +- Each group has a unique groupId. +- The Scribe node with a nodeId numerically closest to the groupId acts as the rendezvous point for the associated group. The rendez-vous point is the root of the multicast tree created for the group. +- To create a group, a Scribe node asks Pastry to route a CREATE message using the groupId as the key + + Pastry delivers this message to the node with the nodeId numerically closest to groupId. + + This Scribe node becomes the rendez-vous point for the group. +- The groupId is the hash of the group’s textual name concatenated with its creator’s name. +- Alternatively, we can make the creator of a group be the rendez-vous point for the group as follows: a Pastry nodeId can be the hash of the textual name of the node, and a groupId can be the concatenation of the nodeId of the creator and the hash of the textual name of the group. +- This alternative can improve performance with a good choice of creator: link stress and delay will be lower if the creator sends to the group often, or is close in the network to other frequent senders or many group members. +- Scribe creates a multicast tree, rooted at the rendez-vous point, to disseminate the multicast messages in the group. +- The tree is created using a scheme similar to reverse path forwarding (??) +- The tree is formed by joining the Pastry routes from each group member to the rendez-vous point. +- Scribe nodes that are part of a group’s multicast tree are called forwarders with respect to the group; they may or may not be members of the group. +- Each forwarder maintains a children table for the group containing an entry (IP address and nodeId) for each of its children in the multicast tree. +***** Joining +- When a Scribe node wishes to join a group, it asks Pastry to route a JOIN message with the group’s groupId as the key + + This will get routed to the rendez-vous point ( node ) + + At each node along the route, Pastry invokes Scribe’s forward method. + + Forward checks its list of groups to see if it is currently a forwarder; if so, it accepts the node as a child, adding it to the children table. If the node is not already a forwarder, it creates an entry for the group, and adds the source node as a child in the associated children table + + It then becomes a forwarder for the group by sending a JOIN message to the next node along the route from the joining node to the rendez-vous point + 1) The peer sends a “join” message towards the groupID + 2) An intermediate node forwarding this message will: + - If the node is currently a forwarder for the group it adds the sending peer as a child and we’re done. + - If the node is not a forwarder, it adds the sender as a child and then sends its own “join” message towards the groupID, thus becoming a forwarder for the group + 3) Every node in a group is a forwarder – but this does not mean that it is also a member +- So the rendez-vous point doesn't handle all requests, if there are forwarders on the path +***** Leaving +- The peer locally marks that it is no longer a member but merely a forwarder +- It then proceeds to check whether it has any children in the group + + and if it hasn’t it sends a “leave” message to its parent (which continues recursively up the tree if necessary) + + otherwise it stays on as a forwarder +***** Sending messages +- Multicast sources use Pastry to locate the rendez-vous point of a group: they route to the rendez-vous point + + They ask it for its IP +- The IP is cached and can now be used to multicast to the group, avoiding further routing through pastry to get the rendez-vous node. +- If this point fails, a new one has to be found. +- There is a single multicast tree for each group and all multicast sources use the above procedure to multicast messages to the group. + + The rendez-vous point can perform access control, as it's a centralised point of which all messages must go through +- Thus, pastry is not used for data traffic, only for handling joining and leaving of tree +- Top-down +***** Reliability +- Scribe provides only best-effort delivery of messages but it offers a framework for applications to implement stronger reliability guarantees. +- Uses TCP +***** Repairing +- Periodically, each non-leaf node in the tree sends a heartbeat message to its children. +- Multicast messages are implicit heartbeats (if you can send a message, you're alive!) +- A child suspects that its parent is faulty when it fails to receive heartbeat messages. +- Upon detection of the failure of its parent, a node calls Pastry to route a JOIN message to the group’s identifier. Pastry will route the message to a new parent, thus repairing the multicast tree. + + I.e., the node "joins" again and gets a new parent (forwarder) +- The state associated with the rendez-vous point, which identifies the group creator and has an access control list, is replicated across the closest nodes to the root node in the nodeId space (where a typical value of is 5). It should be noted that these nodes are in the leaf set of the root node. If the root fails, its immediate children detect the failure and join again through Pastry. Pastry routes the join messages to a new root (the live node with the numerically closest nodeId to the groupId). This new root will be among these k nodes, as such, it will have the state. +***** Evaluations +- slightly longer delay due to more routing I'd imagine + + Helps that pastry uses short routes +- Significantly less node stress, due to the tree structure, so work is nicely delegated +- Decent link stress, due to tree structure and route convergence, as group members which are close tend to be children of the same parent, so the parent needs a single copy, but this can be forwarded to two. + + + +