diff --git a/notes.org b/notes.org index bae3999..21f1d07 100644 --- a/notes.org +++ b/notes.org @@ -1439,3 +1439,114 @@ access the resources. It’s a web server offering an OAuth API to authenticate - To prevent Sybil: In [13] Rowaihy et al. present an admission control system for structured peer-to- peer systems. The systems constructs a tree-like hierarchy of cooperative admission control nodes, from which a joining node has to gain admission. Another approach [7] to limit Sybil attacks is to store the IP addresses of participating nodes in a secure DHT. In this way the number of nodeIds per IP address can be limited by querying the DHT if a new node wants to join. *** Conclusion - We propose several practicable solutions to make Kademlia more resilient. First we suggest to limit free nodeId generation by using crypto puzzles in combination with public key cryptography. Furthermore we extend the Kademlia routing table by a sibling list. This reduces the complexity of the bucket splitting algorithm and allows a DHT to store data in a safe replicated way. Finally we propose a lookup algorithm which uses multiple disjoint paths to increase the lookup success ratio. The evaluation of S/Kademlia in the simulation frame- work OverSim has shown, that even with 20% of adversarial nodes still 99% of all lookups are successful if disjoint paths are used. We believe that the proposed extensions to the Kademlia protocol are practical and could be used to easily secure existing Kademlia networks. +** Protecting Free Expression Online with Freenet +- Freenet uses a decentralized P2P architecture to create an uncensorable and secure global information storage system. +- The growth of censorship and ero- sion of privacy on the Internet increasingly threatens freedom of expression in the digital age. Personal information flows are becoming subject to pervasive monitoring and surveillance, and various state and corporate actors are trying to block access to controversial information and even destroy certain materials altogether. +- Freenet, is a distributed information storage system designed to address information privacy and survivability concerns +- In simulations of up to 200,000 nodes, Freenet has proved scalable and fault tolerant. It operates as a self-organizing P2P network that pools unused disk space across potentially hundreds of thousands of desktop computers to create a collaborative virtual file system. +- To increase network robustness and eliminate single points of failure, Freenet employs a completely decentralized architecture. +- Participants could operate maliciously or fail without warning at any time + + Freenet implements strategies to protect data integrity and prevent privacy leaks in the former instance, and provide for graceful degradation and redundant data availability in the latter. +- The system is also designed to adapt to usage patterns, automatically replicating and deleting files to make the most effective use of available storage in response to demand. +*** Design Motivation +- the prevention of censorship and the maintenance of privacy are both fundamental to free expression in a potentially hostile world. +- Preserving the availability of controver- sial information is only half the problem; individuals can often be subject to adverse personal consequences for writing or reading such information and might need to conceal their activity in order to protect themselves. +- A common objection to mechanisms for secure communication is that criminals might use them to evade law enforcement. +- Freenet is not particularly attractive for such purposes, as it is designed to broadcast content to the world — not so useful for secret criminal plots. +- In designing Freenet, we focused on + 1) privacy for information producers, consumers, and holders; + 2) resistance to information censorship; + 3) high availability and reliability through decentralization; and + 4) efficient, scalable, and adaptive storage and routing. +- Because disk space is finite, a tradeoff exists between publishing new documents and preserving old ones. +*** Freenet Architecture +- Freenet participants each run a node that provides the network some storage space. To add a new file, a user sends the network an insert message containing the file and its assigned location-independent globally unique identifier (GUID), which causes the file to be stored on some set of nodes. +- During a file’s lifetime, it might migrate to or be replicated on other nodes. +- To retrieve a file, a user sends out a request message containing the GUID key. +**** GUID Keys +- Freenet GUID keys are calculated using SHA-1 secure hashes. The network employs two main types of keys: content-hash keys, used for primary data storage, and signed-subspace keys, intended for higher-level human use. +***** Content-hash keys +- The content-hash key (CHK) is the low-level data-storage key and is generated by hashing the contents of the file to be stored. +- So every file has a unique identifier +- Unlike with URLs, you can be certain that a CHK reference will point to the exact file intended. CHKs also permit identical copies of a file inserted by different peo-ple to be automatically coalesced because every user will calculate the same key for the file. +***** Signed-subspace keys +- The signed-subspace key (SSK) sets up a personal namespace that anyone can read but only its owner can write to. +- To add a file you first choose a short text description, such as politics/us/pentagon-papers. +- You would then calculate the file’s SSK by hashing the public half of the subspace key and the descriptive string independently before concatenating them and hashing again. +- Signing the file with the private half of the key provides an integrity check as every node that handles a signed-subspace file verifies its signature before accepting it. +- To retrieve a file from a subspace, you need only the subspace’s public key +- Adding or updating a file, on the other hand, requires the private key in order to generate a valid signature. +- Typically, SSKs are used to store indirect files containing pointers to CHKs rather than to store data files directly. +- These pointers make it easier for people to update their files and such +- you can use indirect files to create hierar- chical namespaces from directory files that point to other files and directories. +- SSKs can also be used to implement an alterna- tive domain name system for nodes that change address frequently. Each such node would have its own subspace, and you could contact it by looking up its public key — its address-resolution key — to retrieve the current address. +***** Messaging and Privacy +- Freenet was designed from the beginning under the assumption of hostile attack from both inside and out. +- Unfortunately, these considerations have had the side effect of hampering changes that might improve Freenet’s routing characteristics. To date, we have not discovered a way to guarantee better data locatability without compromising security. +- Privacy in Freenet is maintained using a varia- tion of Chaum’s mix-net scheme for anonymous communication. +- Rather than move directly from sender to recipient, messages travel through node- to-node chains, where each link is individually encrypted, until the message finally reaches its recipient (kinda like TOR) +- Because each node in the chain knows only about its immediate neighbors, the end points could be anywhere among the network’s hundreds of thousands of nodes, which are continually exchanging indecipherable messages. +***** Routing +- Routing queries to data is the most important element of the Freenet system. +- Centralized is shit +- Randomly broadcasting is shit +- Freenet avoids both problems by using a steepest-ascent hill-climbing search: Each node forwards queries to the node that it thinks is closest to the target. You might start searching for Jordan by asking a friend who once played college basketball, for example, who might pass your request on to a former coach, who could pass it to a talent scout, who might pass it to Jordan’s agent, who could put you in touch with the man himself. +***** Requesting files +- Every node maintains a routing table that lists the addresses of other nodes and the GUID keys it thinks they hold. +- Otherwise, the node for-wards the request to the node in its table with the closest key to the one requested. +- If the request is successful, each node in the chain passes the file back upstream and creates a new entry in its routing table associating the data holder with the requested key. +- Nodes might also cache a copy +- To conceal the identity of the data holder, nodes will occasionally alter reply messages, setting the holder tags to point to themselves before passing them back up the chain. Later requests will still locate the data because the node retains the true data holder’s identity in its own routing table and forwards queries to the correct holder. Routing tables are never revealed to other nodes. +- To limit resource usage, the requester gives each query a time-to-live limit that is decremented at each node. +- If a node sends a query to a recipient that is already in the chain, the message is bounced back and the node tries to use the next-closest key instead. If a node runs out of candidates to try, it reports failure back to its predecessor in the chain, which then tries its second choice, and so on. +- With this approach, the request homes in closer with each hop until the key is found. A subsequent query for this key will tend to approach the first request’s path, and a locally cached copy can satisfy the query after the two paths converge. +- Nodes that reliably answer queries will be added to more routing tables, and hence, will be contacted more often than nodes that do not. +***** Inserting files +- An insert message follows the same path that a request for the same key would take, sets the routing table entries in the same way, and stores the file on the same nodes. Thus, new files are placed where queries would look for them. +- To insert a file, a user assigns it a GUID key and sends an insert message to the user’s own node containing the new key with a TTL value that rep- resents the number of copies to store. +- Inserts might fail if the CHK is already present at a node or the user has already inserted another file with the same description (for SSKs). In the latter case, the user should choose a different descrip- tion or perform an update rather than an insert. +- If the TTL expires with- out collision, the final node returns an “all clear” message. The user then sends the data down the path established by the initial insert message. +- Each node along the path verifies the data against its GUID, stores it, and creates a routing table entry that lists the data holder as the final node in this chain. +**** Data Encryption +- For political or legal reasons, node operators might wish to remain ignorant of the contents of their data stores. To this end, we encourage pub- lishers to encrypt all data before insertion. +- Data encryption keys are not used in routing or included in network messages. Inserters distribute them directly to end users at the same time as the corresponding GUIDs. +- Thus, node operators can not read their own files, but users can decrypt them after retrieval. +**** Network Evolution +- The network evolves over time as new nodes join and existing nodes create new connections after handling queries. As more requests are handled, local knowledge about other nodes in the network improves, and routes adapt to become more accu- rate without requiring global directories. +***** Adding Nodes +- To join the network, a new node first generates a public-private key pair for itself. This pair serves to logically identify the node and is used to sign a physical address reference. +- Certification might be useful in the future for deciding whether to trust a new node, but for now Freenet uses no trust mechanism. +- Next, the node sends an announcement message including the public key and physical address to an existing node, located through some out-of-band means such as personal communication or lists of nodes posted on the Web +- the nodes in the chain collectively assign the new node a random GUID in the key- space using a cryptographic protocol for shared random number generation that prevents any par- ticipant from biasing the result. This assigns the new node responsibility for a region of keyspace that all agree on, while preventing evil people from influencing the assignment, as all have to agree. +***** Training Routes +- As more requests are processed, the network’s routing should become better trained. Nodes’ routing tables should specialize in handling clusters of similar keys because each node will mostly receive requests for keys that are similar to the keys it is associated with in other nodes’ routing tables. +- When those requests succeed, the node learns about previously unknown nodes that can supply such keys and creates new routing entries for them. +- Taken together, the twin effects of clustering in routing tables and data stores should improve the effectiveness of future queries in a self-rein- forcing cycle. +***** Key Clustering +- Because GUID keys are derived from hashes, the closeness of keys in a data store is unrelated to the corresponding files’ contents. +- This lack of semantic closeness is unimportant, however, because the routing algorithm is based on the locations of particular keys, rather than particular topics. +- In fact, hashes are useful because they ensure that similar works will be scattered throughout the network, lessen- ing the chances that a single node’s failure will make an entire category of files unavailable +**** Searching +- Not solved yet +- Freenet can be spidered, or individuals can publish lists of bookmarks. However, these approaches are not entirely satisfactory in terms of Freenet’s design goals. +- One simple approach for a true Freenet search would be to create a special public subspace for indirect keyword files. When authors insert files, they could also insert several indirect files corre- sponding to search keywords for the original file. +- The system would allow multiple keyword files with the same key to coexist (unlike with normal files), and requests for such keys could return mul- tiple matches. +**** Managing Storage +- To encourage participation, Freenet does not require payment for inserts or impose restrictions on the amount of data that publishers can insert. +- Given finite disk space, however, the system must sometimes decide which files to keep. It currently prioritizes space allocation by popularity, as measured by the frequency of requests per file. Each node orders the files in its data store by time of last request, and when a new file arrives that cannot fit in the space available, the node deletes the least recently requested files until there is room. +- Because routing table entries are smaller, they can be kept around longer than files. Evicted files don’t necessarily disappear right away because the node can respond to a later request for the file using its routing table to contact the original data holder, which might be able to supply another copy. +- Why would the original holder be more like- ly to have the file? Freenet’s data holder pointers have a treelike structure. Nodes at the leaves might see only a few local requests for a file, but those higher up the tree receive requests from a larger part of the network, which makes their copies more popular. +- File distribution is therefore determined by two competing forces: tree growth and pruning. +- The query-routing mechanism automatically cre- ates more copies in an area of the network where a file is requested, and the tree grows in that direction. +**** Performance Analysis +- Freenet demonstrates good scal- ability and fault-tolerance characteristics that can be explained in terms of a small-world network model.5 Small-world networks are characterized by a power-law distribution +- In such a distribution, the majority of nodes has relatively few local connections to other nodes, but a significant small number of nodes have large wide-ranging sets of connections. +- This is not surprising, as power-law distributions tend to arise naturally when networks grow by pref- erential attachment +- The new-node announcement protocol initially creates a preferential attachment effect because following random links gives a higher probability of arriving at nodes that have more links. +- During normal oper- ation, the effect continues because well-known nodes tend to see more requests and become even better connected (“the rich get richer”). +***** Path length +- By extrapolation, it appears that Freenet should be capable of scaling to one million nodes with a median path length of just 30. +***** Fault Tolerance +- the network is surprisingly robust against quite large failures. +- The power-law distribution gives small-world networks a high degree of fault tolerance6 because random failures are most likely to eliminate nodes from the poorly connected majority. +- A small-world network falls apart much more quickly, however, if the well-connected nodes are targeted first.