The IPFS Revolution
Better, Faster, Stronger
Back to the roots of the Internet
On October 29, 1969, a single word, “login”, was sent on the ARPANET network, the ancestor of the Internet. It was the first data transfer on a network. Half a century later, we are now connected individuals: intimately involved in our daily lives, the Internet has probably become our most valuable technology.
However, what was initially a peer-to-peer network evolved. The quest for performance and the competition between companies have shaped the Internet so its architecture has become more and more centralized.
This Minitel 2.0 criticized by Benjamin Bayart in a famous conference in 2007 (1) questions a fundamental principle of the Internet, its neutrality. The absence of centers is a condition to the freedom of exchanges within the network.
The IPFS (2) (InterPlanetary File System) protocol is a distributed file system. Its creator Juan Benet was inspired by the structure of BitTorrent, a peer-to-peer data transfer protocol. This structure would help solving the issues of the centralized Internet.
The XSL Labs technical document mentions IPFS, as we want to implement this protocol in our ecosystem, especially when it comes to the availability of Verifiable Credentials issuers public profiles.
Internet: From centralisation to distribution
The Internet has developed and is still developing under the Minitel client-server model, giving access to a plethora of centralized services. The rise of giants such as GAFAM is no coincidence: it is the result of the massification of the Internet and centralized architectures themselves, which are easy to design.
In this model, users connect to and use services that belong to central authorities. These authorities concentrate very large powers. This allows Facebook to unilaterally remove content, this also allowed the existence of the NSA’s PRISM surveillance program.
In the centralized Internet, “clients” connect to huge “servers” and “data centers” to find information. Its integrity depends on central authorities.
The entities behind powerful services whose tools are massively adopted end up generating huge profits extracting and exploiting their users’ personal data, while the competition becomes negligible.
Decentralizing the network requires more participating interconnected machines. The resources are copied over multiple central servers. Note that this does not ensure user’s independence as the user remains bound to an access point that will connect him to the network.
The distributed Internet removes centrality, giving users back their prerogative as actors of the network: everyone can become a node in the network, and all nodes contain data.
This is the IPFS approach: a distributed system, where all the clients are equal, and an answer to the many flaws of centralized systems such as monopolies, breakdowns, censorship and surveillance.
The IPFS helps overcoming the permanent extension of censorship measures, as highlighted by the Turkish example.
In April 2017 in Turkey, the government decides to block access to Wikipedia (3).
In response, a copy of the site is put online using the IPFS protocol, a version that the Turkish government cannot block, as it is distributed.
The non-profit Access Now that defends digital freedoms, estimates in its 2018 report (4) “The state of Internet shutdowns around the world” the number of Internet blockings at 196, in 25 different countries. This number is constantly increasing, this phenomenon becomes more and more visible.
The strength of IPFS lies in a modification of the very nature of content addressing on the Internet.
The centralized Web addresses content using its location. Any content is hosted on a server with a numerical label, the so-called IP (for Internet Protocol) address, which is converted into a domain name via the DNS (Domain Name System). Accessing content requires searching for it in a specific location. This makes it very easy to compromise the accessibility of a content, be it due to a failure, the will of a host or a government or even a denial-of-service attack (DDoS). In addition, this addressing method allows malicious modifications to the content, or even its vanishing when the content is moved, its address changes, or the website closes.
The IPFS system replaces this “location-based addressing” with “content-based addressing”. The content itself is indexed and retrieved using its CID (content identifier)
This is what it actually looks like:
• Using location-based addressing, a content will take this form:
The picture (document « illustration-3-1.png ») is located in a tree view of files (wp-content/uploads/2021/03/) itself located precisely on the server corresponding to xsl- labs.org.
• Now the same content using content-based addressing: /ipfs/#hashcode/wp-content/uploads/2021/03/illustration-3-1.png
The picture (illustration-3-1.png) is still located in the tree view of files (wp-content/uploads/2021/03/).
However, a unique hash (#hashcode) replaces the location. This hashcode or CID does not refer to a physical server, it is generated from the content itself.
The hashcode is the result of the use of a cryptographic hashing function that guarantees the integrity of the data that is returned.
In other words, as long as the content is available somewhere, it is retrievable unaltered.
With data no longer located on a central point, but in a swarm of nodes, the IPFS project brings the promise of an Internet that can no longer be disconnected: the “persistent” web.
IPFS and Merkle Trees
Content-based addressing revolves around the use of unique identification keys (CIDs), which are generated by hashing the contents and structured in the form of Merkle trees.
To share a file using IPFS, the file is fragmented into several blocks of a certain size (usually 256kB). The number of blocks thus depends on the size of the file.
These blocks, because of their small size, are easily replicable and thus spread over the network, in order to guarantee the availability of the data.
Each of these blocks is identified by the hash matching its content.
The hashes of these blocks are then combined in pairs and themselves hashed, until a single hash is obtained: the root hash or CID.
Through a system of identification and links between the blocks, the CID allows access to all the blocks that are recombined in order to reconstruct the original file. This is called DAG or Directed Acyclic Graphs. Juan Benet uses the expression “Merkle DAGs” to describe this structure.
This hash system also guarantees data integrity: if a single block of data is corrupted or modified, its hash changes, as well as those of its “parents” up to the root hash. This ensures that the content is unaltered.
Ensure content availability
Juan Benet’s IPFS project is to distribute the Internet. Besides the various issues that we have already mentioned, the distribution wants to solve the problem of network congestion. Bandwidth saturation, along with the increase in flows, particularly video flows, degrades services. In 2020 Netflix or Youtube were compelled to reduce the default quality of their video streams under the impulse of several governments. All this to prevent network saturation.
The strength of IPFS is also its weakness. In order to create this distributed network, this constellation of nodes, it is necessary that most users of the network become part of it. According to Juan Benet, this would optimize the functioning of the network: since the information is no longer centralized, it can be retrieved at the closest or most accessible location.
However, if the protocol is not widely adopted, and in a similar way to peer-to-peer protocols, the lack of “seeders” would make the content disappear.
It is therefore not conceivable that content is only available on one or a few nodes in the network, and IPFS Clusters aims to solve this issue. This service allows content to be pinned (signifying its importance) in order to ensure that it is stored on a network of IPFS nodes, and thus that the content is always accessible.
Regarding XSL Labs, the use of IPFS will allow all users of our ecosystem to be able to confirm the legitimacy of Verifiable Credentials issuers at any time: an IPFS reference will allow them to easily find the public information of the Verifiable Credential of these issuers.
Making the web more open, distributing it, making it constantly more efficient and open source, is a major challenge for the future. This also means reclaiming control over our data and our personal data. It also means fighting against the risks of authoritarian abuses such as censorship, blocking or mass surveillance. From this point of view, the IPFS marks a step towards the original ideals of the Internet.