@adlrocha - WebBundles are built for content-addressable networks

And not for the traditional Internet.

Aug 30, 2020

The other day I woke up to this post from Brave on HackerNews explaining the potential harms that WebBundles, the new standard proposed by Google for the web, can have over the Open Web as we know it. I know I am pretty biased, but the moment I read this article my brain started making connections between WebBundles, the decentralized web, IPFS, etc. and I came to the following conclusion: WebBundles may be harmful for the traditional Open Web, but they can come pretty handy for the Decentralized Web and the future of the Internet.

Introducing the WebBundle standard

WebBundles (not to confuse with Webpack) is a standard proposed by Google to enable the bundling of websites as a single file making it shareable. All of the assets of a website are included in the same package. Some of the things we will be able to do with WebBundles (according to this post) are: to create our own content and distribute it in all sorts of ways without being restricted to the network; to share a web app or piece of web content with our friends via Bluetooth or Wi-Fi Direct; or to carry our site on a USB or even host it on our own local network.

I don’t know the usefulness of all of these things, but I will add one of my own, to receive the bundle from any machine in the world and not the generator of the content (yep, like a CDN, but more about this in a moment).

A WebBundle is a file format for encapsulating one or more HTTP resources in a single file. It can include one or more HTML files, JavaScript files, images, or stylesheets. The bundle is a CBOR file with a .wbn extension (by convention) which packages HTTP resources into a binary format, and is served with the application/webbundle MIME type. HTTP resources in a Web Bundle are indexed by request URLs, and can optionally come with signatures that vouch for the resources.

So now, if we make a request to https://example.com instead of being served the index file that references the urls of all the assets required to render the website, we will be redirected to, for instance, https://example.com/<bundle_id>, where we will download the bundle including all the assets of the site generated for consumption.

The WebBundle specifications consists of several layers:
Signed HTTP exchanges (a.k.a. SXG) (IETF draft): These allow a browser to trust that a single HTTP request/response pair was generated by the origin it claims.
Web Bundles (previously called Bundled HTTP exchanges) (IETF draft): A collection of HTTP resources, each of which could be signed or unsigned, with some metadata describing how to interpret the bundle as a whole.
Loading: A description of how browsers load signed and bundled exchanges.

Brave’s view on the matter

The folks at Brave are worried about the WebBundle standard because:

“it allows websites to “bundle” resources together, and will make it impossible for browsers to reason about sub-resources by URL. This threatens to change the Web from a hyperlinked collection of resources (that can be audited, selectively fetched, or even replaced), to opaque all-or-nothing “blobs” (like PDFs or SWFs).”
[…]
“By changing URLs from meaningful, global identifiers into arbitrary, package-relative indexes, WebBundles give advertisers and trackers enormously powerful new ways to evade privacy and security protecting web tools.”

WebBundle will allow sites to:

“Evade privacy and security tools, as URLs in WebBundles are arbitrary references to resources in the bundle, and not globally shared references to resources. What on the current Web is referred to everywhere as, say, example.org/tracker.js, could in one WebBundle be called 1.js, in the next 2.js, in the third 3.js, etc”, so caching the “spying script” so privacy-enforcing tools can block it becomes significantly complex.
“Even worse, WebBundles would allow sites to evade blocking tools by making the same URL point to different things in each bundle. On the current Web, https://example.org/ad.jpg points to the same thing for everyone. It’s difficult for a website to have the same URL return two different images from the same URL. As a result, blocking tools can block ad.jpg knowing that they’re blocking an advertisement for everyone;”
Evading Privacy Tools By Hiding Dangerous URLs.
Make Privacy Violations that are Currently Difficult, Easy

I highly recommend checking out the article to completely grasp the security and privacy concerns shared by the outstanding research engineers at Brave.

IPFS as a global CDN

Brave concerns about WebBundles are legit in a location-based addressing Internet, but all of them would immediately be removed the moment we switch from a location-based addressing to a content-based addressing approach for the Internet. But you know what? We already have a content-based addressing version of the Internet, and it is called IPFS. Why not start testing these kind of standards there?

To understand the differences and implications between these two approaches for the Internet, let me share these two great paragraphs from this post of my colleague Yiannis.

“Our internet was designed to operate on location-based addressing. Essentially, what this means is that if two households on the same street are streaming the same content, the network will need to transport that same information from the original source location two times: first to one household, and then to another. This is because our requests are being forwarded to the (IP address of the) server where the content lives.
With content-based addressing, the content can be delivered by previous recipients, instead of re-transporting the data from the original source every time. The first household, or a close-by network router, will temporarily store the content locally on the device, and that device will then serve as the source when another neighbor requests this piece of content. With content-based addressing, content has a unique identifier. (Think of this like the serial number of a device, or the digital object identifier or ISBN number of a digital publication.) Our requests are explicitly asking for the content itself, instead of the IP address where the content lives. Therefore, our requests will find the content copy that is stored the closest to the requester and will not need to travel to the original source.”

You see why WebBundles may not make sense in the traditional web but could be a leap forward in the content-based addressing Internet? WebBundles can be stored in a decentralized network and served to users from their “nearest” peers storing the content. This alternative architecture for the web has a CDN by-design. No more going to the backbone of the Internet to get all the content, how cool is that?

Even more, in content-addressable networks, bundles are identified uniquely, so no more forging links to different versions of a bundle. If https://example.com points to the bundle <CID1>, every request to this URL will lead to the same content. Additionally, this content will be signed using SGX, so not only we can check that the bundle is the one that it should, but also we can validate the source where the content was generated. This idea is not novel, and it has already been briefly prototyped as shown in this video:

Of course, I may have oversimplified the scenario for illustrative reasons, but the point I want to stress here is that, while WebBundles may seem like a horrible idea for the traditional Internet, it may not be such a bad thing over alternative architectures as the one proposed by IPFS.

Stating my case

To me, WebBundles feel like another step towards bridging the gap between the web and our devices. In the end, a WebBundle is to the browser what an application package is to an operating system, with the main difference that WebBundles can be seen as more dynamic than traditional software packages. Downloading an update of a WebBundle is equivalent to pressing F5 in your browser. We are becoming increasingly dependent on web applications, while applications itself are developing into outstandingly complex systems, and WebBundles can open the door to new applications, UXs, and new ways of using the Internet.

Computation is being decoupled from the underlying infrastructure. What technologies like WebBundle or WebAssembly are aiming, in my opinion, is to build the foundations of a world where code can be executed seamlessly anywhere. No more vendor lock-in, browser/operating system dependencies, or system specific executions. What the heck! One day we may don’t need devices at all and everything will be in the network, we will only have interfaces to a global connectivity, computation and storage. But I guess we will have to move one step at a time. See you next week!

If you want to help building this vision, do not hesitate to check this out, it may be of your interest ;)

@adlrocha Weekly Newsletter

Discussion about this post