@adlrocha - Playing with GossipSub

And the release of my PubSub Playground

Jul 12, 2020

woman whispering on woman's ear while hands on lips

When one thinks about security and scalability in blockchain and distributed systems, the first thing that comes to everyone’s mind are consensus algorithms. The only thing preventing us from higher transaction throughputs and more resilient blockchain systems are better consensus algorithms, right? Well, unfortunately, this is only partially true. All these mechanisms require of the use of an underlying messaging protocol to communicate with the rest of the (unstructured) network, and orchestrate the operation of the consensus algorithm and any other overlaying scheme the system may have. Thus, in order to have a resilient and high performant consensus algorithm, we first need a resilient and high performant messaging protocol.

If I asked you what is the underlying messaging protocol used in Bitcoin or Ethereum, would you be able to answer? Until last week, I wouldn’t have been able to give an accurate answer either. We all have a general intuition, but many of us disregard this important component in blockchain platforms.

In case you were wondering, Bitcoin uses a flooding algorithm to spread new transactions and blocks to the rest of the network. In a flooding algorithm, nodes broadcast new messages to every other node connected in the network. This ensures that messages propagate as fast as possible while adding traffic redundancy. Ethereum’s current messaging protocol is a bit more “bandwidth efficient” than Bitcoin’s flooding. Instead of every node broadcasting messages to every other connected peer in the network, they randomly select sqrt(N) nodes to broadcast the message to, reducing a bit the traffic redundancy and bandwidth requirements of the system.

As practically shown for years in the Bitcoin and Ethereum networks, these protocols “kind of work fine”. But could we envision better messaging protocols for next-generation blockchain and p2p networks? Fortunately, we have a new good candidate for this, say hello to GossipSub.

GossipSub Publish/Subscribe

GossipSub is a brand new pubsub protocol proposal designed and implemented in libp2p. The aim of this new protocol is to enhance many of the limitations of existing pubsub and messaging protocols in unstructured networks (high bandwidth requirements, no delivery guarantees, high delivery latency, no mitigation schemes against potential attacks, etc.). GossipSub, and specifically GossipSub-v1.1, has been designed to incorporate reliance against a wide spectrum of attacks, ensuring fast message delivery.

So how does GossipSub work? The main focus of GossipSub is to deliver messages to all the nodes in the network subscribed to a certain topic. To achieve this, two overlay networks are built over the underlying p2p unstructured network: (i) a full-message peering mesh where a small number of nodes are connected using bidirectional links conforming a local mesh (the number of nodes to be included in the local mesh is determined by the degree of the network. This parameter controls the trade-off between speed, reliability, resilience and efficiency of the network); (ii) and a metadata-only network conformed by every peer in the network. This network is made up of all the network connections between peers that aren’t full-message peerings.

Diagram showing a large shaded area with scattered dots inside connected by
thick, dark lines representing full-message peerings between peers. Most of the
dots have three dark lines running from them to other dots. One of the dots has
four lines running from it and is labelled as “Peer reached upper bound”. A
different dot has only two lines running from it and is labelled “Peer reached
lower bound”. Beneath the diagram is a legend reading “Network peering degree = 3;
Upper bound = 4; Lower bound = 2“ accompanied with small symbols showing dots
with three, four and two lines running from them
respectively.

Source: Libp2p Pub/sub Documentation

The local mesh (full-message network) is conformed by nodes subscribed to the same topics (this is the way nodes choose what peers to add to their local mesh). Thus, when a peer subscribes to a topic, it selects some peers to become its full-message peers for that topic. In parallel, nodes in the global mesh (metadata network) use a gossipping algorithm to share information with the rest of the peers inside and outside their local mesh about things that are happening in the network. Hence, gossip messages include information about the subscription and unsubscription of connected nodes, recent messages seen in the network, etc.

Diagram showing a large shaded area with scattered dots inside connected by
many thin, light lines representing metadata-only peerings between peers. The
lines between the dots are labelled “Each peering is a network connection
between two peers”.

Source: Libp2p Pub/sub Documentation

When a peer wants to publish a message, it sends a copy to all full-message peers it is connected to; and when a peer receives a new message from another peer, it stores the message and forwards a copy to all other full-message peers. Peers remember a list of recently seen messages. This let them act upon a message only the first time they see it and ignore retransmissions of already seen messages (avoiding duplicate bandwidth).

Gossip messages propagate metadata throughout the network. Gossips are emitted to a random subset of peers that may or may not be part of the mesh, and they are used, for instance, to inform other peers about messages seen and that they may have missed.

Source: GossipSub paper

Additionally, GossipSub-v1.1 includes schemes to avoid a wide-range of attacks to the protocol. Every node keeps a score of every other peer it interacts with. Scores are private, and are never shared with other nodes (conforming a local view of the network from a specific node). The score function is used as a performance monitoring mechanism to identify and remove badly-behaving nodes from the mesh. It is computed using a set of interesting parameters from their counterparts such as the time a node has been part of the mesh, its rate of message delivery and failures, the number of invalid messages exchanged, etc.

This score is the foundation of many of the protection schemes introduced in GossipSub-v1.1 such as opportunistic grafting and adaptive gossip dissemination. In the sake of brevity, I will refer you to the recently released pre-print of GossipSub’s paper if you want to have a deeper understanding of its operation and design rationale. Even more, in this same announcement, an extensive evaluation report of the performance and security of the protocol (which I honestly had a blast reading), and the results of its security audit have also been released. In short, awesome references to imbibe all the knowledge around this awesome piece of tech.

The GossipSub Playground

And it’s time to get practical! You can find the implementation and specs of GossipSub in libp2p in case you want to start playing with it, or directly use it in your next application. But if the only thing you want to see is GossipSub in action in a simple network without having to worry about hacking anything, I built a simple tool to help you with this that I called “The PubSub Playground”. With this tool you will be able to easily deploy a local GossipSub network and watch how messages flow through the network while collecting interesting metrics.

To run the tool you just need to clone the repo and run the following commands:

$ go build  
$ ./pubsub-playground -pubs <number_publishers> -subs <number_subscribers> -log -server

With this simple command you will set up a network with your desired number of publishers and subscribers. Every publisher node will periodically publish messages into a topic with the same ID of his PeerID, while subscribers randomly subscribe to one of the available topics, and read the messages sent by the topic's corresponding publisher.

Throughout the run every node will collect data from the execution of GossipSub, and trace its local view of the exchange of messages with the rest of the network. All these traces are used to generate a set of metrics that can be checked in two ways:

By enabling the -log flag in the execution of the tool to periodically print metrics in stdout.
By using the -server flag that starts an HTTP Server and runs a WebSocket at ws://127.0.0.1/websocket so that anyone can subscribe to the socket and receive updates of the metrics collected in the network. Along with the websocket, a simple dashboard is provided at http://localhost:3000 so you can visually evaluate the evolution of your network metrics.

The implementation of this tool has been great to help me understand how GossipSub works and behaves under the hood with networks of different sizes. With the PubSub Playground we are able to draw conclusions about the behavior of the protocol such as:

The type of messages exchanged in a GossipSub network, and the amount of them required to deliver messages to subscribers according to the size of the network.
The behavior of the load of the network according to the number of publishers and subscribers.
Realize how the average delay of message delivery behaves, and understand how this metrics changes over time (from boot to stability).
See the average delay of messages from publishers to subscribers, and the rate of “useful” and “control” messages in the system.

And there are still a bunch of other things I could add to drive our analysis further like:

Dynamically set the number of publishers and subscribers (so we can add new nodes with the network running) and play with the network churn.
Enable the publication and subscription to more than one topic for peers.
Set additional delays and latencies in node connections to see how this affects the protocol.
One that I am really looking forward to start exploring (but haven’t got the time to tackle): use all the data collected to draw a heat map that show (in real time?) how messages flow and are exchanged between the different nodes in the network. This would allow us to draw conclusions about “hot links” between nodes, observe the formation of the local meshes, understand the bandwidth supported by links, etc.
And I would also love to start adding rogue nodes to the tool so we can setup and visualize the behaviour of the protocol when the network is under attack (as showed in GossipSub’s evaluation report).

And nothing else from my part, this was all for today. I would love to hear your thoughts about this simple project, and I hope you are as excited as me with the work being done around GossipSub and other brand new networking protocols for the next-generation of the Internet. See you next week (hopefully my next publication will be written from a beautiful Spanish beach :) )

@adlrocha Weekly Newsletter

Discussion about this post