@adlrocha - Performance Best Practices in Hyperledger Fabric II: Infrastructure and Architecture
The "Fabric Into Production" Series (Part 2)
|Alfonso de la Rocha||Mar 29, 2020||1|
Welcome to the second part of this series dedicated to performance best practices in Hyperledger Fabric. In my previous publication I just gave a brief introduction of what to expect from this series, and I shared how we structured our tests, the approach we were following, and the type of conclusions we were looking to draw. In the end, with this analysis we were looking to understand Hyperledger Fabric in production, and how much this could cost.
Today we are going to share our results and best practices in the two lower levels of our “testing stack”, i.e. infrastructure and architecture.
Infrastructure, the Key.
It may seem obvious, but the amount of hardware you dedicate to the deployment of your network significantly affects the maximum performance you can get from it. I say it may seem obvious now, but when we started this matter gave us quite a few headaches. If you read any of the research articles we used as a base for our analysis, you may draw the wrong conclusion that in order to optimize Fabric’s performance you only need to fine-tune several configurations throughout all its layers according to the use case and the magic happens. Actually, this is the case, provided that you use enough hardware under the hood.
Before this analysis, we were deploying our Fabric networks over a Kubernetes cluster of three nodes with, what we thought, was enough hardware for all the tests we were going to do. To our surprise, while testing different network configurations in which we scaled the number of endorser peers and orderers, we started getting weird errors that didn’t fit our initial assumptions, and we couldn’t figure out what was happening. To give you a glimpse of what we were facing, this was the error we were getting when we weren’t givinb enough hardware resources to our Fabric network: “TLS handshake failed with error EOF […]”.
Who could have guessed that this error was related with the lack of hardware resources in our infrastructure? We checked our certificates, we checked our peers’ TLS connections, we checked everything we could think of, until one day, by chance, we decided to set up a bigger Kubernetes cluster for our tests. There you go! No more “TLS handshake failures”. Apparently the problem was that we were scaling so much our network in number of peers considering the hardware resources we had available, that peers didn’t have enough resources even to perform the TLS handshake.
But this was a consequence of a bigger problem, we were deploying different configurations of Fabric networks without clearly understanding the consumption footprint of its various modules. Yes, to avoid these problems, we could have given our clusters enough hardware to feed a quantum physics simulation, but if we had done this, we wouldn’t have learned the optimal amount of resources required for different Fabric network architectures, disregarding one of our goals, understanding a Fabric deployments cost model. Unlike in the research papers I mentioned in the first part of the series where “hardware wasn’t a problem”, we were following a “hardware scarcity approach” for our tests. We wanted to perform a realistic analysis where we may not have all the hardware we needed at our hands.
Related to the infrastructure, and again by chance, we found out that another factor that greatly affects performance is the infrastructure’s latency. I realized this one day that I was working from home. I was doing exactly the same tests I had done the day before in the office, but I was getting way worse results. It occurred to me that it could be related to my Internet connection, and that was it. I did a few tests forcing higher latencies between entities in the network, and there it was, the larger the connection delays, the lower the performance results.
Finally, the last question we wanted to answer at an infrastructure level was if it was better to deploy a Fabric network over bare metal or using Kubernetes (our default choice in production). The results showed that with high loads, Kubernetes introduced and overhead due to its kube-system pods that should be accounted for when designing the underlying cluster for the network. However, despite the better performance/hardware ratio of a bare metal deployment, the increase in management complexity of having bare metal deployments in production didn’t justify the increased overhead introduced by the kube-system.
(BTW, we found out that the kube-system consumption being directly proportional to the load supported by the network was due to the increase in the communication load between entities in the Kubernetes cluster. Just in case you were curious).
Learning in Infrastructure
The underlying infrastructure used to deploy the Fabric network matters for performance.
Ensure that you always have enough vCPUs in your infrastructure to accommodate your network.
Kube-system introduces a appreciable overhead with high loads compared to a bare-metal deployment.
Understand the delay of your infrastructure. Network delay between entities introduces performance overheads.
It is key to understand the computational footprint of Fabric modules to make efficient deployment.
Let’s move on to the architecture layer. As I stressed above, one of our big problems when deploying our Fabric networks was that we didn’t clearly understand the computational footprint of each Fabric entity. In this layer, our goal was to understand these footprints.
In this case, we gave enough hardware resources to our Kubernetes cluster. We didn’t want the infrastructure to be a limitation. We started analyzing the impact of peers in performance. The first obvious conclusion of our tests was that in order to increase the transaction throughput of our network, we had to scale in the number of endorser peers (i.e. peers that simulate and validate transactions). For this, we tested different Fabric architectures with the same number of orderers and exclusively scaling in the number of endorser peers (transactions where balanced using a round-robin policy between all the peers so we could be sure that every node was endorsing transactions).
For this same test, we used different load profiles to understand how an increase in the load affected their computation consumption. The results were clear, the higher the load, the larger the CPU consumption of peers. We followed these same tests fixing the number of peers and scaling the number of RAFT orderers (and even CAs). In this case, the increase in CPU with high loads wasn’t that pronounced. In short, in order to avoid bottlenecks at an architecture level we had to be sure that endorser peers had enough CPUs available to operate (we could forget about orderers).
Another thing that we realized from our tests, was that how we mapped Fabric entities into the physical infrastructure could affect performance. Lets take a Kubernetes cluster as an example. Deploying two endorser peers in the same node of the cluster would make them to cannibalize each other’s CPU resources with high loads. If we didn’t limit the amount of hardware they could consume, they would expand in the node fighting for all the available CPU. Fortunately, this is something that can be easily managed balancing the deployment of CPU-intensive endorsing peers in different cluster nodes, and ensuring that they coexist in their physical infrastructure with other non-intensive entities such as a CA or an orderer.
Other question we were looking to answer with our tests was related to Fabric channels. Do channels affect performance? The answer was pleasantly surprising. For this analysis we fixed the number of endorsement peers and we scaled the number of channels in the network, balancing the load between peers and channels. To our surprises, provided that we had enough computational resources to accommodate the architecture, channels not only didn’t harm the performance, but the more channels the higher the transaction throughput. Even more, in terms of overall throughput of the network, the performance of a network of two endorser peers with two channels is equivalent to one with four endorser peers and a single channel. To support our assumption, a similar conclusion was drawn in this post from the IBM blog about Fabric’s performance.
And now you’ll forgive me for my insistence, but I can’t stress this enough: all of these results are this way provided that your infrastructure has enough vCPUs to accommodate your architecture. Imagine how important is this that leveraging the results of the million (ok, they may have been less… half a million, maybe?) tests that we did, we inferred an equation to easily compute the minimum number of vCPUs we required to accommodate a specific architecture considering a desired transaction throughput and a certain number of endorsing peers, orderer nodes, CouchDB databases, etc.
Finally, what about CouchDB and LevelDB? What is best? We tested different network architectures using peers with CouchDB and LevelDB, and as we expected (and many in the literature have already advanced), the performance of the network with LevelDB peers is significantly better than with CouchDB. Nonetheless, when we discussed this with our infrastructure team, they told us that in order to have peers with high-availability it was better to use CouchDB. I guess there is not an optimal choice here, and it will depend on your use case.
Learning in Architecture
Endorsing peers are intensive in CPU while orderers and CAs are non-CPU intensive. When mapping your deployment try to balance endorsing peers in your different physical nodes.
To scale the transaction throughput of a Fabric network we need to scale in endorsing peers and channels (provided you have enough vCPUs to accommodate the network in your physical infrastructure).
Peers with LevelDB offer higher performance and transaction throughputs than peers with CouchDB. It is easier to do a high-availability deployment of Fabric peers using CouchDB.
This is the end of Part II
In this part I shared our learning and a set of best practices in the lower levels of our Fabric stack. I really hope that you have extracted a few good pieces of knowledge from our experience. In the following parts of the series we will climb a bit more in the stack answering interesting questions about performance such as how to fine-tune your Fabric protocol, or what to bear in mind while designing your chaincodes.
Do not forget to subscribe in order not to miss a thing, and as usual, any suggestions, comments, feedback, or questions you may have… you most probably know where to reach me by now. See you next week!
Part II: Architecture and Infrastructure (you are here).