@adlrocha - Performance Best Practices in Hyperledger Fabric II: Infrastructure and Architecture

The "Fabric Into Production" Series (Part 2)

graphs of performance analytics on a laptop screen

Welcome to the second part of this series dedicated to performance best practices in Hyperledger Fabric. In my previous publication I just gave a brief introduction of what to expect from this series, and I shared how we structured our tests, the approach we were following, and the type of conclusions we were looking to draw. In the end, with this analysis we were looking to understand Hyperledger Fabric in production, and how much this could cost.

Today we are going to share our results and best practices in the two lower levels of our “testing stack”, i.e. infrastructure and architecture.

Infrastructure, the Key.

It may seem obvious, but the amount of hardware you dedicate to the deployment of your network significantly affects the maximum performance you can get from it. I say it may seem obvious now, but when we started this matter gave us quite a few headaches. If you read any of the research articles we used as a base for our analysis, you may draw the wrong conclusion that in order to optimize Fabric’s performance you only need to fine-tune several configurations throughout all its layers according to the use case and the magic happens. Actually, this is the case, provided that you use enough hardware under the hood.

Before this analysis, we were deploying our Fabric networks over a Kubernetes cluster of three nodes with, what we thought, was enough hardware for all the tests we were going to do. To our surprise, while testing different network configurations in which we scaled the number of endorser peers and orderers, we started getting weird errors that didn’t fit our initial assumptions, and we couldn’t figure out what was happening. To give you a glimpse of what we were facing, this was the error we were getting when we weren’t givinb enough hardware resources to our Fabric network: “TLS handshake failed with error EOF […]”.

Who could have guessed that this error was related with the lack of hardware resources in our infrastructure? We checked our certificates, we checked our peers’ TLS connections, we checked everything we could think of, until one day, by chance, we decided to set up a bigger Kubernetes cluster for our tests. There you go! No more “TLS handshake failures”. Apparently the problem was that we were scaling so much our network in number of peers considering the hardware resources we had available, that peers didn’t have enough resources even to perform the TLS handshake.

But this was a consequence of a bigger problem, we were deploying different configurations of Fabric networks without clearly understanding the consumption footprint of its various modules. Yes, to avoid these problems, we could have given our clusters enough hardware to feed a quantum physics simulation, but if we had done this, we wouldn’t have learned the optimal amount of resources required for different Fabric network architectures, disregarding one of our goals, understanding a Fabric deployments cost model. Unlike in the research papers I mentioned in the first part of the series where “hardware wasn’t a problem”, we were following a “hardware scarcity approach” for our tests. We wanted to perform a realistic analysis where we may not have all the hardware we needed at our hands.

Related to the infrastructure, and again by chance, we found out that another factor that greatly affects performance is the infrastructure’s latency. I realized this one day that I was working from home. I was doing exactly the same tests I had done the day before in the office, but I was getting way worse results. It occurred to me that it could be related to my Internet connection, and that was it. I did a few tests forcing higher latencies between entities in the network, and there it was, the larger the connection delays, the lower the performance results.

Finally, the last question we wanted to answer at an infrastructure level was if it was better to deploy a Fabric network over bare metal or using Kubernetes (our default choice in production). The results showed that with high loads, Kubernetes introduced and overhead due to its kube-system pods that should be accounted for when designing the underlying cluster for the network. However, despite the better performance/hardware ratio of a bare metal deployment, the increase in management complexity of having bare metal deployments in production didn’t justify the increased overhead introduced by the kube-system.

(BTW, we found out that the kube-system consumption being directly proportional to the load supported by the network was due to the increase in the communication load between entities in the Kubernetes cluster. Just in case you were curious).

Learning in Infrastructure

  • The underlying infrastructure used to deploy the Fabric network matters for performance.

  • Ensure that you always have enough vCPUs in your infrastructure to accommodate your network.

  • Kube-system introduces a appreciable overhead with high loads compared to a bare-metal deployment.

  • Understand the delay of your infrastructure. Network delay between entities introduces performance overheads.

  • It is key to understand the computational footprint of Fabric modules to make efficient deployment.

Fabric’s Architecture

Let’s move on to the architecture layer. As I stressed above, one of our big problems when deploying our Fabric networks was that we didn’t clearly understand the computational footprint of each Fabric entity. In this layer, our goal was to understand these footprints.

In this case, we gave enough hardware resources to our Kubernetes cluster. We didn’t want the infrastructure to be a limitation. We started analyzing the impact of peers in performance. The first obvious conclusion of our tests was that in order to increase the transaction throughput of our network, we had to scale in the number of endorser peers (i.e. peers that simulate and validate transactions). For this, we tested different Fabric architectures with the same number of orderers and exclusively scaling in the number of endorser peers (transactions where balanced using a round-robin policy between all the peers so we could be sure that every node was endorsing transactions).

For this same test, we used different load profiles to understand how an increase in the load affected their computation consumption. The results were clear, the higher the load, the larger the CPU consumption of peers. We followed these same tests fixing the number of peers and scaling the number of RAFT orderers (and even CAs). In this case, the increase in CPU with high loads wasn’t that pronounced. In short, in order to avoid bottlenecks at an architecture level we had to be sure that endorser peers had enough CPUs available to operate (we could forget about orderers).

Another thing that we realized from our tests, was that how we mapped Fabric entities into the physical infrastructure could affect performance. Lets take a Kubernetes cluster as an example. Deploying two endorser peers in the same node of the cluster would make them to cannibalize each other’s CPU resources with high loads. If we didn’t limit the amount of hardware they could consume, they would expand in the node fighting for all the available CPU. Fortunately, this is something that can be easily managed balancing the deployment of CPU-intensive endorsing peers in different cluster nodes, and ensuring that they coexist in their physical infrastructure with other non-intensive entities such as a CA or an orderer.

Other question we were looking to answer with our tests was related to Fabric channels. Do channels affect performance? The answer was pleasantly surprising. For this analysis we fixed the number of endorsement peers and we scaled the number of channels in the network, balancing the load between peers and channels. To our surprises, provided that we had enough computational resources to accommodate the architecture, channels not only didn’t harm the performance, but the more channels the higher the transaction throughput. Even more, in terms of overall throughput of the network, the performance of a network of two endorser peers with two channels is equivalent to one with four endorser peers and a single channel. To support our assumption, a similar conclusion was drawn in this post from the IBM blog about Fabric’s performance.

And now you’ll forgive me for my insistence, but I can’t stress this enough: all of these results are this way provided that your infrastructure has enough vCPUs to accommodate your architecture. Imagine how important is this that leveraging the results of the million (ok, they may have been less… half a million, maybe?) tests that we did, we inferred an equation to easily compute the minimum number of vCPUs we required to accommodate a specific architecture considering a desired transaction throughput and a certain number of endorsing peers, orderer nodes, CouchDB databases, etc.

Finally, what about CouchDB and LevelDB? What is best? We tested different network architectures using peers with CouchDB and LevelDB, and as we expected (and many in the literature have already advanced), the performance of the network with LevelDB peers is significantly better than with CouchDB. Nonetheless, when we discussed this with our infrastructure team, they told us that in order to have peers with high-availability it was better to use CouchDB. I guess there is not an optimal choice here, and it will depend on your use case.

Learning in Architecture

  • Endorsing peers are intensive in CPU while orderers and CAs are non-CPU intensive. When mapping your deployment try to balance endorsing peers in your different physical nodes.

  • To scale the transaction throughput of a Fabric network we need to scale in endorsing peers and channels (provided you have enough vCPUs to accommodate the network in your physical infrastructure).

  • Peers with LevelDB offer higher performance and transaction throughputs than peers with CouchDB. It is easier to do a high-availability deployment of Fabric peers using CouchDB.

This is the end of Part II

In this part I shared our learning and a set of best practices in the lower levels of our Fabric stack. I really hope that you have extracted a few good pieces of knowledge from our experience. In the following parts of the series we will climb a bit more in the stack answering interesting questions about performance such as how to fine-tune your Fabric protocol, or what to bear in mind while designing your chaincodes.

Do not forget to subscribe in order not to miss a thing, and as usual, any suggestions, comments, feedback, or questions you may have… you most probably know where to reach me by now. See you next week!

@adlrocha - Performance Best Practices in Hyperledger Fabric I: Introduction

The "Fabric Into Production" Series (Part 1)

Inside The Actors Studio Apartment: October 2013

For the past few months me and my team have been working on an extensive work of analyzing, understanding, and improving the performance of a Hyperledger Fabric in order to draw a set of performance best practice for the deployment of Fabric production networks. I was supposed to present this work first in a talk at the Hyperledger Global Forum in Phoenix, and on my return in a Hyperledger Meetup in Madrid. Unfortunately, due to the well-known global circumstances we are living these days, this was impossible.

I didn’t want all this job to be forgotten, specially when I think it could benefit the Fabric community so much. A lot of companies are developing use cases over Hyperledger Fabric, but few are talking about how hard moving into production a Fabric-based use case can be, and the set of best practices to be followed in order to succeed at this. That is why, I decided to start this series of publications where I will share all of our learning moving our proof of concept Hyperledger Fabric networks into production. With this series I hope to set a groundwork for further public performance analysis in corporate environments (my initial goal when submitting my talk to the Hyperledger Global Forum).

Preliminaries

I guess everyone starts developing their first Fabric-based use case in the same way. You go to Fabric’s “Build Your First Network” documentation, you deploy your simple network, you implement your business logic in a chaincode, deploy it in your “first network” and voilá! you’ve deployed your first blockchain use case. This is great for proof of concepts because with this simple setup you can test if it makes sense to use blockchain technology in your business problem, it allows you to validate the UX, implement and fine-tune your chaincode business logic, and even try your system with a limited number of users.

The problem comes once you’ve validated all of your preliminary assumptions, you’ve seen that your system works and is adding value to your company, and you want to move your use case “as-is” into production… this is where things start to break.

Your “first network” was prepared to comfortably accommodate a few thousand transactions a day to support your PoC use case, but in production you start noticing a significant increase in the number of transactions per second. And with this increase in the load of your network, the pressure and stress you feel, and the wetness in your forehead increases proportionately. At this point you start questioning many of your initial assumptions:

  • How much load is my infrastructure ready support? I didn’t consider this.

  • What is the optimal number of peers in the network for my use case? My “first network” has a few of them, but is this enough? I don’t know.

  • Users interact with the Fabric network through an SDK. I am using it right? Can the SDK be harming my network’s performance?

  • And WTF is this MVCC Conflict Error I am getting with high loads?

  • ARGHHHH!!!! Oh my! The performance! The network! Everything is falling apart!!

The aforementioned scenario is more than typical when moving your first blockchain network into production. The underlying complexity of blockchain platforms sometimes hardens the understanding of their core schemes, and their behavior with a high load. In order to explain why Fabric’s performance degraded so much with a high transaction throughput we had to understand what were the bottlenecks of the infrastructure. The first thing we did when we faced this problem was to turn to the literature. Had somebody published already anything related to the performance of Hyperledger Fabric? Fortunately, there were a few interesting works out there. These were the ones that helped us the most:

And almost at the end of our performance evaluations (January 2020) this great paper was published, confirming many of our reached assumptions. So it came to us as if it had fallen from the sky:

These papers were very helpful for our work, but they differed in certain aspects from the analysis we were looking to do:

  • These papers were strictly academic, they considered isolated test environments instead of the real corporate environments we were facing.

  • These papers considered an unlimited amount of underlying computing power available to run the network, while in our case every new machine supposed an additional cost that we had to add to our use case cost model.

  • We not only needed to understand how Hyperledger Fabric scaled technically in terms of performance, but also in terms of costs as we wanted to build products whose cost model is sustainable enough to scale with the load of the production system.

In short, the result of our work wasn’t supposed to be an optimal configuration of the network and the maximum number of transactions an infrastructure could accommodate, but a set of tuples [infrastructure, performance, cost] so according to the product and the use case we could decide an optimal setup (again, technically and in terms of costs).

The Experimental Setup

For our experiments we considered an environment that resembled the most what we were aiming to build in production —and where we think permissioned blockchain networks are moving— a dynamic multi-client general purpose network where all members can deploy and interact with chaincodes and applications. This also differs from the typical setup where a Hyperledger Fabric network hosts a very specific use case whose expected load can be “predicted”. In our case, what we want to build are general-purpose networks able to accommodate several heterogeneous use cases, each with their own requirements and expected load. And we wanted the networks be able to offer a ensured baseline performance to each of them.

Consequently, we used the following base infrastructure as a baseline for our tests:

  • We considered a network with a set of peers (two in the baseline setup), and their corresponding state databases (CouchDB or LevelDB). Throughout the tests we modified the number of peers, the databases used, etc. from the baseline configuration in order to understand their impact in performance. More about this in future publications of the series.

  • To interact with the network using applications from the outside world we built a driver API that abstracts the complexity of the Fabric SDK library. If you come from the Ethereum world, what we built here is the equivalent to the RPC endpoint used to interact with Ethereum nodes. Each peer has one of these HFService drivers connected to it. The building of the HFService has been key for us in the understanding of the impact of the use of the SDK in the performance in the network (again, you’ll learn more about this by the end of the series).

  • Throughout all our tests, we used RAFT for our ordering service, and in our baseline configuration we used three ordering nodes. As part of our exploration, we also played with this number to understand the impact of orderers in performance.

  • Finally, as we are focused on real production environments, for the deployment of our networks we considered Kubernetes for the container orchestration, Prometheus for the monitoring of our infrastructure, and Gatling to generate the test load in the network.

The Testing Layers

The goal of our analysis was to deeply understand Fabric’s architecture in order to be able to draw clear conclusions about how the different layers affected performance, and the potential bottlenecks that might appear with high transaction loads. Thus, we decided to abstract a Fabric network in the following layers to ease our analysis. This is the structure we followed throughout all our tests. As already mentioned, while approaching the end of our work, Chaum et al. published in [4] a similar abstraction stack to this one in their work, reinforcing our abstraction hypothesis:

  • Infrastructure: This layer represents the underlying infrastructure hosting the different modules of the Fabric network. In the analysis of this layer we’ll see how the deployment of a Fabric network over Kubernetes or bare metal, the available computational resources of the infrastructure, and even its latency, can affect the network’s performance.

  • Architecture: In this layer we will focus on analyzing optimal architecture configurations to get the most out of the underlying infrastructure. Here we managed to answer important questions such as, what is an optimal number of peers, how can we dynamically scale a Fabric network, or when is better to use CouchDB or LevelDB in your network.

  • Protocol: We could definitely drive a better performance by modifying Fabric’s core protocol, but this would mean having to maintain a fork of Fabric’s source code, and build a support team for the project. Hence, at this layer we only focused on optimal protocol configurations to fine-tune the performance of our network (fine-tuning the number of transactions per block, the endorsement policies, CouchDB’s cache strategy, etc.).

  • Chaincodes: Yes, people! Chaincodes are critical for performance. How do you implement your business logic in the blockchain can deeply affect your future performance in production. In our case, we did an extensive work profiling and refactoring all of our chaincodes to better the performance of our network, and in this process we managed to infer several design best practices for chaincodes I will share with you throughout this series.

  • Finally, the SDK. In my opinion one of the most disregarded and worst understood pieces of a Fabric network, and surprisingly a big responsible in the performance of a Fabric-based applications. While building our HFService, the SDK was the reason for many of our bottlenecks until we fine-tuned it. Bear in mind that the SDK is the one responsible for orchestrating all the lifecycle of a transaction, so if misused it can break all the good we may have done already fine-tuning the other underlying layers.

To be continued…

This is the first publication of this series dedicated to moving Hyperledger Fabric networks into production. In this article I wanted to set the context and give you an overview of the publications yet to come (see if you get the itch). In the next few weeks I will go through all of our work sharing all the pieces of knowledge we have acquired throughout our extensive analysis. In order not to overload the segment of my audience less interested in blockchain technology, I will intersperse articles on other subjects publishing a new part of this series every two weeks instead of releasing a new one every week.

In any case, if you don’t want to miss any publication of this series, do not forget to subscribe. And if you can’t stand the waiting and you want to learn a bit more about my work, do not hesitate to contact me.

@adlrocha - Why aren't salaries transparent?

To companies salaries are just a line item in the budget. To employees, they're much more.

fan of 100 U.S. dollar banknotes

This will be a controversial publication, I know that. A lot of people will disagree with my view of the matter and, obviously, my opinion may be biased and segmented. I may be disregarding important facts which destroy my whole arguments, but it was still a topic I was really eager to research and share with my audience in an attempt to open a discussion, and know your view of the matter.

The topic I am referring to is transparent salaries. I’ve been wondering lately why, in general, companies are so intent on hiding what they pay their employees. Why can’t we know what our peers and team mates earn for their work? What evil issue will arise if this happens? Even more, why shouldn’t we know our manager’s pay, or the annual salary of C-Level executives in the company? I am an advocate for transparency, so companies eagerness to actively discourage staff from talking to each other about their salaries, even forcing employees to sign agreements stipulating they won’t disclose pay, benefits, etc. to other employees messes with my mind.

Disclaimer: Throughout the article I will talk about salary as a general term. This includes your gross salary, social benefits, perks, and why not? emotional salary.

Benefits of knowing salaries within your company

Knowing the salary of your peers can be beneficial in may ways: It can help you understand your company’s salary policy. It is a way of learning what does the company values in its employees, its criteria for promotions, your potential progression in the company, or the salary rises you can expect in the future. In short, it is a way of digging a bit deeper in the company’s culture, its values, and its level of caring of its employees.

Public salaries triggers and interesting consequence in a company and is that the salary of every single employee in the company has to be able to be clearly justified if you want transparency to work, retain your talent, and avoid potential unbalances. And this is something that, in my opinion, is good for all the company’s workforce. Paying scales are a pacifier in many companies, but they seem arbitrary for individuals. Someone’s salary must be justified through their actual responsibilities and the value they add to the company, and not through a general paying scale computed by the years they’ve been in the company or role title they hold. Some people’s work may look great and be flashy in paper, but its productivity may be limited and the value it actually brings unknown but due to inefficiencies in the system still earn more than your best employee. Are these people worth their salary? As long as salaries are not transparent, we don’t know.

Managers should be able to explain, discuss, or at least justify the following items to their employee:

  • Explaining the salary range for the employee’s current position

  • Outlining the maximum earning potential in the position

  • Explaining how people move through the salary range

  • Discussing whether movement is based on performance or tenure (or a combination of these factors)

  • Outlining ways an employee might earn more money or be promoted (such as training or certifications)

  • The non-written flexibility and added benefits he can enjoy without fearing the company’s retaliation (such as remote work, flexible working hours, etc.).

Employees shouldn’t be blind about all of these aspects. The same way the company is shaping the future of his business, employees are trying to build their own future, so in the end his current and potential salary is not just another item in the company’s budget or an operational cost. So they deserve to understand all of this in order, as far as possible, for them to choose a company that fits their long-term plans.

Even more, transparent salaries can motivate employees who are paid higher to work even harder and in a productive manner. In a way, they are trying to demonstrate their higher value to their peers and management. According to research, employers that move from pay secrecy to pay transparency undergo big and permanent increases in their productivity levels. When pay information remains a secret, employees typically overestimate the salaries of others, leading to job dissatisfaction and lower productivity levels.

And finally, a delicious consequence of transparent salaries people don’t usually realize: it closes the gender gap. If salaries are public for everyone, companies where there is a gender gap would be immediately identified. This has its clear benefits to companies also, as making their salaries public will prevent them from unfounded allegations about how they pay more men than women in the same position. In my opinion, transparent salaries could also do a lot to our quest towards gender equality.

Two recent surveys support this kind of transparency. A study conducted by Cornell University and Tel Aviv University found that employees work together more effectively when knowing the salary information of colleagues. The earning hierarchy helped employees better understand who were experts in the workplace and could more easily seek out the right people for help.

Another study by an economics professor at Middlebury College found that employees worked harder and were more productive when they could compare their earnings with those of coworkers. The researchers also found that a lack of transparency led to inefficiencies and had a negative impact on the retention of high performers.

Source: https://www.bizjournals.com/bizjournals/how-to/human-resources/2018/02/what-you-can-do-if-employees-are-discussing-their.html

Gayle Seletsky Greene | Digital Marketing. Social Media ...

Benefits of publishing the average pay for different roles in your company

Companies transparent about their paying policies and disclosing how much they pay their employees can also benefit their attraction and retention of talent. I will never understand companies that enroll you as a potential candidate for a job opening process without disclosing the salary range you can expect for the position until the end. What is the reason for this? This leads to candidates investing their time on successfully passing all the tests and interviews of the process only to learn at the end of it that they are offered less money than in their current job. This is a waste of time and resources for recruiters and candidates. Why not be transparent upfront of the money your are willing to pay for the role? (Because, let’s be honest, if you are happy at your current job, you would seldom switch for less money as long as the new position doesn’t give you other things apart from the salary to improve your level of happiness at work).

The more companies to adopt a transparent salary policy, the less an issue money will be to retain talent as long as your pay is reasonable. Employees always want higher salaries, but they are also smart. They understand market conditions, financial constraints, revenue shortfalls, and increased competition. They understand when you can’t pay top-of-market salaries. What they don’t understand is when they don’t feel fairly compensated compared to other employees in similar positions, both inside and outside your company. Once pay is reasonable and fair, other things become important: recognition, respect, challenging work, opportunities for development… the feeling that their job is more than just a job, i.e. the kind of things that make you happy at work along with money (check my publication dedicated to happiness at work for a deeper analysis about this matter). Higher pay is great but the effects are fleeting. Respect, recognition, and a sense of real purpose last forever.

This leads me to another great benefit of companies disclosing their employees salaries: hiring applicants which are a better fit. Since applicants have more and understandable information about the company and its salary policy, this can better improve the job matching process. This means that even before they apply, they already know whether a job provides too low a salary range for their liking, or if it lacks the career progression they are looking for in a company. In the long run, it would decrease unemployment periods for employees and the rotation rate of employees in a company.

In the end, the job market is, like its name suggests, a market, governed by the rules of supply and demand. Accordingly I feel prices in this market should be public, for people to be able to transparently compare different alternatives just like one would do in any other market. Not always is the employee looking for the company, but the company trying to attract the employee, and this is something companies should start realizing in many sectors. Times are changing.

It’s not all a bed of roses

Transparent salaries has also its obvious disadvantages. First and foremost, and I am aware of this, transparent salaries may foster envy between coworkers and trigger unhealthy relationships affecting the company’s productivity. Fortunately, this is a good way of identifying unreasonable professionals who do not fit the company’s culture. Transparency without the right culture may be bad for collaboration.

Our competitive juices begin to flow when we see what others earn, and the more competitive we are, the less likely we are to collaborate. “In environments where performance is difficult to precisely measure and isn’t observable to everyone, everyone believes they’re above average in terms of their contributions or performance. Broadcasting everyone’s individual pay triggers a process of social comparison.” This can lead to people feeling less and being really unhappy in their jobs. This is why I think it is so important for a company to disclose salaries in a way that fits their culture ensuring the privacy of its employees. Depending on the company, the owner, and the recipient of the information, the numbers may need to be shared differently. Within a company the exact salary for specific positions may be given, while to the rest of the job market only a salary range would be disclosed. Even more, depending on the company’s culture, salaries may only be published on-demand according to the preferences of its different employees. There are several possible formulas for sharing the numbers.

Even more, transparency without a clear salary policy can be troublesome. As I mentioned above, companies should be ready to be able to justify the salary of every single employee in the company. Implementing these justifications and consistent salary policies can be easy in small companies and startups, and a real nightmare in big companies (where they are most needed). This level of transparency leads to uncomfortable (and in my opinion needed and rewarding) conversations where managers need to reason objectively why he is being paid less than some of his peers. Understandably, no one usually enjoys these uncomfortable moments, but as a part of business and work, I think it is worth the benefits. In short, we should normalize salary discussions in the workplace.

For me, the benefits of transparent salaries overcome its disadvantages, but this something as personal as choosing your preferred underwear.

In one […] study, he found that when participants were given a task counting dots, they performed worse when they knew they were being paid less than others. In other research Rick has shown people are more likely to cheat when grading their own trivia quizzes if they know they’re being paid less than others. The participants earned money for each correct answer of the quiz. “For people on the low end of a pay discrepancy, if there’s no other recourse,” says Rick, “our study suggests that they may very well turn to cheating to even the score.”

Source: https://www.cultureamp.com/blog/pros-and-cons-of-salary-transparency/

No More Face Cake in the Breakroom – New Ways to Make a ...

How to share the numbers

Potential drawbacks make the way that you share the information important. Choosing a method that fits your company culture can help. For example, Buffer puts all of its salaries on its website for anyone to see. SumAll shares numbers within the company, and Whole Foods employees can make an appointment to view the company’s “wage report.”

Other companies post pay rates for certain positions and let employees figure out individual salaries based on the hierarchy or their organizational chart. Some companies share their formula for calculating pay rates, while others provide the median salary for key roles and make this transparent both inside the company as well as on the company’s Glassdoor page.

“Some executives are concerned about the privacy issues,” says Tolan. “A way around sharing exact amounts would be to use salary bands and provide ranges for each role–and while you would still know which band a coworker is in, you probably would have to guess at their actual salary.”

Sunlight makes it impossible to hide things, said Berkus in the HBR interview. “And so in a transparent culture, regardless of how you do it, you tend to find people who have a higher sense of the organization being fair,” he says. “You tend to see increases in collaboration and all sorts of other positive effects.”

Source: https://www.fastcompany.com/3065592/why-everyones-salary-should-be-revealed

A lot of work ahead

Every company is different, in terms of its culture, its business, and its employees idiosyncrasy, and transparent salaries may not fit everyone. Some companies may choose not to be transparent at all, some may want to gradually adopt means to become more transparent, and others may go all-in into the public salary world. The decision-makers of every company, individually, will have to choose this. What it is clear is that once a few relevant companies start adopting transparent salary policies, others will follow once they see the benefits. I myself would prefer to work for a company that discloses publicly its employees salary, and be able to openly discuss these matters with my peers.

Do not forget to share this publication, this newsletter, and sharing your opinion about the matter. Stay safe, stay home, and see you next Sunday!

Idea of the week #2: Open source salary policies.

Once we go transparent, why not modelling open source salary policies that companies can freely adopt and collaboratively enhance? Companies could start publicly sharing that they are implementing this or that open source salary policy without disclosing the actual pays of their employees.

@adlrocha - How to make your web app work offline

Service workers, caches, IndexedDB and PWA.

take sharing offline sticker on wall

I am working on a side-project where I am building a web app. I want to be able to use this application from any of my devices (laptop, mobile device, etc.). My network connection is usually stable when I use it from my laptop, but this is not always the case from my mobile device, so I started wondering, “what can I do to ensure that my app works also offline, and that all my work and interactions with it are saved and successfully synced once I recover my Internet access?”.

I thought at first that finding all the documentation I required to approach this task would be straightforward, “someone must have done this before, right?”. Nothing could be further from the truth. I had a hard time until I managed to find all the concepts I needed in order to successfully take my app offline. Consequently, this publication is an attempt to collect in the same place all the technologies you will find useful when embarking on the adventure of making your app “offline-compatible”.

Progressive Web Apps (PWA)

This first concept is not directly related to the offline management of your app, but is something I found really useful when approaching the development of web applications where we need a good user experience both in browsers, desktop and mobile devices. Progressive Web Apps provide an “installable”, app-like experience on desktop and mobile that are built and delivered directly via the web. They're web apps that are fast and reliable. And most importantly, they're web apps that work in any browser.

Progressive Web Apps can run in a browser tab, but are also installable. Bookmarking a site just adds a shortcut, but an installed Progressive Web App (PWA) looks and behaves like all of the other installed apps. It launches from the same place as other apps launch (in the case of mobile devices). You can control the launch experience, including a customized splash screen, icons, and more. It runs as an app, in an app window without an address bar or other browser UI. All of the PWA-like behavior of your web app will be specified in a manifest.json file, and in order to know the level of PWA your web app has, you can always run the following audit.

Remember, it's critical that an installable PWA is fast and reliable. Users who install a PWA expect that their apps work, no matter what kind of network connection they're on. It's a baseline expectation that must be met by every installed app. Even more, if you follow the guidelines and usual technologies for building PWA it will be easier for you to generate a native mobile-app from your PWA source code using technologies such as Ionic, React Native or Apache Cordova.

To learn more about the development of PWA you can follow this tutorial.

Service Workers

Let’s move on to our matter at hand, how can we make our web app to work offline? The answer to this lies in the service workers of the browser. Service workers are JavaScript code that runs in the background of your website, even when the page is closed. For offline uses, one of their goals is to store network requests or images in the browser cache.

A service worker is a bit like a proxy server between the application and the browser. With a service worker, we can completely take over the response from an HTTP request and alter it however we like. This is a key feature for serving an offline experience. Since we can detect when the user is disconnected, and we can respond to HTTP requests differently, we have a way of serving the user files and resources that have been saved locally when they are offline. Today, they already include very useful features such as push notifications and background sync.

A service worker has a lifecycle that is completely separate from your web page. To install a service worker for your site, you need to register it, which you do in your page's JavaScript. Registering a service worker will cause the browser to start the service worker install step in the background. Typically during the install step, you'll want to cache some static assets. If all the files are cached successfully, then the service worker becomes installed. If any of the files fail to download and cache, then the install step will fail and the service worker won't activate (i.e. won't be installed). If that happens, don't worry, it'll try again next time. But that means if it does install, you know you've got those static assets in the cache.

When installed, the activation step will follow and this is a great opportunity for handling any management of old caches, which we'll cover during the service worker update section. After the activation step, the service worker will control all pages that fall under its scope, though the page that registered the service worker for the first time won't be controlled until it's loaded again. Once a service worker is in control, it will be in one of two states: either the service worker will be terminated to save memory, or it will handle fetch and message events that occur when a network request or message is made from your page.

service worker lifecycle

And in case you were wondering how hard it is to install a service worker for your application, this piece of code checks if the service worker API is available, and if it is, the service worker at /sw.js is registered once the page is loaded. Thus, the code to be executed by the service worker is the one included in ./sw.js. Two things to bear in mind while using service workers is that your application must use HTTPS (if this is not the case service workers won’t work for security reasons), and that the service worker script is accessible at http://myapp.me/sw.js.

if ('serviceWorker' in navigator) {
  window.addEventListener('load', function() {

  navigator.serviceWorker.register('/sw.js')
    .then(function(registration) {      
       // Registration was successful 
       console.log('ServiceWorker registration successful with scope: ', registration.scope);    }, 

    function(err) {      
       // registration failed    
       console.log('ServiceWorker registration failed: ', err);    });  });}

Cache, Indexed DB and Offline Storage

So using service workers we managed to install a background process in our users browsers that will be responsible for running the actions for the offline operation of our app, but from where do we get all the data required for the operation if our app is offline? Here is where the browser’s CacheAPI and IndexedDB come into play.

Cache API

The CacheAPI is used for storing and retrieving network requests and corresponding responses. These might be regular requests and responses created in the course of running your application, or they could be created solely for the purpose of storing some data in the cache. The caches only store pairs of Request and Response objects, representing HTTP requests and responses, respectively. However, the requests and responses can contain any kind of data that can be transferred over HTTP. In this Cache we will store information such as the static files and scripts required for the offline operation of our application, and is the one that enables the “proxy-like” operation of service workers we mentioned above.

The following piece of code is an example of the installation of a service worker that stores static files in cache for their future use offline:

self.addEventListener('install', (event) => {
   event.waitUntil(async function() {     
      const cache = await caches.open('mysite-static-v3');     
      await cache.addAll([       
         '/css/whatever-v3.css',       
         '/css/imgs/sprites-v6.png',       
         '/css/fonts/whatever-v8.woff',       
         '/js/all-min-v4.js'       
         // etc     ]);   
}()); });

Sample operation of service worker and Cache API (source: https://jakearchibald.com/2014/offline-cookbook/)

And what if we want to include a “Read Later” or “Save for offline” functionality to the content of our app, we can still use CacheAPI for this? The answer is yes. You can have a button in your app that triggers the storage of a specific content in the cache. The following code would do the work. We are storing an article in cache whenever an event is triggered:

document.querySelector('.cache-article').addEventListener('click', async (event) => {   
  event.preventDefault();    
  const id = this.dataset.articleId;   
  const cache = await caches.open('mysite-article-' + id);     
  const response = await fetch('/get-article-urls?id=' + id);    
  const urls = await response.json();   await cache.addAll(urls); });

Save for later scheme (source: https://jakearchibald.com/2014/offline-cookbook/)

IndexedDB

Is the CacheAPI the only place where we can store information for our offline operation in the browser? Not at all. IndexedDB is a low-level API for client-side storage of significant amounts of structured data, including files/blobs. IndexedDB lets you store and retrieve objects that are indexed with a key; any objects supported by the structured clone algorithm can be stored. You need to specify the database schema, open a connection to your database, and then retrieve and update data within a series of transactions.

In short, it is like having a client-side database ready to store whatever you want. Unlike in the CacheAPI where all the information has to be stored in the form of Request and Response objects, in IndexedDB we can store actual blobs of data, enabling us with the storage of content of a larger size (check here the limits of storage of IndexedDB to have an idea of the size of the content you could store). Another advantage over CacheAPI? You can make SQL-like queries over IndexedDB. You don’t have to use it as a “proxy" for requests like in the Cache API. Thus, with IndexedDB we can add to our application such a cool functionality as the storing of videos to watch offline (just like in the Netflix App!). We can include a service worker and a “Watch Offline” button that triggers the storage of the video in IndexedDB for its posterior retrieval whenever we want to watch it (even offline).

IndexedDB API Basics - Konga Raju

PouchDB

So our app is ready to be used offline, but what if we want to interact with it while offline and sync our inputs whenever we recover our Internet connection, is this possible? Of course it is! PouchDB is an open-source JavaScript database inspired by Apache CouchDB that is designed to run well within the browser. It enables applications to store data locally while offline, then synchronize it with CouchDB and compatible servers when the application is back online, keeping the user's data in sync no matter where they next login.

Thus, including CouchDB in your application will enable you with the power of storing changes locally in the client until an Internet connection is recovered and the data can be shared with the backend. Imagine that one of your users have been working through the app in a really long form (like for instance, in the submission of his next LastBasic project), and at the moment of submitting his form, he doesn’t have Internet connection, and he won’t recover it for the next hour. It would be awful for our service to make users waste their time submitting the form again. Fortunately, PouchDB’s sync solves this problem.

PouchDB can be used in line with CacheAPI and IndexedDB, or it can completely replace IndexedDB for the “blob-like” storage. PouchDB’s API makes it easier to use and interact with than IndexedDB, as you can see in this simple example where we are storing and syncing onchange some data with PouchDB:

var db = new PouchDB('dbname');  
db.put({_id: 'dave@gmail.com',   
   name: 'David',   
   age: 69 });  

db.changes().on('change', function() {   
  console.log('Ch-Ch-Changes'); }); 

db.replicate.to('http://example.com/mydb');

Wrap up!

So this is all for today, folks! I hope that after this publication you have a better idea of how to “suit up” your app for its operation offline. For me this was an unknown field until I started exploring it in depth. Let me know if there are any additional resources useful for the implementation of offline apps.

References

@adlrocha - Should you be a product company or a project company?

i.e. selling products v.s. selling manpower

watch emitting lights

So you are thinking about building your own company. You think you are a really creative and smart individual, and you feel underpaid on your job considering the skills you have, how much you produce, and the responsibilities you handle.

Or maybe you are a brilliant manager in a company with big pockets, and you have decided to open a new line of work. You have detected a new technology, a user need, or a business problem, that no one has solved yet, or at least not as efficiently and brilliantly as you think you can solve it. And say hello to the first key question in this new venture, how should you sell this solution or technology, as a product, or as a project? This is a question that many companies, managers and entrepreneurs may have faced during their lives. I would like to share with you my view of this matter.

Selling projects

Selling a project means selling a unique solution to a problem through professional services. After executing the project, your client will have a solution/system that suits perfectly his specific problem, and which (a priori) is unique, representing a good head start until the competition manages to successfully implement a similar project. This is a great differentiating value in a world where technology makes a difference for companies advantage.

When selling projects, you as a company, will need to hire the brightest and smartest professionals in the field in order to successfully approach the project. So ultimately, what you are selling is well-directed human manpower. This pose several disadvantages:

  • At the beginning, with one or two projects sold, you may have a small number of employees really well prepared to approach the project successfully. However, once you start selling more and more projects, you will need to increase your workforce, with the consequent increase in costs. It is true that, at the beginning, until you sell your first project, the investment you require is really small, but as you grow, your variable cost will grow accordingly.

  • Your employees will like to benefit from permanent contracts but, how will you manage this when the number of projects is variable, and your projects portfolio may be decreasing and you don’t have money, nor work for them? At the end, they are your most precious asset.

  • And related to this, how are your employees going to feel when they realize they are just “tools” or “resources” in order for you and your managers to deliver what you promised to your clients? They may opt to leave you and work somewhere else (even to the competition). And this is a critical issue, as the one leaving may be one of your “core” resources for your ongoing projects.

  • Finally, while selling project you have the advantage of diversification. You can always try to sell different types of projects to different clients without any additional investment. While in the case of products, as we will see, additional product lines mean necessarily additional investments.

Selling products

OK, so you don’t want to sell projects any more, and you want to sell products. This is not a pass of roses either.

  • First of all, you will need an initial investment in order to build a first version of your product. This has a considerable risk, as after building your product people may not be as interested as expected in it, and all the money you invested may end up in the trash.

  • Your variable costs, however, will not increase with an increase in sales, so your potential profits are higher. There is no upper limit to the profits you may make as, in this case, more sales does not necessarily mean the hiring of more employees. You may be able to benefit from scale economies.

  • Your employees are not resources, assets or tools of your company anymore, as your real asset is your product. Thus, all your employees have a common mission (apart from earning money) which is building the best product possible. This makes a bit easier the task of retaining talent, as you employees feel part of something bigger (not just like a tool on a manager’s hands).

  • Finally, when building products, aiming diversification means additional investments, and this can be hard issue. This is why building products is riskier than building products. It’s up to you the risks you want to assume in order to reach a scale economy in your company (more about this in future posts).

For the lazier readers, let’s draw a table to depict a brief summary of what we have discussed above:

Summary Table (by: @adlrocha)

And you may be wondering, what is your opinion in this? As a salaried employee, I would prefer working for a product company. As a freelance, I would prefer a project company. And as an entrepreneur, both. I think that building a product poses many advantages for a company, however, in order to reduce the required investment and be able to explore the market, is it a good idea to start as “Kind-Of Project Company”. What do I mean by this? You can invest to build a platform or a basic general technology that you can use to sell projects, until you are ready economically and from a market point of view, to start selling your own products. This is a really simplified view of entrepreneurship, but I hope you get my point on this.

OK, and I have been talking about projects and products all the way, but what about selling services and the well-known SaaS (Service as a Service)? Services may have some peculiarities according to the field but, for me, they work similar to products.

Disclaimer: All this discussion is mainly focused on the engineering/IT-related fields. I think it applies to many other industries, but as I am talking from my personal experience and opinion, I can only talk about what I know, so forgive my ignorance.

I humbly accept any kind of feedback to this personal reflection.

Loading more posts…