@adlrocha - Software should be designed to last
And why I am trying to minimize my dependence on external libraries (whenever possible).
|Alfonso de la Rocha||Jun 21, 2020||5||4|
Let’s start with a heads up, the reflection I am going to share in this publication may not be to everyone’s liking, but it is a topic I want to open up a debate on.
Imagine that you want to build a proxy server for you system using your favorite programming language, what do you do? First you choose the web framework you are more comfortable with so you have a scaffolding for the project and you can get started with the implementation straightaway. With this simple step, you have a way of launching a server and receiving HTTP requests out of the box. In your server, you want to authenticate requests and filter traffic according to its source address and the content of the body. Hence, along with your webserver framework, you choose to install a middleware to handle authentication, and a library to ease the processing and filtering of new requests (when are we, in the 90s? You are not going to reinvent the wheel and develop all of this on your own, you want to have your system running in production in two or three days). You stick all these pieces together with a bit of glue code, drop some unit and integration tests to see that everything is working , and you are ready to go.
Source: https://bytecodealliance.org/articles/announcing-the-bytecode-alliance (credits to Lin Clark).
Move fast but with stable infra
You know the problem of the aforementioned approach? Your project ends up being 80% other people’s code from external libraries leveraged by your project (see above image), 20% your own code (and some “inspiration” from StackOverflow), serving your own code as the glue and coarse customization to make the libraries work as desired for your use case. This way of proceeding is a growing trend, and don’t get me wrong, if what you are building is a well-known technology, or an auxiliary system such as your personal CMS, or an internal reporting system, way to go, you will have something up and running in no time, and you will start benefiting from all the goodness of your system ASAP. The problem really arises when you keep following this approach for critical systems, or the core implementations of your tech company’s platform used by million of users. And believe me, this is being the case in companies of all sizes (ask about they type of projects being carried out by consultancy companies, and the manpower dedicated to them —70% recent graduates).
Software development has been really influenced by the “move fast, break things” philosophy. And this is great, I completely agree with testing new ideas early, iterating fast to validate your concept, and be as “lean” as possible in your development. Software prototyping is cheap, so why not? This is what innovation is all about, right? The problem arises when you start shipping your great projects and awesome ideas, and they start breaking in production harming your users. Even Facebook realized that it was time to move from the “move fast, break things” approach to the “move fast but with stable infra”.
You would never expect a civil engineer to deliver his bridge (a critical infrastructure) as “finished” after a few tweaks from his proof of concept. Even more, bridges and civil engineering projects are designed to last, why can we also design software for it to last? Correctly designing a bridge takes time and a good understanding of what is happening under the hood (materials, structures, resiliency, etc.). If we want software engineering to be considered an engineering practice (as it deserves) we need to start building robust, resilient and long-lasting systems, and stop dropping any piece of code we come across into our project. Of course, and I apologize for the extreme generalization, I am not saying ALL software is sloppy, I read the other day this article about software engineering in SpaceX, and I literally loved how they use hardware-software redundancy to minimize potential bugs in a system as critical as a space shuttle. I am just saying let’s all be a bit more like space shuttle software engineers.
Someone also recounted their interaction with the SpaceX team at GDC 2015⁄2016 in an answer on StackExchange. They talk about the triple redundancy system and how SpaceX uses the Actor-Judge system. In short there are 3 dual core ARM processors running on custom board (according to elteto). For each decision a “flight string” compares the result from each core on a single processor. If the output matches the command is sent to different controllers. There are 3 processors (with dual cores) so that means each controller/sensor will get three different commands. The controllers then act as the judge and compare the three commands. If all three are in agreement, they carry out the operation. If even a single command is in disagreement, the controller carries out the command from the processor which had previously been sending the correct commands.
This means that at any given point there are 6 running processes of the flight software.
And all this thinking is what brought me to the decision of trying to detox a bit from external dependencies to see if it improved my performance and the quality of my code (and the trigger for this publication).
Is not stop using, is use smarter
Of course, I am not advocating to stop using external dependencies all along and start implementing everything from scratch. That would be nuts! What I am saying is to dispense with all those that are not strictly necessary. For instance, why using a Go web framework when Go’s http standard library gives you all that you need to build an awesome web server? (I completely agree with this guy). This is the perfect example of a practice I used to follow a few years ago when I started building my first Go REST APIs, and something I have completely dropped by now. This is an example of what I mean by “stop abusing dependencies”. I didn’t need the web framework. Yes, it significantly sped my development at the beginning, but not knowing what was happening under the hood, and not having the complete control of what I was doing when facing a bug or wanting to improve the system’s performance was really messing with my OCD (as I briefly mentioned here).
Using a framework feels like driving a new fast car for a while until it breaks down on you many miles down and you have no idea how to fix it.
There are core dependencies that I would never ever suggest you remove from your list of “frequently used”, as it would be suicidal to write their functionality from scratch for your project (as they are the result of hours and hours of development by teams of extremely talented individuals). Take a few examples I have been using lately for my systems such as libp2p, the Geth client, Polkadot, or Qiskit (ok, you may see from the libraries I chose that I am bit biased :) ). I would have been dead if I had to go without them for my implementations. Of course, if you need to build a p2p network I wouldn’t ask you to rewrite libp2p. If you need “something else” from it, is open source, so you can always go to the guts, and suggest additions. The same would apply for Qiskit if you need to play around with quantum computing. They all have awesome development teams and researchers working on making the most out of their product, so as long as you know what you are doing, and why you are using them (bearing in mind the potential overheads you may incur in by using them), go for it.
So my point is that is not that much “stop using external dependencies” but “use them intelligently”. Don’t be lazy and use a poorly maintained JSON parsing library if you already know how to unmarshal and parse JSON using the available functionalities of your preferred programming language. You will avoid unnecessary overheads and trouble. The same way that instead of using the first library you find that “does the work”, you should invest some time on researching the alternatives in order to choose the one that better suits your needs, or even completely dropping the idea of using a library for the task and writing your own (the same way a civil engineer would do a thorough analysis of the materials and foundations required for his bridge).
Four good reasons to stop abusing external dependencies
Let’s move to the reasons that have led me to these conclusions, and why I am trying not to overuse external dependencies. Using libraries have risks that you should, at least, be aware of.
The libraries and external dependencies you use may be outdated, under maintained, or/and have hidden security flaws and performance bottlenecks. When you are the owner of the code you “kind of” know what you are doing, but the moment you use code from somewhere else you don’t know if the developer made a mistake (as long as your don’t read and understand the code you are adding to your project). You don’t know if he is using an inefficient implementation, or if he added malicious code if you don’t inspect the source code. Even more, what if the library is abandoned by developers and you need to rely on it for your future system? It all comes down to understanding the code you are using. The next paragraph is the perfect illustration of the risks of using external dependencies:
Attackers often use social engineering to get their package into applications. They create a package that has useful features, and then sneak in some malicious code. Once the code is in the app and the user starts up the app, the code can attack the user.
Day 0 (March 6): The attacker published a module to npm: electron-native-notify. This seemed useful—it helped Electron apps fire off native notifications in a way that worked across platforms. It didn’t have any malicious code in it, yet.
Day 2: To pull of the heist, the attacker has to get this module into the cryptocurrency app. The vector they choose is a dependency in an application that helps users manage their cryptocurrency, the Agama Wallet.
Day 17: The attacker adds the malicious payload.
Day 41-66: The app is rebuilt, pulling in the most recent version of the dependency, and electron-native-notify with it. At this point, it starts sending user’s “seeds” (username/password combos) to a server. The attacker can then use these seeds to empty users’ wallets.
Day 90: A user alerts npm to suspicious behavior in electron-native-notify, and they notify the cryptocurrency platform, which moved funds from vulnerable wallets to a secure one.
Extracted from: https://bytecodealliance.org/articles/announcing-the-bytecode-alliance
And there are tons of examples like this that illustrates the risks of using libraries indiscriminately (do a quick search to see how this can happen).
Libraries can significantly increase the size and compilation times of your code. This awesome blog post illustrates vividly the problem in Rust. I highly recommend its reading. You may have probably faced the scenario in which you include a full library to perform some task when, in the end, you are using a single function of the whole library (with the subsequent overhead it poses). Or what about all those dependencies you included in your package.json while developing and experimenting with your solution and that you forgot to clean? All this adds up against you.
Libraries may hide from you many of the trade-offs, design decisions and potential points of failure in your solution. When you completely design and implement all the base code of your system, it is easier for you to identify potential attack vectors than if you delegate this trust to a diversified bunch of external dependencies.
Finally, not using any library in your system is challenging and fun. Of course, this is not a compelling reason, but even if you don’t agree with me I recommend you approach your design like this at least once in a lifetime. You’ll see how much you learn in the process; how you better your skills and understanding on your chosen programming language; and how refreshing it is to be completely in control and clearly understand what your system is doing under the hood at all times. What can I say, I was the kind of kid who took apart his grandfather’s radio to understand how did it work.
Software should be designed to last
We need to start building resilient software. This is a trend that we are increasingly realizing now in the field of smart contracts and distributed systems. In this type of systems tons of users in the network share and execute the same base code. So a software failure or a security flaw doesn’t affect exclusively an individual’s infrastructure, but everyone in the system. A lot of cases have been reported of smart contract failures costing millions of dollars, and bugs in client’s software harming their systems.
We rely more and more in software in everything around us, from buying groceries to chatting to our family and friends. The same way we don’t want our car to be flawed and crash in our next trip, we shouldn’t tolerate software failing in certain scenarios. Updating a library shouldn’t disrupt your system, and a piece software should be designed (as far as possible) the same way bridges are designed, considering that they should be able to be running in the wild for years without the need of any external management or maintenance.
Should we embark in the writing of a Resilient Software Manifesto? I don’t know about this, but let’s open the matter to debate. In the meantime, I would love to hear your thoughts.
Some additional references…
… that are going to help me make my point.
Software Disenchantment. If you have to choose one of the readings from this list let it be this one. It perfectly complements my point (and I found it at the end of my writing process so I couldn’t add many of the concepts in my discussion).