Originally published at BlockchainWorks. Read it also here!
As an electrical engineering student I learned to love state machines. Designing and implementing hardware systems is really hard. Once a system is released into the wild, your ability to “apply patches” to it is quite limited (I wish it could have been as easy as changing the code and pushing it to github). For this reason, everything needs to be well-figured out in the design, implementation and testing of the system, so that no big changes need to be made afterwards. Fortunately, we have finite state machines to help us with all of this.
A finite state machine (a.k.a. FSM) is an abstract model of the behavior of a system. Modeling systems using FSMs gives a tractable way to reason about the design and operation of complex systems. It offers a way to predictably evaluate the behavior of a system when it is in a certain state, and a specific event happens. State machine models are great to design complex systems that need to handle asynchronous inputs triggered by multiple sources, as they can be easily verified in a formal way. For all of these reasons (and probably many more) finite state machines are such a great tool to design hardware systems.
But FSMs are not exclusively used to design systems with some kind of hardware component (an environment where inputs may be naturally asynchronous), they can be found anywhere. Finite state machines are used for small embedded systems (like your alarm clock), in large systems (such as your city’s traffic light network), in programmable circuits such as FPGAs (you’ll see is really common to implement finite state machines over FPGAs using VHDL, something I extensively did in college), and in large and complex software systems such as a blockchain network.
Considering the ubiquity of finite state machines, I felt they deserved an introductory publication.
A brief introduction
State machines can be of several different types, but the fundamentals of all of them are the same. State machines are composed of a set of states and transitions. The state is the status in which the system is in a given moment. The change of state (i.e. transition) of the system from one state to another needs to be triggered by a new input to the system (also referred as event). More formally:
States represent the internal "memory" of the system by implicitly storing information about what has happened before.
transitions represent the "response" of the system to its environment. Transitions depend upon the current state of the machine as well as the current input and often result in a change of state.
In finite state machines, the number of states is finite. Let’s illustrate how they work designing two well-known system using FSMs:
A turnstile is a system with two possible states: locked and unlocked. The initial state of the turnstile is “locked”. It stays locked as long as there is no new input to the system, or if the input (or event) received is of the type “push”. However, a “coin” input triggers a state change from “locked” to “unlocked”. The behavior in this state is complementary to the one in “locked”. While there is no input, or the input is “coin”, the system stays unlocked, however, a “push” event triggers a transition to “locked”. Pretty straightforward, right? Let’s move to a slightly more complex example.
Source: https://brilliant.org
ATMs are also designed using finite state machines. You see that in this case the system is more complex and consequently the FSM requires more state to model the operation, but it still shows all the possible states in which the system can be.
Source: https://people.engr.ncsu.edu/efg/210/s99/Notes/fsm/
One of the main limitations of state machines, and one of the reasons why they aren’t sometimes used in extremely complex and large systems, is what is known as the state explosion problem. The behavior of the system may be such that adding a new aspect to the state machine multiplies the number of states that need to be modeled, and creates a disproportionately high number of transitions (an example here). Fortunately, there are already state machine proposals such as statechart designed to address this problem (but for now, let’s focus on FSMs and leave statecharts for some other day!).
From design...
Ok! I’ve convinced you that state machines are great. You are now planning to build your next big system as a state machine, what are the steps for this? The first thing is to clearly define what you want your system to do. Take a piece of paper and start listing all the possible inputs, states, and transitions of your system. With this in place, you can start painting your machine diagrams (the same way we did for the systems above). A great tool I use for this is mermaid, which allows you to paint these cool state diagrams using markdown-like language. Even more, you can use visual code’s mermaid extension to instantly preview your diagram as you write (as shown in the figure below).
… to implementation…
Your design is ready, time to start coding! You could build your own finite state machine framework from scratch to approach the implementation of FSMs, but fortunately, a quick search online for “FSM framework in <language X>”, will return a good list of “ready to use” frameworks to implement finite state machines in your language of choice.
I will choose Go to guide you through the implementation of the simple turnstile state machine from above. This quick search for “FSM frameworks in Golang” throws results such as this simple package which is a great example if you, not only want to implement a FSM, but to understand how to build its internals; or this full-fledged package for the implementation of FSMs in Golang. Let’s use this last one for our pet project: here is the code (I still need to figure out a way to share pieces of code from Github without running into the “email too long” alert from Substack. In the meantime I am afraid you’ll have to click the link to look at the code in full, I apologize).
After reading the above description, the code should be self-explanatory. We first create a struct for the turnstile including the internal state `(To, Profit)` and the FSM. We define the events and transitions of the FSM in the `events` variable, and the callbacks for each of these transitions in the `callbacks` variable. The callbacks are the entry functions called when the machine enters that state. For every state change we are printing the new state, and if the event was triggered by a coin, we increase the profit of the turnstile. Easy, right? Easy, but powerful.
… and to validation
At this point you may be wondering, “wait, but one of the points you made in favor of finite state machines is that they are easy to formally verify their design”. And that’s right. In an FSM we have all the possible state, inputs and transitions of a system perfectly defined. We can port this design to a model-checker tool to run the system over all its possible states, transitions and inputs to double check that the machine doesn’t get stuck in a “weird state”, or performs some kind of illegal state change.
In my biased opinion, the perfect tool to achieve this is TLA+. TLA+ is a formal specification language. It’s a tool to design systems and algorithms, then programmatically verify that those systems don’t have critical bugs. It’s the software equivalent of a blueprint. By modeling your finite state specification in TLA+ you will then be able to run the model-checker to evaluate its correctness. You can follow this example to get a sense of how to specify and check a system using TLA+.
Real world examples
I get it. You are not an embedded systems, rocket, or hardware engineer, why would you care about finite state machines? Well, let me share a real world example of a complex software system in which many of its subsystems have been designed and implemented using FSMs: Filecoin. Two interesting subsystems in the Filecoin blockchain implemented using FSMs are:
The data transfer subsystem, which encapsulates all the protocols for exchanging pieces of data between storage clients and miners for storage and retrieval deals. Here the FSM orchestrates the lifecycle of data transfers, from the request, to the exchange of data, and any other event that may occur in the process.
The storage and retrieval markets subsystems, which are responsible for finding, negotiating, and consummating deals to retrieve and store data between client and providers. Here the FSM manages the lifecycle of deals, from finding the deal, to fulfilling an agreement, to sending the data, etc. A good place to start in this case is the client retrieval FSM. There is a different state machine to model the client and provider behaviors for both: retrieval and storage deals. In this repo you will find a bunch of great examples of complex FSMs used in production. This is the diagram for the client retrieval FSM I mentioned.
To ease the implementation of these design machines the filecoin-project put together this framework that can be reused to implement these finite state machines in a nice and straightforward way. You should definitely give it a spin for your first FSM (it is a bit harder than the one we used in our example, but its DSL-like approach makes it really powerful!).
And this is all from my side. I hope after this publication, FSMs become a new tool in your engineering tool belt. See you next week!
PS: Looking to paint your first FSM? Take a look at this tool.