@adlrocha - Immutable databases

Like blockchain but without blockchain.

geometric shape digital wallpaper

I don’t know if in the end blockchain technology will thrive and it will end up being widely used, but what it is clear is that many of the innovations introduced by this disruptive technology are here to stay. From the implementation of new consensus algorithms, the engineering of better distributed network protocols, and the design and implementatino of new cryptographic primitive, blockchain technology have brought us many advancements that will prevail even if we end up abandoning the everlasting promises of blockchain technology.

One of the advancements that will potentially prevail are immutable databases. Since the announcement of Amazon’s Quantum Ledger Database (Amazon QLDB) in late 2018 I have been wondering if there was a niche in corporations for the use of this kind of databases. This week I came across an article on an open source immutable database, ImmuDB, and I decided to dig a bit deeper in this matter. First Amazon, now an open source initiative, this was something worth exploring.

How it works?

Immutable databases are centralized database systems where information is stored in a way that its integrity can be cryptographically verified. Every data change is tracked, and the complete history of changes are maintained so that the “integrity of the database” can be verified over time. This is why we call them “immutable”, because the history of all changes performed in the data store is maintained so that whenever there is an unintended or malicious modification it can be detected, reported, and in many cases even recovered. I highly recommend this set of FAQs to get a quick understanding of what immutable databases can and cannot do.

Immutable databases use veriable cryptographic primitives and data structures to ensure the integrity of the data stored. Let’s take the case of ImmuDB as an example (the open source project will talk a bit more about in just a moment). ImmuDB uses a Merkle Tree to store data and keep its integrity. Thus, when in t0 we add the key k0 with value v0, the root of the database’s Merkle Tree has a value of H0 (the hash of (k0, v0)). As we keep adding new information to the database, the tree keeps growing, and the root of the tree keeps changing. When in t1 we update the value of k0 to v1, the tree is updated with a new branch and its root changes to H01. This process is repeated with every new write in the database, either because of the update of a key, or the storage data in a new key.

the merkle tree changes with every new data

With this data structure, validating the integrity of the data stored in the system is easy. Imagine that we want to verify the integrity of the data stored by client A, k0 and k1. To do this we just need to generate a proof that verifies that the first merkle root is consistent with the sencond one, generated after the addition of new data in the database. To generate this proof we would only need to (i) take the nodes from the branches of the first version of the Merkle Root (when client A added his data); (ii) take the higher nodes possible of the new branches generated in the tree after client B’s interaction with the system; (iii) and reconstruct the root of the Merkle Tree and check that its value is the same as the one of the current tree in the database. Thus, we will take H01, and H2 from the first tree, and H3, and H456 from the second tree, and reconstruct the tree up to the root. If the root obtained equals the actual root of the second version of the tree it means that no data has been changed since client A added his data. If on the contrary information had been modified in any way after client A added his data, the H01 and H2 of the first tree wouldn’t be the same as in the second tree, leading to different roots when recreating the tree up to the root.

How immudb data consistency works

In this case Merkle Trees are used to verify the integrity of data, but more complex immutable database systems could be devised where instead of using Merkle Trees, other cryptographic primitives could be use to ensure tamper proofness, such as Zero Knowledge Proofs (although, to be honest, I don’t know if in many cases it would compensate for the overhead).

Where to use it?

An immutable database is managed by a single entity, and there is no distribution or replication of data between different nodes in a network owned and managed by different entities. So don’t be mistaken, immutable databases won’t replace blockchain networks at all, but they are the perfect fit for specific use cases as the one illustrated below:

“I want to build my own blockchain-based system to reliably track all the changes over my stock”, said the CIO of Company A.

“That sounds great, Mr. Boss. What are the entities involved in these updates? Who needs to write in this blockchain, and what is the level of trust between the participantes?”, aked Mr. HardWorker from Consultancy Company Inc.

“Ah no no! Just me. I want the different units of my company to modify the data in the database as they do now, but I want to keep track of the history of all these changes so that no inconsistencies appear in our systems. More so, I want you to be sure that this blockchain can accommodate the high load of transactions of my business. But the only company writing in this blockchain will be us”, stated Mr.Boss.

“Let me introduce you then to Immutable Databases, the solution to your problems”, ended triumphally Mr. HardWorker. He had the sale almost closed.

I guess this toy example made my point. Immutable databases are ideal for use cases where we want the benefits from a tamper-proof storage system without the complexities and potential overheads of a blockchain system, as the database will only be written by a single entity (or a small number of trusted ones).

Actually, I expect to start seeing immutable databases applied to some of these use cases in no time:

  • To immutably store every update to sensitive database fields (credit card or bank account data) of an existing application database.

  • To store CI/CD recipes in order to protect build and deployment pipelines.

  • Store public certificates (this is a widespread one in corporate blockchains).

  • As an additional hash storage for digital objects checksums.

  • To store log streams (i. e. audit logs) tamperproof.

Of course, I would never use a immutable database to store large size data. If we need to offer tamper-proofness to large size data we can follow the blockchain way: hash the large file, and track changes of this hash using an immutable database.

Actually, I would like to finish this section with a cite from Amazon QLDB documentation which perfectly states where to use immutable databases:

Q: Is Amazon Quantum Ledger Database a distributed ledger or blockchain service?

Amazon QLDB is not a blockchain or distributed ledger technology. Blockchain and distributed ledger technologies focus on solving the problem of decentralized applications involving multiple parties where there can be no single entity that owns the application, and the parties do not necessarily trust each other fully. On the other hand, QLDB is a ledger database purpose-built for customers who need to maintain a complete and verifiable history of data changes in an application that they own. Amazon QLDB offers history, immutability and verifiability combined with the familiarity, scalability and ease of use of a fully managed AWS database. If your application requires decentralization and involves multiple, untrusted parties, a blockchain solution may be appropriate. If your application requires a complete and verifiable history of all application data changes, but does not involve multiple, untrusted parties, Amazon QLDB is a great fit.

Having a look at ImmuDB

Enough with the theory. Let’s see an immutable database in action. I will focus on ImmuDB, the open source project I mentioned at the beginning of the publication. And we’ll start with a video (it is always relaxing before getting to work).

ImmuDB consists of the following parts:

  • immudb is the server binary that listens on port 3322 on localhost and provides a gRPC interface

  • immugw is the intelligent REST proxy that connects to immudb and provides a RESTful interface for applications. It is recommended to run immudb and immugw on separate machines to enhance security

  • immuadmin is the admin CLI for immudb and immugw. You can install and manage the service installation for both components and get statistics as well as runtime information.

The easiest way to run ImmuDB and start playing with it is to clone the repo, build all the binaries, and start the database and the gateway as follows:

$ git clone https://github.com/codenotary/immudb.git 
$ make all
$ ./immudb -d
$ ./immugw -d

You can also run ImmuDB using docker. In my case I wasn’t able to connect the gateway with the database and that is why I went for the local deployment. But I thought it was worth mentioning.

$ docker run -it -d --name immudb -p 3322:3322 -p 9497:9497 codenotary/immudb:latest

$ docker run -it -d -p 3323:3323 --name immugw --env IMMUGW_IMMUDB-ADDRESS=immudb codenotary/immugw:latest

With the database running, I wanted to try an SDK to integrate ImmuDB with a simple application, but apparently only the REST API and the gRPC interface are available for now to interact with the system. According to their documentation drivers will soon be available for: Java, .NET, Golang, Python, and Node.js, but for now we will have to settle for the use of the immuclient.

Adding a key and a value with the immuclient is pretty straightforward:

./immuclient safeset mykeytest1 myvaluetest1

We see that the result of the command is the addition of the key to the database, the hash of the data, and if the addition was verified. If we don’t forge data in the database we will keep getting this every time we add new data.

To get a key from the database we can use:

./immuclient safeget mykeytest1

Or we can get the history:

./immuclient history mykeytest1

If you want to “maliciously” modify information, you can go to ./db (by default), and mess around with the files. I invite you to do this if you are courious to see what happens when you try to add new data and the database has been “corrupted” ;)

Finally some notes about performance (in case you were wondering the kind of load scenarios in which this technology can be applied) extracted from ImmuDB’s github repo:

This is all for today, folks! I would love to know what do you think about immutable databases, and the potential uses they may have. And if I don’t hear from you before, see you next week!

@adlrocha - Can WASM become the new Docker?

WASM in the cloud with Krustlet

Kubernetes vs Docker | A Comparison Guide | What is the ...

Cloud computing, microservices, serverless, scalable and affordable computing… all of this has been mainly possible by an outstanding piece of technology, Linux Containers (LXC). Linux containers (LXC) provide an OS-level virtualized sandbox. In a nutshell, containers allow you to run multiple isolated Linux systems on a single host. Using certain features from the Linux kernel, it partitions shared resources (memory, CPU, the filesystem) into isolation levels called “namespaces”. The container runs directly on the physical hardware with no emulation and very little overhead (asides from a bit of initialization to set up the namespace). The most popular tool using Linux containers today is Docker. Linux containers are different from a Virtual Machine, where VM management software (VirtualBox, VMware ESXi, etc.) emulates physical hardware, and the VM runs within that emulated environment.

The Evolution of Linux Containers and Their Future - DZone ...

LXC has been key for the development of cloud computing, but a new player —I’ve talked about before in this newsletter— has enter the game. Yes, I am referring to Web Assembly. I think I’ve copy-pasted WASM definitions a few times now, but I feel it is worth doing it one last time for the sake of clarity: “WebAssembly, is an open standard for a new binary format. By design, it is memory-safe, portable, and runs at near-native performance. Code from other languages can be cross-compiled to WebAssembly. Currently, there’s first-class support for Rust, C/C++ and AssemblyScript (a new language built for WebAssembly, compiled against a subset of TypeScript). Many other compilers are already in development”.

As we already know, WASM was originally designed for the browser, it was a way of replacing Javascript for computationally intensive applications in the browser, but the idea of having a cross-compiled binary format that could provide a fast, scalable and secure way of running the same code across all machines was pretty appealing. This is why WASI (the WebAssembly System Interface) was born (the 2019 announcement here). WASI is a new standard that extends the execution of WebAssembly to the OS. It introduces a new level of abstraction so that WASM binaries can be “compiled once, and run anywhere”, independently of the underlying platform. This is what got me excited about WASM last year, and what triggered the publication of this post in my newsletter.

The Inception

However, the other day I was developing a small Rust microservice and at the moment of deployment I started wondering, “wait a minute, WASM may also come pretty handy here”. Specifically, what I was developing was a simple server that listens to a set of webhooks and triggers actions in a database and other services according to the webhook triggered and their specific content. It was a stateless microservice, and I wanted it to be as lightweight as possible (thus, Rust). I was Dockerizing the service when I realized, “why can’t I compile my Rust microservice into WASM and run it as-is over my infrastructure as if it was a serverless function?” It was then when I started researching the use of WASM in serverless environments. Apparently, many have tried this before using AWS Lambdas and Azure Functions, but I hate vendor lock-in. I already use Kubernetes to manage my deployments (thus the Dockerization of my microservice), why couldn’t I run a raw WASM binaries, without additional virtualization, as if it was a Docker container over Kubernetes. This would allow LXC and WASM loads to coexist in my Kubernetes cluster, enabling me with the ability to deploy lightweight WASM (and fast to wake up, due to WASM binaries small size) functions and applications over Kubernetes combining a contenirized and severless approach in my infrastructure.

WASM in the cloud not only means fast starting up times for lightweight processes frequently slept and awaken, near native-code performance and lighter binaries (some of the reasons for its development in the first-place), but also a sandboxed runtime environment by design. WASM security model has two important goals: “(1) protect users from buggy or malicious modules, and (2) provide developers with useful primitives and mitigations for developing safe applications, within the constraints of (1)”. Something cloud security engineers have been trying to achieve in Docker for years.

Docker v.s. WASM

Deeper in my research, I realized I wasn’t the only one seeing the potential of WASM in the cloud, even Docker’s founder Solomon Hykes had already realized the impact the combination of WASM and WASI could have in cloud environments:

I highly recommend following the responses to the above tweet to find fine jewelry as this one:

And then I came across this Microsoft blog post and Krustelet’s announcement, the answers to my replies.

“Importantly, at a very high level WASM has two main features that the Kubernetes ecosystem might be able to use to its advantage:

  • WebAssemblies and their runtimes can execute fast and be very small compared to containers

  • WebAssemblies are by default unable to do anything; only with explicit permissions can they execute at all

These two features hit our sweet spot, which suggested to us that we might profitably use WASM with Kubernetes to work in constrained and security-conscious environments – places where containers have a harder time.”

Krustlet in action!

Krustlet has been designed to run as a Kubernetes Kubelet. The kubelet is the node agent that runs in every node of a cluster and is responsible for ensuring the correct execution of the loads requested. Krustlet is a kubelet written in Rust that listens on the Kubernetes API event stream for new WASM / WASI pods (I highly recommend DeisLab’s post on Krustlet, to understand how their team have been writing parts of Kubernetes —originally in Go— in Rust achieving more concise, readable, and stable code —it is settled, I have to make Rust my go-to language once I gain a bit more of proficiency—).

In the end, Krustlet was just what I was looking for to deploy my Rust microservice and test this idea of “WASM in the cloud” that many had came up with before. In order to test Krustlet, I followed their Kubernetes In Docker quick start guide. I faced several problems while setting up the default gateway (step 2). I have a Linux machine, and maybe I messed up in the previous step (setting up the certificates), because the solution to my issue was starting the tutorial from scratch again. Anyways, with krustlet deployed in my KinD, I tried the “hello-weold-wasi-rust” demo, and oh man! I got really f* excited!

I then compiled my Rust service into WASM and tried to deploy it using Krustlet. I wasn’t successful in my endeavor, but to be completely honest, I haven’t found yet the time to dedicate full straight hour on this, so I may have messed up in many ways in the process. Either way, I got really excited with the potential of this, and that’s the reason why I decided to write this first early publication introducing Krustlet (even when I haven’t managed yet to deploy my own WASM application over it). I’ll come back once I manage to deploy something aside from the Krustlet demo applications with a step by step guide, and new conclusions on the potential of this — I have a lot of ideas of things I want to test with Krustlet that could be huge, I wish I could make these experiments and development my full-time job (should I open a Patreon, comments are open)—.

A lot of work ahead

I would like to close with a reflection from Microsoft’s blog post:

“Both WebAssemblies and containers are needed

Despite the excitement about Wasm and WASI, it should be clear that containers are the major workload in K8s, and for good reason. No one will be replacing the vast majority of their workloads with WebAssembly. Our usage so far makes this clear. Do not be confused by having more tools at your disposal.

For example, WebAssembly modules are binaries and not OS environments, so you can’t simply bring your app code and compile it into a WASM like you can a container. Instead, you’re going to build one binary, which in good cloud-native style should do one thing, and well. This means, however, that WASM “pods” in Kubernetes are going to be brand new work; they likely didn’t exist before. Containers clearly remain the vast bulk of Kubernetes work.

Still for me the impact of having “specialized WASM services” running in the same infrastructure as container jobs — such as my lightweight Rust webhook listener — can have a huge impact to the ecosystem, in terms of comfort, security, ease of management, and efficiency.

WASMs can be packed very, very densely, however, so using WebAssembly might maximize the processing throughput for large public cloud servers as well as more constrained environments. They’re unable to perform any work unless granted the permissions to do so, which means organizations that do not yet have confidence in container runtimes will have a great possibility to explore. And memory or otherwise constrained environments such as ARM32 or other system-on-a-chips (SOCs) and microcontroller units (MCUs) may be now attachable to and schedulable from larger clusters and managed using the same or similar tooling to that which Kubernetes uses right now.”

And I love this last thing, and it is something I definitely want to try. With LXC you are limited to the execution of containers over architectures that support this (have you tried, for instance, Docker in Windows? I know…), but with WASM we open the door to the execution of virtual WASM environments over any kind of architecture (even if virtualization or containers are not supported). I don’t know about you, but the more I read and learn about WASM the more excited I become with the technology. I may be a bit biased, but who cares!

And for the sake of completion, one last video about the execution of WASM outside the browser:

See you next week!

@adlrocha - Permanently store your things for life in the post-PC era

And Perkeep’s cute official mascot

As part of my research on privacy in distributed system, I was trying to find ways of storing data in distributed environments. We all have already heard about IPFS —if this is not the case, no worries, I am going to talk in depth about this technology in the series about “the infrastructure of the new internet” I am working on— but, are there any other projects working on these kind of environments or using similar approaches?

I was rereading IPFS’ original whitepaper when I came a cross a reference to a project so-called Camlistore. A quick search through DuckDuckGo (yep, I am a privacy-obsessed individual, I could not make a mundane search on Google) opened me the doors to an over seven years old outstanding project called Perkeep (and formerly known as Camlistore).

As stated in their official website “Perkeep (née Camlistore), is a set of open source formats, protocols, and software for modeling, storing, searching, sharing and synchronizing data in the post-PC era. Data may be files or objects, tweets or 5TB videos, and you can access it via a phone, browser or FUSE filesystem.”

Certainly, the idea is awesome and suits perfectly our current landscape. We are generating more and more personal data every day in our phones, PCs, tablets, wearables, etc. and we store all of of it in devices, cloud X, cloud Y, and a wide range of different hardware infrastructures and service providers. In short, we don’t own our data. In many ways, we don’t even know where de data is located, and in many cases, is our own fault that we don’t remember we do we stored the data. Remember that image you took with your friends in your trip to Cabo? Where did you store it? In your previous phone? At Google Drive? It is in your computer? Arghh, I hate when this happens, why couldn’t we store all our data in the same place?

“It would be nice if we were a bit more in control. At least, it would be nice if we had a reliable backup of all our content. Once we have all our content, it’s then nice to search it, view it, and directly serve it or share it out to others (public or with selected ACLs), regardless of the original host’s policies. Perkeep is a system to do all that.”

Perkeep has a modular design. A Perkeep server comprises several parts, all of which are optional and can be turn on or off per-instance:

Perkeep’s Architecture

Perkeep has the following parts:

  • Storage: the most basic part of a Perkeep server is storage. This is anything which can Get or Put a blob (named by its content-addressable digest, like in IPFS), and enumerate those blobs, sorted by their digest. The only metadata a storage server needs to track per-blob is its size. Implementations are trivial and are available for local disk, Amazon S3, Google Storage, etc. They’re also composable, so you can have a “shard”, “replica”, “remote”, “conditional”, and “encrypted” (in-progress) storage targets, which layer upon others.

  • Index: index is implemented in terms of the Storage interface, so can be synchronously or asynchronously replicated to and from other storage types. Putting a blob indexes it; enumerating returns what has been indexed; and getting isn’t supported. The abstraction used by Perkeep, is similar to the storage abstractions of other well-known systems, meaning that any underlying system which can store keys and values, and is able to scan in sorted order from a point, can be used to store Perkeep’s indexes. Implementations are likewise trivial and implementations for memory (development purposes), SQLite, LevelDB, MySQL, Postgres, MongoDB, App Engine, etc. are already supported.

  • Search: pointing Perkeep’s search handlers at an index means you can search for your things. It’s worth pointing out that you can lose your index at any time without big worries. If your database holding your index goes corrupt, just delete it and re-replicate from your storage to your index: it’ll be re-indexed and search will work again.

  • User Interface: the web user interface lets you click around, view your content, and search fro your stuff. Of course, you could also just use the command-line tools or API.

So if I haven’t understood wrong, in the end what Perkeep does is to securely (hence the PGP part), store data in many (and heterogeneous) infrastructures, centrally indexing it so that it can be easily queried and accessed in several ways, through a common API, a CLI, your phone a browser, etc. It aggregates in a common view all your storage systems (not even close to what IPFS aims, but a system worth having in our toolsets).

All the information I shared above was extracted from Perkeep’s official website. As old as the project is, the documentation is still pretty scarce, so I couldn’t find detailed technical information about how each of the modules is implemented, their underlying technologies, or how the system really works (I would have loved to share all that knowledge with you in this publication). Moreover, there has not been a new release since 2018, what makes me think that the project may be a bit abandoned.

The easiest way I found to try Perkeep was through Docker:

docker pull gcr.io/perkeep-containers/perkeep:0.10

You can also download and install the binaries for Linux, Windows and Mac, use Perkeep’s Android App, or use the cloud launcher to deploy the system in a cloud infrastructure from this site.

My next attempt with Perkeep will be to deploy it (in a serious way, not with a single Docker container) for personal use, and read (and hopefully understand) a bit the source code (mainly written in Go) to see if I can get a grasp of the different techniques used for the operation of the system and customize the system at my mercy. Finally, for further information you can check this last talk (and its corresponding slides) where Perkeep latest release is explained.

@adlrocha - You and Your Research

Work in the right problem, at the right time, in the right way, and become somebody worth being.

assorted-title book lot on shelf

Richard Wesley Hamming was an American mathematician whose work had many implications for computer engineering and telecommunications. In case you are wondering, yes, this is the guy that conceived Hamming codes, a mathematical apparatus that still fascinates me, and which is still applicable to many problems in our field.

But today I am not introducing Hamming to talk about his codes (maybe some other day), but to share a talk he gave on how to be successful (if I had to summarize the talk in just want sentence): “You and Your Research”. I was completely unaware of this talk, and it was brought to my attention after a conversation about “favorite Nobel Prize winners” (I know, not the typical conversation to bring up while having some drinks with friends). I decided to watch the talk this weekend, and man! I was completely taken aback by the pieces of knowledge shared by Hamming. They are all things you “kind of know”, but being shared by someone with a Nobel Prize in the top of his shelf makes you rethink your priorities around work.

Personal notes on the talk

If you watched the full video just skip this section, I am just going to share my notes of the talk paraphrasing some of the pieces I liked the most, so I can come back and read through them every once in a while. If your are more into reading than watching, here you can find a transcription of the talk.

Either way, dear future me, here are some things about work you should never forget if you want to succeed professionally:

  • Live the life you want to live. Do significant things, and work in things you won’t look back and regret in your death bed.

  • If what your are doing is not important, why are you working on it? People that succeed professionally work on important problems for which they have ideas on how to solve it. Work in problems that can become mighty oak trees.

  • Shannon once said: “I am scared of nothing”. People that succeed have confidence on themselves, but careful with being extremely overconfident.

  • Have a clear vision. Its not only about doing a good job, but a first-class job.

  • Sometimes problems need to be redefined for them to become significant.

  • Study your successes and other’s, and learn from it. Don’t rejoice in your failures.

  • The one that wins is not the one that works harder, but the one that works in the right problem in his field.

  • Never assume theories are right. See what people have missed and focus on enhancing their vision (this is what science is all about, right?).

  • Progress requires change. So embrace change.

  • Demonstrate greatness and you will be given the opportunity: “research and then you will teach”.

  • Here I will share a piece of my own learning: don’t get frustrated if one of your ideas is stolen, or someone gets there first to develop it, if you are creative you will have more and better ideas. Getting frustrated for a single “lost” idea simply evidences the fact that you are not creative, therefore, that single idea was just a strike of luck. Keep creating.

  • And a cite of my favorite phylosopher, “The unexamined life is not worth living” - Socrates.

But the best way of summarizing Hamming’s teachings in this talk is “be somebody worth being”. Of course, many may think that living like this means having a constant pressure on you to “become the best”, but for me, this way of living is simply a good way of remembering to “prioritize the important”.

Any thoughts on Hamming’s talk? Weird publication, right? I wasn’t into writing anything technical this week, I needed some time off from blocks and chains, so what better than sharing some food for thought. See you next week!

@adlrocha - TrustID

A New Approach to Fabric User Identity Management

minifigure head lot

This past week has been pretty exciting professionally for me. I managed —finally— to share with the community all the work we’ve been doing around performance best practices in Hyperledger Fabric through a Hyperledger meetup, and we announced in the Hyperledger Blog TrustID, our new approach for a decentralized identity in Hyperledger Fabric. Today I want to walk you through TrustID a bit more in detail, and share my view of its future.

Setting the context

Undoubtedly, Hyperledger Fabric offers a core substrate of decentralization and trust to corporations. It opens the door to the development of new use cases and business models based on the benefits of DLT technologies. Fabric supports digital assets, distributed logic through chaincodes, privacy using channels and other schemes such as private data collections, and the use of custom consensus through endorsement policies. Sadly, Fabric “as-is” lacks a key component for a successful decentralized ecosystem, a decentralized identity.

Fabric uses X.509 certificates to authenticate every entity and member in the network. This is really convenient for corporate environments, as organizations can use their existing CA infrastructure to issue new certificates for users, peers and applications. Thus, as long as a certificate is issued by a trusted CA in the network (i.e., the CA from a valid MSP organization in the system) its holder will have permission to interact with the network.

This identity management scheme seems to work for a great gamut of use cases, but the problem arises the moment user continuity between different organizations is required. If user A holds a valid certificate issued by Org1, he is able to interact with the network through peers of Org1, or at least by entities that know how to validate its “chain of trust.” However, if user A wants to interact with the network through a Fabric app from Org2, peers of Org2 won’t be able to identify if A is a valid user in the network.

This is especially a problem when, instead of deploying a user-specific network where organizations and their relationships are well defined (where users belong to a single organization and only interact through this organization’s infrastructure), we launch a general-purpose network with users seamlessly interacting with any of the deployed applications over the network. This is the reason why we embarked in the development ofTrustID, an attempt to decentralize Fabric’s identity.

In Telefónica we have been building TrustOS, an abstraction layer for blockchain platforms that enable companies and developers with a way of implementing their decentralized use cases without having to worry about the low-level complexity of DLT networks. One of the core engines of TrustOS is a general-purpose Hyperledger Fabric network. The first releases of TrustOS leveraged Fabric’s default identity management, so new users were authenticated through Telefonica-issued certificates. Initially, this made sense, as we were the only organization in the network deploying applications. Unfortunately, when we started on boarding new organizations and applications to the system, our users started suffering the aforementioned itinerancy issues. Any users who wanted to interact with more than one organization had to hold a valid certificate signed by every organization in the network through whose infrastructure he wanted to interact. In short, the management of user identities was a complete nightmare in terms of operation and UX. 

Welcome TrustID

We then decided to design TrustID as a standalone identity module for TrustOS. We followed a decentralized identity approach for its design, where users (and services) are identified through a DID. 

These DIDs follow the W3C standard, and they serve as a unique ID to identify users. DIDs aggregate all the pieces of public information required to authenticate a user (i.e., their public key or X.509 certificate).

In order to uniquely identify chaincodes and services deployed in TrustOS, we decided to also give them DIDs so that they could be seamlessly discovered and accessed even if they “live” in independent channels not shared by all the organizations of the network. 

All the authentication and management of identities in the system is performed on-chain through an “Identity Chaincode.” This chaincode has the following parts:

  • Chaincode proxy: It receives and routes every TrustID authenticated transaction. It’s responsible for authenticating users, interacting with the ID registries, and routing user calls to external chaincodes. It also implements the desired access policies by the different organizations.

  • User Registry: It stores every user DID. It implements basic setter and getter operations and enforces the desired access rights per organization.

  • Service Registry: It pays the registry role for services.

  • External service chaincodes: It ensures service chaincodes with whom users want to interact can be deployed in any channel. Once requests are successfully authenticated, the proxy chaincode is responsible for forwarding transactions to them. 

Thus, if user A wants to start interacting with the network, he requests the generation of a new DID. The related keys to this DID could be an existing X.509 issued by a valid organization, or even an Ethereum-related public key (internally we use all the JWS, JWE, JWK, secp256k1, etc. RFCs to make our Fabric infrastructure compatible with identities of any nature for the sake of interoperability). This DID generation request has to be validated by a valid organization of the network. Once verified, every transaction signed by user A and directed through the Proxy CC is authenticated successfully and delegated to the corresponding chaincode.

The TrustID project is conformed by the aforementioned chaincode and a client SDK to ease the integration and interaction with TrustID-enabled networks. Let’s illustrate step by step the interaction of a user with a Fabric network using TrustID.

  • First, the user has to be registered in the platform. If a user wants to register to the platform with one of its identities he just needs to sign his DID using his private key and send it to the proxy contract. The proxy contract inspects the signature and adds the user to the registry. This registration process generates an “unverified user” with limited access to network services. The TrustID SDK includes all the required functionality to seamlessly set up a user wallet to manage DIDs, sign requests, and send them to the proxy contract.

  • Unverified user’s have only access to public services. In order to have access to more features, a “controller” has to verify him and grant him an access level in order to be able to call restricted services. Thus, the controller needs to send a signed verification request to the proxy to trigger this verification.

  • Verified users with a specific access level are entitled to the deployment of new services and their listing in the service registry.

  • Users can interact and “invoke” functions of a service contract by sending a signed request to the proxy contract stating the service to call, the function, and its arguments. The proxy contract authenticates the request, it builds the transactions, and delegate the calls to the corresponding contract.

  • Thus, every interaction with a TrustID-enabled network can be signed using the SDK and routed to the proxy contract to trigger the function significantly simplifying user interaction with a DLT network. The only network currently supported by the TrustID SDK is Hyperledger Fabric; and the proxy contract has only been implemented as Fabric chaincode —for now—. Nonetheless, the modular architecture of TrustID was designed to allow its implementation over other DLT platforms (more about this in just a moment).

We developed TrustID to ease the management of identities for the case of TrustOS. Users shouldn’t need to hold a different set of credentials for each network or decentralized application they interact with. The same credentials used to access your owned Bitcoins and manage your tokens in Ethereum should let you update the state of a Fabric asset or launch a secondary market in TrustOS. This is the rationale behind TrustID. Moreover, pushing Hyperledger Fabric’s user identity management on-chain has opened the door to exciting consequences such as service interoperability between networks, or the use of Fabric as a universal authentication system, but more about this in future publications.

Planning its roadmap

So what are our plans for TrustID? The first thing we are really looking for is feedback from the community. We started TrustID as an internal project to solve a really specific problem in our product roadmap, but early in its development we realized that the scope of the project could be broader, more general, and we could share it with the community so it could also benefit from it and help us evolve it.

We are already collecting some feedback after announcing the project in the Hyperledger Blog, and according to this feedback we will discuss internally if it makes sense to share it on the Hyperledger Labs to get the community involved and open the project to contributions (let me clarify something here, the ultimate decision on whether or not to do this is not mine and we will have to discuss it internally in the team. This is not an official announcement of any kind, I just wanted to share my personal view on TrustID and what I think its future looks like. This is my personal view of the matter, as with everything else in this newsletter. That is why I decided to launch a “strictly personal and non-transferable” newsletter, so I could write my uncensored opinion —just in case—).

What would be my ideal next steps for TrustID? The way the SDK and the architecture of chaincodes have been designed opens the door to their extension to other types of networks and platforms. The SDK has been implemented in a general way so that making it compatible with other networks implementing TrustID would be as easy as adding a driver for the new network in the SDK. The architecture of chaincodes is also general enough to be easily ported and implemented to other platforms. So as a next step it would be great to refine the design of TrustID in Fabric and port the model to other platforms in the Hyperledger ecosystem such as Besu, Sawtooth or even Indy to validate the interoperability of the model between platforms and with other decentralize identity proposals.

TrustID, could also be devised as a standard gateway of interaction to decentralized services living in different networks. The same way you have blockchain wallets compatible with several networks (Ethereum, Bitcoin, etc.), using TrustID we could have the equivalent of a single user wallet for corporate networks (Fabric, Corda, etc.) so that corporations don’t have to reinvent the identity model for every specific DLT platform they use. TrustID could be a “wallet to rule them all”, and using TrustID a user or an application could trigger functions in any of the services hosted by these networks.

Again, TrustID was designed with a really specific problem in mind in a pretty contrained context, and now we are trying to generalize it to broad its scope. We would love to know your feedback about the project and the value you see in it so that we can plan its roadmap and next steps accordingly. Stay tuned for more updates about TrustID, and to learn more about the future series I am planning after my “Performance Best Practices in Hyperledger Fabric Series”. See you next week!

Loading more posts…