@adlrocha - Permanently store your things for life in the post-PC era

And Perkeep’s cute official mascot

May 10, 2020

As part of my research on privacy in distributed system, I was trying to find ways of storing data in distributed environments. We all have already heard about IPFS —if this is not the case, no worries, I am going to talk in depth about this technology in the series about “the infrastructure of the new internet” I am working on— but, are there any other projects working on these kind of environments or using similar approaches?

I was rereading IPFS’ original whitepaper when I came a cross a reference to a project so-called Camlistore. A quick search through DuckDuckGo (yep, I am a privacy-obsessed individual, I could not make a mundane search on Google) opened me the doors to an over seven years old outstanding project called Perkeep (and formerly known as Camlistore).

As stated in their official website “Perkeep (née Camlistore), is a set of open source formats, protocols, and software for modeling, storing, searching, sharing and synchronizing data in the post-PC era. Data may be files or objects, tweets or 5TB videos, and you can access it via a phone, browser or FUSE filesystem.”

Certainly, the idea is awesome and suits perfectly our current landscape. We are generating more and more personal data every day in our phones, PCs, tablets, wearables, etc. and we store all of of it in devices, cloud X, cloud Y, and a wide range of different hardware infrastructures and service providers. In short, we don’t own our data. In many ways, we don’t even know where de data is located, and in many cases, is our own fault that we don’t remember we do we stored the data. Remember that image you took with your friends in your trip to Cabo? Where did you store it? In your previous phone? At Google Drive? It is in your computer? Arghh, I hate when this happens, why couldn’t we store all our data in the same place?

“It would be nice if we were a bit more in control. At least, it would be nice if we had a reliable backup of all our content. Once we have all our content, it’s then nice to search it, view it, and directly serve it or share it out to others (public or with selected ACLs), regardless of the original host’s policies. Perkeep is a system to do all that.”

Perkeep has a modular design. A Perkeep server comprises several parts, all of which are optional and can be turn on or off per-instance:

Perkeep’s Architecture

Perkeep has the following parts:

Storage: the most basic part of a Perkeep server is storage. This is anything which can Get or Put a blob (named by its content-addressable digest, like in IPFS), and enumerate those blobs, sorted by their digest. The only metadata a storage server needs to track per-blob is its size. Implementations are trivial and are available for local disk, Amazon S3, Google Storage, etc. They’re also composable, so you can have a “shard”, “replica”, “remote”, “conditional”, and “encrypted” (in-progress) storage targets, which layer upon others.
Index: index is implemented in terms of the Storage interface, so can be synchronously or asynchronously replicated to and from other storage types. Putting a blob indexes it; enumerating returns what has been indexed; and getting isn’t supported. The abstraction used by Perkeep, is similar to the storage abstractions of other well-known systems, meaning that any underlying system which can store keys and values, and is able to scan in sorted order from a point, can be used to store Perkeep’s indexes. Implementations are likewise trivial and implementations for memory (development purposes), SQLite, LevelDB, MySQL, Postgres, MongoDB, App Engine, etc. are already supported.
Search: pointing Perkeep’s search handlers at an index means you can search for your things. It’s worth pointing out that you can lose your index at any time without big worries. If your database holding your index goes corrupt, just delete it and re-replicate from your storage to your index: it’ll be re-indexed and search will work again.
User Interface: the web user interface lets you click around, view your content, and search fro your stuff. Of course, you could also just use the command-line tools or API.

So if I haven’t understood wrong, in the end what Perkeep does is to securely (hence the PGP part), store data in many (and heterogeneous) infrastructures, centrally indexing it so that it can be easily queried and accessed in several ways, through a common API, a CLI, your phone a browser, etc. It aggregates in a common view all your storage systems (not even close to what IPFS aims, but a system worth having in our toolsets).

All the information I shared above was extracted from Perkeep’s official website. As old as the project is, the documentation is still pretty scarce, so I couldn’t find detailed technical information about how each of the modules is implemented, their underlying technologies, or how the system really works (I would have loved to share all that knowledge with you in this publication). Moreover, there has not been a new release since 2018, what makes me think that the project may be a bit abandoned.

The easiest way I found to try Perkeep was through Docker:

docker pull gcr.io/perkeep-containers/perkeep:0.10

You can also download and install the binaries for Linux, Windows and Mac, use Perkeep’s Android App, or use the cloud launcher to deploy the system in a cloud infrastructure from this site.

My next attempt with Perkeep will be to deploy it (in a serious way, not with a single Docker container) for personal use, and read (and hopefully understand) a bit the source code (mainly written in Go) to see if I can get a grasp of the different techniques used for the operation of the system and customize the system at my mercy. Finally, for further information you can check this last talk (and its corresponding slides) where Perkeep latest release is explained.

@adlrocha Weekly Newsletter

Discussion about this post