The reader experience

The reader experience

Fine-tuning your content consumption, and database deliverance.

The practicalities of pragmatism be damned, we've returned for another edition of endless antiquities and foregone conclusions of frightening fancy. Whether you're a casual consumer of ActivityPub curiosity or a hardened veteran vying for vocational vitality, there's something here for you.

Last week, we dove into improvements organizing notifications, by grouping them rather than showing each individually. We also touched on some of the database work happening in the background as we map out the future of ActivityPub in Ghost. More on that in a moment.

What's new with ActivityPub?

First, on the design front, we spent some time this week iterating and evolving the reading experience within the ActivityPub client app. This is software designed around long-form content, so we want to make sure that it feels good to spend time reading within it.

To that end, we've introduced reading time, reading progress, and some user customization options around typography and spacing — so you can fine-tune your reading experience based on your personal preferences, in the same way you might be used to when reading on a Kindle.

0:00
/0:18

Alongside that, we've been fixing bugs reported by beta testers (there have been plenty). We're slowly but surely getting through them, and adding more people to the beta each week. Keep an eye on your inbox for your invitation, and if you haven't applied to be a part of the first round of beta testing, you can do so here.

Then there's the other giant piece of work that all our engineers have been focused on for the last few weeks.

Databases.

In response to last week's brief mention of database work, several of you indicated a certain predilection for masochism. Specifically, you indicated that database design is super fun and you want to hear more about it.

It is apparent that we share very different worldviews when it comes to the definition of "super fun".

Nevertheless, since enough people were curious about it, here's a bit more detail about how we're storing ActivityPub data. Just remember, you brought this on yourself.

Long-time subscribers will remember how at the very start of this project, we wanted to use Ghost's existing database to store ActivityPub data. Despite only needing 2 more tables, this idea was quickly nixed by the rest of the Ghost product team, who politely told us to fuck off.

Suitably chastened, we switched gears and started building ActivityPub as a separate service that lives outside Ghost and has its own database, where we could make as many sweet, sweet tables as our hearts desired. Freedom!

Now, for those among you who haven't experienced the thrills and spills of database design for a large application, a little context up front. You can think of a database a bit like a big Excel file, and each table in the database as a sheet contained within. Storing some data is equivalent to filling out a new row in the spreadsheet.

When you have a small app with a handful of users and a bit of data, the configuration of how you organize your glorified spreadsheet, in our case: MySQL, doesn't matter much. It'll all work fine. When you start scaling up to large quantities of data, though, the way you organize everything becomes critical, because how data is organized directly influences performance in a way that's not easy to fix later.

Now, normally when you design a database, it's for your own data. You have an app. The app does stuff. Once it's done the stuff, it stores it in the database. In most cases, you have a clear idea of what "stuff" it is that you need to store, and so you design your database to accommodate it.

ActivityPub presents a unique challenge in that the stuff you're storing is coming from ~waves hands~ somewhere. Maybe Ghost. Maybe Mastodon. Maybe Flipboard, Threads, WordPress, or any other product being used by someone your users follow. And they all send data in slightly different ways. It's chaos.

So, what we knew up front was that we had a lot of unknown unknowns. We weren't really sure what the data was going to look like, we weren't really sure what features we were going to build with it, and we weren't really sure how any of it was going to work. Bit of a catch 22.

In light of that, we opted to go for (essentially) no database design at all. Right now, Ghost's ActivityPub service uses MySQL mostly as a key/value store, with no formalized structure of any kind. If that made no sense to you: We're essentially blindly copy-and-pasting blog posts into Excel spreadsheets.

The goal was to learn what (real, not theoretical!) ActivityPub data looks like, to inform the process of designing our database schema properly — and it worked! We now understand a lot more than we did a few months ago. We have a good idea of what data needs to be stored, and how to store it.

That brings us to our next problem: Scale.

When you're building a brand new app or startup, scale is not a problem you have to confront right away. Most apps don't reach millions of users, ever. So you can build a simple solution and iterate slowly along the way, as needed. In the case of Ghost, though, we already have a large, established user base. Thinking ahead to making ActivityPub available to all of them (not in beta) means we have to plan, right away, for storing data at scale.

On Ghost(Pro) alone, we have tens of thousands of sites, with tens of millions of members. It takes only a little napkin math to figure out that ActivityPub will very quickly need to handle billions of rows of data. And when you get into the billions, your database had better be well-designed, or things will quickly grind to a halt.

That brings us to the last few weeks. We now have a good idea of the data we're going to be storing, we have a better sense of how we're going to use it, and we've mapped out the approximate scale we think we're going to need it to handle in the context of Ghost(Pro). So how do you determine a good database design?

You test it.

For the past few weeks, we've been setting up testing infrastructure (both locally, and within a staging environment) where we're implementing a new database design and populating it with billions of rows of data, so we can validate the design decisions we're making, now, against the scale we're anticipating over the first 12-24 months of ActivityPub being generally available to everyone.

This is a slow, arduous process, but an important one. We know that we're going to have to live with the database decisions we make in future, and so we're taking extra time here to validate that we're getting them right.

The good news is that it's going well. The bad news is that it's slow, difficult work.

We're feeling confident, though, that carefully planning out our database design, now, will pay dividends in future as both Ghost and the Fediverse grow together.

If you managed to read this far: Congratulations, you're a nerd. Please report to Pug Station 23 to collect your certification.