The Federation Deathmatch

It’s the weekend, and I have some Thoughts about federated social media. So, buckle up, I guess, it’s time to start some fights.

It’s the weekend, and I have some Thoughts about federated social media. So, buckle up, I guess, it’s time to start some fights.


Recently there has been some discourse about Bluesky’s latest fundraising round. I’ve been participating in conversations about this on Mastodon, and I think I might sometimes come across as a Mastodon partisan, but my feelings are complex and I really don’t want to be boosting the ActivityPub Fediverse without qualification.

So here are some qualifications.

Bluesky Is Evil

To the extent that I am an ActivityPub partisan in the discourse between ActivityPub and ATProtocol, it is because I do not believe that Bluesky is a meaningfully decentralized social network. It is a social network, run by a company, which has a public API with some elements that might, one day, make it possible for it to be decentralized. But today, it is not, either practically or theoretically.

The Bluesky developers are putting in a ton of effort to maybe make it decentralized, hypothetically, someday. A lot of people think they will succeed. But ActivityPub (and, of course, Mastodon specifically) are already, today, meaningfully decentralized, as you can see on FediDB, there are instances with hundreds of thousands of people on them, before we even get to esoterica like the integrations Threads, Wordpress, Flipboard, and Ghost are doing.

The inciting incident for this post — that a lot of people are also angry about Bluesky raising millions of dollars from Evil Guys Doing Evil Stuff Capitalis indeed a serious concern. It lights the fuse that burns towards their eventual, inevitable incredible journey. ATProtocol is just an API, and that API will get shut off one day, whenever their funders get bored of the pretense of their network being “decentralized”.

At time of writing, it is also interesting that 3 of the 4 times that the CEO of Bluesky has even skeeted the word “blockchain” is to say “no blockchain”, to reassure users that the scam magnet of “Blockchain” is not actually near their product or protocol, which is a much harder position to maintain when your lead investor is “Blockchain Capital”.

I think these are all valid criticisms of Bluesky. But I also think that the actual engineers working on the product are aware of these issues, and are making a significant effort to address them or mitigate them in any way they can. All that work can still be easily incinerated by a slow quarter in terms of user growth numbers or a missed revenue forecast when the VCs are getting impatient, but it’s not nothing, it is a life’s work.

Really, who among us could not have our life’s ambitions trivially destroyed in an afternoon, simply because a billionaire decided that they should be? If you feel like you are safe from this, I have some bad news about how money works. So we are all doing our best in an imperfect system and maybe Bluesky is on to something here. That’s eminently possible. They’re certainly putting forth an earnest effort.

Mastodon Is Stupid

Meanwhile, not nearly as much has been made recently of Mastodon refusing funding from a variety of sources, when all indications are that funding is low, and plummeting, far below the level required to actually sustain the site, and they haven’t done a financial transparency report for over a year, and that report was already nearly a year late.

Mastodon and the fediverse are not nearly in a position to claim moral superiority over Bluesky. Sure, taking blockchain VC money might seem like a rookie mistake, but going out of business because you are spurning every possible source of funding is not that wise either.

Some might think that, sure, Mastodon the company might die but at least the Fediverse as a whole will keep going strong, right? Lots of people run their own instances! I even find elements of this argument convincing, and I think there is probably some truth to it. But to really believe this argument as claimed, that it’s a fait accompli that the fediverse will survive in some form, that all those self-run servers will be a robust network that will self-repair, requires believing some obviously false stuff. It is frankly unprofitable to run a Fediverse instance. Realistically, if you want to operate a mastodon server for yourself, it is going to cost at least $100/year once you include stuff like having a domain name, and managing the infrastructure costs is a complex problem that keeps getting harder to manage as the software itself gets slower.

Cory Doctorow has recently argued that this is all worth it, because at least on Mastodon, you’re in control, not at the whims of centralized website operators like Bluesky. In his words,

On Mastodon (and other services based on Activitypub), you can easily leave one server and go to another, and everyone you follow and everyone who follows you will move over to the new server. If the person who runs your server turns out to be imperfect in a way that you can’t endure, you can find another server, spend five minutes moving your account over, and you’re back up and running on the new server

He concludes:

Any system where users can leave without pain is a system whose owners have high switching costs and whose users have none

(Emphasis mine).

This is a beautiful vision. It is, however, an incorrect assessment of the state of the Fediverse as it stands today. It’s not true in two important ways:

First, if you look at any account of a user’s fediverse account migration, like this one from Steve Bate or this one from the Ente project or this one from Erin Kissane, you will see that it is “painful for the foreseeable future” or “wasn’t as seamless as advertised”, and that “the best time to […] migrate instances […] is never”. This language does not presage a pleasant experience, as Doctorow puts it, “without pain”.

Second, migration is an active process that requires engagement from the instance that hosts you. If you have been blocked or banned, or had your account terminated, you are just out of luck. You do not have control over your data or agency over your online identity unless you’ve shelled out the relatively exorbitant amount of money to actually operate your own instance.

In short, ActivityPub is no panacea. A federated system is not really a “decentralized” system, as much as it is a bunch of smaller centralized systems that all talk to each other. You still need to know, and care, about your social and financial relationship to the operators of your instance. There is probably no getting away from this, like, just generally on the Internet, no matter how much peer-to-peer software we deploy, but there certainly isn’t in the incomplete mess that is ActivityPub.

JOIN, or DIE.

Neither Mastodon (or ActivityPub) nor Bluesky (or ATProtocol) has a comprehensive solution to the problem of decentralized social media. These companies, and these protocols, are both deeply flawed and if everything keeps bumping along as it is, I believe both are likely to fail. At different times, on different timelines, and for different reasons, but fail nonetheless.

However, these networks are both small and growing, and we are not yet in the phase of enshittification where margins are shrinking and audiences are captured and the screws must be tightened to juice revenue. There are stil possibilities. Mastodon is crowdfunded and what they lack in resources they make up for in flexibility and scrappiness. Bluesky has money and while there will eventually be a need to monetize somehow, they have plenty of runway to come up with that answer, and a lot of sophisticated protocol work has been done. Not enough to make a complete circut and allow users true, practical decentralization, but it’s not nothing, either.

Mastodon and Bluesky are both organizations with humans in them, and piles of data that is roughly schema-compatible even if the nuances and details are different. I know that there is a compatible model becuse thanks to both platforms being relatively open, there is a functioning ActivityPub/ATProtocol bridge in the form of Brid.gy Fed. You can use it today, and I highly recommend that you do so, so that “choice of protocol” does not fully define your audience. If you’re on bluesky, follow this account, and if you’re on Mastodon or elsewhere on the Fediverse, search for and follow @bsky.brid.gy@bsky.brid.gy.

The reality that fans of decentralized, independent social media must confront is that we are a tiny audicence right now. Whichever site we are looking at, we are talking about a few million monthly active users at best, in a world where even the pathetic husk of Twitter still has hundreds of millions and Facebook has billions. Interneceine fights are not going to get us anywhere. We need to build bridges and links and connect our networks as densely as possible. If I’m being honest, Bridgy Fed looks like a pretty janky solution, but it’s something, and we need to start doing something soon, so we do not collectively become a permanent minority that mass markets can safely ignore.

As users, we need to set an example, so that the developers of the respective platforms get their shit together and work together directly so that workarounds like Bridgy are not required. Frankly, this is mostly on the ActivityPub and Mastodon devs, as far as I can tell. Unfortunately, not a lot of this seems to be public, or at least I haven’t witnessed a lot of it directly, but I have heard repeatedly that the ActivityPub developers are prickly, and this is one high-profile public example where an ActivityPub partisan is incredibly, pointlessly hostile and borderline harrassing towards someone — Mike Masnick, a long-time staunch advocate for open protocols and open patents, someone with a Mastodon account, and thus as good a prospective ally as the ActivityPub fediverse might reasonably find — explaining some of the relative benefits of Bluesky.

Most of us are technology nerds in one way or another. In that way we can look at signifiers like “ActivityPub” and “ATProtocol”, and feel like these are hard boundaries around different all-encompassing structures for the future, and thus tribes we must join and support.

A better way to look at this, however, is to see social entities like Mastodon gGmbH and Bluesky PBC — or, more to the point, Fosstodon, SFBA Social, Hachyderm (and maybe, one day, even an instance which isn’t fully just for software development nerds), as groups that deploy these protocols to access some data that they publish, just as they might publish their website over HTTP or their newsletters over SMTP. There are technical challenges involved in bridging between mutually unintelligible domain models, but that is, like, network software's whole deal. Most software is just some kind of translation from one format or context to another. The best possible future for the fediverse is the one where users care as much about the distinction between ATProtocol and ActivityPub as they do about the distinction between POP3 and IMAP.

To both developers and users of these systems, I say: get it together. Be nice to each other. Because the rest of the social media ecosystem is sure as shit not going to be nice to us if we ever see even a hint of success and start to actually cut into their user base.

Acknowledgments

Thank you to my patrons who are supporting my writing on this blog. If you like what you’ve read here and you’d like to read more of it, or you’d like to support my various open-source endeavors, you can support my work as a sponsor!

Techniques for Actually Distributed Development with Git

It's all very wibbly wobbly and versiony wersiony.

The Setup

I have a lot of computers. Sometimes I'll start a project on one computer and get interrupted, then later find myself wanting to work on that same project, right where I left off, on another computer. Sometimes I'm not ready to publish my work, and I might still want to rebase it a few times (because if you're not arbitrarily rewriting history all the time you're not really using Git) so I don't want to push to @{upstream} (which is how you say "upstream" in Git).

I would like to be able to use Git to synchronize this work in progress between multiple computers. I would like to be able to easily automate this synchronization so that I don’t have to remember any special steps to sync up; one of the most focused times that I have available to get coding and writing work done is when I’m disconnected from the Internet, on a cross-country train or a plane trip, or sitting near a beach, for 5-10 hours. It is very frustrating to realize only once I’m settled in and unable to fetch the code, that I don’t actually have it on my laptop because I was last doing something on my desktop. I would particularly like to be able to that offline time to resolve tricky merge conflicts; if there are two versions of a feature, I want to have them both available locally while I'm disconnected.

Completely Fake, Made-Up History That Is A Lie

As everyone knows, Git is a centralized version control system created by the popular website GitHub as a command-line client for its "forking" HTML API. Alternate central Git authorities have been created by other startups following in the wave following GitHub's success, such as BitBucket and GitLab.

It may surprise many younger developers to know that when GitHub first created Git, it was originally intended to be a distributed version control system, where it was possible to share code with no particular central authority!

Although the feature has been carefully hidden from the casual user, with a bit of trickery you can enable re-enable it!

Technique 0: Understanding What's Going On

It's a bit confusing to have to actually set up multiple computers to test these things, so one useful thing to understand is that, to Git, the place you can fetch revisions from and push revisions to is a repository. Normally these are identified by URLs which identify hosts on the Internet, but you can also just indicate a path name on your computer. So for example, we can simulate a "network" of three computers with three clones of a repository like this:

1
2
3
4
5
6
7
8
$ mkdir tmp
$ cd tmp/
$ mkdir a b c
$ for repo in a b c; do (cd $repo; git init); done
Initialized empty Git repository in .../tmp/a/.git/
Initialized empty Git repository in .../tmp/b/.git/
Initialized empty Git repository in .../tmp/c/.git/
$ 

This creates three separate repositories. But since they're not clones of each other, none of them have any remotes, and none of them can push or pull from each other. So how do we teach them about each other?

1
2
3
4
5
6
7
8
9
$ cd a
$ git remote add b ../b
$ git remote add c ../c
$ cd ../b
$ git remote add a ../a
$ git remote add c ../c
$ cd ../c
$ git remote add a ../a
$ git remote add b ../b

Now, you can go into a and type git fetch --all and it will fetch from b and c, and similarly for git fetch --all in b and c.

To turn this into a practical multiple-machine scenario, rather than specifying a path like ../b, you would specify an SSH URL as your remote URL, and turn SSH on on each of your machines ("Remote Login" in the Sharing preference pane on the mac, if you aren't familiar with doing that on a mac).

So, for example, if you have a home desktop tweedledee and a work laptop tweedledum, you can do something like this:

1
2
3
4
5
6
7
8
tweedledee:~ neo$ mkdir foo; cd foo
tweedledee:foo neo$ git init .
# ...
tweedledum:~ m_anderson$ mkdir bar; cd bar
tweedledum:bar m_anderson$ git init .
tweedledum:bar m_anderson$ git remote add tweedledee neo@tweedledee.local:foo
# ...
tweedledee:foo neo$ git remote add tweedledum m_anderson$@tweedledum.local:foo

I don't know the names of the hosts on your network. So, in order to make it possible for you to follow along exactly, I'll use the repositories that I set up above, with path-based remotes, in the following examples.

Technique 1 (Simple): Always Only Fetch, Then Merge

Git repositories are pretty boring without any commits, so let's create a commit:

1
2
3
4
5
6
7
$ cd ../a
$ echo 'some data' > data.txt
$ git add data.txt
$ git ci -m "data"
[master (root-commit) 8dc3db4] data
 1 file changed, 1 insertion(+)
 create mode 100644 data.txt

Now on our "computers" b and c, we can easily retrieve this commit:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
$ cd ../b/
$ git fetch --all
Fetching a
remote: Counting objects: 3, done.
remote: Total 3 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
From ../a
 * [new branch]      master     -> a/master
Fetching c
$ git merge a/master
$ ls
data.txt
$ cd ../c
$ ls
$ git fetch --all
Fetching a
remote: Counting objects: 3, done.
remote: Total 3 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
From ../a
 * [new branch]      master     -> a/master
Fetching b
From ../b
 * [new branch]      master     -> b/master
$ git merge b/master
$ ls
data.txt

If we make a change on b, we can easily pull it into a as well.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
$ cd ../b
$ echo 'more data' > data.txt 
$ git commit data.txt -m "more data"
[master f3d4165] more data
 1 file changed, 1 insertion(+), 1 deletion(-)
$ cd ../a
$ git fetch --all
Fetching b
remote: Counting objects: 5, done.
remote: Total 3 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
From ../b
 * [new branch]      master     -> b/master
Fetching c
From ../c
 * [new branch]      master     -> c/master
$ git merge b/master
Updating 8dc3db4..f3d4165
Fast-forward
 data.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

This technique is quite workable, except for one minor problem.

The Minor Problem

Let's say you're sitting on your laptop and your battery is about to die. You want to push some changes from your laptop to your desktop. Your SSH key, however, is plugged in to your laptop. So you just figure you'll push from your laptop to your desktop. Unfortunately, if you try, you'll see something like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
$ cd ../a/
$ echo 'even more data' >> data.txt 
$ git commit data.txt -m "even more data"
[master a9f3d89] even more data
 1 file changed, 1 insertion(+)
$ git push b master
Counting objects: 7, done.
Writing objects: 100% (3/3), 260 bytes | 0 bytes/s, done.
Total 3 (delta 0), reused 0 (delta 0)
remote: error: refusing to update checked out branch: refs/heads/master
remote: error: By default, updating the current branch in a non-bare repository
remote: error: is denied, because it will make the index and work tree inconsistent
remote: error: with what you pushed, and will require 'git reset --hard' to match
remote: error: the work tree to HEAD.
remote: error: 
remote: error: You can set 'receive.denyCurrentBranch' configuration variable to
remote: error: 'ignore' or 'warn' in the remote repository to allow pushing into
remote: error: its current branch; however, this is not recommended unless you
remote: error: arranged to update its work tree to match what you pushed in some
remote: error: other way.
remote: error: 
remote: error: To squelch this message and still keep the default behaviour, set
remote: error: 'receive.denyCurrentBranch' configuration variable to 'refuse'.
To ../b
 ! [remote rejected] master -> master (branch is currently checked out)
error: failed to push some refs to '../b'

While you're reading this, you fail your will save and become too bored to look at a computer any more.

Too late, your battery is dead! Hopefully you didn't lose any work.

In other words: sometimes it's nice to be able to push changes as well.

Technique 1.1: The Manual Workaround

The problem that you're facing here is that b has its master branch checked out, and is therefore rejecting changes to that branch. Your commits have actually all been "uploaded" to b, and are present in that repository, but there is no branch pointing to them. Doing either of those configuration things that Git warns you about in order to force it to allow it is a bad idea, though; if your working tree and your index and your your commits don't agree with each other, you're just asking for trouble. Git is confusing enough as it is.

In order work around this, you can just push your changes in master on a to a diferent branch on b, and then merge it later, like so:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
$ git push b master:master_from_a
Counting objects: 7, done.
Writing objects: 100% (3/3), 260 bytes | 0 bytes/s, done.
Total 3 (delta 0), reused 0 (delta 0)
To ../b
 * [new branch]      master -> master_from_a
$ cd ../b
$ git merge master_from_a 
Updating f3d4165..a9f3d89
Fast-forward
 data.txt | 1 +
 1 file changed, 1 insertion(+)
$ 

This works just fine, you just always have to manually remember which branches you want to push, and where, and where you're pushing from.

Technique 2: Push To Reverse Pull To Remote Remotes

The astute reader will have noticed at this point that git already has a way of tracking "other places that changes came from", they're called remotes! And in fact b already has a remote called a pointing at ../a. Once you're sitting in front of your b computer again, wouldn't you rather just have those changes already in the a remote, instead of in some branch you have to specifically look for?

What if you could just push your branches from a into that remote? Well, friend, I'm here today to tell you you can.

First, head back over to a...

1
$ cd ../a

And now, all you need is this entirely straightforward and obvious command:

1
$ git config remote.b.push '+refs/heads/*:refs/remotes/a/*'

and now, when you git push b from a, you will push those branches into b's "a" remote, as if you had done git fetch a while in b.

1
2
3
4
$ git push b
Total 0 (delta 0), reused 0 (delta 0)
To ../b
   8dc3db4..a9f3d89  master -> a/master

So, if we make more changes:

1
2
3
4
$ echo 'YET MORE data' >> data.txt
$ git commit data.txt -m "You get the idea."
[master c641a41] You get the idea.
 1 file changed, 1 insertion(+)

we can push them to b...

1
2
3
4
5
6
$ git push b
Counting objects: 5, done.
Writing objects: 100% (3/3), 272 bytes | 0 bytes/s, done.
Total 3 (delta 0), reused 0 (delta 0)
To ../b
   a9f3d89..c641a41  master -> a/master

and when we return to b...

1
$ cd ../b/

there's nothing to fetch, it's all been pre-fetched already, so

1
$ git fetch a

produces no output.

But there is some stuff to merge, so if we took b on a plane with us:

1
2
3
4
5
$ git merge a/master
Updating a9f3d89..c641a41
Fast-forward
 data.txt | 1 +
 1 file changed, 1 insertion(+)

we can merge those changes in whenever we please!

More importantly, unlike the manual-syncing solution, this allows us to push multiple branches on a to b without worrying about conflicts, since the a remote on b will always only be updated to reflect the present state of a and should therefore never have conflicts (and if it does, it's because you rewrote history and you should be able to force push with no particular repercussions).

Generalizing

Here's a shell function which takes 2 parameters, "here" and "there". "here" is the name of the current repository - meaning, the name of the remote in the other repository that refers to this one - and "there" is the name of the remote which refers to another repository.

1
2
3
4
5
6
function remoteremote () {
    local here="$1"; shift;
    local there="$1"; shift;

    git config "remote.$there.push" "+refs/heads/*:refs/remotes/$here/*";
}

In the above example, we could have used this shell function like so:

1
2
$ cd ../a
$ remoteremote a b

I now use this all the time when I check out a repository on multiple machines for the first time; I can then always easily push my code to whatever machine I’m going to be using next.

I really hope at least one person out there has equally bizarre usage patterns of version control systems and finds this post useful. Let me know!

Acknowledgements

Thanks very much to Tom Prince for the information that lead to this post being worth sharing, and Jenn Schiffer for teaching me that it is OK to write jokes sometimes, except about javascript which is very serious and should not be made fun of ever.