Bilithification

Not sure how to do microservices? Split your monolith in half.

Several years ago at O’Reilly’s Software Architecture conference, within a comprehensive talk on refactoring “Technical Debt: A Masterclass”, r0ml1 presented a concept that I think should be highlighted.

If you have access to O’Reilly Safari, I think the video is available there, or you can get the slides here. It’s well worth watching in its own right. The talk contains a lot of hard-won wisdom from a decades-long career, but in slides 75-87, he articulates a concept that I believe resolves the perennial pendulum-swing between microservices and monoliths that we see in the Software as a Service world.

I will refer to this concept as “the bilithification strategy”.

Background

Personally, I have long been a microservice skeptic. I would generally articulate this skepticism in terms of “YAGNI”.

Here’s the way I would advise people asking about microservices before encountering this concept:

Microservices are often adopted by small teams due to their advertised benefits. Advocates from very large organizations—ones that have been very successful with microservices—frequently give talks claiming that microservices are more modular, more scalable, and more fault-tolerant than their monolithic progenitors. But these teams rarely appreciate the costs, particularly the costs for smaller orgs. Specifically, there is a fixed operational marginal cost to each new service, and a fairly large fixed operational overhead to the infrastructure for an organization deploying microservices in at all.

With a large enough team, the operational cost is easy to absorb. As the overhead is fixed, it trends towards zero as your total team size and system complexity trend towards infinity. Also, in very large teams, the enforced isolation of components in separate services reduces complexity. It does so specifically intentionally causing the software architecture to mirror the organizational structure of the team that deploys it. This — at the cost of increased operational overhead and decreased efficiency — allows independent parts of the organization to make progress independently, without blocking on each other. Therefore, in smaller teams, as you’re building, you should bias towards building a monolith until the complexity costs of the monolith become apparent. Then you should build the infrastructure to switch to microservices.

I still stand by all of this. However, it’s incomplete.

What does it mean to “switch to microservices”?

The biggest thing that this advice leaves out is a clear understanding of the “micro” in “microservice”. In this framing, I’m implicitly understanding “micro” services to be services that are too small — or at least, too small for your team. But if they do work for large organizations, then at some point, you need to have them. This leaves open several questions:

  • What size is the right size for a service?
  • When should you split your monolith up into smaller services?
  • Wait, how do you even measure “size” of a service? Lines of code? Gigabytes of memory? Number of team members?

In a specific situation I could probably look at these questions for that situation, and make suggestions as to the appropriate course of action, but that’s based largely on vibes. There’s just a lot of drawing on complex experiences, not a repeatable pattern that a team could apply on their own.

We can be clear that you should always start with a monolith. But what should you do when that’s no longer working? How do you even tell when it’s no longer working?

Bilithification

Every codebase begins as a monolith. That is a single (mono) rock (lith). Here’s what it looks like.

a circle with the word “monolith” on it

Let’s say that the monolith, and the attendant team, is getting big enough that we’re beginning to consider microservices. We might now ask, “what is the appropriate number of services to split the monolith into?” and that could provoke endless debate even among a team with total consensus that it might need to be split into some number of services.

Rather than beginning with the premise that there is a correct number, we may observe instead that splitting the service into N services where N is more than one may be accomplished splitting the service in half N-1 times.

So let’s bi (two) lithify (rock) this monolith, and take it from 1 to 2 rocks.

The task of splitting the service into two parts ought to be a manageable amount of work — two is a definitively finite number, as composed to the infinite point-cloud of “microservices”. Thus, we should search, first, for a single logical seam along which we might cleave the monolith.

a circle with the word “monolith” on it and a line through it

In many cases—as in the specific case that r0ml gave—the easiest way to articulate a boundary between two parts of a piece of software is to conceptualize a “frontend” and a “backend”. In the absence of any other clear boundary, the question “does this functionality belong in the front end or the back end” can serve as a simple razor for separating the service.

Remember: the reason we’re splitting this software up is because we are also splitting the team up. You need to think about this in terms of people as well as in terms of functionality. What division between halves would most reduce the number of lines of communication, to reduce the quadratic increase in required communication relationships that comes along with the linear increase in team size? Can you identify two groups who need to talk amongst themselves, but do not need to talk with all of each other?2

two circles with the word “hemilith” on them and a double-headed arrow
between them

Once you’ve achieved this separation, we no longer have a single rock, we have two half-rocks: hemiliths to borrow from the same Greek root that gave us “monolith”.

But we are not finished, of course. Two may not be the correct number of services to end up with. Now, we ask: can we split the frontend into a frontend and backend? Can we split the backend? If so, then we now have four rocks in place of our original one:

four circles with the word “tetartolith” on them and double-headed arrows
connecting them all

You might think that this would be a “tetralith” for “four”, but as they are of a set, they are more properly a tetartolith.

Repeat As Necessary

At some point, you’ll hit a point where you’re looking at a service and asking “what are the two pieces I could split this service into?”, and the answer will be “none, it makes sense as a single piece”. At that point, you will know that you’ve achieved services of the correct size.

One thing about this insight that may disappoint some engineers is the realization that service-oriented architecture is much more an engineering management tool than it is an engineering tool. It’s fun to think that “microservices” will let you play around with weird technologies and niche programming languages consequence-free at your day job because those can all be “separate services”, but that was always a fantasy. Integrating multiple technologies is expensive, and introducing more moving parts always introduces more failure points.

Advanced Techniques: A Multi-Stack Microservice Environment

You’ll note that splitting a service heavily implies that the resulting services will still all be in the same programming language and the same tech stack as before. If you’re interested in deploying multiple stacks (languages, frameworks, libraries), you can proceed to that outcome via bilithification, but it is a multi-step process.

First, you have to complete the strategy that I’ve already outlined above. You need to get to a service that is sufficiently granular that it is atomic; you don’t want to split it up any further.

Let’s call that service “X”.

Second, you identify the additional complexity that would be introduced by using a different tech stack. It’s important to be realistic here! New technology always seems fun, and if you’re investigating this, you’re probably predisposed to think it would be an improvement. So identify your costs first and make sure you have them enumerated before you move on to the benefits.

Third, identify the concrete benefits to X’s problem domain that the new tech stack would provide.

Finally, do a cost-benefit analysis where you make sure that the costs from step 2 are clearly exceeded by the benefits from step three. If you can’t readily identify that in advance – sometimes experimentation is required — then you need to treat this as an experiment, rather than as a strategic direction, until you’ve had a chance to answer whatever questions you have about the new technology’s benefits benefits.

Note, also, that this cost-benefit analysis requires not only doing the technical analysis but getting buy-in from the entire team tasked with maintaining that component.

Conclusion

To summarize:

  1. Always start with a monolith.
  2. When the monolith is too big, both in terms of team and of codebase, split the monolith in half until it doesn’t make sense to split it in half any more.
  3. (Optional) Carefully evaluate services that want to adopt new technologies, and keep the costs of doing that in mind.

There is, of course, a world of complexity beyond this associated with managing the cost of a service-oriented architecture and solving specific technical problems that arise from that architecture.

If you remember the tetartolith, though, you should at least be able to get to the right number and size of services for your system.


Thank you to my patrons who are supporting my writing on this blog. If you like what you’ve read here and you’d like to read more of it, or you’d like to support my various open-source endeavors, you can support me on Patreon as well! I am also available for consulting work if you think your organization could benefit from more specificity on the sort of insight you've seen here.


  1. AKA “my father” 

  2. Denouncing “silos” within organizations is so common that it’s a tired trope at this point. There is no shortage of vaguely inspirational articles across the business trade-rag web and on LinkedIn exhorting us to “break down silos”, but silos are the point of having an organization. If everybody needs to talk to everybody else in your entire organization, if no silos exist to prevent the necessity of that communication, then you are definitionally operating that organization at minimal efficiency. What people actually want when they talk about “breaking down silos” is a re-org into a functional hierarchy rather than a role-oriented hierarchy (i.e., “this is the team that makes and markets the Foo product, this is the team that makes and markets the Bar product” as opposed to “this is the sales team”, “this is the engineering team”). But that’s a separate post, probably. 

A Response to Jacob Kaplan-Moss’s “Incompetent But Nice”

What can managers do about employees who are easy to work with, and are trying their best, but can’t seem to get the job done?

Jacob Kaplan-Moss has written a post about one of the most stressful things that can happen to you as a manager: when someone on your team is getting along well with the team, apparently trying their best, but unable to do the work. Incompetent, but nice.

I have some relevant experience with this issue. On more than one occasion in my career:

  1. I have been this person, more than once. I have both resigned, and been fired, as a result.
  2. I’ve been this person’s manager, and had to fire them after being unable to come up with a training plan that would allow them to improve.
  3. I’ve been an individual contributor on a team with this person, trying to help them improve.

So I can speak to this issue from all angles, and I can confirm: it is gut-wrenchingly awful no matter where you are in relation to it. It is social stress in its purest form. Everyone I’ve been on some side of this dynamic with is someone that I’d love to work with again. Including the managers who fired me!

Perhaps most of all, since I am not on either side of an employer/employee relationship right now1, I have some emotional distance from this kind of stress which allows me to write about it in a more detached way than others with more recent and proximate experience.

As such, I’d like to share some advice, in the unfortunate event that you find yourself in one of these positions and are trying to figure out what to do.

I’m going to speak from the perspective of the manager here, because that’s where most of the responsibility for decision-making lies. If you’re a teammate you probably need to wait for a manager to ask you to do something, if you’re the underperformer here you’re probably already trying as hard as you can to improve. But hopefully this reasoning process can help you understand what the manager is trying to do here, and find the bits of the process you can take some initiative to help with.

Step 0: Preliminaries

First let’s lay some ground rules.

  1. Breathe. Maintaining an explicit focus on explicitly regulating your own mood is important, regardless of whether you’re the manager, teammate, or the underperformer.
  2. Accept that this may be intractable. You’re going to do your best in this situation but you are probably choosing between bad options. Nevertheless you will need to make decisions as confidently and quickly as possible. Letting this situation drag on can be a recipe for misery.
  3. You will need to do a retrospective.2 Get ready to collect information as you go through the process to try to identify what happened in detail so you can analyze it later. If you are the hiring manager, that means that after you’ve got your self-compassion together and your equanimous professional mood locked in, you will also need to reflect on the fact that you probably fucked up here, and get ready to try to improve your own skills and processes so that you don’t end up this situation again.

I’m going to try to pick up where Jacob left off, and skip over the “easy” parts of this process. As he puts it:

The proximate answer is fairly easy: you try to help them level up: pay for classes, conferences, and/or books; connect them with mentors or coaches; figure out if something’s in the way and remove the blocker. Sometimes folks in this category just need to learn, and because they’re nice it’s easy to give them a lot of support and runway to level up. Sometimes these are folks with things going on in their lives outside work and they need some time (or some leave) to focus on stuff that’s more important than work. Sometimes the job has requirements that can be shifted or eased or dropped – you can match the work to what the person’s good at. These situations aren’t always easy but they are simple: figure out the problem and make a change.

Step 1: Figuring Out What’s Going On

There are different reasons why someone might underperform.

Possibility: The person is over-leveled.

This is rare. For the most part, pervasive under-leveling is more of a problem in the industry. But, it does happen, and when it happens, what it looks like is someone who is capable of doing some of the work that they’re assigned, but getting stuck and panicking with more challenging or ambiguous assignments.

Moreover, this particular organizational antipattern, the “nice but incompetent” person, is more likely to be over-leveled, because if they’re friendly and getting along with the team, then it’s likely they made a good first impression and made the hiring committee want to go to bat for them as well, and may have done them the “favor” of giving them a higher level. This is something to consider in the retrospective.

Now, Jacob’s construction of this situation explicitly allows for “leveling up”, but the sort of over-leveling that can be resolved with a couple of conference talks, books, and mentoring sessions is just a challenge. We’re talking about persistent resistance to change here, and that sort of over-leveling means that they have skipped an entire rung on the professional development ladder, and do not have the professional experience at this point to bridge the gaps between their experience and the ladder.

If this is the case, consider a demotion. Identify the aspects of the work that the underperformer can handle, and try to match them to a role. If you are yourself the underperformer, proactively identifying what you’re actually doing well at and acknowledging the work you can’t handle, and identifying any other open headcount for roles at that level can really make this process easier for your manager.

However, be sure to be realistic. Are they capable enough to work at that reduced level? Does your team really have a role for someone at that level? Don’t invent makework in the hopes that you can give them a bootleg undergraduate degree’s worth of training on the job; they need to be able to contribute.

Possibility: Undiagnosed health issues

Jacob already addressed the “easy” version here: someone is struggling with an issue that they know about and they can ask for help, or at least you can infer the help they need from the things they’ve explicitly said.

But the underperformer might have something going on which they don’t realize is an issue. Or they might know there’s an issue but not think it’s serious, or not be able to find a diagnosis or treatment. Most frequently, this is a mental health problem, but it can also be things like unexplained fatigue.

This possibility is the worst. Not only do you feel like a monster for adding insult to injury, there’s also a lot of risk in discussing it.

Sometimes, you feel like you can just tell3 that somebody is suffering from a particular malady. It may seem obvious to you. If you have any empathy, you probably want to help them. However, you have to be careful.

First, some illnesses qualify as disabilities. Just because the employee has not disclosed their disability to you, does not mean they are unaware. It is up to the employee whether to tell you, and you are not allowed to require them to disclose anything to you. They may have good reasons for not talking about it.

Beyond highlighting some relevant government policy I am not equipped on how to advise you on how to handle this. You probably want to loop in someone from HR and/or someone from Legal, and have a meeting to discuss the particulars of what’s happening and how you’d like to try to help.

Second, there’s a big power differential here. You have to think hard about how to broach the subject; you’re their boss, telling them that you think they’re too sick to work. In this situation, they explicitly don’t necessarily agree, and that can quite reasonably be perceived as an attack and an insult, even if you’re correct. Hopefully the folks in Legal or HR can help you with some strategies here; again, I’m not really qualified to do anything but point at the risks involved and say “oh no”.

The “good” news here is that if this really is the cause, then there’s not a whole lot to change in your retrospective. People get sick, their families get sick, it can’t always be predicted or prevented.

Possibility: Burnout

While it is certainly a mental health issue in its own right, burnout is specifically occupational and you can thus be a bit more confident as a manager recognizing it in an employment context.

This is better to prevent than to address, but if you’ve got someone burning out badly enough to be a serious performance issue, you need to give them more leave than they think they need, and you need to do it quickly. A week or two off is not going to cut it.

In my experience, this is the most common cause of an earnest but agreeable employee underperforming, and it is also the one we are most reluctant to see. Each step on the road to burnout seems locally reasonable.

Just push yourself a little harder. Just ask for a little overtime. Just until this deadline; it’s really important. Just until we can hire someone; we’ve already got a req open for that role. Just for this one client.

It feels like we should be able to white-knuckle our way through “just” a little inconvenience. It feels that way both individually and collectively. But the impacts are serious, and they are cumulative.

There are two ways this can manifest.

Usually, it’s a gradual decline that you can see over time, and you’ll see this in an employee that was previously doing okay, but now can’t hack it.

However, another manifestation is someone who was burned out at their previous role, did not take any break between jobs, and has stepped into a moderately stressful role which could be a healthy level of challenge for someone refreshed and taking frequent enough breaks, but is too demanding for someone who needs to recover.

If that’s the case, and you feel like you accurately identified a promising candidate, it is really worthwhile to get that person the break that they need. Just based on vague back-of-the-envelope averages, it would typically be about as expensive to find a way to wrangle 8 weeks of extra leave than to go through the whole hiring process for a mid-career software engineer from scratch. However, that math assumes that the morale cost of firing someone is zero, and the morale benefit of being seen to actually care about your people and proactively treat them well as also zero.

If you can positively identify this as the issue, then you have a lot of work to do in the retrospective. People do not spontaneously burn out by themselves. this is a management problem and this person is likely to be the first domino to fall. You may need to make some pretty big changes across your team.

Possibility: Persistent Personality Conflict

It may be that someone else on the team is actually the problem. If the underperformer is inconsistent, observe the timing of the inconsistencies: does it get much worse when they’re assigned to work closely with someone else? Note that “personality conflict” does not mean that the other person is necessarily an asshole; it is possible for communication styles to simply fail to mesh due to personality differences which cannot be easily addressed.

You will be tempted to try to reshuffle responsibilities to keep these team members further apart from each other.

Don’t.

People with a conflict that is persistently interfering in day-to-day work need to be on different teams entirely. If you attempt to separate them but have them working closely, then inevitably one is going to get the higher-status projects and the other is going to be passed over for advancement. Or they’re going to end up drifting back into similar areas again.

Find a way to transfer them internally far enough away that they can have breathing room away from this conflict. If you can’t do that, then a firing may be your best option.

In the retrospective, it’s worth examining the entire team dynamic at this point, to see if the conflict is really just between two people, or if it’s more pervasive than that and other members of the team are just handling it better.

Step 2: Help By Being Kind, Not By Being Nice

Again, we’re already past the basics here. You’ve already got training and leave and such out of the way. You’re probably going to need to make a big change.

Responding to Jacob, specifically:

Firing them feels wrong; keeping them on feels wrong.

I think that which is right is heavily context-dependent. But, realistically, firing is the more likely option. You’ve got someone here who isn’t performing adequately, and you’ve already deployed all the normal tools to try to dig yourself out of that situation.

So let’s talk about that first.

Firing

Briefly, If you need to fire them, just fire them, and do it quickly.

Firing people sucks, even obnoxious people. Not to mention that this situation we’re in is about a person that you like! You’ll want to be nice to this person. The person is also almost certainly trying their best. It’s pretty hard to be agreeable to your team if you’re disappointing that team and not even trying to improve.

If you find yourself thinking “I probably need to fire this person but it’s going to be hard on them”, the thought “hard on them” indicates you are focused on trying to help them personally, and not what is best for your company, your team, or even the employee themselves professionally. The way to show kindness in that situation is not to keep them in a role that’s bad for them and for you.

It would be much better for the underperformer to find a role where they are not an underperformer, and at this point, that role is probably not on your team. Every minute that you keep them on in the bad role is a minute that they can’t spend finding a good one.

As such, you need to immediately shift gears towards finding them a soft landing that does not involve delaying whatever action you need to take.

Being kind is fine. It is not even a conflict to try to spend some company resources to show that kindness. It is in the best interest of all concerned that anyone you have to fire or let go is inclined to sing your praises wherever they end up. The best brand marketing in the world for your jobs page is a diaspora of employees who wish they could still be on your team.

But being nice here, being conflict-avoidant and agreeable, only drags out an unpleasant situation. The way to spend company resources on kindness is to negotiate with your management for as large a severance package as you can manage, and give as much runway as possible for their job search, and clarity about what else you can do.

For example, are you usable as a positive reference? I.e., did they ever have a period where their performance was good, which you could communicate to future employers? Be clear.

Not-Firing

But while I think it’s the more likely option, it’s certainly not the only option. There are certainly cases where underperformers really can be re-situated into better roles, and this person could still find a good fit within the team, or at least, the company. You think you’ve solved the mystery of the cause of the problem here, and you need to make a change. What then?

In that case, the next step is to have a serious conversation about performance management. Set expectations clearly. Ironically, if you’re dealing with a jerk, you’ve probably already crisply communicated your issues. But if you’re dealing with a nice person, you’re more likely to have a slow, painful drift into this awkward situation, where they probably vaguely know that they’re doing poorly but may not realize how poorly.

If that’s what’s happening, you need to immediately correct it, even if firing isn’t on the table. If you’ve gotten to this point, some significant action is probably necessary. Make sure they understand the urgency of the situation, and if you have multiple options for them to consider, give them a clear timeline for how long they have to make a decision.

As I detailed above, things like a down-leveling or extended leave might be on the table. You probably do not want to do anything like that in a normal 1x1: make sure you have enough time to deal with it.

Remember: Run That Retrospective!

In most cases where this sort of situation develops, is a clear management failure. If you’re the manager, you need to own it.

Try to determine where the failure was. Was it hiring? Leveling? A team culture that encourages burnout? A poorly-vetted internal transfer? Accepting low performance for too long, not communicating expectations?

If you can identify a systemic cause as actionable, then you need to make sure there is time and space to make necessary changes, and soon. It’s stressful to have to go through this process with one person, but if you have to do this repeatedly, any dynamic that can damage multiple people’s productivity persistently is almost definitionally a team-destroyer.


  1. REMEMBER TO LIKE AND SUBSCRIBE 

  2. I know it’s a habit we all have from industry jargon — heck, I used the term in my own initial Mastodon posts about this — but “postmortem” is a fraught term in the best of circumstances, and super not great when you’re talking about an actual person who has recently been fired. Try to stick to “retrospective” or “review” when you’re talking about this process with your team. 

  3. Now, I don’t know if this is just me, but for reasons that are outside the scope of this post, when I was in my mid teens I got a copy of the DSM-IV, read the whole thing back to back, and studied it for a while. I’ve never had time to catch up to the DSM-5 but I’m vaguely aware of some of the changes and I’ve read a ton of nonfiction related to mental health. As a result of this self-education, I have an extremely good track record of telling people “you should see a psychiatrist about X”. I am cautious about this sort of thing and really only tell close friends, but as far as I know my hit-rate is 100%.