Hugo Martins

On Signal's Outage

Last January 15th, Signal had an outage of epic proportions. WhatsApp’s mistakes with the communication of their privacy changes, along with the sole fact that they are considering making modifications to their privacy policies, lead to a surge of new users trying to use Signal - potentially as high as 7.5 million new users. This surge of new users was too much for Signal’s infrastructure to bear.

Signal’s server-side code has pretty much been stable throughout the outage, which indicates they are either running a different server code on their infrastructure, which is highly likely given that the last modification to Signal’s server code was on April 2020, or they had problems mostly caused by their clients. Or a combination of both of these. One can only wonder.

Although there’s no official explanation as to why this outage happened, at least that I’m aware of, according to some recent commits in the Android client, made almost exactly after the outage started, it appears that the clients were actually worsening the situation by constantly retrying, after a failure that resulted from the server’s rejection of the updates. This was effectively creating even more load for the servers to bear with. There’s also been some changes related with session resets and automatic retries, with feature flags. I can only wonder if the clients caused more harm than good to an already bad situation on the server-side.

Signal’s outage exposes a deeper issue though. In a world where there are huge technological monopolies, smaller competitors have a multitude of problems. Not only can user adoption be difficult but being able to support the kind of loads that the servers of big technology companies support can quickly become unmanageable. This outage has most probably been caused by infrastructure issues but it was quickly exacerbated by software. I’d guess that the infrastructure didn’t scale fast enough for the amount of users flocking to Signal and, in the end, in broke or a bottleneck was uncovered by the millions of new users signing up and using Signal. Regardless of what the technical reason actually was, it poses an interesting question: in a world where users changing from one platform to another causes an outage, where are users going to go if they are unsatisfied with their current option?

While I use Signal, and was considering moving completely to Signal after WhatsApp’s rampant diregard for, at least, keeping a facade about privacy-caring, their outage will have, probably, driven away a lot of users that were going to move from WhatsApp and were left with a bitter taste in their mouth.

It is obvious that these smaller competitors, with less capital to invest, need to have the infrastructure that supports their size, considering scalability to a certain degree. However, they will never have the ability to have infrastructure laying around for user increases that are orders of magnitude higher than their existing user base, and this is actually…fine. But if these smaller companies can’t handle high increases in their user base, when users are willing to leave their competitors, this leaves users in a grey area of either sticking with the existing application they are using, or having to accept that there are differences in availability between these two companies and, most probably, those differences will always be there.

This is the power that monopolies have on consumers' choices. While they don’t actively force you to stay on their platform, when you look around - most often than not - you’ll find a lot of services but they won’t be able to advertise the same features, availability or stability.