
Gergely Orosz
over 4 years ago
•View on 𝕏
The twist on an the ongoing Facebook outage is that infra/oncall teams at FB likely use their own, in-house chat service on top of t.co/HTjjifnz2J or FB workspaces to communicate while resolving outages. Now this is also down. Good thing Whatsapp is still up. Oh wait…
When Uber had its own, self-hosted chat service, this was a major point of discussion. What happens if they chat service goes down while coordinating an outage? Or what if it has an outage? Most companies now have a different problem. What if Slack/PagerDuty goes down that time?
The good thing for lost companies using third parties that even if the third party goes down, you have your internal systems to e.g. check who is oncall, and their phone number. That also went down for Facebook. This is a horrible day to be oncall there, and a good one to be OO.
Just in: coordination is happening via… IRC! Facebook planned ahead for the unlikely case of their infra going down. Which it kind of did. Wishing ppl oncall best of luck getting this resolved. It’s not the most fun to be in their seat, but it’s a story to tell for years.
Fun fact: if this type of global outage happened at Amazon, a top exec (e.g. VP, or maybe above) would likely be on that coordination call AFAIK. They'd drop any meeting they'd be in and jump in to see what's happening - it's the culture there. Less common at other big tech.
After an Amazon exec joined Uber, on their first week there was this L5, low impact outage (the highest sev, meaning rides were impacted). It was a pretty "standard" outage then for me. This person N levels up the chain shows up on the incident Zoom with engineers. We all go: t.co/mqQE00LPQo
Another twist: Facebook's buildings use the Facebook domain for badge authentication: WOW. Just WOW. You can't fall back to IRC there... t.co/YZIycpUF9Q
Page created with TweetHunter
Write your own