Vanishing power feeds, UPS batteries, failover fails... Cloudflare explains that two-day outage

Lee Duna@lemmy.nz · 11 months ago

Vanishing power feeds, UPS batteries, failover fails... Cloudflare explains that two-day outage

JakenVeina@lemm.ee · 11 months ago

the overnight shift consisted of security and an unaccompanied technician who had only been on the job for a week.

That poor bastard.

draughtcyclist@programming.dev · 11 months ago

This is interesting. What I’m hearing is they didn’t have proper anti-affinity rules I’m place, or backups for mission-critical equipment.

The data center did some dumb stuff, but that shouldn’t matter if you set up your application failover properly. Architecture and not testing failovers are the real issue here

Nighed@sffa.community · 11 months ago

Surprised a company of their scale and with such a reliance on stability isn’t running their own data centres. I guess they were trusting their failover process enough not to care

Dr. Dabbles@lemmy.world · 11 months ago

It was poor design. Poor design caused a 2 day outage. When you’ve got an H/A control plane designed, deployed in production, running services, and you ARE NOT actively using it for new services let alone porting old services to it, you’ve got piss poor management with no understanding of risk.

some pirate@lemmy.dbzer0.com · 11 months ago

Mr magoo it’s the CEO

Vanishing power feeds, UPS batteries, failover fails... Cloudflare explains that two-day outage

Vanishing power feeds, UPS batteries, failover fails... Cloudflare explains that two-day outage

Cloudflare issues postmortem report on two-day outage