• JakenVeina@lemm.ee
    link
    fedilink
    English
    arrow-up
    2
    ·
    11 months ago

    the overnight shift consisted of security and an unaccompanied technician who had only been on the job for a week.

    That poor bastard.

  • draughtcyclist@programming.dev
    link
    fedilink
    English
    arrow-up
    2
    ·
    11 months ago

    This is interesting. What I’m hearing is they didn’t have proper anti-affinity rules I’m place, or backups for mission-critical equipment.

    The data center did some dumb stuff, but that shouldn’t matter if you set up your application failover properly. Architecture and not testing failovers are the real issue here

  • Nighed@sffa.community
    link
    fedilink
    English
    arrow-up
    2
    ·
    11 months ago

    Surprised a company of their scale and with such a reliance on stability isn’t running their own data centres. I guess they were trusting their failover process enough not to care

  • Dr. Dabbles@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    11 months ago

    It was poor design. Poor design caused a 2 day outage. When you’ve got an H/A control plane designed, deployed in production, running services, and you ARE NOT actively using it for new services let alone porting old services to it, you’ve got piss poor management with no understanding of risk.