355
submitted 5 months ago by MicroWave@lemmy.world to c/news@lemmy.world

As the world recovers from the largest IT outage in history, it shows the danger of one point of failure in IT infrastructure

A global IT failure wreaked havoc on Friday, grounding flights and disrupting everything from hospitals to government agencies. Over all the chaos hung a question: how did a flawed update to Microsoft Windows software bring large swaths of society to a screeching halt?

The problem originated with an Austin, Texas-based cybersecurity firm called CrowdStrike, relied upon by most of the global technology industry, including Microsoft, for its Falcon program, which blocks the execution of malware and cyber-attacks. Falcon protects devices by securing access to a wide range of internal systems and automatically updating its defenses – a level of integration that means if Falcon falters, the computer is close behind. After CrowdStrike updated Falcon on Thursday night, Microsoft systems and Windows PCs were hit with a “blue screen of death” and rendered unusable as they were trapped in a recovery boot loop.

Microsoft is a juggernaut with significant market power, dominating cloud-computing infrastructure across Europe and the United States. So it wasn’t just computers that were affected, but servers and a host of other systems as well. Overwhelming requests from users, devices, services and businesses ushered in a cascading series of failures with Microsoft products – namely Azure Cloud and Microsoft 365. Failures plaguing Azure led to additional but separate disruptions with 365 services. A giant clusterfuck ensued.

you are viewing a single comment's thread
view the rest of the comments
[-] Ooops@feddit.org 103 points 5 months ago

No... the Crowdstrike debacle primarily shows the dangers of today's corporate culture in software development.

Ship as fast as possible, fix issues later if necessary...

[-] dugmeup@lemmy.world 58 points 5 months ago

Yup. Push to prod!

This is the Boeing debacle in software land. Kill the engineering and pay the executives. QA? Testing? Strict standards? People? Naaah, more conferences! More logos on F1 cars!

[-] mynameisigglepiggle@lemmy.world 1 points 5 months ago

I totally agree. But without knowing a bit more about the specifics, I can't help but think that just maybe... The updating mechanism could have perhaps just rolled back an update if it caused a bsod?

Seems like that infrastructure is really the biggest oversight and people would have been none the wiser.

Also surprised just how many things are running windows. I thought for sure the self checkout registers would have been some embedded Linux system.

[-] Rentlar@lemmy.ca 23 points 5 months ago* (last edited 5 months ago)

I disagree. You are correct that the cause of the fuck up is because of bad development practices. However, if every firm is being reckless with development, but only one out of a myriad of competing firms fucks up because of it, maybe you'd take one airline or hospital network offline or something like that.

It's only because of consolidation and market monopolization of the sector, that an outage at such a global scale was even possible to begin with.

[-] Wooki@lemmy.world 3 points 5 months ago

You’re partly right as is the article.

Centralization is dangerous for security, innovation and cost (monopolies, duopolies).

this post was submitted on 20 Jul 2024
355 points (97.8% liked)

News

23669 readers
4002 users here now

Welcome to the News community!

Rules:

1. Be civil


Attack the argument, not the person. No racism/sexism/bigotry. Good faith argumentation only. This includes accusing another user of being a bot or paid actor. Trolling is uncivil and is grounds for removal and/or a community ban. Do not respond to rule-breaking content; report it and move on.


2. All posts should contain a source (url) that is as reliable and unbiased as possible and must only contain one link.


Obvious right or left wing sources will be removed at the mods discretion. We have an actively updated blocklist, which you can see here: https://lemmy.world/post/2246130 if you feel like any website is missing, contact the mods. Supporting links can be added in comments or posted seperately but not to the post body.


3. No bots, spam or self-promotion.


Only approved bots, which follow the guidelines for bots set by the instance, are allowed.


4. Post titles should be the same as the article used as source.


Posts which titles don’t match the source won’t be removed, but the autoMod will notify you, and if your title misrepresents the original article, the post will be deleted. If the site changed their headline, the bot might still contact you, just ignore it, we won’t delete your post.


5. Only recent news is allowed.


Posts must be news from the most recent 30 days.


6. All posts must be news articles.


No opinion pieces, Listicles, editorials or celebrity gossip is allowed. All posts will be judged on a case-by-case basis.


7. No duplicate posts.


If a source you used was already posted by someone else, the autoMod will leave a message. Please remove your post if the autoMod is correct. If the post that matches your post is very old, we refer you to rule 5.


8. Misinformation is prohibited.


Misinformation / propaganda is strictly prohibited. Any comment or post containing or linking to misinformation will be removed. If you feel that your post has been removed in error, credible sources must be provided.


9. No link shorteners.


The auto mod will contact you if a link shortener is detected, please delete your post if they are right.


10. Don't copy entire article in your post body


For copyright reasons, you are not allowed to copy an entire article into your post body. This is an instance wide rule, that is strictly enforced in this community.

founded 2 years ago
MODERATORS