CrowdStrike Update Results in Global Windows Outages
This past Friday, cybersecurity company CrowdStrike made global headlines when it released an update that caused many devices running on the Windows Operating System (OS) to crash and rendered them temporarily unusable. The update intended to upgrade the capabilities of the CrowdStrike Falcon sensor security system, but contained a minor bug that disrupted the update process. All Microsoft devices running the software that updated between July 19th at 4:09 UTC and July 19th at 5:27 UTC were susceptible to the bug.
The CrowdStrike security system is a fail-close type system: when a security anomaly is identified, the system prevents normal function of the entire device. This is generally good practice in terms of security, as it prevents bad actors from taking advantage of a device with compromised security. However, this practice renders devices useless until a fix is implemented, causing many users of devices running Windows OS to experience the dreaded blue screen of death.
Trigger warning: The infamous blue screen of death
The CrowdStrike update affected approximately 8.5 million devices worldwide, or less than 1% of all Windows devices. However, CrowdStrike is a company that primarily services large federal organizations, hospitals, and corporations such as Microsoft and Delta. As a result, a large proportion of the devices impacted by the outage belonged to organizations involved in critical functions such as hospitals, airlines, and banks. This caused significant delays in these services, with no initial timetable for return to normal function.
Here’s the kicker: CrowdStrike developed a solution to the problem within 79 minutes of pushing out the initial update, but most systems required a manual fix to resolve the issue. Hundreds of Microsoft and CrowdStrike engineers were mobilized to help provide service for the issue. While many critical services are back up and running, it is estimated that some organizations will take up to a month to fully recover pre-outage functionality. A fix that took 79 minutes to develop will have weeks, potentially months, worth of consequences.
This situation is a good reminder for us at Moberg Analytics that even minor mistakes can have large, widespread ripples. Functionality and cybersecurity of our solutions are critically important to us, so we will continue to implement thorough verification & validation processes throughout the life cycles of our solutions.