ORLANDO, Fla. — Software outages are no longer just technical problems. They can disrupt hospitals, airlines, delivery platforms, streaming services and other businesses that depend on connected systems.
That became clear after CrowdStrike’s 2024 software update failure, which affected airlines, hospitals and other organizations around the world. The incident was not caused by a cyberattack, but by a faulty update that triggered widespread outages and an estimated $5 billion in disruptions and damage.
For companies, the lesson was that even routine software changes can create major business risk when systems are deeply connected.
Karan Luniya, a senior software engineer at DoorDash, said many organizations are still underestimating how fragile their infrastructure can become when systems fail under pressure.
“The danger is in assuming your systems will behave well under stress, without knowing exactly how they’ll fail,” Luniya said.
Luniya said businesses need to think less about reacting after something breaks and more about designing systems that can recover cleanly.
“It’s very easy to think of resilience like insurance,” Luniya said. “But in architecture, it’s really about shaping system behavior so failure is something you can manage.”
Luniya’s work has included infrastructure projects at DoorDash and Conviva, where reliability and performance were tied directly to customer experience. At DoorDash, he worked on moving large amounts of delivery data out of a legacy system while avoiding downtime. At Conviva, he worked on infrastructure supporting real-time analytics for streaming platforms.
Those examples show why resilience is becoming a business issue, not just an engineering concern. If companies lose access to key systems, the consequences can include delayed orders, customer frustration or disruption to critical services.
Luniya said predictability is one of the most valuable parts of resilient infrastructure.
“That kind of determinism is really underrated,” Luniya said. “It gives teams the confidence to act under pressure. More importantly, it means customers don’t notice when something’s wrong.”
The challenge is not only technical complexity. It is also organizational clarity. Companies need to know who owns each part of a system, where failures may spread and how teams should respond when something breaks.
That kind of planning matters because data loss and system disruption remain common. Studies show that nearly 85% of organizations suffered multiple data-loss incidents last year, with many reporting significant business disruption as a result.
His rule of thumb is simple: if a company needs to page five people to figure out what broke, the system is not resilient. It is confusing.
As more businesses rely on software infrastructure they do not fully control, resilience is becoming more difficult and more important. Public cloud platforms, software-as-a-service tools and third-party APIs can create efficiency, but they also increase dependence on systems outside a company’s walls.
For Luniya, the companies best prepared for the next major outage will be the ones that treat failure as something to plan for, not something to explain afterward.
“Every system breaks eventually,” Luniya said. “The smart ones are built to break in ways you can control.”
Click here to download our free news, weather and smart TV apps. And click here to stream Channel 9 Eyewitness News live.
©2026 Cox Media Group








