The AWS outage reminds us that the fate of the Internet is in the hands of a few

The AWS outage reminds us that the fate of the Internet is in the hands of a few

At that time, we operate sites from personal and corporate servers, often located in our homes and offices. As the Internet grew, we built server racks, locations, and data centers. Over time, however, companies and services of all sizes have offloaded the sacrifices of servers to third parties, or as they are now called, cloud services.

The logic is sound. We live in houses, but we do not physically build our houses. The act of serving and developing sites is not essential to the service they provide. Well, it is as if without servers there would be no service. But the server runs through APIs, scripts, and other algorithms and programs developed by the company to deliver things like your Netflix stream, your Coinbase wallet account details, or the next potential Tinder user.

The ability of cloud services like Amazon Web Services (AWS) and Microsoft's Azure to, if you pay enough, quickly scale up (or scale back, as needed) makes it a smart business resolution for any IT company. any size. You never know, for example, when a small company will become a major company and when it must serve ten simultaneous users instead of five hundred.

This is the obvious benefit of cloud-based web services. The downside is what happened this week with AWS.

AWS failure

Huge chunks of AWS fell apart on Tuesday afternoon. The AWS Health Dashboard provides a good overview of the nearly 7-hour outage. In the background was not, at least according to Amazon, an attack, hack or denial of service (DDoS) attack. It was a couple of APIs that misbehaved in a massive service industry.

We all live in fear of an essential DDoS attack or an attack that will breach these systems (really any system we depend on) and bring them to their knees, but rarely does. When Cloudflare went down in XNUMX, it was initially accepted that it was an attack on their system. However, we quickly discovered that it was simply a bad software implementation, primarily human error.

Even with the AWS outage contained in what Amazon calls the "US-EAST-1 region," the impact has been significant and widespread. This has been felt on consumer-facing platforms like Disney+ and, naturally, Amazon.com and certain Alexa services.

When I posted the current one on Twitter, I appreciated how many people were practically banging their heads in the face and exclaiming, "Hence I was out!"

It occurred to me that many of these users had no idea that AWS was behind their preferred user and business systems. On top of this, absolutely no one has the exact number (apart from Amazon), plus recent reports claim that AWS serves millions of people. Microsoft's Azure also accounts for millions of users and most of the Fortune XNUMX companies. Google Cloud has big names like Verizon, NewsCorp, and Fb.

Does something have to change?

Widespread use of cloud services isn't a bad thing, though a lack of information can lead to confusion and fingerprinting, like the guy who couldn't move the commands on his system and got multiple crash messages blaming his systems on an outside distributor. like AWS).

The combination of the extensive reach of cloud systems and the general lack of real-time information and feedback for affected service customers is cause for concern. The magnitude of any blackout is surely worrying, especially considering that the next one is inescapable.

Gone are the days when someone's server rack crashed and the site crashed. We now have small outages in large cloud systems like AWS, Axure, and Cloudflare that are causing an outage tsunami.

Someone on Twitter asked, "What happened to scaling and load balancing?" "That's a good question. AWS is built on hundreds of separate cloud server clusters and offers tons of redundancy, scaling, and load balancing. And again, sometimes that's not enough. Complex systems can misbehave and are particularly fragile to software updates that can collide with outdated code Due to the fact that as powerful and distributed as all these cloud services are, including AWS, they are still programmed, run and maintained by fallible humans.

So how can we better educate the public and, more importantly, protect AWS, Azure, Cloudflare and others from these kinds of failures, which not only result in down sites and services, but rather in the loss of millions of people? American dollars?

Perhaps it is time to step back and examine the integrity and security of cloud systems, in exactly the same way that we monitor water systems. None of them are too big to fail, it seems, but all are too essential to be damaged, violated, or lost.