Some Thoughts on Amazon S3's Recent Outage

July 21, 2008

@ 01:57 PM

Yesterday Amazon's S3 service had an outage that lasted about six hours. Unsurprisingly this has led to a bunch of wailing and gnashing of teeth from the very same pundits that were hyping the service a year ago. The first person to proclaim the sky is falling is Richard MacManus in his More Amazon S3 Downtime: How Much is Too Much? who writes

Today's big news is that Amazon's S3 online storage service has experienced significant downtime. Allen Stern, who hosts his blog's images on S3, reported that the downtime lasted ~~3.5~~ over 6 hours. Startups that use S3 for their storage, such as SmugMug, have also reported problems. Back in February this same thing happened. At the time RWW feature writer Alex Iskold defended Amazon, in a must-read analysis entitled Reaching for the Sky Through The Compute Clouds. But it does make us ask questions such as: why can't we get 99% uptime? Or: isn't this what an SLA is for?

Om Malik joins in on the fun with his post S3 Outage Highlights Fragility of Web Services which contains the following

Amazon’s S3 cloud storage service went offline this morning for an extended period of time — the second big outage at the service this year. In February, Amazon suffered a major outage that knocked many of its customers offline.

It was no different this time around. I first learned about today’s outage when avatars and photos (stored on S3) used by Twinkle, a Twitter-client for iPhone, vanished.
…
That said, the outage shows that cloud computing still has a long road ahead when it comes to reliability. NASDAQ, Activision, Business Objects and Hasbro are some of the large companies using Amazon’s S3 Web Services. But even as cloud computing starts to gain traction with companies like these and most of our business and communication activities are shifting online, web services are still fragile, in part because we are still using technologies built for a much less strenuous web.

Even though the pundits are trying to raise a stink, the people who should be most concerned about this are Amazon S3's customers. Counter to Richard MacManus's claim, not only is there a Service Level Agreement (SLA) for Amazon S3, it promises 99.9% uptime or you get a partial refund. 6 hours of downtime sounds like a lot until you realize that 99% uptime is 8 hours of downtime a month and over three and a half days of downtime a year. Amazon S3 is definitely doing a lot better than that.

The only question that matters is whether Amazon's customers can get better service elsewhere at the prices Amazon charges. If they can't, then this is an acceptable loss which is already covered by their SLA. 99.9% uptime still means over eight hours of downtime a year. And if they can, it will put competitive pressure on Amazon to do a better job of managing their network or lower their prices.

This is one place where market forces will rectify things or we will reach a healthy equilibrium. Network computing is inherently and no amount of outraged posts by pundits will ever change that. Amazon is doing a better job than most of its customers can do on their own for cheaper than they could ever do on their own. Let's not forget that in the rush to gloat about Amazon's down time.

Now Playing: 2Pac - Life Goes On

Categories: Web Development

« Software as a Service: When Your Busines... | Home | What You Can Learn from the Facebook Red... »

Dare Obasanjo's weblog

"You can buy cars but you can't buy respect in the hood" - Curtis Jackson

Navigation for Some Thoughts on Amazon S3's Recent Outage - Dare Obasanjo's weblog