Amazon Cloud Back Online After Major Christmas Outage

Amazon Cloud Back Online After Major Christmas Outage


Amazon Web Services suffered an outage that spanned Christmas Eve and Christmas Day and affected streaming video service from Netflix. (Photo by BCP via Flickr.

Amazon Web Services says it has recovered from the latest major outage for cloud computing service, which affected large customers, including Netflix and Heroku. The problems with Amazon’s Elastic Load Balancing (ELB) service began on Christmas Eve at 1:45 p.m. Pacific time, and weren’t fully resolved until 9:41 a.m. on Christmas Day, an outage of about 20 hours.

The incident was the latest in a series of outages for Amazon’s US-East-1 region, the oldest and most crowded portion of its cloud computing infrastructure. The downtime raised new questions about Amazon’s management of the region, and the prospect that load balancing problems in a single zone can undermine the benefits of hosting assets in multiple regions – scenario that first showed up in an extended outage last summer.

This was the second AWS-related outage in six months for Netflix, one of Amazon’s most sophisticated customers, which noted on its Twitter feed that it was “terrible timing.” The streaming video service gradually restored service to different devices throughout the night, but it wasn’t until 9 a.m. Pacific on Christmas morning – more than 19 hours after the incident began – that Netflix reported full recovery:

Curiously, the streaming outage affected Netflix but not the Amazon Prime streaming service, an Amazon video offering that competes directly with Netflix. It wasn’t clear whether this was because Amazon’s streaming service doesn’t use AWS, or simply uses AWS infrastructure that was unaffected by the issues.

The ELB service is important because it is widely used to manage reliability, allowing customers to shift capacity between different availability zones, an important strategy in preserving uptime when a single data center experiences problems.

During a June 29 outage, Amazon said a bug in its Elastic Load Balancing system prevented customers from quickly shifting workloads to other availability zones. This had the affect of magnifying the impact of the outage, as customers that normally use more than one availability zone to improve their reliability (such as Netflix) were unable to shift capacity.

In a July 2 incident report from that event, Amazon outlined steps it would pursue to avoid a repeat of these issues: “As a result of these impacts and our learning from them, we are breaking ELB processing into multiple queues to improve overall throughput and to allow more rapid processing of time-sensitive actions such as traffic shifts. We are also going to immediately develop a backup DNS re-weighting that can very quickly shift all ELB traffic away from an impacted Availability Zone without contacting the control plane.”

It will be interesting to see whether Amazon’s load balancing problems were related to any of the issues identified in July, and what new solutions are devised to address them. We’ll likely see information on that front soon, as the Amazon team has been scrupulous about publishing details incident reports.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值