Windows Azure outage: Leap year was to blame, Microsoft says

Windows Azure weathered a series of outages this week, and Microsoft has traced the whole mess back to the wonky 2012 calendar.

Windows Azure, Microsoft's cloud computing platform, weathered a series of outages this week.

Reuters

March 1, 2012

At around 9 in the evening on Tuesday, a series of outages roiled Windows Azure, a popular cloud computing platform run by Microsoft. According to the Register, the blackouts continued well into the next day, with some users reporting problems as recently as Wednesday evening. It was a "meltdown," to borrow the terminology of one popular tech blog. 

Now, Microsoft says it has sussed out the source of the problem – and it all has to do with the wonky 2012 calendar. 

"While final root cause analysis is in progress, this issue appears to be due to a time calculation that was incorrect for the leap year," Bill Laing, a Microsoft executive, wrote on the Azure blog yesterday afternoon. "Once we discovered the issue we immediately took steps to protect customer services that were already up and running, and began creating a fix for the issue." 

In Kentucky, the oldest Black independent library is still making history

Laing acknowledged that "some sub-regions and customers are still experiencing issues," but he said Microsoft was working to address the problem. As of yesterday, Azure service had been restored to the "majority" of customers, Laing added. 

Not that all users were easily comforted. IDG highlights today a series of complaints on the Azure forum, including this one, from an especially dyspeptic customer: "I can't imagine the damage this has done to companies with large scale customers. I mean we have chosen Windows Azure due to the redundancy... How can we explain this to our customers?"

As Charles Babcock of Information Week noted, the outages also served as further evidence of the occasional instability of the cloud.  

"This incident is a reminder that the best practices of cloud computing operations are still a work in progress, not an established science. And while prevention is better than cure, infrastructure-as-a-service operators may not know everything they need to about these large-scale environment," Babcock wrote. 

For more tech news, follow us on Twitter @venturenaut. And don’t forget to sign up for the weekly BizTech newsletter.