Cloud application failure happens (at least) as often as on-premise applications crash. So while minimizing downtime remains a key IT challenge, cloud computing disasters do require a slightly different approach. Increased flexibility does mean you have to give up a degree of control. When disaster does strike, there’s no physical data center you can visit to investigate the problem. But those who prepare accordingly find that cloud applications can perform remarkably well. Here are the top nine reasons the best cloud applications crash (and what you can do about them):
1. Human Error
The #1 cause for cloud downtime. Even with perfect applications, cloud environments are as good as the people who manage them. This means ongoing maintenance, tweaking, and updating must be worked into standard operational procedures. One bad maintenance script can (and does) bring down mission-critical applications.
2. Application Bugs
While the cloud does introduce a new level of complexity, application failure still trumps cloud provider downtime as a leading cause for downtime. More often than not, such failures are unrelated to the cloud infrastructure that runs your applications. Traditional IT practices still apply to the cloud – continuously develop, test, and deploy your application.
3. Cloud Provider Downtime
As our recent Q2 cloud service availability report showed, cloud failures are routine. Whether it’s an instance, an availability zone, or an entire region, applications should plan for these failures. This means routinely checking performance and spinning up new instances to replace terminated machines. Amazon Web Services (for example) lets you spread and load-balance an application across several availability zones (AZs), so that when one does fail, your application does not suffer.
4. Quality of Service
As far as consumers are concerned, streaming videos that freeze up mean your Cloud is not working. They don’t really care (or even know) that the application is technically-speaking still running. That means accommodating for network latency, fluctuating demand, and shifting customer requirements
5. Extreme Spikes in Customer Demand
This is actually a great example of public cloud superiority. If customer demand exceeds capacity, there’s not much you can do with on-premise IT infrastructure. In a public cloud environment, you can respond to fluctuations in customer demand by automatically scaling capacity during peaks, and then back down when demand levels off.
6. Security Breaches (hacking, DDoS attacks, etc.)
Security is often raised as a red flag when it comes to hosting critical applications in the public cloud. Much like on-premise environments, it’s up to you to comply with regulatory and security concerns. However, the cloud does make it easier to check off a list of security requirements since Cloud providers have addressed these concerns repeatedly with hundreds of enterprise customers.
7. 3rd Party Service Failures
While the whole is greater than the sum of its parts, all it takes to bring your cloud down is one faulty 3rd party app. It’s up to you to continuously monitor these applications as well, and have a contingency plan in place for a rainy day.
8. Storage Failures
In a recent disaster recovery survey, storage failure was listed as a top risk to system availability. The cloud still depends on physical storage, which routinely fails. Much like overall service availability and quality, storage issues can lead to serious performance issues. This means planning for these failures by setting up dedicated cloud storage applications that maintain data resiliency and meet data retrieval requirements.
9. Lack of Cloud Disaster Recovery (DR) Procedures
Although disaster recovery has been common practice for decades in physical data centers, cloud DR has only recently come under scrutiny. Few realize that it’s the customers who are solely responsible for application availability. Cloud providers can help you develop failover and recovery procedures, but it’s up to you to integrate them into your application.
This list was originally published on eWeek as a slide show titled Nine Common Reasons Cloud Systems Crash.
How to automate single-click cloud disaster recovery on AWS, Google Cloud Platform or Azure – Read More >