In the cloud, everything is perfect until all it goes wrong

Thursday, 2 March, 2017 - 12:07

Clouds are everywhere: public, private, hybrid… and with prices dropping almost daily they are becoming more affordable to all sizes of businesses. When everything is working fine, life is good – who cares where their data is stored, assuming it is encrypted, secure, accessible, backed up and most importantly recoverable. Ah, that annoying word recoverable – even in the cloud, data recoverability is critical.

Traditionally backup has been slow, painful, expensive and often unreliable. How much has your company spent on backups over the past 2, 5, 10, 15 years? How confident are you that you can restore all the data from your last backup? 

Would you describe backup as simple, trivial and a ‘no-brainer’, which just works with no need to manage anything about the process – ever again?

Or, are you like the vast majority of companies, struggling with backup complexity and reliability of restores. What would happen if a catastrophic event occurred right now? How long could you afford to have your systems down? How much data are you prepared to lose? 

Who is happy to have a complex process that few have mastered, and let another company (a cloud provider) take full responsibility for the protection and recoverability of their data?

Cloud crash – your problem!

What happens if a private or public cloud crashes?  Are your customers concerned that their data is lost forever? What will happen to their business?

What has your organisation done to protect its most important asset – data? If and when the big crash comes – employees, shareholders and banks are unlikely to say, “Oh well, XYZ cloud provider said they backed up, but due to circumstances beyond their control all our data is unrecoverable…”.

Moving infrastructure to the cloud delivers substantial benefits for most; however, an organisation still needs to take responsibility for recoverability. Speaking to many customers and resellers, one of the concerns they have about the cloud is “What if the worst DOES happen and cloud crashes?”  How diligent have they been in testing recoverability and the ability to restore operations quickly and efficiently?

Migrating real-time

Before worrying about cloud protection and recovery, consider how to get company data into the cloud? Traditional physical and virtual migration products are expensive, painful and highly disruptive for both physical and virtual servers.

Enter the new breed of real-time recovery solutions that deliver near zero impact migrations. These protect data in real-time every 15 minutes, even for complex databases. In most businesses, only a relatively small volume of data changes at the sector level (the smallest unit of measure on disk) every 15 minutes.

These small real-time incremental sector-based backups are replicated to the remote site, the cloud provider or data centre. Often the first or base backup is sent via ‘sneaker-net’ and USB / NAS devices, with the incrementals being replicated in real time.

Once the cloud site has caught up with production virtual and/or physical servers, a business can roll over to the cloud. At say 7pm, force everyone to log off. Create a last incremental, replicate it to the cloud and finalise the job at the cloud site. Then when users log on again, they are running from the cloud with data from a few minutes ago.

From a technical perspective, there are a few more steps. Equally compelling is the rollback process. If something unexpected happens, simply turn on the production servers at the customer’s site and everything works.

Recovery options

Every customer looks at its own specific requirements for data protection and recovery from the cloud. In general, the same three principles apply in the cloud or on-premise.

First the RTO (recovery time objective) – what is the acceptable downtime for your business? Critical applications may have a shorter RTO than non-critical applications.

Secondly the RPO (recovery point objective) - how far back in time do you need to go in order to perform a ‘clean’ data restore. This might be the point of the last backup, depending on whether it has worked correctly and if indeed, it is recoverable.

The third principle is equally critical, but often neglected - the TRO (test recovery objective) which should be the point in time to which a business is completely confident of restoring data. A test monthly or quarterly test will run the risk of substantial data loss. The latest real-time recovery solutions allow non-intrusive, automated daily recovery tests to help ensure data recoverability from all backups, reducing risk substantially.

Location, Location, Location

Most businesses are advised to store their data in at least three locations with one being geographically remote, and where possible perform regular recovery tests across all servers and sites to help maximise chances of recoverability. If the process is automated, this should have zero or minimal impact on support staff.

In the cloud, this should become a simple process. Perform a local backup (within your cloud provider), test the recoverability by automatically running Microsoft Checkdisk (helps to ensure data quality) on the backup volume regularly. Next replicate these backups to a different cloud provider, where the automated recovery testing tool re-tests recoverability; then for critical data / databases replicate into a cloud that delivers real-time disaster recovery of critical servers with the ability to virtualise critical servers in minutes and have them restored to a point no longer than 15 minutes ago.

We are starting to see more and more companies looking at replicating from the cloud back to their [old] production site, so this site becomes one of their disaster recovery locations. After all, they have the infrastructure already in place.

Finally, it cannot be stressed enough, especially with the cloud: test, test and then test again to ensure that all data, databases and applications are recoverable quickly and reliably.  If a current backup product does not give daily testing, find one that does. Protection, with the ability to recover data in the cloud, is your responsibility.