Disaster Recovery Without Boundaries Part 2

Posted on

by James Keating III, Business Technology Architect, Evolving Solutions

In my last blog post, I covered a scenario to allow for offsite backups with a simple disaster recovery mechanism in the cloud.  This post I will go into a more complicated scenario where both return to operation (RTO) and recover point objective (RPO) are critical. Before I get into this scenario I will set the stage again on what aspects of cloud computing we will be using to help disaster recovery and also more clearly define RPO and RTO.

 

The attributes of cloud that can be utilized for disaster recovery are as follows:

  • On demand computing
  • Containerized workload
  • Location agnostic
  • Speed of implementation

Each one of these attributes can lend itself nicely to improving disaster recovery abilities in terms of cost (both capital and labor), speed to recovery, ability to automate.

Definitions:

RTO = Return to Operation, or the total time from the time a disaster is declared to when the system/s are operational again.  Key to this concept is, it is not the entire time a system is down in a disaster, but the time from when a disaster is declared (could be hours after the incident started) until the system is back up and running.

RPO = Recovery Point Objective, or the time difference from when the system went offline and the timing of the restored data.  So boiled down this is really the amount of data one is willing to lose.  So an RPO of 30 minutes would say when we get back going again the data will be consistent and intact for a time point of 30 minutes or less before the system went offline, or a loss of 30 minutes of data.

Now onto our scenario for this blog post.

Goals:

  • Meet an RTO of 8 hours and an RPO of 60 minutes
  • Meeting of the above RTO and RPO with 100% of the compute and performance of production
  • All applications for this scenario will be ones that reside on a VMware virtual environment in production
  • Ability to have all of the above without investment in a second data center location infrastructure and management.

Items we will need for this scenario:

  • Containerization of workload – As stated in the goals, we will be using our VMware environment as the base for our systems and applications that are part of this scenario.
  • Backup software – For this scenario we will choose to use the current backup software in use in our environment.  This can be a mix of backup software as this portion of the setup is for protection from corruption and a method to still meet the RTO if we have a corruption event, noting that if we do have a corruption event our RPO of 60 minutes is likely not going to happen, more on this later.  So for this example we will say we are using both IBM TSM and Symantec NetBackup.
  • Replication Agent – For this particular scenario we can also use several methods, from VMware replication to other dedicated software.  To allow for our RTO and RPO we will choose a software that can be both a replication agent and failover automation mechanism, Zerto.
  • Cloud Storage Gateway – for this we will choose a NetApp system that will be used as a backup storage target for both our IBM TSM and Symantec NetBackup, which will then mirror our backup data to our cloud provider.
  • Cloud Provider – Since the goal is 100% of productions horsepower in terms of performance and we need to meet our RTO and RPO, we will need a cloud provider that allows us a much more customizable setup that we would get from the common cloud names one would hear such as AWS or Azure.  For this, since we are also assuming a heterogeneous environment in terms of both Linux and Windows inside our VMware, we will choose PEAK.

Since this is a much more detailed scenario than the previous blog post, I will go into some details and notes below, but this is by no means a complete listing of considerations and requirements as the nature of the applications and backup software will play a huge factor in how this would get rolled out in a real world environment.  If you want to know more you can contact Evolving Solutions to go over your particular scenario.

Notes and items about our scenario and choices:

  • The ability to use software to replicate data from our on premises data center to our cloud data center is reliant upon network considerations and also ease of use.  Zerto was chosen precisely because it is easy to configure and maintain, but it also has automation features that allow for reduced exposure to human interaction risks. Further it is optimized for asynchronous replication of VMware systems.  This will help us in terms of meeting both the RTO and RPO goals.
  • We chose PEAK because we are using VMware.  We need to have a VMware configuration on the cloud side that can be treated just like our on premises VMware.  Peak can provide that to us  in terms of a dedicated private vCenter that we can admin in the same way we do our local VMware.  This is not something you can get from all cloud providers.  Secondly PEAK is native NetApp so this helps us in terms of getting our backup sets to the cloud provider efficiently using a common set of commands.  Third, PEAK offers us a 100% uptime SLA at the hypervisor level, so we can be sure our VMware in the cloud will be available at the time of a disaster.
  • Backup software and backup data, while not the first source of data for us, is still required.  This is because while we are replicating data from our primary to our cloud environment, we run the risk of corruption.  If our data gets corrupted on the primary side,  it will corrupted on the secondary side within seconds to minutes as the replication happens.  The backup copies are an insurance policy against this possibility.  Since we are using backup software we will need to have that software available on both sides and have a method of shared backup sets between the sites.  This is our NetApp storage that both IBM and Symantec will write to as disk based backup locations.  Then if we need to we can bring up the backup applications on the cloud side and read from the cloud storage and begin restores at disk speed.
  • NetApp as the backup target.  This was chosen as it fits the PEAK cloud model nicely and allows for ease of administration in terms of having backup software write to a CIFS or NFS share and have those shares both snapshotted and also replicated to a NetApp system sitting at our PEAK cloud location.  This will make administration of the backup storage identical on both sides of the equation.

So, with this setup we essentially have built a secondary cloud site that looks as similar as possible to our primary site.  We have an equal amount of compute power on both sides.  We have backup software and application licensing on both sides to allow us to run at either side.  We have replication in two flavors running to protect our data from corruption and/or loss.  We will likely have more storage investment in terms of capacity at the cloud location depending on how long we wish to have offsite backups retained as the cloud side will also be utilized as offsite backup storage.  So you can see this scenario is very much a redundant configuration that will have costs in line with that.  What cost savings we will have will be related to facilities in terms of not having to maintain cooling, power and building facilities for our cloud location.

Again this scenario is much more complex than the one described in my first blog post.   I also realize the above details may be too much to absorb while reading a blog post, so if you find yourself with questions do not hesitate to contact Evolving Solutions to go over how your situation would fit into the above model and what real world obstacle and limitations might exist.

________________

James Keating III is a Business Technology Architect for Evolving Solutions. James is a technology leader in the IT community and brings a combination of excellent technical skills and business acumen which allows him to guide customers in developing IT solutions to meet business requirements.

Disaster Recovery Without Boundaries Part 1

Posted on

by James Keating III, Business Technology Architect, Evolving Solutions

Last week I was the speaker for an event on disaster recovery in the age of cloud infrastructure delivery.  At the event I went over some high level concepts and a few sample use cases, however I didn’t go into details on what one would need to be successful in implementation of a cloud delivered disaster recovery or “Cloud Cluster”.  I will now provide over the next two blog posts a bill of materials required to setup a cloud cluster.

Before I get into the first scenario, I will set the base by going over what cloud technology attributes can be looked at as key items in terms of enabling disaster recovery.  This is one of the few times in IT, that one can potentially increase the overall ability of the business and lower IT costs in terms of existing disaster recovery strategies.  The attributes of cloud that can be utilized for disaster recovery are as follows:

  • On demand computing
  • Containerized workload
  • Location agnostic
  • Speed of implementation

Each one of these attributes can lend itself nicely to improving disaster recovery abilities in terms of cost (both capital and labor), speed to recovery, ability to automate.  The first scenario I will cover is the simple scenario of offsite backups with a side of disaster recovery.

The goals of the scenario are:

  • Reduce complexity and manual labor involved in backups, including tape management and offsite shipment of tapes.
  • Increase the speed at which backup data sets are fully offsite from primary data
  • Ability to utilize offsite data for disaster recovery in terms of both times of a disaster, but also for testing of failover for disasters.
  • Ability to have all of the above without investment in a second data center location infrastructure and management.

Items we will need for this scenario:

  • Containerization of workload – for this we will choose VMware.  This is the default standard in server virtualization and can be used in our case to enable the ability to encapsulate the data and workload in manageable chunks that will help us in terms of backup and disaster recovery.
  • Backup software – for this we will choose VEEAM.  VEEAM is a backup product that has a few features that enable our journey.  The first is it is 100% VMware compatible and aware so it will work efficiently and effectively to back up the data contained in our VMware environment.  Second it is optimized for disk based backups, this will allow us to remove dependence upon tape systems.
  • Cloud Gateway – for this we will choose the Amazon Storage Gateway.  This is a piece of software that sits in a VMware environment and allows for storage that is moved to it to be replicated securely to the cloud.  This will work as our repository for our VEEAM backups and the method of transport to the cloud for the data.
  • Cloud Provider – for this we will choose Amazon Web Services and primarily the Amazon S3 storage offering.  This will be the location that our backup data will move to and be safely stored in the cloud.  We chose this because of the gateway being integrated into Amazon and because we can use the VEEAM data to create Amazon EC2 instances when we need compute resources.  The advantage of this model is we only pay for the S3 storage the majority of the time and can use “on demand” EC2 compute instances when we do tests or during a full disaster.  This allows us to only pay for compute when we need it which can drastically reduce overall infrastructure costs, since in a DR scenario, we only need compute at times of tests or disaster.

So the overall thrust of this model is we will back up our VMware environment and have it migrate to the cloud.  We pay for the storage portion of the cloud but only during testing and disaster do we pay for computing instances.  This allows us to gain the following items:

  • Reduced cost of both labor and infrastructure as we have no secondary computing to pay for unless we are using it.  We also have no tape management to deal with in terms of labor.  Further our backups will move once taken immediately offsite and there is no need for people, shipping containers, trucks and vaulting process, it moves automatically offsite as part of the backup process.
  • Ability to remove complexity from the backup process.  People are not touching tapes, making sure new tapes are added or managing sending tapes offsite.
  • Speed of restore is improved.  Since we are taking backups to local disk and sending that offsite to cloud disk, the majority of our operational restores will happen from this local disk, so restores will be at disk speed, no waiting for tapes to load, come back from offsite vaults etc.

For more technical specifications and requirements, contact us.  Evolving Solutions has tested this scenario in depth and knows the limitations and requirements.

In a future blog post, I will go into a second scenario where the key goals are the speed of return to operation (RTO) and minimization of data loss (RPO).

_________________

James Keating III is a Business Technology Architect for Evolving Solutions. James is a technology leader in the IT community and brings a combination of excellent technical skills and business acumen which allows him to guide customers in developing IT solutions to meet business requirements.

Data Recovery and Lost Art

Posted on

CIO ran a story in April about an art data recovery project from an Amiga 1000 computer’s floppy disk. The art was by none other than famed artist Andy Warhol. Warhol was hired to promote the graphical capabilities of the Amiga 1000 and his work on the machine saved to the floppy disks in 1985. The Andy Warhol Museum was preserving the floppy disks but had no way to extract the data they held.

According to The Andy Warhol Museum, via YouTube another artist came across a video of Warhol promoting the Amiga 1000. In 2011, he dug deeper into this work and started the connecting of the parties to recover these works from the disks. This included many staffers from the Warhol Museum itself as well as the Carnegie Mellon University’s Computer Club, “a student organization known for their comprehensive collection of obsolete computer hardware.”

Check out these images:

computer
Source: warhol.org

Campells
Source: warhol.org

Do you have a unique data recovering story? Share it.