Can replication replace backup?
Well, since you ask...
You the Expert We are asking Reg reader experts - and a chief technology officer - to have a look at interesting issues in storage. The first topic is:
Can replication replace backup?
Backup started as saving a copy of data to tape. For many users it now means saving a copy to disk, either as a straight file backup or as a virtual tape backup. In both instances data is being moved from one disk to another. That’s what replication does too, but without the complexities of backup software and restoration using backup software.
Does this mean you can dump your backup software and rely completely on the much simpler replication idea?
We asked four storage experts what they think about this - and you can read their musings below. But we are very interested in your thoughts too. We are keen to expand our line-up of Reg experts. Show us what you are made of!
Independent storage consultant
Initially, the obvious answer to this question is no; replication can’t replace backup; that’s not what the technology was designed for. Replication was developed to migrate data between storage arrays in the event of a physical failure in a storage device or with the site in which the storage device is located. The process of replication creates an exact replica of data at the source location. This is achieved either synchronously or asynchronously; effectively as soon as a change is made on the primary array is it permanently made on the secondary.
Traditional replication goes against the premise for which backups are taken - to recover to a point in time at which data was lost or corrupted. The second point (corruption) is probably the most important reason why traditional replication can’t replace backup - corrupted data would be replicated to the remote site before there had been any chance to identify it or to stop the replication that had occurred.
However, of course, replication is no longer simply transmission of changed blocks of data from storage device to another. Continuous Data Protection (CDP) logs changes in primary data and stores those changes for a finite period of time, enabling recovery to be made at a secondary site to any time in the past - assuming enough data has been retained. This is fundamentally different from traditional replication where the changes are usually simply overwritten immediately.
In summary, traditional replication will not replace backup - however, CDP-style replication may just have a chance.
Chris M Evans is a founding director of Langton Blue Ltd.. He has over 22 years' experience in IT, mostly as an independent consultant to large organisations. Chris' blogged musings on storage and virtualisation can be found at www.thestoragearchitect.com.
Product Specialist at Magirus UK
When looking at the topic of Backup and Replication or even Backup Vs Replication, there are a number of facets to both which need to be considered. Traditionally we see replication as something which would provide a remote copy of data for the purposes of either recovery of data or business continuity to facilitate fail over of services; the method and smarts required to do this are defined by both RPO and RTO requirements, coupled with the associated value of data or business impact of downtime.
The challenge with replication, as with most things IT is that there are many flavours of technology which will address different business requirements and levels of protection. Traditional A-Synchronous or Synchronous replication will maintain the latest point in time that has been replicated and will enable you to recover in the event of site failure typically; but there is nothing stopping data corruption being replicated.
There is another option in the form of CDP (Continuous Data Protection), which rather than Synchronously replicating data or using a copy on write or first write mechanism to replicate a point in time, will journal writes as they are committed to disk, time and date stamp them and continuously replicate many points in time to a remote site (Enter the likes of EMC Recoverpoint and Falconstor). CDP replication mitigates the risk of replicating corruption because you have very granular restore capability being able to recover to a specific point in time (even recover a corrupt volume from a remote site). This presents an argument for replacing backup...potentially.
However, not everyone can afford appliance based replication or the Storage array typically required to utilise this technology, not least the fact that this technology is typically capacity licensed (presenting challenges when requiring any level of retention, especially in legal, finance and government where retention and secure disposition of data is a critical requirement).
There are also other reasons why backup is still king in many areas. Backup vendors have come on leaps and bounds over the last few years with technologies like deduplication; the likes of Data Domain provide super efficient storage utilisation with their SISL engine and the likes of quantum aren't far behind with their DXI (arguably).
Also for multi-site implementations, backup seems like less of a complex monster to implement and manage than a complex replication topology (although still complex in many cases). The software is simply better geared to that scenario at the moment, especially when we look at the likes of EMC's Avamar for example, where we are deduplicating data at block level and hashing and checking every bit of data that we back up to ensure we don't have to send duplicate blocks over the LAN/WAN.
Not to forget a mention of those governing bodies in many sectors which still require that a physical tape is vaulted every month, coupled with the traditionalist backup administrators who simply know tape, love tape and want tape. Let's not also forget that some of these enterprise companies have invested large sums of money in their tape infrastructure (not to mention time in integration with their applications) and are going to see the kit through its life cycle to get that return on investment (unless they are presented with a compelling event to do otherwise).
So in summary: for the agile mid-size to large SMB who are not bound by archaic governance and are willing to invest in technology and methods which are less traditional and more cutting edge (which may have its risks), certain replication technologies may be a viable option; but for some of the more constrained and regulated enterprise organisations, I think it will be a while until we see things move that way and traditional backup will very much remain the last line of defence until there is a real shift in the market demonstrates replication as tried and tested when replacing backup.
One last point, to touch on a word which we're all getting drilled with.. let's not forget the cloud! Backing up to the cloud is a very compelling prospect when you don't have to invest in anything but agents and an internet connection to backup. In some cases the larger organisation is also looking to the cloud for secondary/tertiary backups and long term retention; a shift which companies like Commvault have been quick to pick up on with their cloud plug-in for example...
Evan Unrue is a Product Specialist at Magirus UK, the UK arm of the Stuttgart-based IT systems and services supplier.
The debate about Replication versus Backup suggests significant waste in storage. But the purpose behind copying data back and forth is not trivial and multiple copies are rarely pointless. Duplication can conversely be inefficient and overkill.
Backup can be used both for day-to-day recovery of files and for disaster recovery of entire systems.
Total copies of systems for backup and disaster recovery (BC/DR) purposes are intended for re-establishing entire systems as quickly as possible. High calibre BC/DR relies on synchronous or asynchronous replication between disk arrays. The term "replication" here refers to a continuous trickle of updates from source to target. Less ambitious designs may use tape or disk for point-in-time backup, but again the BC/DR backup should be used only for this purpose when a serious incident has happened.
With regard to day-to-day backup the ability to recover subsets and earlier versions of files is intended to be invoked for restoration at will or through automation. These backups are typically performed per application type, but should be kept separate from BC/DR. The backup applications can be highly sophisticated and employ advanced snapshot techniques or CDP (Continuous Data Protection) for instance. The capacity and performance can further be optimised through de-duplication.
Many IT sites still rely on magnetic tape, a key reason being that it enables off-site BC/DR especially for small and medium businesses. But even in a multi-site set-up customers are still comfortable with tape serving as portable and mechanically simple storage containers. Also, certain types of data and certain industries are governed by regulation dictating retention periods, which favour tape. Yet, there is no technical reason for not eliminating tape entirely.
Strictly speaking, a backup can be described as an amount of data or a point in time and whether it is created by backup software or some form of snapshot process is irrelevant. Replication can be described as the mechanism by which a remote backup target gets updated. On that basis replication can not replace backup as it is an essential complement to backup in creating a disaster recovery system.
Claus Egge is a storage analyst whose IT career started in IT Operations in his native Denmark where he managed storage environments. In 1991 Claus moved to the UK, becoming a storage analyst at Market Research company IDC, eventually managing the entire European storage market research practice. He became an independent analyst in 2010.
Chief Technology Officer, Shoden Data Systems
If you want to create a complete IT infrastructure you need both replication and backup because the two are not the same, and nor are they alternatives to each other. Replication is ideal for taking snapshots of data to feed to other applications and test environments, or to re-create an existing environment in the event of a serious failure and, as network bandwidth increases in speed and reduces in costs, replication becomes ever more attractive.
Some replication approaches operate within a single disk subsystem; others work across multiple subsystems, either synchronously or asynchronously. Data integrity is also of paramount importance here as it’s pointless creating copies you can’t trust. What replication won’t give you is any logical data management, the ability to handle retention and expiry policies, to access “last Thursday’s” version of a file (rather than the most current copy), or to build and manage a worthwhile archival policy.
This is what backup does; it allows you to protect your data (to a certain extent only unless coupled with a DR strategy) and to manage this protection.
For some enterprises, backup is their biggest single application. Whether at a physical or logical level, there needs to be certainty, integrity, and repeatability in your backup process. How long should you keep backup copies for, how to manage different versions of the same file, how to correlate any newly-imposed DR requirements with existing backup procedures, and how to meet audit and regulatory demands, all need careful thought when building your backup strategy, and cannot simply be achieved just by implementing some form of replication.
In summary, you really do need both replication and backup if you want to create a complete IT infrastructure; one without the other is incomplete.
Phil Jones is the Chief Technology Officer for Shoden Data Systems UK. He is responsible for devising the Company’s strategy and direction, the product roadmaps, and developing key alliances and partner relationships across EMEA.