- I agree with Tony (and probably every other seasoned data storage professional)...storage systems fail, that's why we have backup systems. These systems in turn are definitely only as good as their last successful test...if regularly testing is too much of a burden then you ought to at the very least audit the environment according to some baseline.
- Whilst the person in question at the State of Virginia may be a little ashen faced currently, I can assure you that the "service delivery manager" (as they were called in the hey day of outsourcing), at Virgin Blue for the reservation and ticketing system will be feeling the same churning in the lower part of the stomach...his contact at Navitaire probably likewise. Just because a cloud provider is 'big' or ' branded' or, (inserted alarm bells), 'multi-tenanted', does not mean for one second that they can do better/cheaper job of helping you meet your SLAs for service uptime and/or data recovery.
In the former instance remember that 'speeds and feeds' as Tony puts it indicate in my experience the 'bleeding edge' of what a product can reliably do...divide by 2 and set that as your peak load. The more complex the data storage layout on a disk array (fragmented RAID groups, meta-LUNs, concatenated LUNs, etc, etc), the longer your restore/rebuild will be. Remember that in the never ending race toward better storage performance, there is a necessary compromise around recoverability.
In the latter instance, just 'think' in terms of recovery, not 'did it get backed up'...build backup systems that focus on the process of data restoration (we talk only of data availability here, compute availability is a whole different story). It is far better to have a backup run for 8-10 hours, complete, validate and be easily restorable than a backup that runs in half that time but require multi-step, error prone recovery procedures.


Comments
Post has no comments.