DavsDisorder

This blog captures some of the observations of Tim Davoren, Data Engines' founder and Managing Consultant. Do not expect an especially coherent delivery here!

engines in the data center

Tim Davoren - Thursday, January 27, 2011
Just came across this quote from EMC Asia Pacific President, Steve Leonard;

Leonard said the use of one company to approach market would help to win business over competition, and provide better servicing.

“What we’re trying to do is position Vblock as an architecture, and VCE as a company that can bring that with one support number, one architectural campaign, and we think that if we can do that faster, we think we can win,” he said.

“We want to be the guys who help put the engines in the data centres." (my emphasis)


So, do we Steve, so do we...oh, its in our company name!

Designing for Failure

Tim Davoren - Friday, April 30, 2010
I recently read a news article about Intermedia's service level agreement 'miss' that was linked to a performance issue on an EMC CLARiiON array.

http://searchstorage.techtarget.com/news/article/0,289142,sid5_gci1510721,00.html


There have also been a couple of subsequent posts and email responses linked to this story;

http://itknowledgeexchange.techtarget.com/storage-soup/one-storage-pros-response-to-intermedias-hosted-email-outage/

http://chucksblog.emc.com/chucks_blog/2010/04/helping-to-avoid-a-really-bad-day.html


I wanted to make a few commetns myself in regards to the story and the responses shown above.

Firstly, I agree with everything Chuck Hollis at EMC says in his post, and I wanted to emphasis and elaborate on his points.

Products Fail?
Damn right they do, all the time...sometimes without causing much of a fuss, but trust me failures don't seem that common because you only hear about the big ones (like Intermedia's). It is a testament to IT hardware vendor's engineering that alot of these "failures" go unnoticed because fo the rigorous redundancy build into their systems...not to mention field support services which, in the case of EMC, are some of the best around.

A short anecdote that relates to this story; an insurance client of ours suffered a similar failure on their IBM N-Series (NetApp) devices a few years back. A controller panicked due to a power supply issue and tried to hand over its load to the other controller but due to incorrect configuration of multi-pathing, dropped all the workloads that it was serving. Result; reboots, reboots, reboots. Missed SLA.

Design for Failure
It will happen...not if, but when. You will have a component failure somewhere in your data path at some point in the future. Design for it (or insure for it!).

CLARiiON arrays (like N-Series, HDS and many other array vendors) have controllers that operate in active/active configuration, which is great when both controllers are working, and 99.99% of the time it works fine when one fails (the beauty of PowerPath). But the disadvantage of running and active/active architecture in a disk array is that, unless you religiously monitor your workloads, you can never be sure if you can meet performance demands in a degredated state (this principle applies all down the data path, even to RAID Group design and LUN layout). My favourite disk array of the last 10 years is EqualLogic's PS Series, now owned by Dell. These fellas only operate in active/passive mode to ensure customers don't accidentally find themselves in Intermedia's situation where peak load cannot be accommodated in degradated mode.

The Alarm is Ringing but Everyone's Asleep
This is an interesting point...vendors and integrators like ourselves put effort into engineering and deploying monitoring and alerting for systems in client sites. That's great but if the client doesn't put in place procedural steps that are triggered into action by these tools, all is for nought. There is no point in having a tight RPO and the ability to deliver a quick RTO unless you have the procedural surety  to act when issues are identified. EMC's DialHome feature is a good example of removing this dependency but its simply not possible (nor do you want it to be possible) for all system or component failures. In short your recovery time is only as good the weakest trigger point and usually that trigger point is simply deciding to act on a error/mis-configuration event.

Practice Failure
Great tip here...I hear clients and prospects talk about their highly redundant environments and their sub-minute failover setups and ask have they tested it...usually the answer is no. Reminds me of people who love to talk about how much their house has gone up in value...inevitably when they actually want to sell they are a little disappointed. Proof is in the pudding. You must test your failure recovery procedures. VMware's SRM product is an excellent tool for doing this non-disruptively. Clients should regularly test failover of their Tier 1/2 applications to ensure that the 'best laid plans' are also the 'tried and true' method.


VMware buying some of the EMC Ionix goodies

Tim Davoren - Monday, March 01, 2010
Its been quite some time since I posted, but am just catching up on some IT industry news so thought I would pass on second the news of VMware purchasing a bunch of software assets from its 80% + owner EMC Corp. The products seem to be the core network management part of Ionix ans may indicate either a) EMC don't really see a future in fighting the DC managment fight or b) VMware may be a better vehicle to take part in such a fight. Interestingly though I also read subsequent to that story details about the revenue growth attributed to Virtualisation by iTX Group...not just VMware though...they also sell Vizioncore, IBM Tivoli and Computer Associates...maybe b) is the right answer...VMware may end up with a very rich portfolio of systems management and reporting tools in the medium term.

Search the Data Engines Site

Featured Content

Backup or Archive? An age old question - after almost 60 years of data storage and backup on electro-magnetic media, people are still confused as to what a "Backup" is and what an "Archive" is. See Tim's blog post explaining the difference. 

Do you "Splunk" ?? It's not a rude question, but it could lead you to some empowering insights into what's happening out there in your multi-vendor, multi-faceted IT infrastructure.

Data Engines have developed a set of field tested, vendor backed data-at-rest encryption solutions that can help organisations mitigate data security risks for removable storage media like tape. Ask us how to ensure your primary data storage or backup data is safely encrypted, but most importantly, how you can insure full recovery in the future.