For many years, there's been a debate back-and-forth in our little corner of the storage world about which approach is better for storage arrays: high-end, or midrange?
High-end storage arrays have multiple controllers and large shared cache. Midrange storage arrays are usually dual-controller designs with much more modest processing and cache.
But one of the traditionaly arguments in favor for high-end storage ("it's more reliable") doesn't really apply anymore they way it used to. And I think that's worthy of a post.
Context
Storage availability is paramount in most use cases. When people can't get to their information for any reason, IT is usually having a very bad day.
As well as their users.
In the industry, we use the term "unplanned outage" to cover any situation where a server can't get to its storage. That could be a disk failure, a controller failure, an HBA failure, a botched microcode upgrade -- you name it, the net result is the same.
Somebody is not having a good day.
Historically, high-end storage arrays have had an architectural claim to potentially better availability, essentially more redundancy in cache, controllers, power supplies, etc.
But the game has changed, and -- in the process, a new bar has been set in the midrange storage marketplace.
The News Release
Recently, EMC announced that the new CLARiiON CX3 had achieved a demonstrated "five 9s" availability in the field.
"Five Nines" means that it's available 99.999% of the time, assuming 24x7 usage.
That's an average of 5.26 minutes of unplanned downtime per year. As a comparison, "Four Nines" (the current average in the marketplace) translates to 52 minutes of unplanned downtime per year.
I don't even want to think about Three Nines, Two Nines and so on. And neither do you.
Now, remember, we're not just talking about the array by itself here. We're talking about HBAs, SANs -- everything that sits between the server and the information. It covers a pretty wide swath of potential problems.
We had IDC take a look to make sure we weren't missing anything. And they wrote a nice note about the topic as well.
How We Did It
Back when we all were debating the merits of RAID 6, I wrote a piece that described EMC's approach to solving the real problem (storage availability) as opposed to introducing interesting bits of technology (e.g. RAID 6).
It's a very long story, but -- if you're interested -- it explains the thinking and the process.
There were some technology bits, like the UltraPoint back-end that avoids long-fibre-loop issues (that, by the way, can mysteriously look like disk failures, but aren't).
There's a whole bunch of array software enhancements, like making things bulletproof when configuring and upgrading the array (another source of potential problems).
There's a set of best-practices around how to dual-path, use MPIO effectively, set up spares intelligently, use EMC's remote support capability, and so on. Kind of obvious stuff, but it needed telling.
And all capped off by an aggressive campaign to help customers use the technology to achieve superior levels of availability.
It wasn't easy.
Well, the results are in -- and it seems to have worked well. High availability is no longer the sole province of high-end arrays. It's now available in midrange arrays as well.
And -- well -- I think that other midtier storage vendors have something to aspire to in the future.
Given the level of effort involved, I don't think we'll be seeing any of them make similar claims (even the RAID 6 guys) in the near future. Or they can try and convince customers that availability isn't important.
Good luck, guys!
Next up -- Six Nines!

Chuck, welcome to the midrange five nines club! EqualLogic customers really appreciate having the five nines gold standard of reliability. The thing I appreciate about this from EMC is that it does more to legitimize mid-range storage for HA and lights out computing.
Posted by: MarcFarley | May 18, 2007 at 11:45 AM
Hi Marc
I went looking on your web site for something (anything!) to back up your claim, or at least explain how Equalogic might be making measurements in support of your claim.
Alas, I found nothing that speaks to your claim. Maybe I wasn't looking in the right place?
My fear is that many mid-tier vendors will say "sure, me too" and not provide customers the best-practices papers to achieve these results, nor provide actual measured data, nor provide some sort of independent verification.
I hope that doesn't happen, Marc.
Cheers!
Posted by: Chuck Hollis | May 20, 2007 at 02:11 AM