No, this post isn't really about this version of RAID or that version of RAID, it's just a suggestion that -- before too long -- we might use this acronym to refer to a very different (yet intriguingly related) concept.
To Begin With
If all this storage stuff is new to you, don't fret. The idea is simple -- use multiple disk drives that can appear as one: bigger, faster and more reliable. In the storage world, the advent of RAID fundamentally changed the industry in a substantial manner.
Interesting note: if you'd like to see an interview from the DG engineers who built the first commercial RAID array (the CLARiiON), check this out. All of them are still at EMC :-)
During the video, the thought comes out that the advance was driven by two things: growth in the power of CPUs, and new IP -- in this case, the foundational Berkeley paper on RAID concepts.
Someone had to take the two, and come up with a viable product, so the video is interesting in a "Soul Of A New Machine" kind of way.
Remember this as we dig into this next section.
New Fundamental Enabling Technology
I think many people saw VPLEX as fundamental new enabling technology -- the ability to pool storage (and anything that uses it) at increasing distances. At least, that's what I've been trying to communicate here and here :-)
I make the point several times that this technology has the potential to change how we think about data centers -- how many do we need, what they do, etc.
Put differently, VPLEX and its unique distributed cache coherence technology creates the potential -- over time -- for an entirely new interpretation of RAID.
A Redundant Array of Inexpensive Datacenters
Data Center Thinking Today
Most data center thinking today is predicated on size and scale. The thinking is that -- the bigger the data center, the more efficient it might be.
However, in talking to customers, there are some problems with that sort of approach.
First, we're talking massive capital projects here that consume enormous amounts of money and time. In essence, they are large, risky bets on future requirements.
Second, as scale increases, so do challenges. Land. Power. Security. Environmental impact. Bandwidth. Zoning. Politics. Data centers -- at scale -- create second-order challenges.
Third, you'll usually want at least two for recovery purposes. RAID 1, anyone? :-) Unfortuantely, the dominant model is that the second site rarely gets used at its full potential, with a preponderance of expensive resources sitting around waiting for a disaster that hopefully never happens.
Fourth, things change. Business models change, politics change, customer demands change, technology changes, and so on. If anything, change is accelerating -- making those big data center bets even more problematic.
Data Center Thinking Tomorrow?
Imagine that data center resources can be pooled as if they were in the same physical location -- the fact that they may be in different locations isn't a concern of yours. How would that change things?
I can make an argument that data center strategies would fundamentally change in many cases, with a strong preference towards smaller, more nimble data centers -- federated together -- rather than big, humongous ones with their associated challenges.
Rather than a 1 primary + 1 failover approach, we'd see N+1 clusters of much smaller data centers with enough redundancy to handle one, or potentially two, failures -- and recover gracefully.
No need to make massive bets on giant data centers. No external constraints on power, cooling, zoning, network, etc. Better use of all available resources. Faster. More reliable.
And -- ultimately -- more flexible.
Is This A Pipe Dream?
Yes ... and no.
Pat Gelsinger and Brian Gallagher were pretty clear about the VPLEX roadmap at EMC World. If you believe them and their ability to deliver (and I do), it'll be here as promised. And if you saw the teleportation demos, you know what's coming down the road.
Metropolitan bandwidth has come way down in many markets, making the required network connectivity more attractive than it once was -- and I'd expect that trend would continue.
Not to mention that Atmos has been doing geographically distributed parity protection for a while :-)
So, What Do You Think?
If the technology lives up to the promise, and bandwidth prices continue to fall, will we see a preference for redundant arrays of inexpensive datacenters? Or will we continue to build larger and larger "single site" approaches?
It's an interesting discussion, to be sure :-)

This really rather depends on the size of the business, although there is a certain about of truth in what you say about the current RAID-1 nature of many data centres today. But you have to also examine whether building smaller distributed data centres is actually any more economically viable than the building mega-data centres. Are their economies of scale involved? For example, I may need more staff to manage multiple data centres; more security guards, more cabling guys, more WAN infrastructure?
You talk about metropolitan band-width but many of us distribute our data centres on more than a metropolitan basis and long distance bandwidth is still costly, even for those of us who own our own fibre.
Now the solution for a lot of businesses may be just to move their infrastructure into the public cloud and let the likes of Amazon build the mega-data centres in various parts of the world; let them deal with redundancy problems?
Posted by: Martin G | May 15, 2010 at 12:10 PM
All good observations.
I've noticed that -- here in the USA -- there are many major cities that have an interesting combination of (a) cheap metro bandwidth and (b) a glut of mid-sized data centers looking for occupants (due to a variety of reasons), creating an interesting supply-and-demand situation.
Can't say whether or not this is widespread, or whether it will continue. And, as you say, long-distance is still dear, so that problem still remains.
Your point about midsized businesses is right -- they will benefit from convenient access to "big IT" (scale, process, functionality, etc.) in bite-sized chunks.
And even the likes of Amazon et. al. will have to deal with geographically dispersed scale :-)
-- Chuck
Posted by: Chuck Hollis | May 15, 2010 at 12:19 PM
Sorry, one more thought ...
If you listen to the video, note the part when Mark Lippett recalls how they were staring at a rack of 18" Artis disk drives, and how all the load would peg one of them.
Clearly, the answer wasn't "bigger disk drives" at the time, it was making multiple smaller ones work better than a single big one.
Same thinking here?
-- Chuck
Posted by: Chuck Hollis | May 15, 2010 at 12:29 PM
Hi Chuck,
It was a kick to see our video in your blog today. There are still a few old timers from Data General’s Disk Drive Development team at EMC so heres a couple of typos and a comment:
The stack of disk drives revealing the operating system’s actuator load imbalances was Argus. It was DG’s 14 inch drive. The success of Argus funded CLARiiON's birthplace.
Redundant Arrays of Inexpensive DataCenters brings to mind the old debate over the "I." Should it be Inexpensive or Independent?
In the same way Steve Todd can sell the electrical power of his roof mounted solar panels back to the grid, mid-sized regional data centers owned by Independent businesses could supply excess computing capacity, denominated in Virtual-Machine-units, to a VM market maker. This would result in Redundant Arrays of Independent DataCenters. The economics of large versus small scale becomes less of a factor, while the ability to federate remains an enabler.
Mark Lippitt
Posted by: Mark Lippitt | May 17, 2010 at 08:54 AM
Hi Mark -- thanks for the corrections.
And, upon reflection, there's merit in having data centers both "inexpensive" *and* "independent".
-- Chuck
Posted by: Chuck Hollis | May 17, 2010 at 10:44 AM
I've been working with metro/geo-clusters for over a decade now. The biggest stumbling block (after politics and conservatism) are the applications. Most applications are not designed for federated environments. E.g. a couple years ago a national retailer was running analyses against POS data on a central massive database rather than running various aggregations in stores, regional offices, etc. Banks and insurance companies are similar. Running processes against millions of accounts daily.
Sadly, software is still a long way from catching up with the hardware. Once that happens we'll still need to handle the politics and inbred conservatism.
Posted by: Joe Svankanski | May 19, 2010 at 01:00 AM
Chuck,
A little credit for coming up with this new definiton for you would have been nice....
Rick Parker
Posted by: Rick Parker | May 21, 2010 at 05:47 PM
Hi Rick
Sorry, I didn't mean to withhold credit, but I've now heard this from a few people, and can't remember who I heard it first from.
Anyway, the credit is yours if you want it ...
-- Chuck
Posted by: Chuck Hollis | May 21, 2010 at 08:45 PM