A great debate is shaping up in our industry -- one that everyone should take note of.
I saw with much interest Forrester's recent white paper "Do You Really Need A SAN Anymore?", with Andrew Reichman as the lead author. It attempts to make a case that SANs (really shared storage) is dead concept, and needs to be replaced with something better.
To the casual reader, it appears well written and well reasoned. To people hip-deep in this stuff (like me) I think it's dangerously wrong. And I have a sneaking suspicion that we'll see copies of this being dropped off on desktops around the world before long.
I'll leave it up to you as to whether or not it's worth the $279 or not.
My suggestion?
Bookmark this longish post -- because, at some point in the future, you may find yourself defending your storage strategy to someone who's read this paper.
Before We Get Started
You should know that typically I find Forrester's work quite good on many topics. Frankly, this piece of work isn't up to the same standards as some of their other work, IMHO.
I of course have a certain bias and a vested interest as an employee of EMC. But, putting that aside, there's some thinking here that's just dead wrong and can get you in trouble.
Finally, there's a nucleus of a fascinating discussion here, but the authors shoot too low, and don't get to the heart of the matter. I'll try to aim a bit higher towards the end of this post.
The White Paper Part I -- The Promise
The white paper starts with Forrester's view of the four general benefits ascribed to SANs
1 -- Lower Hardware Cost through Improved Utilization
The authors state that one of the marketed benefits of SANs (or shared storage) is saving money. They're right, to a certain degree.
Back when EMC started selling shared storage devices in the mid 1990s, storage was well over $1 per MB. If you could share it, you'd save money. Back then, most of the focus was on sharing individual drives between applications, something that isn't usually done these days.
Speaking as a vendor, though, there were other parts of the "shared" pitch:
- the ability to manage a pool and move storage around if needed.
- the benefits unallocated pool for the inevitable rush jobs that came up.
- the ability to upgrade storage technology independently of upgrading the server.
2 -- Less Complexity Through Consistent Storage Management
The authors do have a valid point, but it's an obsolete one.
Back then, we had a situation where every server, every OS and every database was attempting to manage their own storage in a DAS world. There was a certain logic to using a centralized array to do things consistently, e.g. provision, protect, performance, etc. -- more of device management rather than the broader topic of storage and information management.
As things got more complex, though, it got harder and harder for vendors to deliver storage management frameworks that kept things under control -- a situation that persists to this day.
3 -- Better Performance Through Increased Spindle Count
Whoops!
I don't know where they got this one -- I've never really heard this used with customers, simply because you could usually get a decent number of spindles behind a server. Theoretically, yes, you could get more spindles behind an app using an external array, but -- in reality -- this wasn't a big deal.
More to the point, we did find that nonvolatile cache, intelligent scheduling algorithms, flexible RAID choices and multipathing I/O did a lot to help performance, but not so much the "more spindles" argument. Veritas Volume Manager (or whatever volume manager was at hand) always did a good job striping with internal storage, or a direct-attached storage enclosure.
At least, that's true for EMC. Can't speak for other vendors, though.
4 -- Improved Data Protection Through Array-Based Replication And Backup
Yep, that's always been part of the pitch.
Even in a world with server-based replication, database-based replication, etc. -- there's always been a strong preference to put the heavy replication -- whether local or remote -- in the array itself.
Usually, that's for three reasons: (1) you can use the horsepower of the array to move data, rather than the application server, (2) there'd be a consistent way of copying data across diverse applications, servers, operating systems, etc. and (3) array-based replication is usually more mature and feature-rich than other flavors.
So far, so good.
The White Paper Part II -- The Indictment
The authors then make four arguments as to why SANs have failed to live up to their promise.
1 -- Low Aggregate Utilization Means High Cost Of Infrastructure
The authors toss out a 20% to 40% "typical" utilization number, comparing "amount purchased" to "amount written".
A couple of thoughts spring to mind.
I don't know if the authors are familiar with all the layers of overhead between raw capacity and what user applications see. I would point out that these overheads are pretty much the same regardless of whether we're talking about internal storage, direct-attached storage, or SANs.
For example, a RAID 1 overhead is a RAID 1 overhead, period. Go see Chris Evan's interesting charts here.
What really bothers me, though, is the inherent logical flaw: it speaks more to how the technology is managed and used, rather than the technology itself. It's like saying "email is worthless because so much junk comes into my inbox".
Here's what we know: many EMC customers run north of 80% utilization.
I'd venture to say that -- because these customers manage storage well, they'd probably do mostly the same if it was internal storage, or DAS, or whatever. I'd also go as far to say that they'd probably appreciate the ability of a SAN to move capacity around as needed as supporting those high utilization rates.
Not only is this a flaw in their logic, it demonstrates a certain lack of understanding about how people actually use storage in the real world.
2 -- Limited Workload Sharing Creates Application Islands
The authors point to a reluctance of many IT shops to share multiple applications on the same array (not SAN!).
And they also point to the project-oriented funding model of many organizations where every project gets its own infrastructure.
They then go a bit farther, and state that array vendors are too restrictive in allowing users to mix workloads, and that quality-of-service tools are too clunky to be of any practical value.
This line of reasoning appears pretty shaky to me.
While it is true that some IT shops have a per-project mentality for acquiring storage arrays, that argument has absolutely nothing to do with the underlying merits of the technology. And, I must say, the authors are confusing "shared storage arrays" with "SANs".
One could have a SAN with individual per-project or per-application arrays, if one desired. Or one could consolidate multiple workloads into a single array, and not have a SAN at all. Or -- more frequently -- a combination of both.
But let's focus in on the "can't share workloads" claim for the moment.
At EMC we *expect* people to mix workloads on our arrays (and our SANs!) -- that's what they were *designed* to do. I mean, think about it, not too many people are going to fork over for a big honkin' DMX or CX and then only put a single application on it. Now, for certain ginormous applications, people will buy one or more arrays and dedicate them to a single application, but we're talking some pretty specialized environments here.
I think we all need to make a clear distinction between *being able* to easily share workloads, and *choosing* to share workloads. And, once again, this is a discussion about "shared storage arrays", rather than SANs specifically.
Now, I am aware that other storage vendors offer different classes of technologies, and -- in some cases -- it might make sense to limit a single application to an array.
But as an indictment of an entire class of technology? Well, it's just way off base. It just doesn't map to the real world -- at least, the world I live in.
Besides confusing a few concepts, the real problem here is more political and organizational, rather than technological, I'd offer.
3 -- Vendor Heterogeneity Limits Compatibility
OK, score one for them. They're right. It's not relevant to this particular discussion, though.
Although all of us storage vendors can live well on the same SAN (with a bit of careful forethought), we all manage very differently. While we vendors can do a half-decent job of upper-level reporting tools (topologies, inventories, utilization, etc.) in a heterogeneous environment, the underlying devices are configured and provisioned very differently.
Making matters more problematic is that all of us storage vendors have unique features that are unlike the others -- making 100% consistent management virtually impossible.
However, I see this is an indictment on "mixing and matching multiple array products in the same shared environment", if you think about it.
As a matter of fact, this is also true when you mix and match different servers and operating systems, different database management systems, different networking gear -- I think it's the nature of infrastructure, folks.
Mix in a bunch of stuff from different vendors, and you can usually make the same sort of statements.
Storage infrastructure isn't that much different, when you think about it. Hardly a worthy indictment of the entire SAN category, in my opinion.
And, when you get to their proposed solution, this problem just moves to a different domain -- it doesn't go away.
4 -- Block Storage Devices Have Very Little Context About Information
Score one for them again -- sort of. The real problem is that nothing has good context about information.
This argument is brought up in the context of storage tiering and ILM, and the ability to match storage service levels with application requirements.
And, for most use cases, they're right. I mean, EMC does have some dynamic optimization capabilities that aredriven entirely off of data usage patterns (Symmetrix Optimizer and Rainfinity's file archiving capabilities come to mind), but it's not a generic capability, and only covers certain use cases.
Later on, the authors will argue that applications inherently have better context, but I will argue that -- when you really look at it closely -- they are subject to many of the same limitations as storage arrays, and are perhaps worse off in some regards.
The White Paper Part III -- The Proposed Alternative
The new approach they propose really isn't all that new -- it's commodity storage managed by the application.
This line of thinking is being driven in the industry by -- surprise -- the application vendors, and to a certain extent, some of the server vendors.
So we know where the battle lines are -- but how do the arguments stack up?
First, if you have a copy of their report, take a look at the "quick comparison chart" on page 5. This where consultants become downright dangerous, in my humble opinion. It is a scary oversimplification with all sorts of sharp edges.
If you can't see the chart, it's organized left-to-right, comparing internal and/or DAS storage, networked storage and application-oriented storage. And of course, the message is "bad, bad, good". Scary stuff.
Some of the arguments they present are worthwhile of discussion, though.
1 -- It will make it easier it to design, manage and deliver service levels.
Really? How might that be?
The arguments behind this is the claim that arrays "struggle to prioritize important data" and "a block is a block is a block". Although not accurate, it's really an argument against shared storage arrays, and not SANs specifically, if you think about it.
More insidious is the claim that the application owner knows best what's important in delivering service levels, and is much better positioned to manage the end-to-end experience than, say, a storage admin.
I don't know about you, but most application people I've met are not particularly adept when it comes to storage performance characteristics and their relationship to application performance. Storage admins know, for example, that you don't put transaction logs and data stores on the same spindles.
I think that they're proposing an entirely new class of application administrators that have advanced knowledge of storage design.
But there's more to the discussion that's worthy of exposing.
By going to internal storage (or something similar), you lose all of the array-specific service level delivery features. You lose nonvolatile cache, you lose intelligent scheduling algorithms, you lose multipathing I/O, you lose access to the ability to upgrade your storage technology to the latest-greatest without touching your server environment.
Now, one could argue that (a) none of these things are needed, or (b) the proposed internal storage model has reasonable alternatives, but you'll have to admit there's a lot you're taking away with their proposed approach.
I should point out that any service level management magic that an application might offer will work equally well for internal storage as well as SAN. As a matter of fact, I could create a case that it would work better in a shared storage environment than just using internal storage.
But, if you really think about it, this isn't an indictment of SANs and shared storage; the author is just sharing an opinion on who's best positioned to control and manage service levels for an application -- and that discussion is somewhat independent of the underlying technology, isn't it?
2 -- Applications are more likely to have success with tiering than with storage systems.
There is a bit of merit to this discussion, as we're just starting to see the very first attempts by application vendors do a bit of smart tiering with the data they own. We've started to hear that application vendors consider this sort of thing of growing importance, which is good.
Because any help is good help, right?
But, let's do a bit of deconstruction here, shall we?
First, any storage device can be used to expose the different service levels that an application might want -- internal storage, DAS, SAN or a shared storage array. I'd argue that you'd get a far broader set of choices with external networked storage, though.
Second, this is all a relatively new discussion by the application vendors. Like everything else in this industry, we'll have to see it in practice to evaluate whether it's useful or not.
But the real issue is that people use applications in unpredictable ways. Do I really need tiering in my test and dev environment? Does my application understand that it's the end of the quarter, and more performance is needed? Or that this particular application is owned by a group that doesn't have any funding, so they get the cheap kit?
Yes, even applications will have their limits to do effective storage tiering. They can help, but they won't solve the problem.
And whether they use internal storage or external storage to do this is largely irrelevant.
3 -- The application-centric model can significantly reduce acquisition costs
Ahhhh ... the siren song of cheap(er) kit -- always a popular theme in our industry.
At a high level, we've done more than a few side-by-side capex/opex analyses for these environments. Once in a while, they're cheaper to acquire. Most of the time, though, the costs show up in other places, and are somewhat larger. And, once you factor in opex, we rarely see them beat a traditional shared storage approach.
So, why is that?
Let's use the popular examples shared in the paper: Microsoft Exchange 2007 using DAS, and Oracle/HP's new Exadata data warehousing "appliance".
If you were to do side-by-side compares of, say, two different ways of doing Microsoft Exchange, you'd notice that the DAS approach has far more servers, and far more copies of data.
The shared-nothing approach (e.g. DAS) means that if a key server fails, you can't get to the data it's storing, so you need another copy of data (and its storage) attached to another server. Plus, all those servers are continually busy making copies amongst themselves to keep everything updated. More servers, more licenses -- and more storage, due to the need to keep redundant copies around.
You've just moved the problem to another place -- at usually more cost. But, if you're in the business of either selling servers, or application software licenses, there's a certain attractiveness to the argument.
The problem is even more pronounced on Oracle's Exadata box. One server can support only 12 disks, and only half of them are usable. In a big DW environment, that's a lot of servers, and half your storage is wasted for redundancy purposes.
I found it interesting that the authors did not bring up the subject of operational costs -- power, cooling, floorspace, administration, backup and recovery, migrations, etc. etc. I can only imagine why.
4 -- Managing storage from the application can cut out the middleman.
The logic presented here is a bit convoluted, so I need to quote it in its entirety to do it justice:
"The politics in many firms favors the application teams over the server teams, so as application offers more capabilities, the money is likely to flow in that direction, favoring trusted application vendors over storage vendors, who have built something of a reputation for being predatory.
What’s more, the relationship between storage teams and application teams is characterized by limited communication, and at times is downright hostile. This often leads to provisioning taking
too long and application teams asking for more than they need to buffer against future delays.
Eliminating the game of telephone between the teams and putting more control into the hands
of those who know the application best will likely result in more responsible resource utilization
and faster changes to the environment."
There's a lot here to dissect, isn't there?
First, in this case, the "middleman" adds value -- the storage admin knows how to configure the resources to do what the business needs. The application people, in most cases, don't.
There's also the question on who's keeping an eye on aggregate storage expenditures and overall architecture -- it won't be the application teams, will it?
Having seen the interactions between application teams and storage teams, yes -- there's frustration. The storage guys are frustrated that the application guys don't get it, and the application guys are frustrated that the storage guys are making things so difficult for them. That conflict is there for a good reason.
Note the characterization of application vendors as "trusted" and storage vendors as "predatory". From my point of view, all of us vendors all predators, including maybe certain analysts at Forrester?
The White Paper Part IV -- The Application Vendors Are Getting Busy
The authors talk about Microsoft Exchange 2007 first, and point to the fact that Microsoft is pushing this environment. This is quite true -- they've been doing this for a while. The authors point to CCR as a replication technology that doesn't need an array.
First, Microsoft officially supports both DAS and SAN architectures, and will most likely do so for the foreseeable future. They've offered up a new choice, though, which should be compared with other choices.
CCR is an excellent example -- it meets the needs for most Exchange environments -- but not all of them. And, if you have more than one application than needs business continuity, you could end up living in an interesting world where you have a half-dozen or more different remote replication schemes to protect your business, instead of a common way of recovering all of them.
As far as generic storage management capabilities, yeah, there's some useful stuff there that would probably do OK for most moderate environments, but nothing like the sophistication you'd find with external SAN shared storage. It's an option, though. And, of course, Exchange's storage management capabilities work with both internal and external storage.
Next up is the Oracle Exadata example. From EMC's perspective, pointing at ASM as an example of a storage management capability is probably not the best industry example to serve up. And when you start probing around at topics like backup, archiving, remote replication and other storage capabilities, there's a pretty thin story.
Besides, we're still waiting for real-world experience on the new gizmo -- it's still early days.
The authors then point to vMware as putting more features into their environment.
Bad example, folks.
To use any of VMware's more powerful features (DRS, HA, etc.), you need, well, external storage. And let's not forget VMware is doing a great job of orchestrating features found in arrays today. I'd point to the wonderful SRM replication management environment, VCB and a few others.
Simply put, I don't think VMware would agree with Forrester's characterization in any regard.
The White Paper Part V -- The Storage Vendors Fight Back
The authors do point out that the storage specialists offer specific application integration, superior performance and functionality, advanced availability features and bring useful expertise to the table.
Whew! I'm glad we're not totally irrelevant yet.
The authors then take an interesting turn by pointing to the server vendors as beneficiaries of this "trend" by offering stripped down, RAID boxes that fit neatly behind a single server. Ummm ... EMC and others offer this sort of thing as well, and position it where it makes sense, so it's not just the server vendors that understand cute storage bricks.
There's a bit of hand-waving around "storage blades", which -- frankly speaking -- don't really bring anything new to the discussion, architecturally speaking.
The White Paper Part VI -- Recommendations
The overall recommendation? Challenge conventional wisdom on SANs!
The first suggestion is to build islands around applications you care about, especially in larger shops.
I don't think that's a new piece of advice -- customers routinely talk about storage pools used for mainframes, SAP, Exchange, etc. The authors are trying to tell people not to build one, honkin' pooled and shared fabric in a large enterprise. I don't think most larger shops even consider that.
But -- once again -- this has nothing to do with an indictment of SANs, or shared storage, does it?
The second suggestion is that size matters.
Smaller shops should consider shared, converged storage platforms, but not larger ones. Yes, but there are places where larger shops could benefit from this approach as well, for example tiering within a larger array, or combining multiple apps in a single array, or common replication backbones.
The third suggestion is that infrastructure can be an inflection point for application consolidation.
The authors suggest that if you're using infrastructure to make your applications work better, you might want to think about rewriting your applications.
I won't even touch this one.
General Thoughts And Future Discussions
So, I'd be interested in the storage community's take on this whole line of thinking.
To me, storage is infrastructure. And shared infrastructure is usually better than infrastructure that's not. We could argue about the right way of sharing it, but -- on this basis alone -- there's a very flawed argument being proposed here.
Even if you might agree that it's a poorly thought-out line of reasoning, make no mistake -- there's a power play going on here in the industry, and this is probably the first salvo of many. So be prepared!
The real architectural question -- for me -- is the broader question of orchestration of the end-to-end environment: service levels for applications and the information they use.
Can applications orchestrate themselves, or is something else needed?
Does an application know that it's just a link in a longer chain? Or that what it does is very important right now, or maybe not so much later? And in a world of multiple applications demanding resources (server, network, storage), who mediates?
And, more importantly, does an application understand budget constraints? :-)
That I'd like to save for a subsequent post ...
Your comments are always welcome -- especially on industry topics like this!

Comments