A while back, I shared how the industry analyst firm of IDC viewed the storage market. A surprising number of people were interested in what could ostensibly be categorized as a rather boring tech subject.
As part of the recent EMC investor day, David Goulden shared a very detailed, behind-the-scenes EMC view of the evolving storage marketplace.
It's probably the best segmentation model I've seen to explain the macrodynamics of what's going on right now -- and well into the future.
IDC and the other industry watchers do a nice business in studying the storage landscape, and selling their insight to subscribers. Not to be disrespectful of anyone's work (we do buy their services) but you should to keep in mind that -- as EMC -- the majority of our multi-billion dollar business is either storage, or things that relate to storage.
So we really, really care about having good models :)
If your interest in storage technology and market dynamics is more than a passing one, you'll find this model very interesting.
I did.
Don't Write Off Storage Just Yet
If you're looking for spectacularly uninformed opinions, the internet is a great place indeed. Folks who aren't close to the tech tend to think of storage as an undifferentiated commodity of little interest.
Not true.
For one thing, anything done at decent scale becomes intrinsically interesting. Given the massive amounts of data we're collectively generating, reasonable scale has to be assumed, hence interesting.
We're currently going through two important technology transitions: from tape to disk, and from disk to flash. The consumption model is shifting as well: cloud, converged infrastructures, etc.
And as software continues to eat the world (hence software-defined data centers as the next infrastructure model), software-defined storage concepts are now part of the mix.
It's All About Workloads
Storage is not an end unto itself -- it exists solely to support use cases that people care about.
While the core technologies and consumption models are certainly in play, the real action starts by looking farther up the stack at what workloads are out there today, and -- more importantly -- how that map will change over time.
EMC shared a simple quadrant to map out the storage workload discussion.
Capacity at the bottom, performance at the top. High storage service levels (availability, replication, etc.) at the right, minimal service levels required at the left. While there are certainly corner cases to be considered, most of the observed storage universe fits nicely somewhere on this chart.
Upper right quadrant, we have our familiar transactional workloads: extremely high performance, consistent data, and data loss being something to avoid if at all possible.
Navigating to the lower right, we have capacity-oriented use cases -- perhaps a video production company? -- where performance has to be "good enough", but efficient capacity and operations are perhaps more important. Data must be consistent, of course. Availability must be good, data loss is bad.
From there, let's jump to the upper left, with Facebook offered as an example. Here we want consistently good performance (it's an interactive experience you're delivering), and eventual consistency of data is good enough. In the big scheme of things, occasional data loss isn't a major concern -- although annoying.
Finally, let's jump down to the content-oriented web apps -- using YouTube as one example. Performance has to be "good enough", but losing a video here or there won't be the end of the world.
Now, the technically-minded reading this will immediately react with the "hey, what about ... ??" but please stay with me here -- this is an approximation of the world, not the world itself.
EMC In The "Blue Zone"
Now, let's refine this model a bit more.
Here, we've populated the quadrants with more recognizable use cases from the IT world: things like transaction processing, home directories, etc.
In the background, we've drawn the EMC "blue zone" -- places where our familiar storage platforms provide different optimized combinations of performance, data services, capacity, etc. Each of these workload use cases can be characterized, sized, modeled, relocated, combined, etc. as needed.
We do that level of analysis internally on a routine basis, but that's not where we're going with this discussion.
The Cloudification Of Enterprise Workloads
Our default assumption is that -- over time -- the consumption model for *all* these workloads will inexorably shift to either private clouds, or hybridized versions.
The logic is simple: if enterprise workloads are going to any form of cloud consumption model, storage must inevitably follow.
By "follow", I largely mean storage resources and functionality tightly integrated into the cloud operational and consumption model.
Please note: the storage workload requirements don't change simply because they've been "cloudified" or otherwise virtualized: ERP is still ERP, home directories are still home directories, etc. I should also point out that whether that functionality is supplied with purpose-built hardware or perhaps using software-defined storage using more commodity flavors -- the applications requirements still don't change.
EMC Storage Platforms
If you take a handful of EMC's popular storage platforms, it's straightforward to position them over groups of requirements.
Please don't read too much from the precise placement of the icons -- it was done by a graphics person -- it's just a notional representation.
First point: when you're designing storage architectures, you're going to have to make some choices. Despite what you might hear, there's no one-size-fits-all approach, except perhaps in vendor fantasyland.
A storage platform that's purpose built around one set of use cases will uniformly trounce a storage platform that's been uncomfortably adapted to the role: performance, cost, functionality, etc.
That's why EMC invests in different storage architectures, including the newer purpose-build backup appliance category via DataDomain.
Second point: overlap is a good thing and to be encouraged.
Although it tends to occasionally drive our customers and sales force a bit bonkers (which one is right for me?) workload profiles are rarely "pure", encouraging a healthy and productive discussion around requirements.
Besides, having gaps in our product portfolio only encourage the competition to take shots at us.
The reasonable expectation is that each product group inevitably wants to expand their market share, so they'll invest in features that customers want, encouraging even more overlap in the future. At EMC, we're set up to encourage that sort of loosely-coordinated internal competition.
Strong Industry Positioning
We are fortunate in that we have clear market-share leadership positions in scale-out NAS, midrange storage (unified, if you prefer), and -- of course -- high-end enterprise storage.
While you might quibble with our approach, there's no arguing it has served us (and our shareholders) quite well.
David shared a historical view of the IDC numbers around "worldwide external storage" (a very broad category indeed and not cherry-picked).
As you can see, EMC continues to separate itself from the pack, with a heated battle going on for a distant #2 position.
But that's history -- what about the future?
The Future Of Traditional Application Growth
Back to workloads -- the majority of the ones out there today are the familiar SAP, Oracle, Microsoft et. al.
They are by far the most numerous, and expected to grow in number around 70% through 2016. One could make a nice business, solely focusing on subsets of these workloads, and that would be understandable.
What they need is familiar: the familiar block/file presentations, performance, consistency, no data loss, etc.
The inarguable evidence is that hardware-based protection and resiliency schemes are the strong preference for this use case. Why? Various forms of software-based storage resiliency (and thus hardware agnostic) has been with us for well over a decade; very little of this has ended up being embraced by customers as opposed to purpose-built hardware-based approaches, and there's no evidence that this will change anytime soon -- at least, for these workloads.
We also have default assumptions on what kinds of clouds these workloads will most likely land on between now and 2016.
By far, the most popular approach for enterprise applications will be some flavor of a private cloud. A few enterprise workloads will inevitably find their way onto public clouds.
And we'll have a decent subset running on the notion of a 'virtual private cloud': one where external resources transparently extend the internal IT model: compatible, controllable, well-integrated, etc.
The motivation behind the primacy of internal private clouds for enterprise apps turns out to be mostly economic -- something I hope to explore deeper in a future blog post.
But What About The "Green Zone"?
But if we want to look at the entire picture, we have to turn our attention to the rapidly growing "green zone" of storage that's not particularly well-covered by EMC (or any other storage vendor) today.
Think "cloud storage", bulk content depots, web-scale apps -- all of that.
While it's true that EMC offers Atmos in this category (basically a distributed object storage stack running on commodity hardware) -- and it's enjoyed some decent success -- the world is changing fast.
We fully expect public cloud capabilities to grow and expand with regards to storage functionality as they try to gain more and more paying customers -- it's inevitable.
To think otherwise would be naive.
Once again, it's all about the application workloads.
Consider the broad category of newer "cloud apps" -- possibly web-scale, using modern frameworks and platforms.
Yes, numerically smaller than the familiar, established category -- but growing much, much faster.
Block and file presentations aren't as interesting here: an object interface for smaller things; HDFS for bigger things. Immediate data consistency isn't a big deal either; eventual consistency is the norm.
More interesting, note the strong preference for resiliency implemented as software vs. purpose-built hardware. Different needs requires a different approach.
Newer cloud apps want something very different from storage.
Don't Assume Cloud App Means Public Cloud
So let's filter these newer "cloud apps" against our current best guess as to where they will land.
Certainly, a healthy portion will show up running in public clouds -- that's to be expected.
But our best models currently shows that the majority will be running in "owned" (or virtually owned) private clouds.
The motivation -- once again -- turns out to be mostly economic. I promise, I'll share the data in a future post.
There's an important implication here: if enterprises are going to be running a serious portion of these new, fast-growing "cloud apps", they're going to want some new storage technology to go with it: HDFS/object presentations, software-based resiliency, and so on.
Even better if their existing infrastructure could support the new model(s) in a consistent, pooled and shared fashion without needed to stand up yet-another-stack and yet-another operational model :)
Enter A Software-Defined Storage Model For The Enterprise
While we certainly have aspirations around meeting the storage needs of public cloud providers over time; our "sweet spot" is -- of course -- enterprises, whether they're running mostly traditional application workloads, or perhaps a more likely mix of old and new.
Here's the model that was shared -- let's take a quick tour.
On the data plane side of things, we've got two distinct approaches: a traditional purpose-built hardware model (for the applications that require them) and a newer software-only model for the applications that don't.
To be clear, you can't call one approach "better" than the other unless we first have a conversation around workload requirements.
More interesting is the control plane -- abstraction of presentations, consumption portals, orchestration, etc. regardless of what kind of hardware/software combination is providing the underlying services.
Finally, support for perhaps the most important aspect of software-defined anything: complete and open programmability coupled with a rich ecosystem of partner, customers and perhaps competitors building on the abstractions. Couple that with a potential open-source-ish model, and we have a rather complete model for software-defined storage through the EMC enterprise lens.
A Use Case, Perhaps?
Just to help crystallize the thought for people, David shared an EMC-specific example of a newer cloud-app that brings all of these concepts into play.
It's Syncplicity -- the slick sync-and-share application for enterprises.
Part of what makes Syncplicity so appealing is the ability to collaborate with control with entities outside of the organization and thus outside the perimeter of enterprise IT. External cloud services handle that part; but the data itself can reside within the data center's four walls if needed.
The application itself deals with objects, and supports either Isilon or Atmos on the back end. Eventual consistency is the norm (after all, we're talking sync-and-share here), and a hefty part of the resiliency features are implemented as part of the Syncplicity software stack. Needless to say, do the math, it's easy to see that these environments can get very large indeed, indicating a true scale-out architecture vs. one that has been adapted to the role.
There are other potential examples from the Pivotal Hadoop world, but perhaps you can start to see how it all might fit.
New, Performance-Intensive Workloads
Let's go back to our quadrant, but -- this time -- let's just consider the upper-half: the workloads that are extremely performance-sensitive, whether they be enterprise apps or the newer "cloud" apps.
Places where the performance really matters.
The relevant technology, of course, is flash storage in all its forms.
But flash won't be eating the entire storage world for one very important reason: it costs a lot more.
David shared our best internal guess as to what we expected to happen with MLC vs. disk prices through 2016. Unless something seismic happens, we're expecting MLC to be 8x more expensive than fast disks, and 40x more expensive than capacious, slow ones.
If you're using expensive flash storage to get performance, you'll be encouraged to be judicious as to how and where you use it: now, and for the foreseeable future. The $/GB is roughly the same whether those MLC chips go on a PCIe board, or are wrapped in an enterprise SSD.
I've covered EMC's flash portfolio at length here if you're interested -- but it's useful to see how it all fits together.
One view is that flash (coupled with intelligent software such as EMC's FAST) moves data ever-closer to the application, making hot data even "hotter".
The goal is simple: use a minimal amount of flash to deliver a maximal performance impact, thanks to either locality of reference, or perhaps segmenting workloads to run on all-flash designs such as the XtremIO array.
But there's another direction to the data flow as well -- and that's "cool" data getting progressively colder, and "aging out" to purpose-built, capacity-optimized, uber-cheap "sea of storage" pools: presumably using software-defined storage stacks against commodity hardware.
Not to point out the obvious, but combine (a) a persistent wide gap between flash prices and disk prices, and (b) exponential amounts of storage growth -- it's pretty easy to see why EMC sees the world through this four-segment storage tiering model -- somewhat independent of use case, consumption model, etc.
To the extent that applications or other entities get smart enough to give the storage layer "hints" about what needs performance (and what doesn't) -- well, that would be great.
But we haven't made much progress on that front in the last twenty years -- hence we're forced to assume we have to largely depend on either (a) intelligent software that reacts to observed application behavior, or (b) explicit segmentation of performance-sensitive application data by a human being.
Some Things Never Change
I thought I'd end this post with perhaps the simplest (and most resonant) slide of the day.
When you boil it all down, delivering IT services is all about three things: costs, revenue and risks.
Just like any other business.
We just have to keep figuring out better ways to help our customers do all three at the same time :)

Good write-up as always Chuck. Wonder if you can help me understand some of the numbers in the graphs. It highlights that EMC believes that 86% of existing apps will stay in Private Cloud.
During Pat Gelsinger's presentation, he highlighted that he expected server virtualization to be a $6B market, and public IaaS to be a $14B market.
Can you help me better understand how those two projections connect? On the surface, it seems like they might be somewhat in conflict. Numbers are always complex (and slides are simple), so any depth of understanding is appreciated.
Posted by: Brian Gracely | March 15, 2013 at 03:52 PM
An app can be large or small, thus can consume a variable amount of infrastructure, virtualization, etc. What you're seeing in the first number is a simple count of application workloads -- regardless of size.
The second number is (I believe) an addressable TAM (total available market) given the existing and anticipated products in the combined portfolio.
Bottom line -- trying to connect the two numbers probably can't be done without multiple transformations. I've seen the work that did this, but it's probably too dense for me to blog about -- not to mention being somewhat proprietary.
-- Chuck
Posted by: Chuck Hollis | March 15, 2013 at 04:19 PM