You are forgiven if your head hurts from time to time.
It might seem that we in the storage business are intentionally complicating things. Not so -- the reality is that there are so many different use cases that can be optimized. Add to that the inherent noise generated by the current crop of startups, and it all can be distracting.
I'm going to make matters worse.
I believe the substantial value in storage technology will move up the stack -- away from simply persisting data, and to an entirely different layer: storage data services -- snaps, caching, dedupe, encryption, etc.
Up to this point, we've mostly thought of physical storage arrays as providing both: persisting data as well as providing important data services. You couldn't have one without the other.
But in the emerging software-defined storage world, these can easily be considered separate functions that can work independently of each other.
And that's where it gets interesting.
What Brings This About
You probably know I work at VMware, mostly focusing on software-defined storage. My core belief is simple: much in the way that virtualization has permanently changed the way we think about servers, the same will inevitably happen in networking and storage.
That's not just me saluting SDDC (software-defined data center) concepts -- I see it as inevitable.
Re-envisioning hardware functionality via virtualization leads down two very complementary paths.
The first path is simply recreating the familiar construct as a virtual entity. Here's a physical server; here's what looks like a physical server emulated in software. Retaining that familiar paradigm lessens disruption and eases adoption.
And we should expect the same in the storage world: here's a physical storage array; here's what looks like a physical storage array emulated in software.
The second path is more interesting: virtualization creates the potential to re-factor familiar functionality in entirely new and useful ways. As an example, the notion of a vMotion is impossible to consider in the physical server world, but quite achievable in the virtual server world.
You end up getting the best of both worlds: familiar functionality done better, as well as useful things you couldn't even begin to consider before. True in the server world; also very likely true in the storage world.
And one of the areas that will be most likely impacted is data services.
Arrays Do More Than Just Store Data
Anyone who's spent time with a modern storage array knows that they provide a wealth of value-added services in addition to just storing your data. For starters, there are a seemingly infinite variety of snaps -- quick local copies of data. Dozens of flavors of remote replication. Even remote snaps if you'd like.
We're just getting started. Consider all the performance-optimizing data services: caching, tiering, QoS and more. Efficiency services like deduplication, compression and thin provisioning. Security and compliance services like encryption, audit trails and anti-virus for your files.
Now, let's really stretch out. Cloud gateways. Alternate data presentations. Geo-optimization of data. Distributed caching. Metadata management. Application pipelining. And much more ...
It doesn't take long to realize that -- in many aspects -- the data services associated with storage can be much more interesting and valuable than simply how you store the data.
How Software-Defined Storage Changes Data Services
The value of data services is not lost on array vendors. If you've ever sat through a storage product presentation, you'll notice that many of the slides describe all the cool and powerful data services provided by the array's software. As just one small example, just look at all the different ways people do snaps and remote replication.
In the emerging SDS model, data services can migrate out of the array, and run on the server: either encapsulated within a virtual machine, or within the hypervisor directly. How does this new architectural model change things? Many different ways it turns out -- all for the better.
The first -- and most obvious -- benefit is that data services can be used largely independently of what's being used to actually store the data. Imagine a snap that worked uniformly wherever it's used. Or a remote replication service that was completely agnostic to the back end. Today, if you use three different kinds of storage arrays, you'll likely have three different kinds of snaps -- as well as three different ways to manage them. Not ideal.
The promise of SDS-based data services is simple: a consistent way of doing things, completely abstracted from the underlying hardware. Just like it is with server virtualization.
The second impact is very pragmatic -- and that's resource pooling. Many of these data services are potentially heavy consumers of CPU and memory. When they're implemented as part of the array, those additional resources have to be sized and acquired at array purchase time: bigger controllers, etc.
When data services move to an SDS model, their underlying resources can be dynamically invoked, just as with any other workload. Need a lot of temporary horsepower to dedupe all that data you just ingested? How about a burst of processing power for a resync? More cache for that workload?
The benefit of SDS-based services here is just the same as it is for application workloads: resources are allocated when and where you need it; no need to overprovision.
The third benefit is rather subtle, but very important nonetheless -- it's application alignment. The vast majority of array-based data services are provided on historical array container boundaries (e.g. LUNs, filesystems, etc.) vs. being precisely aligned around the application components that need the service.
As a pragmatic example, you often end up replicating (or snapping) big containers vs. just the objects you're interested in.
Not only is the lack of application alignment inefficient, it also introduces all manner of complexity into design and operations. In an idealized SDS world, you'd simply point at the application components you're interested in, provision the exact data services desired, and be done with it. No need to understand which arrays and LUNs do what, etc.
Fourth, there's potentially an enormous benefit on the management and operations side of the equation. Data services can ideally be composed into policies, provisioned at the same time as the application itself, pushed down to the storage layer, with services dynamically invoked using pooled resources.
Finally, there's a really big one: the ability to quickly and easily change the set data services being provided without disrupting the application, disturbing the infrastructure, etc. And that can potentially be a huge boon in the real world of enterprise IT.
Towards A Future Model?
Let's take a moment to envision what this world will look like before long.
Imagine you're in charge of provisioning new application requests. A new request comes in from the sales operations group. For most of the sales quarter, their needs aren't overly demanding: decent performance, can't be down for more than an hour, RPO of two hours, dedupe please, etc.
But for the last two weeks of the quarter, the profile changes dramatically: 4x performance required, can't be down for more than a minute or two, RPO of five minutes, logically consistent checkpoint every minute, hot failover to a second site if needed, turn off the dedupe if it impacts performance, cache like crazy, etc.
Why? Any downtime, poor performance or data loss is directly equatable to lost revenue.
In today's hardware-defined world, you'd have to design and size for the peak. You'd take a look at the requirements, and probably be tempted to custom-build an environment just for this one application.
But in tomorrow's software-defined world, you'd simply change the policy for that one application -- resulting in the configuration of additional data services dynamically, using shared resources from the (hopefully adequate) pool.
And then simply dial it all back -- once the storm had passed.
Where Are We Today?
If one were to inventory a list of software-defined data services available in the market today, it wouldn't be exactly encouraging.
Most data service offerings are tightly bound to the data plane. Few offer the sophisticated enterprise-class capabilities that dedicated hardware provides today. Every one of them has their own management semantics, and behaves differently than its peers. Almost none of them can respond to notions of static policies, let alone dynamic ones.
You might be tempted to think that this world I've described would never come to pass. But there's a case for optimism.
The prevalence of a single, robust hypervisor model in the enterprise is the essential starting point -- and that's vSphere. That gives us the required application abstraction (the VM) as well as a dandy dynamic container to run newer data services. There are good policy mechanisms clearly being developed, as well as the required orchestration and control planes as part of vCloud Suite.
I can see the required pieces starting to assemble and mature.
Whether those software-defined data services are delivered by VMware -- or its technology partners -- there's a clear case that can be made that tomorrow's data services will be thought of as virtualized software entities, dynamically applied on precise application boundaries, using pooled resources, and consistently managed using policies.
I can't wait.
Like this post? Why not subscribe via email?