Welcome back to an ongoing series, exploring the new world of software-defined storage.
If you’d like to catch up, please take a moment to read the previous posts:
“Introducing The Software-Defined Storage Series”
“Why Software-Defined Storage Matters”
“Building The SDS Conceptual Model — Part 1”
In the last post, we introduced the notions of applications, their containers -- and policy. We also discussed how policy is interpreted by the control plane, while mediating access to services and providing the required perspective to multiple stakeholders.
In this post, we’ll extend our SDS model to discuss data services (snaps, dedupe, etc.) as well as the data plane where data is physically stored and persisted.
What Is A Data Service?
To briefly recap our SDS model, we started with applications, their containers, and policies attached to each that list specific requirements for each.
Policies are interpreted by the control plane, which provisions services needed by the application, mediating access to resources, and providing storage-related views to multiple roles within the organization.
The next stop on our journey is data services.
It’s far easier to make a partial list of familiar data services:
- snaps, clones, remote replication, federation, geo-dispersion, etc.
- caching, tiering, striping, etc.
- dedupe, compression, thin provisioning
- backups, archiving to another location, etc.
- encryption, compliance, auditing, etc.
- layered presentations and/or protocols: file over block, object over file, etc.
Note that none of these data services actually “store” (i.e. persist) data itself — that’s done by the next layer down in the data plane — instead, these data services add value beyond simply storing and persisting 1s and 0s.
In This Model, Data Services Are Now Separate
Calling out data services as a distinct layer — separate and independent from the data plane and control plane — is an important feature of this SDS model.
And it will be very controversial to many.
Historically, we’ve grown up mostly with all-in-one array solutions: an external array might present NFS, with its snaps etc. implemented underneath the covers and bound tightly to a specific array. Or a fibre-channel block array that had its own remote replication or tiering mechanism.
Why the big change here? Why put data services above the data plane?
The big reason is uniformity.
If we truly want to uniformly compose dynamic data services without regard to the underlying hardware, we’re going to want a standard set of data services that aren’t bound to a specific piece of external hardware.
Everyone who’s worked with storage stuff knows that — while certain data services might look the same on different arrays — they are certainly not implemented the same! A snap is not a snap is not a snap as you move across different arrays.
Separating data services from how data is actually stored creates powerful ancillary benefits as well: for example, being able to stack data services, as well as dynamically invoking the resources required to implement a data service, align with application boundaries, and more. All of those are nice, but the real big deal is things work the same way, regardless of what you’re doing — or on what storage you’re using.
To be clear, nothing in this SDS model precludes the use of data services that are tightly bound to a specific piece of hardware; but the strong belief is that — over time — the preference will be for data services that are independent of the data plane.
Stackable Data Services?
Yes — that’s the goal. An application container’s policy should be able to dynamically compose a stack of data services that work together in a logical manner.
Here’s this application data container.
I’d like it to be cached and tiered, a remote copy made for disaster recovery, continuous data protection to guard against corruption, old data archived out to a separate location, have it be encrypted as it’s sensitive — and an audit trail please, as we have auditors to report to.
But please do this just for this one application container, and not a bunch of other stuff that doesn’t need it. Don’t make my choices today difficult to change in the future.
And ideally do this regardless of the hardware being used :)
Now, to be fair, the vast majority of data services in use today are quite array-specific. Indeed, that’s where a lot of the “secret sauce” originates in the array business.
But — just as we’re starting to see software-only implementations of server arrays (sometimes dubbed server-side storage, or server SANs) -- we will inevitably see more software-only implementations of data services that are quite agnostic to their back end.
Next: The Data Plane
As we get to the bottom of the software-defined storage model, at some point we actually have to store (or persist) data. Here we also have distinct choices regarding our dynamically composed storage service: capacity, performance, cost, redundancy, sharing attributes, etc.
It’s useful to speak of “capabilities” of a given data plane: how much, how fast, how protected, how costly, etc. Ideally, these capabilities are potentials, and not actually allocated or provisioned until requested — as a composable service delivered on demand.
In our SDS model, these potential capabilities are exposed for consumption, and then instantiated when requested, driven by an application container’s policy choices.
Change the policy, change the physical instantiation.
Data Plane Choices
The SDS model we’re building here has three choices for a data plane.
One, of course, is the familiar external storage array: many choices on protection, performance enhancement, etc.
A second option would be to persist data using an external cloud service, in which case protection and other mechanisms would be the responsibility of the service provider.
And the third choice would be a software-only storage stack that runs on standard servers, with VMware’s VSAN being but one example.
As long as each data plane provider is dynamically composable — as a service — using key policy attributes like capacity, redundancy, performance, cost, etc. — each would qualify as software-defined storage under the model being presented here.
It should also be noted that any data plane has one or more “personalities” (data structure and protocol) in how the storage capacity is presented: blocks, filesystem, objects, key-value, etc.
But that’s not the end of the story, as it is possible that an upper-level data service may impose a different personality than the one that is native to the data plane. EMC’s ViPR, as one example, can present files over blocks, or objects over files. NetApp’s V-Series is yet another example.
Software-Based Vs. Hardware-Based
There will be some purists who will inevitably recoil that the software-defined storage model presented here allows for purpose-built hardware as part of the definition.
Before you decry me as a vendor shill, please consider the following:
- With this SDS model, the focus is on dynamically composable (and stackable) storage services, driven by application-centric policies. Whether or not you choose to implement those storage services entirely in (virtualized) software, or entirely in purpose-built storage arrays, or more likely some combination — it’s basically the same functional architecture.
- While we could debate the pros and cons of doing things one way or another, those are implementation choices, and not architectural ones. For example, the palette of potential data services available today are very rich and mature when it comes to external storage arrays; and somewhat less so when considering newer software-only solutions.
Our model for software-defined storage should allow for various implementations, and not religiously enforce one over the other unless there’s a clear architectural reason for doing so.
Choices Still Abound
For many attributes in this model, there are still overlapping choices of where specific functionality goes.
One example: let’s say you’d like two copies of data for redundancy purposes. Should that go (a) in the data plane, (b) provisioned as a data service, or perhaps (c) something done by the application itself?
You could repeat that question for deduplication, caching, tiering, personality, etc. I think the tradeoff will end up being standard capabilities used consistently vs. specific and unique attributes used selectively.
Clearly, there is no “right” answer; it depends. But as long as the capability could be dynamically composed via programmatic interfaces (and done so aligned to application boundaries), it would qualify as software-defined storage using the model presented here.
In the next post, we’ll take a look at how this SDS model differs from the ones in use today — and what the impacts might be.
Like this post? Why not subscribe via email?