If you're involved with virtualization or any form of IT infrastructure, you're probably paying attention to SDDC (software-defined data center) concepts, as well you should.
It's a powerful set of ideas: by abstracting intelligence into software, the IT world becomes a better place: more agile, more controllable, and more efficient. Infrastructure behavior will be ideally driven by the application workload at hand, and not pre-ordained.
As the three core infrastructure categories pass through the SDDC wormhole, they are inevitably altered.
For example, how we thought about servers five years ago in the physical world is nothing like we think about them today: servers are now abstracted, dynamic, resizable, and relocatable virtual entities -- all thanks to virtualization.
Now both networking and storage are approaching the same event horizon as server and compute did years before. And both will look very different before long.
This inevitability is not lost on industry storage vendors, either. To date, we've heard much about software-defined storage from EMC, NetApp and HP. We will eventually hear from HDS and IBM, and maybe even Dell. And, of course, no shortage of small, nimble startups that smell an opportunity.
Where are the flags being planted? Is there any consistency in the perspectives? How do various vendor views stack up? And what might we see in the future?
In this post, I'll vainly attempt to be an impartial industry observer. I do admit I have my favorite horses in this race, though …
Basic SDS Concepts
As you might expect, there is little consensus on what is software-defined storage, and what isn't. That lack of agreement might be frustrating to some, but it's inevitable. However, the definitions are showing signs of getting better over time.
The high-level picture is simple: re-invent our familiar-yet-complex storage and protection plumbing as programmable, virtualized software entities.
With regards to SDS, I think three core challenges lay ahead of us.
First, the required technology isn't quite there yet -- but there are all signs that it's coming along very quickly. By next year, there should be several good products in the marketplace to concretely evaluate.
Second, we have to bring the legacy forward. For example, VMware didn't dictate that you had to go buy new servers to run ESX, and SDS should do the same. That being said, server design changed considerably in the presence of compute virtualization, and I would expect storage and network design to do the same over time in the presence of SDS and SDN.
Third, we're all going to have to organize differently if we're going to exploit the new capabilities. We won't get the benefits of any of these technologies if we insist on managing storage like it's still 1999.
Towards A Superset Of Characteristics
While we're patiently waiting for the authoritative definition to emerge, we can get a clearer picture by assembling the various attributes already described.
Warning: aggregating every characteristic creates quite a list; and not everything might be important to everyone.
One frequent attribute of most SDS descriptions is programmability: can all services be invoked, behaviors changed and management information sourced via RESTful APIs?
The envisioned model requires that all storage-related services are orchestrated by an entity with application knowledge (vs. traditional storage administrators!) so I would expect to see this in just about every valid definition.
Extending the notion somewhat, part of an expanded SDS would be access to a variety of consumption portals around specific roles and needs (VMware admin, DBA, finance person, etc.) -- whether those are part of the package, or created as needed.
Related to programmability is hardware abstraction in two dimensions: (1) can I pool resources and capabilities across all my devices, and (2) can I discover and invoke services in a standard way, regardless of the underlying devices? It's one thing to have a separate set of APIs for everything you use, it's another thing entirely to have a *single* set of APIs for everything you use.
Besides being a useful convenience, this standard storage API layer is proving to be an essential component in most cloud stacks.
Then there's the inevitable discussion around rich storage services: data presentation, replication, caching, tiering, QoS, non-disruptive mobility, etc. Ideally, any definition of SDS should be able to consistently catalog and expose which underlying data services which are available in the inventory, as well as the SDS layer ideally providing some of its own even if the underlying devices don't.
An attractive part of the discussion is the ability to use commodity hardware -- specifically, providing data preservation (e.g. persistent storage and related functionality) without the need for purpose-built array hardware -- creating the ability to use standard commodity-based servers as a storage target. This implies the existence of software-based storage that mimic many of the properties of familiar arrays, only using commodity hardware.
Because SDS is infrastructure, we inevitably want gobs of management information: standardized and easy-to-consume through a variety of portals: hardware inventory, configuration, utilization, performance, monitoring, etc. -- on either an aggregate or per-tenant basis.
Again, the notion of abstraction is useful here as well -- best if the SDS layer collected and presented information about everything it touched, and did so in a standardized way.
Very importantly, there ought to be some notion of policy-based orchestration built on a standard catalog of services -- translating from a higher-level request (e.g. gold service level for my Oracle database) all the way through the provisioning and configuration of storage capacity, paths, data protection and anything else that might part of a real-world storage service definition.
I couldn't find a vendor-agnostic diagram I liked, so I whipped one up. I'm sure there's room for improvement -- please chime in!
If you're with me so far, our idealized description of software-defined storage has the following attributes:
-- pooling of resources and capabilities independently of specific devices
-- everything discoverable and programmable through a standard set of RESTful APIs, no matter what you're using for physical storage.
-- access to a catalog of rich storage services, whether or not they're provided by the underlying storage platforms, or provided by the SDS layer, or ideally both.
-- ability to store data on both traditional storage devices as well as newer software-only storage targets built on commodity hardware
-- rich management information streams, standardized and consolidated, with multiple role views including tenant-specific ones.
-- an orchestration engine that uses policy to translate high-level service requests into a sequence of validated provisioning tasks.
And, of course, all delivered as a software product, and not a hunk of tin :)
While no vendor today is talking about ticking all of these boxes, there are sharp differences between the approaches already described. Decide for yourself the relative merits.
My Understanding Of The NetApp View Of SDS
Following EMC's announcement, NetApp appears to be taking great pains to make sure everybody knows they've been doing this all along -- even though SDS itself is a very new concept. Without commenting on that particular marketing assertion, how might NetApp stack up against this expanded criteria?
NetApp positions ONTAP as the asset that implements the majority of SDS in the NetApp world. Occasionally, the OnCommand assets get drawn into the discussion, although that’s not as strong a fit.
Discoverability and programmability: while I'm sure every NetApp ONTAP asset is discoverable and programmable via APIs, I would have two questions: how would I discover and program non-ONTAP assets, and -- even if I'm living in an all-ONTAP world -- where is the common repository of all assets and their capabilities? OnCommand is targeted at smaller, NetApp-only environments – and doesn’t appear to be programmable in this sense. This is somewhat tenable if a customer has made the decision to live in an all-NetApp world for the foreseeable future.
Access to rich storage services: while ONTAP itself provides a rich suite of storage services, it does not expose any others that might be in the environment. While it's true one can plug a NetApp filer head in front of a block storage device, that back-end device is only used as very dumb storage. This also limits you to storage services implemented by NetApp and no one else.
Traditional and commodity storage: while ONTAP marketing can tick the box on supporting traditional storage devices (either a NetApp array or an external array that's being used as dumb capacity behind ONTAP), the vast majority of ONTAP today runs on NetApp hardware and nothing else.
Rich management information: yep, ONTAP supplies rich management information for its own devices, but does nothing to expose and/or aggregate on behalf of other devices that might be in the environment. OnCommand creates a simple (although non-pooled) view of NetApp FAS devices, but nothing else.
Policy-based orchestration: so far, my assumption is that this heavy lifting is intended to be left to upper-level orchestration engines that a cloud environment would provide, e.g. VMware, OpenStack.
However, I believe that's an insufficient answer for any vendor -- it will take a very long time indeed for any of these non-storage orchestrators to gain any pragmatic knowledge about setting policies around performance, protection, availability, etc. There's a big hunk of important glue missing.
Bottom line: make no mistake, NetApp clearly states they intend to be in this game. The required effort to do so will be considerable: they're going to have to get comfortable with running ONTAP on commodity servers, as well as make non-trivial investments in storage-specific management and orchestration domains that know about more than just NetApp FAS devices -- maybe a few acquisitions? The hardest part may end up being coming to terms with existing in a multivendor storage world.
My Understanding Of The HP View of SDS
HP, coming off their recent HP Discover event, made a number of storage announcements under the "software defined storage" flag. Although you'll see the words being used frequently in their press statements, I found it very difficult to ascertain what their underlying definition and associated capabilities might be.
Dave Donatelli said a lot about "storage software", the importance of being multi-vendor and open, and the ability to move the right data to the right place at the right time.
I can't disagree with any of that, but how do HP's statements -- plus their product portfolio -- prepare them in this world?
Discoverability and programmability: not much here that I can discern. While individual HP storage products might have the occasional API-like interface, there's a ton of work for them to do to make *all* of their storage products discoverable and programmable. That's before considering doing the same for non-HP products, not to mention a *single* set of APIs for everything a customer might own: HP and non-HP.
Access to rich storage services: while there's no arguing that individual HP products such as 3PAR provide rich storage services, it's done in a traditional hardware-centric manner. There's nothing in the portfolio that appears to abstract various storage services and make them easily consumable, regardless of what entity is providing those services.
Traditional and commodity storage: HP has some interesting assets, namely the StoreVirtual VSA (acquired as Lefthand), and the StoreOnce VSA for backup. Both are positioned as software-only storage targets that run in virtual machines. A promising start, but ...
Rich management information: I can't comment on what's there and what isn't. HP has a long history of infrastructure management via OpenView, but it's not clear how much of that historical ethos will find its way into software-defined storage. There's not much in the current portfolio that speaks to sophisticated storage management in an enterprise setting -- whether we're talking about traditional storage or newer software-defined variants.
Policy-based orchestration: once again, I can't offer much commentary here. No evidence that I can discern of any focus or capabilities around storage-centric orchestration. Much like NetApp, the temptation will be to point at upper-level cloud orchestration engines as responsible for that task, but there's still a serious hunk of glue missing.
Bottom line: HP has started to talk the talk, but has a very long road ahead of them in walking the walk. Although there are a few nice software-based storage assets in the portfolio, that's only a small part of the broader picture as I'm describing it here. In some regards, they have a much longer journey than NetApp.
My Understanding Of The EMC View Of SDS
We have the most to look at here, thanks to EMC's recent announcement of ViPR at EMC World.
Of the three vendors discussed here, EMC does the best job of ticking the most boxes, but nonetheless still has gaps to be addressed -- at least in my view.
ViPR is breathtaking in what it intends to accomplish, and as a result you'll find elements of each and every functional SDS category as I've laid them out here.
Discoverability and programmability: this is a central focus of ViPR. Existing storage assets are discovered and categorized as to their capabilities. A single set of RESTful APIs is used (regardless of storage device) to expose information up and send commands down the stack out-of-band. There are a wide variety of consumption portals available.
While the initial targets are unsurprisingly EMC arrays, the intent is to provide Day 1 support for NetApp, as well as sequentially tackle the other non-EMC arrays that are out there. Device adapters can be created by EMC, the array vendor or perhaps a motivated third party.
Access to rich storage services: ViPR is rather unique in that it both exposes underlying array and infrastructure service capabilities as part of its catalog, as well as introduce a framework for providing its own software-only storage services regardless of the underlying hardware. First versions will include HFDS and object over existing NAS, but there are said to be many more coming.
Traditional and commodity storage: ViPR has rich support for traditional storage arrays -- at least from EMC -- but more work to do on non-EMC ones. ViPR also supports existing EMC storage stacks that run on commodity hardware, e.g. Atmos, Avamar, etc. What EMC is *not* doing yet is actively promoting the idea of storage targets running in virtual machines on commodity hardware.
Rich management information: another ViPR strong point, out of the box there's everything you need to run storage-as-a-service with full metering and measurement at both an aggregate and a tenant level. Much of this has been built from EMC's many acquisitions in this space.
Policy-based orchestration: another very strong ViPR capability. Provisioning a storage service is more than simply allocating capacity: you're provisioning performance, data protection, availability, access paths and perhaps other attributes as well. It creates high-level service constructs that fit in nicely with your cloud orchestration framework of choice: VMware, OpenStack, etc.
Bottom line: at this early stage in the game, my view is that EMC owns the leadership position in the software-defined storage race -- not that there are still gaps to be addressed. Although there's a clear plan to increase the portfolio of traditional arrays that are discoverable and can be orchestrated, not much has been said publicly about bringing storage software array stacks to separately acquired commodity hardware.
Where Do We Go From Here?
No technology vendor wants to be left out of any relevant industry trend -- and that's certainly the case with software-defined storage and our familiar club of storage vendors. Peek over at what's happening now in the adjacent SDN (software-defined networking) space, and consider that a preview of what's likely to come in our little world.
But there's a big difference between adopting the buzzwords, and delivering capabilities that customers can actually use in their environments.
By that metric, it's sort of a one-vendor race at the moment. The next twelve months will determine whether or not storage vendors other than EMC will make the required investments in enabling technology, and help customers to consume the new capabilities intelligently.
That being said, I believe there are another set of industry vendors we have yet to hear much from -- the cloudstack vendors: VMware, OpenStack, RedHat, Microsoft, et. al. These players will come from an entirely different angle -- cloud service integration. While it's true that there's not exactly a plethora of storage domain experience among them, it is also true they aren't burdened by their existing business models.
One thing is for certain -- I've seen the real demand out there for these sorts of solutions. Not the "we'll buy it today and put it into production tomorrow" sort of demand, more of the "this is important stuff and we better pay attention to what the vendors are doing in this space" demand.
Progressive IT shops know that the infrastructure models are changing fast, and the potential benefits are likely to be very great indeed.
And no one wants to be left behind these days :)