Those capabilities were severely put to the test when I first approached EMC's new product announced at EMC World: ViPR.
The effort paid off: I came away with a deeper understanding of some of the more powerful forces at work in our industry, as well as a breathtaking appreciation for what ViPR intends to achieve: both now and into the future.
Rather than debate terminology and categories, the best approach with ViPR might be to relax, follow the discussion, and come to your own conclusions: what ViPR does, what it means to the IT industry, and -- most importantly -- how it might affect you in your world.
Trust me, the journey will be rewarding ...
Not A Simple Exercise
While I'm sure many familiar labels will be applied to ViPR, I'm going to avoid that for the time being. Yes -- you can find elements of many somewhat-familiar storage and infrastructure concepts within ViPR: they all are accurate, and they are all inaccurate.
Perhaps the strongest praise I can offer is that it's changed the way I think about things. Maybe it will do the same for you as well.
We Need To Establish Some Context
Big picture time: cloud is changing the way IT is produced and consumed.
Everything in IT is being morphed to a service: easy to find, easy to consume, easy to meter, easy to adjust, etc. The supporting technologies and operational models are also starting to converge, whether you're an enterprise IT group, or perhaps a cloud-scale service provider.
But everyone is at a different point in their journey along this continuum. Many IT organizations are still tightly bound to physical devices and legacy workflows.
Others are moving beyond simple server virtualization and have begun to attack the new models: driving consumption of services, new skills, new organizations.
And a small, select group is now post-transformation: they're building newer cloud-style apps as the default.
Thinking back, server virtualization (especially VMware) moved people very nicely along in this continuum for enterprise compute. Physical servers became virtual servers, which laid the groundwork for moving to cloud-like operational models.
Back to storage, how can a technology move each of these groups along in their journey? Directly attacking some of today's problems, yet laying the groundwork for future evolution?
That -- in a nutshell - is what EMC ViPR aspires to achieve.
The first bucket is very remiscent of a storage hypervisor: it abstracts physical storage arrays to create new, virtual arrays -- and then provides for ease-of-consumption delivered through a multi-tenant and metered "as-a-service" model.
Absolutely great for all those shops who have a variety of different storage devices, but want to move to a more modern operational and consumption model.
The second bucket is a rather new (yet powerful) idea: the ability to morph the presentation of data to meet the need at hand: object-over-file, HDFS-over-file and more. Sure, we're all familiar with gateways, but ViPR a richer and more fully-featured architectural approach that's worth appreciating.
The third bucket is the beginnings of a framework for doing application-specific storage service provisioning in an advanced cloud model: one storage framework, many services. While there's not exactly an enormous market for this yet, there will be soon :)
Each grouping of functionality maps to different needs as IT shops start down their cloud journey. While there are certainly differences, you'll notice strong parallels with what VMware did for server virtualization -- moving people from physical to virtual to cloud.
Part 1 -- From Storage Silos To Storage Services
The first key component of ViPR is essentially about a better control plane for existing storage arrays: EMC and others.
ViPR discovers existing arrays and networks (file and block), catalogs their capabilities (capacity, performance, features), and enables the construction of "virtual storage arrays" – entities that are essentially abstracted across multiple physical arrays and existing data services.
If you're playing buzzword bingo, you'll notice a new variation of familiar storage virtualization concepts.
These virtual storage arrays are then presented via service catalogs to consumers: server admins, database admins, power users, etc. through a variety of mechanisms. Requests for service drive an orchestration and change management workflow for provisioning and presentation -- the details of which can get quite complex indeed as you layer in snaps, replication, storage federation, remote sites, etc. -- all part of ViPR functionality.
Consumers (and admins) get metering, QoS reports, a rich multi-tenant model -- everything you need to transform physical storage arrays into a storage-as-a-service operational model.
But so far, ViPR is largely separate from the data path; think of it more as an enhanced and well-metered "vending machine" for traditional physical arrays.
This specific set of capabilities will likely appeal to shops who -- for one reason or another -- have acquired storage arrays a batch-at-a-time, usually on a project basis, and usually from different vendors.
Many of these IT groups would likely want to move to a storage-as-a-service model, but certainly don't have the luxury of starting over again with a clean sheet of paper.
The ViPR software is delivered as a set of virtual machines, with a wide range of potential southbound adaptors for physical storage arrays (EMC and NetApp first, others coming).
Northbound, there's a great deal of tight plug-in integration into the VMware stack, support for OpenStack via Cinder, intended support for Microsoft's Hyper-V.
I can see that the ViPR product reflects the very best of EMC's extensive experience in storage and availability management -- extremely powerful discovery, provisioning, monitoring, reporting, metering, etc. That alone would take several blog posts to fully cover.
Perhaps more importantly, version 1.0 includes a well-documented set of REST APIs if you'd like to do your own orchestration integration.
Plenty to discuss here, but let's move on ...
Part 2 -- Store Once, Access Many
The second key component of ViPR is representational: changing how data is presented depending on a given application's access needs.
I believe this is an extremely powerful concept that will undoubtedly become more popular going forward.
Let's say you store a bunch of data as an NFS filesystem, maybe using Isilon or VNX as an example. That's fine, but now you've got a handful of applications that really could use an object-oriented REST interface. Or maybe you're seeing more Hadoop in your world, and it would be great to expose that same data as HDFS.
Rather than think in terms of yet another storage stack (hardware and software), why don't we just change the access method for the underlying data? Store once, access many?
Specifically ViPR supports an interesting object-over-NFS model, as well as a HDFS-over-NFS model. While a purist might argue that a purpose-built stack might be more "optimized" for a given use case, the flexibility and efficiency gained by this layering approach is not to be lightly discounted.
As an example, consider perhaps a content-rich application that could really benefit from an API model to access objects. That's all well and good, but (operationally), sometimes you need to sequentially process *all* the objects: process them, back them up, etc.
Throwing 100 million API calls at your object store isn't exactly the most efficient access method; you'd really like to just open up the file system and have at it.
A similar situation occurs with HDFS and Hadoop. Most acquired data is stored in NFS, then processed in HDFS -- and the results shared using either NFS or CIFS. Similar to what Isilon's OneFS does today, wouldn't it be great to simply change the presentation of the data in-place?
Now, let's extend that concept -- just for fun. To be clear, what follows is not part of any official announced capability, but it's an interesting concept that's worth speculating on.
If you go looking for other potentially useful data representations, there are many more to be considered: key-value, document store, graph, etc.
You end up thinking in terms of a "data family" that has multiple "data facets". Maybe even multiple data families: semi-structured, transactional, message-based, etc.
On a parallel vector, let's consider associated data services: snaps, remote replication, tiering, federation, deduplication, etc. Your optimized replication model for, say, object or key-value data isn't semantically the same as what you'd use for file or block.
Another powerful idea in play ...
Also gratifying: there are formal plans (and funding!) to foster a community of developers -- working at the fascinating intersection of new data models, data services and orchestration.
Part 3 -- Storage Services For Cloud Applications
Now let's jump to the fascinating part the spectrum -- organizations that are routinely developing cloud-friendly applications using a PaaS environment such as, say, CloudFoundry.
As you watch these developers work, they simply specify storage services they need for the application at hand: block store, file store, key-value store, object store, etc.
As it should be, they usually have little concern or interest in exactly *how* these services are implemented -- unless, of course, they don't behave as expected ...
Beneath the covers, though, you'll usually find multiple software and/or hardware stacks that deliver those application-specific storage services. As a result, complexity and inefficiency is typically the order of the day.
What's easy for the consumer of the services is not so easy for the providers of those services!
For example, there's usually no central framework for publishing a wide range of storage services, monitoring their consumption -- and implementing them from a common pool of abstracted resources.
As another example, if you need another representation of the data, you'll usually make yet another copy of it.
Storage service provisioning, orchestrating, change management, workflow, monitoring and metering those services -- again, no simple solution today.
Better yet -- if the proposed solution could accommodate extreme flexibility on the back end: traditional physical arrays as required, newer software-only implementations of array functionality, leveraging the native array services if that's desirable, using policy to move data around on the back end without impacting the front end, and so on.
Just to repeat, I have left the world of formally announced ViPR capabilities here – I’m simply extending ViPR concepts to a third emerging audience with very unique needs.
Is This What We Really Mean Now By "Storage Virtualization"?
Way back in 2003, the term "server virtualization" essentially meant VMware's GSX running on a Windows host.
Fast forward to 2013 (a mere ten years later!), and the term "server virtualization" is a foundational concept in IT: it's gone from lab curiosity to the desired model for compute infrastructure.
The term "storage virtualization" first dates from about the same timeframe. Is ViPR a modern interpretation of what we now mean when we say "storage virtualization"?
It's certainly a much stronger contender.
Discover all your resources. Pool them. Abstract them. Automate them. And -- perhaps most importantly -- endow them with capabilities they never were intended to support :)
Is ViPR Software-Defined Storage?
It's certainly a very strong candidate.
Deep programmability. Behaviors and capabilities largely defined by software. The ability to eventually use more commoditized hardware in the future.
Northbound and southbound openness and extensibility. Server virtualization concepts clearly extended to the storage domain. And so on.
But labeling it as such inevitably invites debate around the definition of the term itself, and that's not the intent here.
Much like VMware helped customers take enterprise compute from the physical world to the virtual world -- and then cloud world; EMC ViPR intends to do much the same for storage. Same ideas, different domain.
Clearly, there's a lot more to discuss here -- and I hope to take each of these topics in turn and explore them more deeply.
Because, with ViPR -- there's certainly a lot to discuss.