I've really enjoyed watching the VMware IPO, and seeing all of EMC's traditional (and newer) competitors pile in to the VMware opportunity.
It's also even more fun to watch them try to pitch their old wares into the new world without having some basic understanding about how things fundamentally change when you really look at what server virtualization does to your infrastructure.
Is it that they haven't figured it out yet? Or, maybe they're suspecting that it's a bit different, but haven't fully grasped the full implications just yet ...
Sorry, there's no way I can avoid getting a bit sarcastic in this post, so my apologies in advance. I'm in therapy about this issue, but it isn't working well ...
Today, I'm going to offer a few insights to my brethren in the storage and infrastructure industry with a few examples on how some things change once you really understand what's going on here.
Those of you in the biz, you may want to drop a note to your marketing department that's churning out PDFs and webcasts to hold up a just a bit until you figure out how you want to handle some of these issues.
Otherwise you're going to have to answer some pretty hard questions.
Consider this a preview of the server virtualization pop quiz for 2008.
Basic Concepts
VMware (or any server virtualization) creates a new abstraction that -- at once -- creates powerful new capabilities, but also some new considerations.
One prime example in VMware is VMFS -- the VMware file system. It's a clustered file system that's the basic storage abstraction for all things virtual.
It underlies many of the cool features of VMware, things like DRS which provides advanced load-balancing and new flavors of pooled availability.
Trust me, you want this.
It also creates many problems for any storage vendor that's trying to provide storage-based functionality.
The Replication Example
Let's say that I have a nifty snap/clone function on my array or filer. Everyone has them, right?
Well, in a VMware environment, the array sees a LUN (or group of LUNs, or a filesystem), but VMFS carves up that space and manages it to provide any number of virtual machines. It's a very useful abstraction.
So, you have a choice.
- You can use the storage-level functionality to make copies of entire VMFS spaces (works great), but you're copying lots and lots of virtual machines at the same time, or ...
- You can turn off VMFS (also known as RDM or raw access) and get individual VM granularity, but turn off about a half-dozen cool VMware features in the process.
No middle ground -- that's your choice.
FC, NAS or iSCSI, it doesn't matter. That is, unless you take a fundamentally new approach.
Trust me on this.
OK, that's mildly annoying when we're considering local replication, but it becomes downright annoying when you consider remote replication.
Making remote copies of things is expensive. You'd like to pick the individual VMs you're replicating, choose the mode (sync, async, CDP, etc.) and have the ability to recover individual VMs if you like.
Living in a world where you're copying and recovering big baskets of virtual machines doesn't feel right, does it?
The only way to solve the local/remote problem with today's technology is to put the splitter in the virtual machine.
Trust me on this.
That's why, as an example, EMC RecoverPoint is so cool in this environment. For physical machines, the splitter can either run on server, or on the intelligent switch (cool!).
And in a VMware environment, you can run the splitter in the virtual machine, using the same shared infrastructure (storage independent, mind you) for both physical and virtual machines.
And you can do local replication (either on the same array, or to another one), remote replication (choose your favorite mode and storage target) or both -- using the same infrastructure.
And if you'd like to get into the 301-level discussion, we can bring in database consistency for multiple transactional entities across multiple physical/virtual entities and multiple arrays.
So when I see a vendor flogging their 1990's-style replication technology for VMware environments, I just have to shake my head a bit.
Guys, go look a close look at this, and let me know what you think?
On a related note, I saw a forum thread somewhere where some poor fellow realized that this whole thin-provisioning thing he'd set his heart on wasn't working for much the same reason. The same is largely true for any feature where you're assuming application = LUN set or application = filesystem.
There's a new abstraction in town ... VMFS
Now, at some point in the far future, I'm sure that storage arrays will be more knowledgeable about VM boundaries, and be able to see them as distinct logical entities (instead of LUNs), but we don't live in that world today.
While you're thinking about it, keep in mind that -- by definition it'll be a very dynamic environment with new ones being created all the time, and moving around minute to minute, and -- of course -- dependency relationships between multiple physical and virtual entities.
It'll make today's static binding model look primitive by comparison ...
Just to make your head hurt a bit more.
The Backup Example
When you consolidate multiple physical servers into a virtual one, you're consolidating many things. CPU. Memory. Storage. I/O.
And -- of course -- backup!
If you do an 8:1 consolidation of physical to virtual, you've done an 8:1 consolidation of backup streams as well.
Data deduplication is great, but if you're not doing it on the client side (e.g. inside the VM), it won't do you much good.
So, once again, you've got a choice.
- Ignore the fact that you're in a virtual server world, bring over your favorite legacy backup tool, buy big pipes and engines to move it all in and out of the physical servers, or
- Embrace the fact that you're in a server virtualized world, every server is in fact a file, most server images are pretty redundant with other ones, and move directly to data dedupe (client side, not target side).
No middle ground here. Trust me on this.
I could go on, there's (much) more ...
I could get into the storage resource management (SRM) -- on how discovering, provisioning and reporting on storage in a VMware environment is totally different than in the physical world. Or how something as boring as storage quals are very different in the VMware world.
Or how resource management is different. Or security.
And I've ranted before on how the majority of the IT run book will want to change in the virtual world.
So here's the real question ...
How long will IT vendors plod along, trying to plug their tools designed for the physical world into the virtual one?
And how many users will discover that -- at a fundamental level -- things are very different (in a good sort of way) in this virtualized server world?
Judging from the marketing flotsam from various vendors, it may take longer than I first thought ...
Simply put: virtualization changes everything.

So Chuck are you telling us that now that we've consolidated our physical servers shaved off server mgmt time, effort, resources and costs, that we need to re-allocate those resources to managing potentially 100s or even thousands of individual replication streams occuring at the VM level?
Why not manage replication streams based on Datastores by grouping VMs together with similar DR charecteristics and use array based replication?
and BTW...given that the splitter is now part of the Guest OS and the CPU and Memory are now virtualized and shared among multiple VMs, what are the performance repercussions when something like this gets turned on memory wise and CPU wise? Do I have to rely on DRS safely deploy something like this without impacting everything else on the same physical server?
Thanks
Posted by: Pq65 | August 28, 2007 at 03:12 PM
Hi -- what I'm saying is that you'll have to give it some thought and make your choices.
Managing hundreds (or thousands) of individual logical replication streams using a single shared replication infrastrcuture is nothing new. As an example, this is a routine application of one of the array-based replication products, like EMC's SRDF.
Ditto with virtual machines using RecoverPoint and individual splitters -- as long as you have a single shared replication/mgmt infrastructure (as opposed to thousands of individual sessions WITHOUT common mgmt/infrastructure), no big deal.
As far as your suggestion of combining data stores of VMs with like replication characteristics, fine, anything is possible, but this means -- as an example -- they will have separate DRS and VMFS domains, and -- more importantly -- you'll never be able to recover a isolated VM without bringing back all their friends and neighbors first.
Having thought about this extensively, there's no arguing that ideally you'd be able to tag an individual VM with the replication characteristics you want, and not affect the others. Of course, you're welcome to live in a less-than-ideal world, as many of us do.
The splitter code running in the guest OS is nothing more than a simple shim. Memory use is a rounding error, and performance impact regarding the act of splitting is very difficult to measure, it's so small.
That's true in extended lab testing and real-world scaled-up production sites. There are issues, but it ain't those!
Thanks for commenting!
Posted by: Chuck Hollis | August 29, 2007 at 09:52 AM
Hi Chuck,
What I'm suggesting is grouping VMs according to replication requirements in VMFS LUNs and then replicate the LUN(s) from the array.
What I further believe is...the heck with iSCSI and FC...NFS's the way to with VMware and I can replicate EVERYTHING and be able to recover individual VMs or groups of them residing in an NFS datastore. And I can backup whole VMs or files inside VMs...The latter by deplying a loopback mount on a Linux box.
Plus NFS, for ESX, performs better than FC and iSCSI. Did some testing on identical configs (iSCSI/FC/NFS) and I was surprised to find out that NFS indeed poduced better IOPs, better latency and better TPUT. I didn't believe the results initially and re-run the tests and got the same result. Then I realize "hey wait a minute...I have no VMFS, and i'm using completely driver stacks...
I think NFS is the way to go with VMware and like solutions for a lot of reasons, like the added SAN complexity server virtualization introduces, fewer touch points with NFS, plus the fact that it can eleviate the tremendously low storage capacity utilization rates introduced by server virtualization.
I hear a lot of people talk about the benefits of server virtualization but no one is talking about the ramification server virtualization has to storage.
BTW...A single replication mgmt console is a requirement if you are managing thousands replication relationships but at the end of the day you *still* have to manage these relationships.
Frankly, whether you manage replication relationships or juggling balls in a circus, fewer's always easier...
Posted by: Pq65 | September 01, 2007 at 05:53 PM
Your suggestion about using VMFS domains for like-minded replication requirements is good -- up to a point.
When I speak of replication, I'm also talking about local replication, e.g. snaps et. al.
Not being able to snap (or restore) individual VMs I think would bother many people.
I agree with you on the attractivness of NAS in VMFS environments. In fact, I raised a few eyebrows last year when I wrote a post that speculated that, indeed, NAS might be a better choice for VMware for many of the reasons you mentioned.
I think the performance is more than adequate, files are easier to manage than LUNs, and (if you're not using VMFS) there are a bunch of cool tricks you can use to manage, replicate, archive, etc. VM images that are in fact files.
This view of NAS may not 100% jive with VMware's view of the world, so it will be very interesting to see how this plays out.
Thanks for the informative comment!
Posted by: Chuck Hollis | September 01, 2007 at 06:28 PM
Chuck
Sounds to me like the original problem was in the way VMware handled (or more accurately didn't handle) external disk (i.e. not within the server running VMware). I think that the origin of VMware as a PC based product for virtualisation meant that it wasn't designed with Enterprise-class architectures in mind. For example, why support such as small number of LUNs on an ESX Server (plus the various other parameters that need to be set if you use LUN numbers outside the range VMware recognises).
Don't get me wrong, VMware is a good product, but the issue with storage support for LUN creation, replication, snapshots, backups, etc, is a VMWare issue *not* everyone else's problem as these issues I've just listed already have mature solutions that solve them.
Posted by: Chris M Evans | September 26, 2007 at 03:32 AM