OK, so I'm on a theme here.
The concepts around private clouds have been introduced and discussed, and there are even a few large enterprises who have declared they're building one.
I'm now very interested in looking at the operational impact -- going from IT topic to IT topic -- and seeing how familiar discussions change radically -- and for the better -- when thinking about private clouds.
The last post was about securing the private cloud; this one's about backup.
And, as we'll see, how we think about something as familiar (and essential) as backup might be up for some wonderful re-thinking in this world.
Private Clouds In 30 Seconds
Every application running in a virtual container.
Cloud-like management style, but with the control that IT needs.
No impact on existing applications and information -- as long as they're running in a virtual machine.
Option to federate with compatible service providers.
Basically, everything that's so attractive about clouds with what enterprise IT needs to get the job done.
What Makes Backup Interesting In This World?
Most people think of backup in physical terms: backup THAT server there to THAT backup device over there, using THAT backup server -- or something similar.
Now, put everything in a virtual machine. That's right, everything.
Not only applications and desktops, but backup clients as well. As a matter of fact, why not virtualize the backup server software itself, like EMC Avamar does?
Imagine these various virtual machines floating amongst a pool of virtualized servers, network and storage.
They're all using vSphere to load balance, scale up and down.
Perhaps they're using DPM (distributed power management) to turn off servers that aren't being used right now,
Or maybe they're using vSphere's sexy new fault tolerance feature to run in an uber-HA mode.
Of course, all of these dynamic entities are attached to a single, very large storage space that provides the full range of media choices, dynamic tiering and QoS across the board, thin provisioning, spin-down, scale-out, etc.
Going a bit further, imagine some of these virtual machines running on the pooled infrastructure you own, and just maybe some of it on infrastructure you rent.
That's your private cloud.
What changes with backup?
The first thing that jumps out at you is that there are new options to get more separation between the thing you're backing up, and where you're backing it up to.
When it comes to backup, the more separation or distance, the better.
I think that most people realize that backing up to the same physical disk drive isn't the best idea, or using the same array for production and backup exposes you to certain risks, or perhaps that putting your backup in a different time zone than your production data helps you sleep better at night -- it's a familiar discussion.
When it comes to data protection, distance is a good thing -- the more distance, the better.
Now, let's map that back onto the previous picture.
You'd want to be able to define a "separation policy" for backup sources and targets, wouldn't you?
You'd want to set a few rules that were smart about distance, e.g. class A applications always have a backup target to a different physical location (if available), class B applications can have backup targets in the same location (but physically separate storage hardware), and so on.
Sure, things can move around -- servers, information, applications -- but you'd always want to make sure that -- regardless of dynamic optimizations and cool abstractions -- there'd always be some physical reality in all of this.
And, of course, it'd have to happen automatically.
I think it's safe to say that everyone will eventually want to dedupe just about all their backups.
I also think it's also safe to say that people will continue to prefer disk as a target for dedupe backup, with tape playing a ever-decreasing role over time.
So, how does the dedupe discussion change in the private cloud?
Where you do the dedupe work is really going to matter. Backup source and backup target will ideally be in different places -- as described above -- with some sort of network involved.
And it's going to be really painful if you have to shovel an enormous mountain of backup data over a wire and THEN dedupe it, rather than dedupe it BEFORE it's sent.
Hence client-side deduplication becomes even more compelling in the private cloud -- maybe even strategic.
Those 95-99% dedupe rates for client-side dedupe start looking might attractive when there's a network involved.
BTW, that's EMC Avamar again.
Keeping It All Together
I don't know how many of you have been exposed to the "consistency group" discussion when it comes to backup and DR.
The idea is pretty simple: more complex applications often have multiple participants that share state. Imagine an order processing system that touches multiple databases, for example.
The consistency discussion is essentially identifying those dependent relationships, and backing them up (or replicating them) at the exact same point-in time.
Otherwise, recoveries get to be kind of interesting :-)
EMC has been doing consistency groups for over a decade with various backup and replication technologies, but it takes on a new wrinkle in the private cloud -- everything is moving around -- there's no physical location anymore. No longer can you point to those three servers and say "go back them up".
Backup Management Evolves
EMC sells a great management package for orchestrating multiple backup and recovery approaches into a single, coherent environment -- EMC's Data Protection Advisor.
If you live in a heterogeneous backup world like everyone else, and you're ultimately responsible for safeguarding information assets, you really should go take a look at this gem.
Now, going back to our private cloud model, how should this sort of product logically evolve?
Well, it'd be a convenient place to define and manage the "separation policy" we talked about earlier. And to do this, it'd have to be able to dynamically translate virtual entities into physical locations.
And, of course, what a great place to catalog application dependencies in support of that consistency group discussion we just had.
There's more, but you get the idea ...
Load Balancing Redux
Keep in mind, in these fully virtualized environments, everything is a dynamic resource: compute, memory, network bandwidth, storage characteristics, etc.
The idea is that the underlying infrastructure can flex up or down (within policy, of course) based on what's needed right now.
Which brings up the interesting potential of considering backup and recovery as just another application running in the private cloud.
Ever need to get backup done in a hurry? If not, maybe a recovery? :-)
Now, please consider this: the ability to bump up the target client to the highest priority (lots of CPU and memory), telling the fabric "this is important", dynamically invoking a bunch of scale-out backup servers, and finally telling the storage to capture (or produce) the data stream at the highest possible speed -- and -- in the case of backup -- migrating the data set to a lower tier automatically at some future time?
And, once the event has passed, automatically going back to normal operating conditions ...
The potential of an incredibly powerful and dynamic "surge" makes getting those backups and emergency recoveries done in a very short window an entirely new proposition, doesn't it?
Maybe we should add a "turbo" button to the backup management interface :-)
Checking Your Work
OK, let's face it.
How often do we go back and check that the data we backed up is actually recoverable and usable by the business?
We don't have to go far to hear many stories about recoveries that appeared to work at one level, but the data got mangled somehow along the way.
The only viable way of ensuring that your data is recoverable is by -- well -- doing a recovery and actually testing it.
But that's a royal pain, so it isn't done all that often.
Well, in the private cloud model, this can change -- if we want.
There's no reason why we can't use spare cycles and virtual machines to mount up previous backup images, invoke the corresponding application images, and running through a few "sanity check" scripts as a background task.
No, it's not an ironclad guarantee that nothing has been mangled, but it'd be a whole lot better than what's generally done today.
Costs you nothing in this sort of environment, really -- other than a bit of setup time.
Hmmm, maybe that's something else that the backup management software can do to make a bit easier :-)
Backup Infrastructure As A Service
I almost forgot the most obvious benefit -- the ability to get remote backup on someone else's infrastructure with almost no drama whatsoever.
It might be as simple as renting some virtual machine space from someone, and pushing the backup target over to their location. No need to change anything else.
Now, before you dogpile on "what about security?", keep in mind that -- in a private cloud -- all communication between entities is externally authenticated and audited -- never mind transparently encrypted if you so choose -- so that sort of architectural issue was probably addressed before you got around to this specific use case.
That's part of the beauty of the private cloud approach. Lots and lots of choices around infrastructure without the need to rethink the architecture.
Just don't lose the encryption key, please.
And The Fun Has Just Started
I've probably only scratched the surface of how something as familiar (and necessary!) as backup morphs in this new world of private clouds.
And I haven't even gotten into the adjacent discussions around continuous data protection (CDP), or business continuity -- but, with a little imagination, you can probably see where that goes as well.
When discussing the private cloud architecture, I'm seeing a recurring theme: we go from "gee, how could we possible do this in a fully virtualized environment" to "gee, this has the potential to be so much easier and efficient in a fully virtualized and dynamic environment".
And backup appears to be no exception.
More to come.