OK, so I'm on a theme here.
The concepts around private clouds have been introduced and discussed, and there are even a few large enterprises who have declared they're building one.
I'm now very interested in looking at the operational impact -- going from IT topic to IT topic -- and seeing how familiar discussions change radically -- and for the better -- when thinking about private clouds.
The last post was about securing the private cloud; this one's about backup.
And, as we'll see, how we think about something as familiar (and essential) as backup might be up for some wonderful re-thinking in this world.
Private Clouds In 30 Seconds
Every application running in a virtual container.
Cloud-like management style, but with the control that IT needs.
No impact on existing applications and information -- as long as they're running in a virtual machine.
Option to federate with compatible service providers.
Basically, everything that's so attractive about clouds with what enterprise IT needs to get the job done.
More here. And here. And here.
What Makes Backup Interesting In This World?
Most people think of backup in physical terms: backup THAT server there to THAT backup device over there, using THAT backup server -- or something similar.
Now, put everything in a virtual machine. That's right, everything.
Not only applications and desktops, but backup clients as well. As a matter of fact, why not virtualize the backup server software itself, like EMC Avamar does?
Imagine these various virtual machines floating amongst a pool of virtualized servers, network and storage.
They're all using vSphere to load balance, scale up and down.
Perhaps they're using DPM (distributed power management) to turn off servers that aren't being used right now,
Or maybe they're using vSphere's sexy new fault tolerance feature to run in an uber-HA mode.
Of course, all of these dynamic entities are attached to a single, very large storage space that provides the full range of media choices, dynamic tiering and QoS across the board, thin provisioning, spin-down, scale-out, etc.
Nice picture.
Going a bit further, imagine some of these virtual machines running on the pooled infrastructure you own, and just maybe some of it on infrastructure you rent.
That's your private cloud.
What changes with backup?
Location Matters
The first thing that jumps out at you is that there are new options to get more separation between the thing you're backing up, and where you're backing it up to.
When it comes to backup, the more separation or distance, the better.
I think that most people realize that backing up to the same physical disk drive isn't the best idea, or using the same array for production and backup exposes you to certain risks, or perhaps that putting your backup in a different time zone than your production data helps you sleep better at night -- it's a familiar discussion.
When it comes to data protection, distance is a good thing -- the more distance, the better.
Now, let's map that back onto the previous picture.
You'd want to be able to define a "separation policy" for backup sources and targets, wouldn't you?
You'd want to set a few rules that were smart about distance, e.g. class A applications always have a backup target to a different physical location (if available), class B applications can have backup targets in the same location (but physically separate storage hardware), and so on.
Sure, things can move around -- servers, information, applications -- but you'd always want to make sure that -- regardless of dynamic optimizations and cool abstractions -- there'd always be some physical reality in all of this.
And, of course, it'd have to happen automatically.
Dedupe Matters
I think it's safe to say that everyone will eventually want to dedupe just about all their backups.
I also think it's also safe to say that people will continue to prefer disk as a target for dedupe backup, with tape playing a ever-decreasing role over time.
So, how does the dedupe discussion change in the private cloud?
Where you do the dedupe work is really going to matter. Backup source and backup target will ideally be in different places -- as described above -- with some sort of network involved.
And it's going to be really painful if you have to shovel an enormous mountain of backup data over a wire and THEN dedupe it, rather than dedupe it BEFORE it's sent.
Hence client-side deduplication becomes even more compelling in the private cloud -- maybe even strategic.
Those 95-99% dedupe rates for client-side dedupe start looking might attractive when there's a network involved.
BTW, that's EMC Avamar again.
Keeping It All Together
I don't know how many of you have been exposed to the "consistency group" discussion when it comes to backup and DR.
The idea is pretty simple: more complex applications often have multiple participants that share state. Imagine an order processing system that touches multiple databases, for example.
The consistency discussion is essentially identifying those dependent relationships, and backing them up (or replicating them) at the exact same point-in time.
Otherwise, recoveries get to be kind of interesting :-)
EMC has been doing consistency groups for over a decade with various backup and replication technologies, but it takes on a new wrinkle in the private cloud -- everything is moving around -- there's no physical location anymore. No longer can you point to those three servers and say "go back them up".
Backup Management Evolves
EMC sells a great management package for orchestrating multiple backup and recovery approaches into a single, coherent environment -- EMC's Data Protection Advisor.
If you live in a heterogeneous backup world like everyone else, and you're ultimately responsible for safeguarding information assets, you really should go take a look at this gem.
Now, going back to our private cloud model, how should this sort of product logically evolve?
Well, it'd be a convenient place to define and manage the "separation policy" we talked about earlier. And to do this, it'd have to be able to dynamically translate virtual entities into physical locations.
And, of course, what a great place to catalog application dependencies in support of that consistency group discussion we just had.
There's more, but you get the idea ...
Load Balancing Redux
Keep in mind, in these fully virtualized environments, everything is a dynamic resource: compute, memory, network bandwidth, storage characteristics, etc.
The idea is that the underlying infrastructure can flex up or down (within policy, of course) based on what's needed right now.
Which brings up the interesting potential of considering backup and recovery as just another application running in the private cloud.
Ever need to get backup done in a hurry? If not, maybe a recovery? :-)
Now, please consider this: the ability to bump up the target client to the highest priority (lots of CPU and memory), telling the fabric "this is important", dynamically invoking a bunch of scale-out backup servers, and finally telling the storage to capture (or produce) the data stream at the highest possible speed -- and -- in the case of backup -- migrating the data set to a lower tier automatically at some future time?
And, once the event has passed, automatically going back to normal operating conditions ...
The potential of an incredibly powerful and dynamic "surge" makes getting those backups and emergency recoveries done in a very short window an entirely new proposition, doesn't it?
Maybe we should add a "turbo" button to the backup management interface :-)
Checking Your Work
OK, let's face it.
How often do we go back and check that the data we backed up is actually recoverable and usable by the business?
We don't have to go far to hear many stories about recoveries that appeared to work at one level, but the data got mangled somehow along the way.
The only viable way of ensuring that your data is recoverable is by -- well -- doing a recovery and actually testing it.
But that's a royal pain, so it isn't done all that often.
Well, in the private cloud model, this can change -- if we want.
There's no reason why we can't use spare cycles and virtual machines to mount up previous backup images, invoke the corresponding application images, and running through a few "sanity check" scripts as a background task.
No, it's not an ironclad guarantee that nothing has been mangled, but it'd be a whole lot better than what's generally done today.
Costs you nothing in this sort of environment, really -- other than a bit of setup time.
Hmmm, maybe that's something else that the backup management software can do to make a bit easier :-)
Backup Infrastructure As A Service
I almost forgot the most obvious benefit -- the ability to get remote backup on someone else's infrastructure with almost no drama whatsoever.
It might be as simple as renting some virtual machine space from someone, and pushing the backup target over to their location. No need to change anything else.
Now, before you dogpile on "what about security?", keep in mind that -- in a private cloud -- all communication between entities is externally authenticated and audited -- never mind transparently encrypted if you so choose -- so that sort of architectural issue was probably addressed before you got around to this specific use case.
That's part of the beauty of the private cloud approach. Lots and lots of choices around infrastructure without the need to rethink the architecture.
Just don't lose the encryption key, please.
And The Fun Has Just Started
I've probably only scratched the surface of how something as familiar (and necessary!) as backup morphs in this new world of private clouds.
And I haven't even gotten into the adjacent discussions around continuous data protection (CDP), or business continuity -- but, with a little imagination, you can probably see where that goes as well.
When discussing the private cloud architecture, I'm seeing a recurring theme: we go from "gee, how could we possible do this in a fully virtualized environment" to "gee, this has the potential to be so much easier and efficient in a fully virtualized and dynamic environment".
And backup appears to be no exception.
More to come.

It is my hope that in the grand vision of cloud abstracted architecture that backup becomes obsoleted by the internal functionality of the storage cloud...
Here is what I mean. If we translate the backup RAID protection, etc.) as copies of the initial primary data set, why can't the 'storage cloud' include functionality to create the exact number of data copies that a Business requires and place it on the exact type of storage media without having an external process to drive this process?
In my version of the 'storage cloud' nirvana, the storage aray where we host the primary copy of the data would have the ability to continuously create (yes - CDP) the requisite number of Business required copies (based on policies setup by an administrator) of that data and move all copies of the data to the appropriate (again - based on policy) media over time without hindering the eDiscovery needs of the regulatory compliance needs. For efficiency, it would also include the various data reduction techniques (compression, de-dupe, etc.) available in this uber-storage cloud.
In other words - should we not enable the current storage arrays to virtualize the functions that traditionally reside outboard due to the technology limitations of the past? Should we not enable the storage arrays themselves to be part of the IT cloud to remove the limits of locality in providing its functions (i.e. globally accessible storage cloud for the masses)?
I think EMC is already well on the way with V-Max and Atmos products to have the building blocks for that vision. It also has the breadth of software (Legato, Avamar, Recover-Point, etc.) to enable this ILM-in-the-cloud vision. If only you could truly integrate their functionality together, then the 'Backup Infrastructure As a Service', indeed backup in general, becomes a moot conversation. The question will instead become: who is your storage cloud provider and does the solution provide all of the data protection and efficiencies you need? The world would be a much simpler place to manage...
Posted by: Gene Piatigorski | April 29, 2009 at 03:11 PM
I've been using SMEStorage.com for public (Amazon S3) and private Cloud Backup for a little while now and the benefits are huge.
Firstly, all my org's data is backed up to tape. Works well and is stored in a fire proof safe, but it is still a pain not being able to get a file when I am away from the office or on the move. To solve that I started to use SMEStorage Private Cloud Backup. This allows me to nominate an internal FTP Directory as my private storage repository and maps it to their platform somehow. I can then access these file from an explored interface on the web or a windows virtual drive or iPhone etc. Best if all are the integration features with things like Zoho online office, MS & Open Office on the desktop etc. We automatically backup all this data to Amazon S3 using the same platform's sync services. It has transformed how we are able to get access to information.
Posted by: William Todd | April 30, 2009 at 04:03 AM
It's quite interesting to ponder about the backup in relation to cloud computing. As Gene already pointed out, fundamentally, we may need to rethink a new way of backing up with the advent of the cloud computing. The old architecture of backing up may no longer be suitable in the new era. Although the concept of cloud computing is innovative and promising, the implementation of cloud computing is, well, cloudy. What kind of issues will we encounter if we put the new wine in an old skin? Then again, what kind of the clouds are we talking about? Clouds are an aggregate of the nano-sized water vapors that are visible to human eyes. Virtualization is, at least in part, an evolutionary progression out of the object oriented programming and component processing. Nonetheless, virtualization is the old way of thinking as far as cloud computing is concerned. Even if it is not so difficult as to prohibit implementation, it will most likely pose numerous complications. Are we putting the cart before the horse as far as trying to implement architecture without proper technology behind it for support? Perhaps just the idea of cloud computing might suggest that the centralized data processing era, with which we have become so accustomed during later part of last century and early part of this, is finally fading away. Thus, the revolutionary nano technology beckons us.
Posted by: shiningarts | May 01, 2009 at 09:40 PM
Virtualization technologies such as VMWARE, XEN, Hyper-V, has given way to these cloud services. It is within these technologies that the backup paradigm and methods will evolve; meaning that the infrastructure will govern how far and how well backup can evolve so long as RTO and RPO objectives as well as regulation and compliance is not put at risk. I think that the notion of having a service that is built into the infrastructure to generate, track, manage, enforce, react, report, and catalog information based on context aware meta data processes will ultimately be the key to a successful change but this is a huge shift, a sort of middleware for data change.
A master HA virtual appliance cluster which uses agents and a highly distributed and fault tolerant data hashing and deduplciation indexing mechanism that is specialized for each application / operating environment to service file level and structured instance level data recovery operations, coupled with a DR image level infrastructure tied API / Service will ultimately support a platform which can then still be used with traditional archiving techniques and technologies with minimal disruption.
I do not think there is a silver bullet for data protection even in the virtual environment.
Keep in mind Tape is still way more Green than disk...
Posted by: Jesse James Lala | May 14, 2009 at 12:54 AM