You'd think that the temptation would be to simply rest on past successes, and offer up only small tactical enhancements to the product.
This apparently isn't the case -- today we saw an interesting product announcement that signals continued innovation with the core infrastructure behind the Data Domain product. You can read the press releases here and here, or go see Brian Biles' take on Data Domain's blog here.
The Question Of Scale
One of the central questions in any infrastructure is how do you achieve scale? Do you scale up (make the individual components faster), scale out (use multiple components in parallel), or a combination of both?
Data Domain's central thesis -- up to now -- has been simple.
Since their product essentially rides the Intel innovation curve, they get an automatic performance and capacity bump with every tick/tock of the Intel CPU roadmap. More cores equals more performance.
Indeed, this design point was one of the many reasons they were so attractive to EMC as an acquisition target.
If all you need is more capacity, the nature of backup tends to present an easy solution: buy more backup targets. Thanks to the virtualizing nature of networks and backup software, there's usually no overwhelming need to pool performance and capacity into a scale-out cluster.
Want a faster backup? Use a faster backup engine.
Want more concurrent backups, or more aggregate capacity? Use multiple backup engines.
Indeed, this positioning has proven satisfactory for the vast majority of enterprise users -- as evidenced by Data Domain's success in the marketplace.
And, to my way of thinking, given this scenario it'd be perfectly reasonable to not introduce any fundamental new enabling technology.
The Data Domain GDA
The Global Deduplication Array is the first take at a multi-controller scale-out approach where processing and storage resources are pooled at the storage device level.
At first glance, this gives a considerable boost to both performance and capacity as compared to the previous industry-leading DD880. Logical, raw and usable capacities have exactly doubled.
We're now talking a maximum of 14.2 petabytes of logical capacity, as compared with the just-upgraded DD880 which now sports a "mere" 7.1 petabytes of logical capacity.
More importantly, throughput has now jumped to 12.8 terabytes per hour.
Yes, the GDA delivers better feeds and speeds as compared to the single node approach, but I believe there's more to the story.
First, the architectural underpinnings are now in place that if there should be a need to go from, say, 2-nodes to 4 or 8, a good portion of the heavy lifting is now done.
There's a reasonable argument that not too many really needs that sort of scalability in the real world, and that future requirements will be adequately met by either (a) faster nodes and (b) multiple independent nodes, but you can think of this as a "scaling insurance policy" in case things turn out otherwise.
There's also a "more efficient dedupe" argument as well -- the more data behind a logically clustered controller, the better the opportunity to spot and eliminate backup data that's redundant. Now, actual benefits will be highly dependent on what you're backing up, and how you're doing backups -- but some customers should experience additional efficiencies in this regard.
And, finally, any time we can minimize the number of devices to be managed, that's generally a good thing, although I don't think that's a strong argument here -- yet.
Oh Yes, And Now There's Encryption
A certain customer audience demands that their backup data be encrypted while at rest. I think this requirement originated in the days of tapes when they could easily go missing.
As the industry transitioned to disk-based backup targets, much of this concern disappeared, but the requirements stayed the same. And there's still a valid argument that disk drives could go missing, whether by theft or perhaps a maintenance vendor replacing a failed drive without erasing it.
Regardless, combining deduplication and encryption requires that the backup device deduplicate first, then encrypt. And that's exactly what Data Domain is doing now -- encrypting at the storage array level.
It meets the legacy requirements of backups being encrypted from the tape days. It guards against the eventuality that un-erased disks go missing. And it does so in a near-seamless manner with a minimum of complexity and intrusion. Sweet.
Putting The Pieces TogetherOne of many industry transitions going on is the secular shift to disk from tape as a backup medium. Driven by lower storage costs and the efficiency magic of deduplication, tape's long-term prospects aren't all that good.
Data Domain's platform appears to be the obvious market leader for all intents and purposes. And with this announcement, it's clear that the innovation continues.
Complement this great product portfolio with EMC's broader set of capabilities, and it's harder to imagine a more complete and competitive offering for the all-important topic of next-generation data protection.
Until the next technology disruption, that is ...
---------------------------

Comments