There are literally hundreds of cool products here at EMC now. Even though I do my best, I tend to lose track of some of them.
When I do meet up with these products after a while, it's almost like meeting a young niece or nephew you haven't seen in a while -- my, haven't you grown up?
Such is the case with a soon-to-be-formally announced update to EMC's Cloud Tiering Appliance, now at 9.0.
As I was browsing the preview materials, I kept thinking "sheesh, this is pretty compelling now" -- and thus worthy of a blog post.
If you're up-to-date on this whole topic, my apologies -- best to skip this post.
If, however, you're looking for a way to painlessly tier and/or archive ever-growing file systems to less-costly archival alternatives -- internal devices, or a variety of external services -- then read on.
I hope you'll find it interesting.
The Data Deluge
The majority of this growth is inevitably unstructured data -- and that usually shows up in file systems.
And the vast majority of that data is simply stuff that wants to be kept around for one reason or another, even though it's rarely -- if ever -- used.
In more practical IT terms: your filers tend to fill up with a bunch of low-value junk.
You'd like to delete at least some of it, but getting approvals to do so is a painful process. There's almost an emotional response when you suggest getting rid of stuff.
You'd like to charge business users for the wasteful way they're using the service (or at least show them what they're consuming!), but that's a pain as well.
And of course, you're always looking for a way to spend less -- far less, in fact -- on products and labor to support this relatively thankless task.
Not only that, there's usually operational pain as well.
Certain popular filers can run much slower when full. Besides, all of that file cruft is now part of the backup stream at some point -- even though it's likely low-value and hasn't changed in ages. Power, cooling, floor space, keeping things updated and operational, etc. etc.
Is it any wonder that at some point, enough is enough, and the IT team starts fishing around for a better approach?
Enter CTA -- The EMC Cloud Tiering Appliance
In 2005, EMC acquired Rainfinity, which did a popular flavor of NAS virtualization; aggregating multiple filers so they (sort of) looked like a single one -- at least to users. As a bonus, there was a neat non-disruptive migration capability as part of the product -- migrate on to the environment, within the environment, off of the environment -- that didn't get much attention to at the time.
Well, we got a surprise.
After a while, it became pretty clear that the #1 use case for the Rainfinity product was non-disruptive file system migration, and -- within that -- the most popular use was migrating stuff from more expensive stuff to less expensive stuff using a straightforward policy engine.
We saw what our customers were doing with the product, and decided to give them an even better answer.
Several years ago, a small engineering group refactored and extended the code, and brought to market EMCs' first version of CTA -- the EMC Cloud Tiering Appliance. Over time, it ended up being one of those "underground smash hits" at EMC, sort of like PowerPath -- we sold a bunch of it, customers really liked what it did, and they bought more.
We're now at many thousands of CTA installations humming nicely around the globe. The fact it was built on a proven code stack and supported by EMC didn't hurt either ...
Fast forward to 2012 -- what do we have now?
Enter CTA 9.0
CTA is available in two versions: a virtual machine edition and a physical appliance, depending on customer choice. The functionality is identical in both.
Sources are EMC's VNX, the predecessor Celerra, or any flavor of a NetApp filer .
Targets include the EMC NAS products and adding Isilon, Centera, Data Domain, as well as generic Windows servers.
The "cloudiness" comes from two other interesting options: EMC Atmos-based storage clouds (either your internal one, or one of several Atmos-based services) or Amazon's S3 service.
Mix, match and migrate as your needs change. Use multiples. No need to lock in to one approach or another.
Admins set up simple policies as to when, where and how they'd like files to be moved. For example, files in the LEGAL share could stay in-house while others go off-site, if you choose.
Users accessing through either CIFS or NFS shares aren't really aware that anything has been done to their files -- unless they look closely at the icon or attribute setting.
Click on it, it comes back. No drama, no fuss.
In the "what's new" category, two important new features: compression and encryption.
Much (but not all) archived data is moderately compressable, and doing so means (a) less data over the wire and/or faster archiving, (b) less footprint used on the target, whether internal or external, and (c) faster retrieval speeds over the wire again. Everyone wins.
Encryption, obviously, is yet another important layer of assurance, especially interesting when using external storage services. Encryption support is limited to either Atmos or AWS cloud services.
There's more new useful stuff underneath the covers. The look-and-feel of the reporting and monitoring capabilities have been updated - although still not consistent with the Unisphere-style look. I am told that may be coming in a near-term release.
There's now more use of logging around background activities: specific migration actions, stub recovery, orphan deletion. And, of course, snazzy graphs.
Diving deeper, CTA now supports Atmos' data retention policies, automating large-scale data management even further. There are many more choices supported for file migration sources and targets (as opposed to simply archiving).
The maximum number of files supported per CTA has been doubled from 250 million to 500 million. Yes, you can easily use more than one -- if you need.
And if you opt for the physical version, there's updated hardware with 10GbE support.
Yes, the technology is interesting, but what's more compelling is what it does for customers.
The Stories Are Impressive
The product team sent me a pile of customer examples to pore through.
One that jumped out sort of told the whole story for me -- there was the storage expense curve before CTA, and there was the storage expense curve after CTA.
Very different trajectories; sorry I can't share that specific customer's name.
One customer I personally met works in a public sector setting, providing IT services across multiple agencies.
His customers never want to delete anything (of course), but it's getting expensive and eating into the overall budget. He's using CTA to establish store-your-digital-stuff-as-a-service using external service providers.
He takes the price from the service provider, marks it up for his value-add, and simply sends along the invoice to each agency to be paid. He told me that no one had deleted anything yet, but that wasn't really his problem going forward. Bring it on.
But there are plenty of names we can share. In each, the story is the same: this product made a difference in their world. Easy to deploy, easy to manage, transparent to users, all sorts of flexible deployment options, and big eye-popping efficiency results in each case.
No, it's not the most exotic storage technology in the world; it's just a clever, proven and cost-effective solution to a increasingly common problem most everyone is facing.
And the world could use more products like that.

Chuck, I can understand EMC supporting CTA to move data into Atmos, but Amazon S3? Isn't EMC losing revenue doing that?
Posted by: Carlos Soares | September 21, 2012 at 01:25 AM
Hi Carlos -- fair question
I don't think we see it that way. Our customers told us they needed a solution to move data to external cloud services, which may or many not be comprised of EMC storage.
Our choice was simple: give them what they were asking for, or have them buy it from someone else.
You'll see that same theme reflected in many EMC products that tend to be storage-agnostic.
Posted by: Chuck Hollis | September 21, 2012 at 08:42 AM
Chuck - will this move data across multiple platforms? Take HPC for example - let's say I'm using the EMC HPC bundle (couple VNX's with Lustre) for scratch, Isilon for persistent, and Atmos/S3 for archive. Through policy can I have the data that originates in scratch moved by CTA to persistent and eventually archive?
Posted by: Christopher Gardner | September 27, 2012 at 04:26 PM
Chris
Very interesting question, hadn't thought about it.
My best guess would be a strong maybe. The original RF product did a good job of virtualizing arbitrary NAS filesystems and namespaces, regardless of origin, which would make me hopeful around Lustre -- even though it probably isn't officially sanctioned.
My other concern would be bandwidth -- it sounds like you'll be slogging some massive data sets to and fro, which isn't exactly what CTA was designed to do.
If you're interested, I can put you in touch with Percy Tzelnic who leads all of our efforts in this space, and would be *far* more conversant than I on the best way to skin this cat, including developing something new if warranted.
-- Chuck
Posted by: Chuck Hollis | September 28, 2012 at 02:37 PM