This theme plays out in two primary ways -- one of which is storage's continuing alignment with fully virtualized servers and networks. And the other is how -- once we fully abstract logical from physical -- many more things are possible in the storage domain than we might previously have assumed.
This second post digs in deep on the second idea -- what new things are now possible once we fully embrace "virtualized storage"?
And one of those "new things" is an entirely new take on storage tiering.
Yes, the costs of storage devices are coming down -- but nowhere near 30%. Storage operational and management models are getting more efficient, but not nearly fast enough on their own.
We, as vendors, tried to address this problem by arguing for better information management and governance models. Categorize your data, we pleaded, and manage it efficiently. We made the situation better, but not nearly enough to stem the information tsunami.
Although there will be exceptions, the default behavior we're forecasting is that most enterprise information will be largely un-categorized and largely un-managed. And we, as technology vendors, had better get busy on *dramatically* reducing the costs of storing enterprise information -- and do it in a way where very little effort on the part of IT is required.
Optimizing The "Long Tail" Of Information Storage
If you remember ILM (information lifecycle management) concepts from several years ago, the idea was simple -- progressively and automatically age your information to lower tiers of storage over time.
Even a few years back, this approach generally had three major problems:
First, many of the schemes required metadata-driven policies to make them work effectively. The categorization never happened, or was too difficult.
Second, the cost-savings payback wasn't as compelling as it could have been -- after all, we were pretty much working with ordinary FC disk, PATA disk and tape at the time.
Third, because of the lack of generic metadata and supporting automation, there usually was an enormous amount of heavy lifting to place the right information at the right place at the right time. Get it right, save a lot of money. Get it wrong, users unhappy with storage service levels.
So people tended to default towards putting information at higher service levels than were actually needed, as kind of an insurance policy -- which defeated the whole purpose in the first place.
FAST In A Different LightIf we start to take all the storage optimization technologies, and put them end to end, a very interesting picture emerges.
Big performance improvement (through the selected use of enterprise flash), and simultaneously big cost savings (through the use of massive SATA drives).
Now, let's take another bit at storage inefficiency. Let's add on virtual provisioning (a subset of which is called thin provisioning elsewhere), and only hand over physical storage to applications when it's actually being used, rather than when it's provisioned.
Yet another big bite at storage costs -- one that many people have seen in their own environment.
Let's go at it again -- let's use compression, single-instancing and data deduplication technologies to squeeze out all the different redundancies. Can't be used everywhere, but -- where it fits -- it's one more big bite out of storage costs.
We're not done yet -- not even close. A significant amount of enterprise information is used *very* infrequently. So infrequently, in fact, that the disk drives can be spun down, or -- at the least -- be made semi-idle. Another big chunk out of storage costs, this time it's power and cooling.
And, finally, at the outer reaches long tail of enterprise information, there's probably information you don't even want in your data center anymore. Much in the way many enterprises subscribe to "records management" services (basically, someone who trucks boxes of paper from your facility to theirs), there's no doubt we'll want to do the same in the digital world.
We'll want the ability to federate with external storage service providers who can provide trusted bulk information storage that's cheaper (and presumably safer) than internally providing this service.
And, of course, we'll want a convenient abstraction and associated tools to manage all of this in a transparent and efficient manner.
The New Storage Efficiency Mantra -- FAST, Thin, Small, Green, Gone!
Put it all together, and a rather simple yet compelling picture emerges.
Thin (virtual) provisioning eliminates provisioning waste.
Compression, single-instancing and deduplication technologies eliminate information redundancy.
Spin-down saves power if the disk drives aren't being accessed very frequently.
And -- eventually -- the information gets shipped out to a specialized service provider as an option.
Now, is that some distant vision of the future? No, not really.
For example, consider EMC's unified storage platform -- Celerra.
The first wave of space-saving technologies are available today, more coming. Spin-down of SATA drives is supported, I believe. And there's now an Atmos adapter to ship bulk information off to either your own internal storage cloud, or perhaps someone else's.
Policy orchestration, in this case, is provided by Rainfinity functionality -- it establishes and enforces the policies as to when files move from one category to another, or back again.
Granted, we're doing this today by assembling a few different portfolio technologies. We have more work to do to make all of this more seamless and transparent, not to mention bringing it fully to the world of real FC LUNs and their demanding applications.
But, without a lot of squinting, you can clearly see where it's going, and far more of the pieces are now in place -- and usable today -- than most people realize.
Is ILM Finally Real?
Yes and no. In one sense, the original vision of ILM may have overpromised and underdelivered.
Too much depended on the ability of IT to partner with the business to classify information, assign metadata and implement information management policies. Sure, it worked well in some domains (e.g. email, e-discovery, document repositories, etc.) but not well enough in the general case.
This time around, it's far more promising. We have a much wider palette of storage technologies to use. We're smarter about what's important, and what's not. And we all now realize getting 80% of the way there with very little work is more attractive than getting 95% of the way there with a ton of work.
One thing hasn't changed, though.
The information beast continues to grow.
And How Does This Change The Role of The Storage Architect / Administrator?
Great question.

It is worth pointing out a few fundamental differences between Compellents Data Progression and EMC's FAST. For instance:
Data Progression is a proven, mature technology that has been in use by customers since 2005.
FAST and other competitive solutions only have the capability to move entire volumes of data between tiers, while Compellent’s Data Progression moves data at 512KB blocks, regardless of the storage volume or disk type. Our customers can tune Data Progression to move up to 4MB pages depending on the application, but of course, the more granular the data movement the better. Because of this active management of data, Compellent’s automated tiered storage can save customers 50 percent or more in storage costs.
Data Progression and our Fast Track feature will also automatically migrate blocks of data based on frequency of access from the inner to the outer tracks of every disk drive, further saving about 20 – 30 percent in storage costs while improving performance. The fastest parts of a drive is typically the outermost edge. Some vendors put entire volumes on the outer tracks, which in many ways negates the efficiency advantages of automated tiering.
Data Progression also tiers data between RAID volumes, so for example, in a single tier of FC storage a Compellent customer can migrate the inactive blocks off of RAID 10 to RAID 5 to further save on disk costs and free up their RAID 10 space for higher-performance needs. Compellent recommends customers use RAID 5 and slower high-capacity drives for read-only snapshots, which don’t need the performance of RAID 10 or fast disk. Why buy tier 1 disk for inactive data?
Data Progression software is also an integrated part of our modular and scalable solution, which does not require customers to rip-and-replace their current storage investment just to acquire automated tiered storage as their needs grow. This is perhaps the most important difference between Data Progression and FAST and others. Data Progression is built into the Compellent SAN, just like boot from SAN, snapshots and replication software, and all existing customers need to do is purchase and download a license key.
Whereas other implementations are limiting the technology to a small number of enterprise customers today, we believe an automated tiered storage solution should be able to accommodate all enterprises, from the SMB all the way up to the largest enterprise, without discriminating.
Many vendors including Compellent, Pillar and 3Par support thin provisioning with automated tiered storage. The combination is critical for maximizing storage efficiency and utilization. On the other hand, EMC doesn’t appear to support thin provisioning with FAST. The lack of support on EMC’s part further limits the customer base that can actually use FAST. Compellent’s thin provisioning software, Dynamic Capacity, and Data Progression work together seamlessly. About 2/3 of our customers use Data Progression together with Dynamic Capacity, with installations ranging from 2TB to 1PB or more.
The maturity of Compellent’s automated tiered storage solution also enables customers to easily mix and match popular and emerging drive technologies such as SAS, SATA, FC and SSD in one virtual pool of storage. The tiering is based on rotational speed, so it’s possible to use different spindles of the same drive type—such as 7,200 and 15K RPM of SAS, or 10,000 and 15,000 RPM of FC—in different tiers within the same system. Because the Compellent architecture is also open, we’ll support a range of I/O technologies from FC to FCOE and iSCSI to 10GbE without requiring controller upgrades.
The benefit of a truly dynamic and persistent storage architecture means you can scale automated tiered storage to keep up with changing data requirements. We have customers such as Munder Capital that have simply added an SSD tier for better performance to their existing automated tiered storage system that they’ve been using for years (integrated with thin provisioning, replication and so on).
Data Progression is a technology we’ve worked hard to develop, patent and improve upon since we shipped our first SAN, and we couldn’t be happier with the feedback from customers of all sizes. They tell me how much Compellent’s automated tiered storage has saved them money, both in IT staff time and hardware acquisition costs. Automated tiered storage has relieved the traditional pain points of data management—and revolutionized the storage marketplace. Naturally, I'm very interested to see how end-users of EMC, 3Par and Pillar will use automated tiered storage over the long-term. But until then, I take the imitation is the sincerest form of flattery for Compellent.
Posted by: epbasketball33@yahoo.com | December 10, 2009 at 03:05 PM
Hi, whoever you are.
First, some ground rules. If you work for (or with) a specific vendor, you need to start by disclosing that. You seem to be a Compellent employee, for example, but we shouldn't have to guess.
See that big EMC logo on the top of my web page? No question as to who's paying the bills here!
Second, while I'm OK with competitive jousting on this blog (all part of the fun), it needs to be fact-based, rather than marketing rehash. Lots of happy marketing speak in your response.
Third, if you want to write 1,000 word competitive responses to my posts, I'd suggest you consider starting your *own* blog, rather than use mine.
Now, let's dig into specifics.
You're right, this version of FAST (for the V-Max) doesn't support virtual provisioning. However, fully supported on the Celerra. I'm not sure about the CLARiiON.
Weren't aware if you guys were shipping flash drives yet. Clearly, fully automated storage tiering is far more interesting when we get flash into the mix.
The Celerra is interesting for NAS and iSCSI use cases, because it goes even farther than your concepts around "data progression" and adds dedupe, spindown, archiving and even move-to-cloud as other options. Nothing like taking a good idea and making it into a great idea.
While you're free to position Compellent as a "V-Max alternative", I don't think you'll find many takers. Large enterprises need more than one feature or another -- they need a platform to run their most critical applications, and that's a tough spot to earn.
Now that more vendors are supporting these types of features, the race is on as to who can do the best job -- while doing everything else that customers need!
Thanks for writing
-- Chuck
Posted by: Chuck Hollis | December 10, 2009 at 03:18 PM
Chuck,
Great readings.. thought I'd mention something under "The Backdrop" section.. 50% growth is not 5x more information in the next four years (.5 times).
Posted by: anon | December 14, 2009 at 04:41 PM
Anon
Agreed -- did not check my maths as I should -- thanks!
-- Chuck
Posted by: Chuck Hollis | December 14, 2009 at 04:56 PM
Chuck,
I take that back.. 50% growth for 4 years is 5.0625x the current amount. (1.5^4)
Posted by: anon | December 15, 2009 at 10:50 AM