Had a very interesting conversation yesterday with a large international telecommunications company.
And the topic?
Getting better at storage management.
And I won't bore you with the pedantic discussion -- but I was able to share some interesting perspectives that they found valuable -- at least, they were polite enough to say so ;-)
A Few Disclaimers
I have never managed a truly large storage environment. But I've talked to perhaps over a thousand different people who do, as well as several dozen people who consult on this topic for a living, many who work for EMC.
Also, size matters. This particular discussion is about ginormous scale -- these guys are in the multi-petabyte club, and show no signs of slowing down.
We're not talking about Web 2.0 cloud-ish things either. We're talking about enormous portfolios of traditional applications that support the business.
And, surprisingly, this particular discusssion is pretty much technology and vendor agnostic.
Start With Context
I spent a few moments laying out a context to help them visualize the situation. No answers here, just things to consider.
First element: information will continue to grow faster than media costs decline. There seems to be no practical way out of this trap. A safe prediction? You'll probably spend more on storage with every passing year.
Which makes the discussion about efficient spend, rather than lowering the overall number.
Second element: storage technology is changing -- and very fast. The best answers from a few years ago aren't the best answers today, and the best answers today won't be optimal in three years time.
There's an enormous amount of R+D and innovation in this space, so you should be thinking about rapid technological change, rather than slow and predictable.
Third element: you're housing the corporation's wealth -- information. Which means that the "storage guys" will find themselves at ground zero of many more interesting corporate initiatives: compliance, security, entering new markets, green, et. al.
Put another way: you'll be going to a lot of meetings.
Fourth element: building a storage management function is hard. Really hard.I have met more than a few people who think they've cracked the code, and -- even then -- they're always looking for better answers.
Why? The problem gets bigger every year, there are new requirements every year, and there are new potential solutions every year.
Common Traps
I spent a good deal of time on what I saw were the common time-wasting, dead-end traps that I saw people getting into. I wanted to identify and warn them of the traps I've seen over and over.
First, good technology won't mask poor process.
If you can't figure out how to manage storage effectively using manual processes, the slickest storage resource management automated doohickey won't save your skin. This is effect is not limited to storage, it's generally applicable to IT in general: you can't automate a process that doesn't exist.
Define your processes, your org structure, your workflows -- and *then* go shopping for a cool tool or two. BTW, when it comes to SRM, EMC has a rather unique point of view: we were early into this market, offer what's perhaps the most robust set of tools (ControlCenter), and enjoy more than our fair share of this market.
All that being said, I have been on more than a few sales calls where customers were complaining loudly about ControlCenter, and storage management. More often than not, when I start asking a few questions about their process and organization, the answers come back a bit weak. I believe it''s an unrealistic expectation that any management software will mask poor organizational effectiveness.
However, such software can serve as a rallying point for a broader organizational and process redesign -- that seems to work pretty well, in my experience.
Second, speed matters more than cost to most businesses.
When the business wants something done, there's usually a good case for "now" rather than "three months from now". I see far more process and org design biased towards that speed and flexibility design point, followed (of course) by cost, service levels, etc.
Most dysfunctional processes I've seen have been sequential processes. A provisioning request goes from group to group linearly, and may loop around a few times.
The example I gave was travel. I call the airline, I want to fly to Las Vegas.
I don't have to wait for the airline to find a plane, hire and train a pilot, arrange for fueling, etc. etc. I want to get there, and get there fast. I expect that the flight will be cost effective, reliable, etc. -- those are table stakes. Start thinking more like an airline, and less like fiefdoms.
Third, the 80/20 rule applies in spades here.
I have seen very robust processes designed with tier 1, super-HA, ultra-critical applications in mind. And then I go look at their provisioning backlog, and the majority of the requests aren't for that, it's for day-to-day stuff that isn't particularly sensitive to those sorts of things.
Why not have different processes? One designed for really important stuff, and another one designed for everything else? Hint: your problem will be more manageable, thought about that way. Very often, I see the lighter-weight stuff come out of a generic buffer pool that's managed differently (with different expectations!) than the critical stuff.
Fourth, expose true cost options so people can make informed decisions.
I am not a fan of charge-back. However, I am a fan of priced-out service tiers (performance, availability, recoverability, etc.), coupled with an honest discussion with business owners about how much they're spending, and opportunities to cut costs.
I usually approach this discussion with a sequence of questions.
Do you know how much you're spending on storage and storage management? The answer is usually "yes".
Do business process and application owners know how much they're spending on storage -- regardless of who pays for it? The answer is occasionally "yes".
When the project comes in, do you have the tradeoff and options discussion? Far fewer "yes" answers.
During the lifecycle of the application, do you periodically meet with the business/app owners and identify opportunities for cost savings, based on recent experience? Almost always, the answer is "no".
I am continually amazed by the number of people who are concerned about rising storage costs, but think the answer is to just beat vendors up a bit more. Sure -- you can do that -- but why not put a bit of effort into the other side -- the demand side -- of the equation?
Four Functions
How you build -- and staff -- the storage group at scale requires some thought. One schema that I find interesting is the "four function" model.
Group 1: layers of technology -- the array team, the SAN team, the backup team, the host HBA team, et. al. They cut horizontally across technology layers in the storage stack. Whether you need 5, 1 or 0.5 people for each of these is dependent on your circumstances.
Group 2: business/application alignment -- these people close the loop and provide the service back to application owners -- provisioning, performance, availability, cost reduction, etc. They sit in the planning meeting and take responsibility for the outcome. They're also responsible for looping back and proposing enhancements and/or optimizations.
Group 3: measurement and metrics -- these people measure costs, service levels, and utilization from different angles: use of shared resources, actuals vs. forecast by application, cycle time for change requests, SLAs -- the works. They're a separate function because you want (a) a measure of independence, and (b) you're measuring across different axes.
Group 4: strategy and architecture -- these people are looking out on the technology front, and trying to find synergy with other IT-related initiatives in the company, e.g. server virtualization, data center consolidation, etc.
Sure -- they're storage infrastructure people, but they don't get buried in day-to-day.
At a multi-petabyte organization, I might find 7-10 people in group 1, perhaps 4-6 in group 2, a single person in group 3, and perhaps 2 people in group 4. No, the resources aren't evenly divided between the groups, but the functions seem to be the right ones, and interplay well together.
Oh, by the way, the era of the "mainframe storage team" and the "Windows storage team" and the "Distributed System storage team" seems to fading fast. Information is information, applications are applications, storage is storage, business is business.
I'm sure other organizational models are workable -- but I've found that when I sit down to actually map processes to functions, this sort of schema works pretty well.
Interesting Thinking I'm Seeing
I'm also collecting a list of clever things I've seen a few customers do. I"m not recommending that you actually do any of them (!) I just found the thinking very interesting, and wanted to share.
I'm starting to see more "parking lots" get deployed. This is nothing more than a giant slug of ultra-cheap storage with no promises around service level, backup, recoverability, etc. Typically, it's presented as a file system to the organization, and anyone can store anything there -- no questions asked.
Customers who've done this have told me that they were absolutely amazed as to what came off of "production" arrays, and ended up in the parking lot. All of the sudden, production backup was a lot easier, production arrays all of the sudden had a lot of spare capacity.
Turns out that a lot of production capacity was just stuff. Nothing important, just stuff. And by having a cheap (e.g. free) place to put things -- with no questions asked! -- everything just kind of flowed there.
Now, they can actually get a look at stuff they've never been able to see before, because it's all in one place. And if someone asks the question "gee, no backup, what does it cost to get a backup?" -- well, that's a productive discussion, isn't it?
Another trend I'm seeing is what I'm calling "continuous storage buffer management". This is cropping up more frequently for the 80% of storage requests that aren't mission-critical, performance-sensitive stuff. Simply put, they manage two parallel processes: (1) making sure there's always storage that can be provisioned on a moment's notice, and (2) reclaimation of storage that was asked for, but never really used.
People tell me that thinking of this sort of stuff as you might think of a network (rather than a storage array) is turning out to be a lot easier, and lot more efficient.
Finally, I'm starting to see people build in "trade down" capabilities into their infrastructure. Here's the scenario: business unit comes in, most important application in the world, has to have the very best of everything -- and, after six months, well -- things have changed. It ain't as important as people thought it was; although it hasn't gone away entirely.
So storage virtualization (block and file) seems to be attractive in these very large shops, simply because IT can dial down -- or dial up -- a service level on short notice, and do so non-disruptively. These same people also appreciate what a large enterprise array can do along the same lines, but without the need for external storage virtualization.
The Bottom Line
You might think that storage management is a boring, mundane, uninteresting topic.
It's not. It's where all the information lives. Although there are those that espouse one storage management / methodology over another, I haven't found a single, unique "best way" that applies to the incredibly broad range of organizations and use cases that are out there.
If you do engage a consultant or a vendor to help you with this, may I make a suggestion?
That if they have the "answer" without understanding what makes your organization unique -- well, maybe you'd be better off working with someone else.
And -- if you have any thoughts along these lines -- especially as it applies to very large enterprises -- I'd be interested if you'd share your thoughts.
Thanks!

Very interesting (and necessary) post.
Thanks for sharing your experiences in this area. Parking lots look to me an interesting approach. You pointed this way somehow in a previous article about resources available at EMC, right?
Just one question about those 'parking lots'. Have you considered the potential implications in the mid term they might bring to IT Manager's desktops? (i.e. indexing, information and document cualification, access control, etc?)
At the very beginning, these will not probably show up. But i believe they might further on.
Maybe a usage policy avoids those considerations?
Juan
Posted by: Juan Jose Palacios | June 19, 2008 at 02:29 PM
Hi -- I think the thinking was "no policies", simply because the act of creating policies created friction and costs associated with the usage of the resource, and reinforced the existing bad behavior.
Going further, once it's all out in the open, and in a single place, I agree -- there's opportunity to index, protect, dedupe, etc. -- but you have to get the information out of hiding and into the open.
Or, at least, that's how the thought was expressed to me.
Posted by: Chuck Hollis | June 19, 2008 at 02:33 PM