Operating systems have known how to dynamically balance competing workloads for years.
Networks have recently gotten good at sharing multiple kinds of traffic -- and associated service levels -- over the same pipe using QoS.
And storage arrays (specifically EMC's) have been pretty good at balancing workloads, but I think there's more to do. And it's going to be an interesting topic for the next few years -- at least to the storage guys!
Let's see why ...
Background
Many years ago in a land far, far away, I became acquainted with prioritization in the UNIX operating system. I learned about the "nice" command, and later about pre-emptive scheduling.
The goal was simple: how can you share a single resource between competing workloads, yet make sure that everyone is happy and not create excessive management effort?
In general, the UNIX scheduling algorithm worked pretty well. It would look at CPU, I/O, memory, etc. and try and come up with a fairness algorithm that made everyone happy.
If you needed it to give it a hint, you used the "nice" command. If you needed to give it a stronger hint, you'd use pre-emptive scheduling. And if that didn't work, you'd write a device driver.
Get too carried away "optimizing" the environment, and you'd spend all day at the console watching screens, typing commands, and taking phone calls from irate users who wanted to know why you were screwing around with the server.
Eventually, I learned to do the least amount possible to get the greatest result, and let the algorithms do what they were good at.
I was a slow learner, though ...
Many of you know I've worked closely with Symmetrix products for many years. And I was always impressed by the algorithms and the intelligence it used to balance competing priorities for cache, processor and I/O bandwidth to keep everyone happy with a minimum of fuss. Same sort of concepts are in CLARiiON, Celerra, etc. but the real heavy lifting is on a Symm.
Compared to the algorithms I knew from UNIX-land, this was a whole new ballgame. The engineers who did this work blew me away with the depth of sophistication and practicality around what they had done.
I kept asking them questions (which they patiently answered at the time) until I hit the point where I said "gee, this is pretty cool, I guess you've got this nailed" and moved on.
When I talked to customers early on, the question often came up "how do I tune my array". I guess I didn't offer the politically correct answer at the time, and often ended up saying "the array is probably smarter than you are on this topic, so leave it alone!".
OK, maybe not entirely correct in all cases, but I was trying to illustrate a point at the time.
Fast forward to today
But, all history aside, I think that there's two enormous opportunities for storage arrays to get a whole lot smarter around self-optimization.
And as we see a strong resurgence in mega-arrays with multiple storage types (e.g. DMX-3 with approx a petabyte of capacity), there's more interest in self optimization.
First, there's an enormous opportunity for arrays to receive external "hints" about what's going on in order to make better decisions.
Second, even though smart storage controllers could make smart decisions around cache, CPU, I/O, etc. most of the performance characteristics were tied up in the disk drives the information was landed on, and moving that around dynamically presents a different challenge.
Perhaps the best state-of-the-art today is Symmetrix Optimizer. It's a very mature product that does sophisticated time-series analysis, presents recommendations and likely performance outcomes, and (with approval) will move things around in the background. Works as advertised.
But there's always room for improvements, aren't there?
External hints
At a high level, there are two ways for an array to optimize its resource. One is to look at usage patterns and to make some smart guesses, e.g. no one has touched this LUN is a long time, maybe I can move it slower (cheaper) storage? Or maybe this particular is getting hammered, maybe move it to something faster?
But these internal algorithms can only go so far. They have no context beyond what they see coming down the I/O channel. Most of what Symmetrix Optimizer does falls into this category.
As an example sometimes this leads to behavior you don't want: this database table is idle, until the end of the quarter, and then we need a lot of performance for a few weeks, and then back to normal.
Or this is the backup job, and production is waiting, so please hurry up. Or we have these guys in marketing who are hammering their database, but they didn't pay for their storage, so it's no soup for you.
So where will these hints come from?
They could come from the operating system. As an example, zOS can offer "hints" to storage (e.g. SMS) about what's important and what's not. And, just for completeness, a DMX respects these hints as much as possible.
They could come from the SRM environment. As an example, if one uses ControlCenter, you're provisioning storage at different classes, you're identifying replication and backup jobs, and -- in general -- there's a lot of information captured at the SRM layer that could be directly used by a storage array to provide even more optimization hints.
And in the most recent version of Symm Optimizer, there's the first round of "tiering" hints that can be provided. These are a step in the right direction, but more to do, to be sure.
But there's a gleam of nirvana out there.
For those of you who've done a deep-dive on model-based management (a-la-EMC Smarts), you'll realize that the power of model-based approach is that you can define business processes that matter, identify all the connecting pieces, and drill all the way down to supporting infrastructure, including storage.
As an example, if you define an order-to-cash process, Smarts will represent that as a set of applications, databases, servers, networks, storage, etc. and can provide a very accurate hint to the storage array as to what's required.
Better hints will make for better optimization. The core technologies are in place, but more integration work is required.
But what about moving data around between disk drives?
Over time, there's a wider and wider disparity between the potential service levels (and associated costs) from disk media.
Look at the differences (and costs) between, say, a short-stroked mirrored pair of 15K 73GB drives, and a large RAID 6 group built from terabyte-class drives.
And, if we look out a bit farther, as data reduction technologies find their way into storage arrays, we'll see an even wider range of costs and associated service levels. And then there's the tantalizing potential of flash memory.
So the disk-related performance and costs differences betwen different kinds of storage will get bigger, not smaller.
But moving a workload from one place to another is largely a manual process these days. And workloads tend to vary dramatically over time. So folks tend to overprovision storage performance, simply because it's easier to think about worst case.
Arrays will need the ability to move workloads around from place to place quickly, and non-disruptively. Put another way, if it takes 10 commands and 20 hours to move a data set from fast to slow (and app performance suffers during that period) no one will want to do it.
If it happens automatically, quickly and with no production impact, it'll be used a whole lot more to balance things out.
Moving a couple of terabytes around will take bandwidth, CPU and some smart scheduling to make sure that these behind-the-scenes optimizations can be used, and aren't more trouble than they're worth.
Still will need the hints though.
So, what are the implications of this for storage users, should it come to pass?
First, it takes the concept of storage tiering to a new level.
Today, most customers create static tiers of storage service levels and price points, and there's significant effort to move things around between tiers, so exploitation is limited. I also think that -- because there's significant effort to change tiers -- people tend to specify a notch up rather than risk coming up short.
Second, there will be an interest in having external management tools provide hints, e.g. SRM, model-based management. Look for integration between higher-level concepts of what's important to the business, and those entities that provide the service level -- storage, servers, networks, etc.
Third, many of our notions of storage management will have to evolve. Storage management is largely a static model today -- you can hopefully name the drives where the database might live. In this environment, you might have different answers at different times. Might be hard for people to get their heads wrapped around that one.
"Where's the database?" "It's here, no wait, it's over here, no wait ... ahh, it's somewhere in this box"
So what's the payoff?
For one thing, I think it's pretty easy to market.
Save money on storage by simply creating pools of different service levels and price points. Let the storage environment balance it out. When you need more, it will simply tell you how much more and what class of storage would be recommended.
Less management effort. Tiering becomes almost automatic. Service levels get delivered, and if they don't, you know why that is and what to do about it.
And, ultimately, better service level optimization -- critical applications and users get the performance they need, when they need it, and the remainder is available for others.
And there's no reason why this sort of capability couldn't be provided via an intelligent SAN switch to provide dynamic tiering across mutliple arrays, if you wanted. Kind of would take that whole storage virtualization thing to a whole new level, wouldn't it?
But I think we'll see it within an array first.
A nice world, to be sure.
And I don't think it's that far away ...

Chuck, I commented on Optimizer last week and I'm currently running a poll to see how many people actually trust it to do the job. On the subject of neeting products like Optimizer in the first place, I think that current array designs make these products necessary. Although architecture is a "shared everything" disks are still stored on back-end loops and data is stored in discrete packets (e.g. hypers on EMC) which makes moving data around to balance performance a necessary evil. Perhaps we need a fundamentally better architecture where we move away from the requirement to map LUNs directly to physical drives and the array microcode writes data to the least busy part of the array on a dynamic basis. Thought Moshe was working on something like this (but perhaps not with EMC)?
Posted by: Chris M Evans | April 28, 2007 at 06:39 AM
Hi Chris
I've found that there's a trust-building exercise with all dynamic optimization products (servers, storage, etc.).
Thankfully, Symm Optimizer has been out there long enough that you'll probably find more than a few people that trust it to do its magic in the background, but that trust relationship didn't happen in the first week!
As far as future architectures minimizing the need for data relocation, I would argue that we're seeing more dynamic range between disk alternatives, and not less.
This leads me to believe that -- at some level -- there will be a need to shuffle the data chunks from one place to another to optimize.
Now, that being said, I think there's huge room for improvement on how that's done, e.g. more granular chunks (as opposed to LUN slices), being able to do so quickly and without impacting production, and so on.
I think this is one area where our thoughts line up a bit!
Posted by: Chuck Hollis | April 30, 2007 at 02:38 PM