A while back, I wrote a controversial post ("The Great Data Placement Debate") where I offered that certain storage architectures -- due to their mandatory obscuring of physical drive geometries -- would stymie users that were looking to get a performance bump through judicious use of things like enterprise flash drives.
Well, like kicking over any hornet's nest, you learn a few things.
And here's what I learned.
The Vendor Reaction
Competitive storage vendors reacted somewhat predictably and vociferously. You can scroll through the comments to see what various vendors had to say. But I began to suspect that many of these vendor representatives had never been in a *real* performance intensive situation, based on their commentary.
I still stand by my orginal premise: there are gonna be times where users are going to want to precisely control which devices their data lands on. And if your storage array insists on virtualizing / randomizing / obscuring this basic mapping in such a way that it can't be turned off -- well, I hope you never run into this particular situation.
But I thought the more interesting response was from users.
The User Reaction
I heard from more than a few users (at least, I think they were users) who said this wasn't a big deal for them. They had never had any performance problems in their environment. They couldn't see themselves being in a situation where they'd need to go beyond whatever auto-magic feature their array supported, and couldn't imagine ever needing to turn it off for one reason or another.
Luck, lucky people.
I'm not going to argue with these people, because -- in one sense -- they're quite right. They've got something that works well for them, does the job, and they're happy with how it's going. Far be it from me to suggest otherwise.
I guess I'm a victim of sampling bias in my day-to-day activities. Week after week, I meet customers who *do* have performance sensitive applications, and are always looking to get more performance out of key applications. For them, they'd understand exactly what I was talking about, and would most likely agree.
But it's not an either/or discussion -- and let me explain why.
Have It Both Ways?
My real beef was not in whether or not a spindle-randomizing storage architecture was "better" or "worse": it was that you didn't really have a choice -- there was no way to turn the "helper" off and get back to basics if you needed it.
I contrast that "one size fits all" (or "one algorithm fits all") thinking with something like EMC's NS40 unified storage. For applications where ease of management is paramount, you've got a nifty NAS/iSCSI environment that's easy to configure, provision, auto-grow, auto-shrink, make lots of space-efficient snaps, etc. -- it can be configured as a "spindle randomizer" type set-up, if that's your preference.
Or not, depending on your druthers.
But -- here's what's important -- underneath the NAS/iSCSI head is a real FC dual-controller array implementation with real, physical drives. Sure, they can be combined and virtualized as meta-luns, or virtual LUNs, or whatever you need.
But -- if you need to precisely control what data lands on which spindles, and don't want some storage engineer's idea of a file system in the way -- raw access is there for you if and when you need it.
Including landing that 10GB of an Oracle transaction log on an enterprise flash drive, if you choose. And landing another 20GB of something entirely different on the same flash drive drive if you need to. Or something else equally clever.
I'm all for innovation in the storage business. But I think that storage administrators should have a choice as to whether to use a specific feature, or not. And many of these platforms simply don't give you that choice.
And, Maybe It's Different In The Larger Enterprise
Most of my work is with customers who do IT at significant scale. LIttle things can become big things when sufficiently magnified. And I have to admit, it does affect how I look at different IT issues, and it strongly flavors my opinions on technical matters.
So, thanks to all the users out there that reminded me that IT comes in many different shapes and sizes, and what might matter to one shop may not be all that important to another shop.
Mea culpa!

I see where you are coming from, but what I think both vendors and users were trying to say was "all I want is a way to provision X amount of capacity from Y amount of storage."
Having a whole bunch of different boxes out there, and having to make that decision on a daily basis - where do I have that class of storage, where do I enough free space... Lets not think NS40, DS4000, CX4, USP-V, DS8000, XIV, lets think pools. I will decide once, and only once, which pool my Z set of arrays will provide, i.e. which class they are, and thats that. Next day I want my X GB of class Y, and all I need to do is provision it.. virtual storage...
Again, simplicity, we all have lives outside the workplace, we all want to spend more of it with our loved ones, so lets make those decisions when we are being paid for making them, and if we need to move something from class Y to class Y' then so be it - and lets not worry about it when we go home, its happening without disruption, without expensive services and without us going in at the weekend or in the 'wee' small hours to make sure its happening.
Thats the bigger beef, thats the real value of virtulization, to all our customers.
Posted by: Barry Whyte | August 18, 2008 at 07:02 PM
Agreed -- an ideal goal!
If only some vendor had a product that actually *did* that and didn't create more problems in the process -- well, that'd be great, wouldn't it?
(don't respond, BarryW, I'm just baiting you ...)
Posted by: Chuck Hollis | August 18, 2008 at 08:14 PM
LOL. Hook, line and sinker... always wondered what the sinker was, until I googled it ;)
Posted by: Barry Whyte | August 18, 2008 at 08:34 PM
I wonder how many systems out there have been deployed and tuned specifically for an application by a consultant, had disk placements tweaked to perfection, the various variables tuned to within an inch of there life, and left running at top speed, but now maybe 3-4 years later is actually running worse than it would have been without the initial tuning?
Requirements change, environments change, applications change, the space used grows, new disks are added, old ones repurposed, all these things are layered on top of the initial tuning done by the installers.
This is surely what the automatic tuning systems and the virtualization or masking systems are working to avoid?
Yes, you can set your database log files to run on a specific set of tracks on a specific disk in the array, but in a years time when a new DBA comes along and changes things because of a capacity issue, you'd better be prepared to spend the money to get a consultant in to start the whole tuning process over again.
On the other hand, if your disk array "just sorts it", and gets 95% of the performance, then your DBAs can switch the location of their log files every other month and noone will care.
Posted by: Ewan | August 20, 2008 at 07:23 AM
Yes, of course, assuming that your disk array knows how to "just sort it" within most of its potential.
And assuming that whatever "sort it" algorithm the array uses happens to match your particular use case.
And if you're comfortable with the overhead that's needed for this.
Great, if all of these things are true.
But sometimes, they're not -- at least, from what we can see.
A noticable part of our business is what we call "second surgeries". Customer put something in from another vendor, and it didn't work out as expected or promised. And we're not talking about replacing older gear here.
Some of these result from tragic availability problems the customer experienced with the other vendors' kit. Other situations were driven by a failure to meet performance requirements of the business.
Either way, it wasn't pleasant for the people involved, I'm sure.
We also see rampant overprovisioning in many of these mid-tier array environments: fractional reserves, etc. Getting to use 60% or less of the disk you paid for doesn't look good from certain perspectives, especially in larger environments.
So -- once again, I'm agreeing with most of the opinions here: e.g. if an array can do something automatically and satisfactorily, great.
But, if it doesn't, you ought to have the ability to turn it off and work around it if you need to.
And if you never have to do that, lucky you.
Posted by: Chuck Hollis | August 20, 2008 at 08:31 AM