OK, I admit it. Part of me likes big iron.
Ever since I saw my first mainframe as a young pup, I've been impressed by big IT machines.
The bigger and the more blinking lights, the better. A loud roar coming out the back is cool, too.
Is this a guy thing?
I've had the privilege of being associated with EMC's Symmetrix business for over twelve years. I think high-end storage architectures are an interesting lens on what's coming in the broader storage landscape.
EMC just finished a set of announcements that extend the core DMX in some interesting ways.
In this post, I'll try and give you a bit of context behind the announcements, and why they might (or might not) be important.
The Basics
The whole idea of a large DMX is hyper-consolidation. Take all the different service levels, and all the different cost points, and put it in one big storage frame.
If done right, it can be cheaper in terms of hardware, software, power, cooling and management effort -- as opposed to multiple, individual frames (or worse, internal storage).
Not politically correct with some of the more aspirational storage industry pundits (none of who seem to actually work in IT), but very effective and very popular.
Most of the features in this release were designed to support bigger consolidation scenarios along these lines.
Hardware Feature Updates
For those of you who track hardware features, the most obvious part of the announcement was support for 4Gb FC on the front end, and RAID 6 protection as an option.
Neither are especially game-changers, and neither were reactions to customer demand. In the high-end market, people continue to be happy with 2Gb FC infrastructure, and -- since we're only talking a controller swap -- they knew they could get to 4Gb when and if they needed it.
RAID 6 is marginally interesting, but is mostly a feature checklist type of thing. Given the way a DMX aggressively protects disk drives, it wasn't a big need. But it's there now, if you want it.
What's A Priority? Dynamic Cache Partitions and Symmetrix Priority Controls
As these frames get ever-larger (~2000 drives), there's more and more interest around making sure the various service levels and priorities are achieved. Hasn't been a major issue so far, but high-end storage customers tend to look out over the horizon and anticipate what might be coming.
Now, at first blush, most people think they want very finely-grained controls to precisely set and manage performance tradeoffs.
I would think that would be exactly the last thing you'd want to do -- being responsible for minute-to-minute micro-optimizations of hundreds of applications hitting thousands of disks -- no human being could keep up.
Fortunately, Enginuity (the storage operating environment on the DMX) has the benefit of 15 years of acquired knowledge in dynamic optimization of rapidly changing workloads. Better than a human operator.
What it was missing was the external hint -- hey, this application is more important than this other one, so act appropriately, without doing any damage.
And that's the new feature.
Users can create groups of volumes that represent more important-- or less important -- applications. Enginuity uses this external hint to express external priorities to the internal algorithms, and let them do what they do best. Or, if you're in a mainframe environment, it can handle similar hints from z/OS (say, from SMS Workload Manager) and act appropriately.
Not exactly a game-changer, but as we move into a world where more people are hyper-consolidating on extremely large frames, and as people try to wring the very ultimate in performance out of these arrays, this will be a useful capability going forward.
And then there's cache --
One of the architectural features that distinguishes a real high-end array from all the rest is a monstrous amount of cache. For many I/O profiles, it's the key to performance -- delayed writes, read pre-fetches and so on.
The new feature -- Dynamic Cache Partitioning -- allows customers to provide another helpful hint to Enginuity to help it optimize performance -- the amount of global memory that can be utilized for each group of volumes.
It's dynamic in two ways. First, feel free to change the knobs when the box is cooking. No need to schedule quiet time, wait for cache partitions to drain, reboot, etc. See a problem, make a change.
Second, rather than working with fixed allocations, thereâs the concept of minimum and maximum. So the internal algorithms are free to move things around based on the overall picture, while making sure that certain minimums are met.
I don't know if this is a game-changer, but it's a unique feature that -- for the right situation -- could be very helpful.
Especially if something very important is running very slow and you'd like an immediate fix ...
Automatic Tiered Storage
Another significant new feature is found in the latest version of Symmetrix Optimizer.
For years, it's been able look at your workload, and suggest an alternate balancing of volumes and spindles to maximize performance. If you like what you see, you can instruct it to transparently move things around without anyone noticing -- the net result being a potentially big bump in performance.
Well, the same principle works in the opposite direction. If something is less important, then, well, maybe it ought to be on the slower drives, no?
If you've defined a set of volumes and related applications as less-than-important, Symmetrix Optimizer now can be told to transparently move the workload to fatter/slower (cheaper) drives --freeing up the faster (more expensive) storage for other applications that need the performance.
The net result is that you can get a fair amount of benefit from tiering without a whole lot of effort. Symmetrix Optimizer looks at the situation, makes some suggestions, and you either agree or not.
All of this marks a trend of providing more and more QoS (quality of service) capabilities in storage, but doing it in such a way that doesn't creates new headaches. Just to be complete, CLARiiON also has a subset of these capabilities.
Securing The Symmetrix
Relatively big news in providing additional infrastructure security for DMX platforms.
With all that information in one place, many customers want to know that administrative access to the storage array can meet best-practices guidelines for securing infrastructure.
So, in this release, we've added RSA credentials and role definition to service processor access, as well as provided a tamper-proof log of all service-processor related activities. And since many forms of access can be done programmatically (either through a CLI or API from an application program), there are some new protection features there as well.
And there's now a nice feature that will do certified data erasure on a failed drive when automatically spared, hopefully before the service person comes to remove and replace the failed drive. Nice to have if you have bad dreams about someone walking off with a disk drive's worth of sensitive information.
Storage infrastructure is coming under increased scrutiny for security compliance; I think these features go a long way towards making this less of an issue.
An Alternative to GDPS
If you're into high-end mainframe computing, you know all about GDPS -- geographically dispersed parallel sysplex.
From a feature-set perspective, it'ss got it all -- dynamic failover at a distance, automatic workload migration, super automation, etc.
The UNIX guys can only hope they'll be as cool one day.
The bad news has been that -- since there was only one source for the technology (IBM) -- it was one of the single most expensive IT propositions in the marketplace. It wasn't really offered as a product, more as an extended service engagement from IBM.
Competition is good for the industry and for customers. Better feature sets, lower prices.
EMC announced a competitive alternative in this space -- GDDR -- geographically dispersed disaster restart -- that builds on top of SRDF to offer a reasonable alternative at a fraction of the cost.
The people who are using it like it better than GDPS in terms of functionality and flexibility. And it's not bad to have another alternative to go look at.
Replication Enhancements
Over the last few years, the industry has moved from short-to-medium distance synchronous replication to asynchronous replication over longer distances.
The flagship product in large enterprises has been SRDF/A. It uses cache to ensure that critical applications don't take a performance hit when replicating over a long distance.
But networks can be finicky things, especially IP networks. Customers found themselves monitoring their IP networks more than they would have liked to, since when cache resources got exhausted, the session would drop and they'd have to take extra steps to resync.
With this release, the situation is improved. The DMX monitors the IP link and if cache resources are exhausted, it uses a dedicated disk area to put the excess store-and-forward traffic. Works with FC links as well.
The net result is that -- even if there's an relatively long network event -- SRDF/A stays up, squirrels the extra data in a log file, and resyncs automatically when the network service regains its health.
And, in the category of more information than you really wanted to know, the log file itself is write-folded to reduce the amount of data that has to be transmitted on recovery. No sense in sending old copies of yet-to-be replicated data that have already been updated, unlike other approaches.
The result -- faster recovery time.
Size Matters
As we start to deal with larger and larger array capacities (now getting into the petabyte neighborhood), other limits have to be bumped up: number of logical volumes, number of groups that can be defined, the maximum number of volumes in a consistency group, and so on.
Lots of that sort of stuff in the release as well.
Of course, very little of this matters unless you're thinking about consolidating into a very big array -- which, based on what we're seeing, is happening a lot more often than most industry-watchers think.
Are You Ready For A Really Big Array?
If you're already a user of high-end arrays, you probably can see the value of the enhancements EMC has offered with this round.
Like any foundational piece of IT architecture, progress is evolutionary, not revolutionary. Doesn't make for great attention-grabbing headlines, but that's the winning recipe here.
If you're not a user of high-end arrays, maybe you might be ready to consider it as an alternative. There are pros and cons, of course, but in my opinion many shops reach a point where managing a whole bunch of disparate arrays just isn't fun any more ;-)
On a more serious note, high-end arrays represent a big step forward in how shops look at information management: is it a centralized function, or is it distributed?
And -- all theoretical and philosophical debate aside -- we're seeing more and more customers taking the big step into a hyper-consolidated storage world.

Comments