Over the weekend, a few internal notes were being circulated about testing storage array performance. There's an interesting pattern that we're seeing with these evaluations, and we'd like to offer a few suggestions to save you time and frustration.
This storage performance testing discussion is very different from the ever-popular storage benchmarking (or benchmarketing!) debates -- it's about you investing your time to run some tailored tests to figure out what works well in your environment.
Not everyone has the time or the inclination to invest multiple days doing a suite of performance tests against multiple storage arrays. Certain vendors, though, are forcing the issue by dropping off their newest boxen for evaluation.
My take? If you're going to do this sort of thing, invest the time to do it right.
I'd invite others from the storage community to offer up their testing suggestions as well ... it's a topic that probably deserves a bit more attention.
It's Array Testing Season!
Now, to be fair, there's always a steady background of performance bake-offs going on at any one time. But lately, a number of newer storage vendors are dropping off their just-outta-beta products for evaluation, causing a spike in the number of our customers interested in doing this sort of work.
That's to be expected -- it's tough times out there!
The problem is arising from the level of testing being done, and it usually goes something like this.
1 -- Customer runs very simple test on very simple simple config, gets amazing results
2 -- Customer tells EMC "gee, this new box is really fast, what's up?"
3 -- EMC asks customer "did you test for XYZ?"
4 -- Customer says "no", repeats test using suggested approach
5 -- Results are very different, causing customer to revisit original opinion of new storage array
So, in the interest of saving time for everyone -- or avoiding erroneous conclusions -- here are a few hints on how to evaluate storage array performance.
#0 -- What Are We Testing For, And Why?
I know this sounds blinking obvious, but you'd be surprised on how many testing efforts that we get involved with midstream, and we start asking these sorts of questions, and -- well -- for some reason there are no good answers.
I'd suggest writing a paragraph or two -- up front -- on why you're doing the testing, perhaps a description of the envisioned usage of the array, key performance characteristics that matter, and how you plan to do the testing.
Call it a "test doc" or whatever -- but a bit of thinking up front is really helpful.
#1 -- Fill It Up
So many arrays today build file systems on top of physical LUNs that it's almost imperative that testers fill up an array with data to get realistic numbers.
I call these arrays "spindle randomizers" because they take the approach of spreading user data on as many disk spindles as possible. This leads to very different performance profiles at, say, 10% capacity and, say, 90% capacity. There's no way to get realistic numbers without filling it up.
In particular, write space allocation can substantially degrade as free disk space becomes harder to find. You might remember the kerfluffle that resulted when I posted this example.
If you're planning to use your arrays full, you should test your arrays full.
#2 -- Don't Write Just Zeros
Several popular testing tools (e.g. IOzone and a few others) write nothing but zeros. A few vendors have gotten wise to this, and have inserted special code that looks for this situation, and uses special routines that don't reflect general performance characteristics.
I'm sure that they had good reasons for doing this that had nothing to do with spoofing popular storage testing tools, but it's worth watching for.
Unless, of course, all your applications write nothing but zeros -- and then it's a reasonable test.
#3 -- Use Multiple Servers and Multiple I/O Paths
Sure, a single server can generate a substantial workload (especially if it's a big one), but give some thought to using multiple servers and multiple I/O paths -- each with a different I/O profile.
Certain arrays can "lock in" on specific I/O patterns and do some decent optimization -- but suffer when presented with multple, uncorrelated I/O patterns. Or they do a good job keeping up with the first few 4Gb FC ports, and struggle to keep up with larger numbers.
That's presuming, of course, that you'll actually be using your array this way.
#4 -- Watch For Cache Effects
I learned this one over a decade ago when I was working with the first Symm 3s that very large nonvolatile storage caches at the time. Many of today's newer arrays have substantial caches as well.
I remember a particular event when I bet a customer that a UNIX file system check (fsck) of a 100GB file system could be completed in under 3 seconds. It was all sitting in cache, so I won that bet easily.
Cache effects are particularly noticeable when the amount of allocated storage being tested is small, or there's only one server driving I/O, you're running a simple test, or you're running the same test over and over again.
Finding the "flush cache" feature on an array can be hard, and there's no assurance that it actually cleared cache, rather than just destaged dirty buffers. Some arrays preserve cache contents across power cycles (using battery backups), so there's just no "starting over" in some cases.
Good testing practice involves testing allocated storage domains considerably larger than cache, using a simultaneous mix of different I/O patterns, preferably from multiple servers, and letting things run until they reach a steady state.
Unless, of course, you aren't planning on using the disk in your array -- just the cache ...
#5 -- Make The Array Really Work
It's one thing to test application performance when you're just doing I/O; it's entirely another thing to do so when snaps are being created and/or manipulated, or remote replication is doing its thing, or drives are being rebuilt, or -- in some cases -- the array's space reclamation processes are running.
Space reclamation processes? Yep, one of the joys of some of the spindle-randomizing arrays is that they periodically need to do a bit of housekeeping. If you aren't planning on pushing yours 24x7, this isn't an issue, but -- if you are -- you should force this situation and see what the impact might be.
Plan on doing backup to disk? Run through the entire scenario while your performance test is running: creating the snap, splitting it off, copying it to another target device. Or perhaps fail a disk or two, and watch the rebuild happen while you're running your load.
If you're going to see application timeouts and other errors, it's going to be when you really pile it on: demanding workload, all the replication doing its thing, and throw in a few rebuilds on top of it all.
Unless, of course, you aren't planning on pushing your array too hard -- but then, why are you doing performance testing?
#6 -- Get The Vendors Involved
Depending on the situation, you might want to get your storage vendors involved in designing the test.
Good vendors will work with you to understand how you'll be using the products, and attempt to come up with a practical scenario that tests key aspects and doesn't take an inordinate amount of effort.
Not-so-good vendors will simply flip you an email that says "run this" using a single server against a small amount of capacity.
Hmmm -- I guess that's a good test all by itself, isn't it?
There's More, But I Think You Get The Idea
Testing array performance -- and getting meaningful results -- is not a simple or easy task.
Just about every array can look pretty good with a small amount of data and a simple test -- which proves absolutely nothing, and may even lead you to the erroneous conclusion that your results with a small, simple test will scale linearly to a large, complex test.
It's only when you start seriously cranking things up that you notice the real differences in architecture, algorithms and some pretty important design choices that array vendors need to make.
Need Some Testing Tools?
I wanted to share EMC's external testing tools -- not many people know about them, but they're quite good at generating large-scale, repeatable storage workloads that are nice and complex -- just like the real world!
You can find them here.
They're designed for use by a knowledgeable UNIX type who has a few UNIX servers on hand, and wants to invest some significant time in creating a storage performance test bed.
They are not designed for people who want to load up a simple program on their PC, press a button and get a quick number, nor are they offered up as a standardized benchmark test.
You may notice that there's about a bazillion different options in how they can be set up. Things like number of concurrent streams, offsets, varying between random vs. sequential mix, reads and writes, varying block sizes, pauses, etc. etc. Lots of useful knobs to play with.
That's because there's at least a bazillion different testing scenarios that people are interested in.
However, if you need help with storage array testing (the man page is pretty terse), your friendly EMC field resource can either help you (or find someone to help you) figure out how you want to design your specific test, and help you run it.
Culturally, we really like this sort of thing, so you'll find us willing and enthusastic participants. Also, the fact that our arrays are pretty damn fast might have something to do with it :-)
Anything Else?
I think there are more than a few readers of this blog who have done hard time doing performance testing on storage arrays.
If you have, I'd invite you to share your experiences -- good and otherwise.
Thanks!

Chuck, Good stuff, and I couldn't help but be reminded of many very-real situations where your comments triggered memories, some of them quite recent. The most important aspect of testing that I've always encouraged (storage or otherwise) is that if you want to get meaningful information, test with your actual production applications and data, at as close to real scale as is practical. I was recently involved with a customer where a competitor claimed, rather emphatically, to be "2X!" the performance of EMC's Symmetrix. I was reminded of my own days as a customer where never once did I ever hear any vendor claim to be anything other than "the best" relative to performance. I can't recall even one vendor ever saying anything like "did I mention how inferior my performance is to all of my competitors?". Always quite the opposite... The competitor in this particular case was justifying their claim based upon an artificially rigged benchmark that violated most of the suggestions that you make in your note - it used minimal paths, minimal data, no back-end, etc. Fortunately, the customer's true need for performance was associated with a very common application for their industry, using a very common database product, and the data from the arduous task of evaluating the actual application at scale showed EMC's Symmetrix to consistently outperform the alternative, at least in the "real" world. After only a tiny percentage of the data was then moved to Flash disks (not an available option for comparison on the competitive offering), the gap in performance became massive in favor of the Symmetrix DMX - almost 7X for response times. I also recently saw an actual situation like the one that you describe where a competitor's storage system appeared to be performing quite well with one server connected to a very-lightly loaded configuration, but when the customer (fortunately) decided to load up more capacity and add several additional servers (closer to their actual requirements), the performance dropped to about 1/3 of what the storage was delivering with only one server connected - and well below that of the EMC Symmetrix being evaluated as an alternative. Anyway, I believe that your suggestions strongly support good "best practices" for customers to consider, having been on both sides of this fence. Thanks, Ken
Posted by: Ken Steinhardt | December 16, 2008 at 05:49 PM