« Managing Security Event Information | Main | Social Media at EMC »

July 12, 2007

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d83451be8f69e200e5507b8c488834

Listed below are links to weblogs that reference Does Anyone Take The SPC Seriously?:

Comments

orbist

I think you've seriously missed the point here. SPC tests do not attempt to 'predict' any kind of customer to customer real life actual performance.

What they DO provide is a common workload that can be used to scale, or dare I say benchmark, one product against another in a controlled and consistent manner.

Nigel

Hi Chuck,

I agree that these SPC tests are not worth the paper they're written on (actuall Im not sure if you even a paper printout ;-).

I once remember seeing that an EVA had beaten an HHDS 9900 or 9980, memory escapes me, for a long large block sequential workload written to a disk group consisting of about 100 disks, along side another, this time smaller block random pattern written to separate disk group of 50 or so disks. The thing is, they had turned off cache mirroring, which you cant do on the 9900, and had the LUN for the sequential workload owned by one controller and the random owned by the other controller.

They had big smiles on their faces as well - as if their tests actually meant aything in the real world!

I also agree with the consolidation thing, with many customers caring less about the sheer speeds of the box and more about the functionality it supports and how it copes under multiple different workload types.

I mentioned this trend in a recent post. Im basically seeing quite a lot of small configuration enterprise boxes being sold into environment that might previously have tended towards a midrange solution.

Nigel

Chuck Hollis

Hi Orbist

No, with all due respect, I don't think I have missed the point. Yes, it's a (sort of) repeatable test in customer hands, less so in vendors'.

But I am absolutely stuck on the thought that what use is a repeatable test unless it has some sort of correlation to the real world?

Otherwise, it's just an exercise in technical self-gratification, I think.

Thanks for the comment!

Chuck Hollis

Hi Nigel

I take no small measure of pleasure that we are actually agreeing with something!

Since I spend a lot of time with customers, one of my "do you get it" indicators is if they start hammering me about the SPC.

I'm more than happy to explain the situation. But sometimes they persist.

If they do, I suggest to the rep that they might want to think a bit about how they approach this particular customer.

orbist

Chuck,

Thanks for the response. I can sort of see where you are coming from, and I do agree that today's storage infrastructures require a lot of fine tuning on an instance by instance basis to ensure maximal performance in any given environment.

I think everyone understands that vendors headline figures (which are usually unrepresentative of ANY realife workloads - that is, small read cache hit measurements) should be taken with a pinch of salt. However surely you would admit that at least the SPC tests attempt to benchmark some kind of realistic workload, with mixed reads and writes, streams that do repeat range IO and response time must be kept under 30ms. I guess if nothing more, it shows an openness from those vendors that do take part.

Nigels comments do bring up the point of needing a consistent strategy from the SPC testing point of view. However, from what I've read any 'cache disabled' results are clearly marked in the title and SPC insists on at least 50% disk utilisation (which is a common strategy used by most shops when disk performance is key)

Just my 2c

blake

Why does EMC participate SPEC? If all tests that aren't a direct test of a production environment are bogus, then why submit for SPEC?

Chuck Hollis

Hi Orbist

I'd guess I'd differentiate between a tool that a smart technical type wants to use SPC code to do his/her own testing (fully aware of the pros and cons fo doing so), and the public spectacle of SPC press releases.

In the former case, the tester is free to modify not only the test but the testing conditions to more accurately reflect the purpose at hand. Not so in the latter case.

As far as modifiable test beds go, EMC has invested dozens of man-years just building testing simulators (let alone doing the testing), but not everyone has a rationale to do this sort of work.

I, for one, don't know where the "50% full for performance" urban myth originates. I understand the technical reasons why perhaps NetApp and IBM might have to do this, but it is not a standard EMC recommendation by any stretch.

I, for one, would think that this would make nnominally cost-effective storage very, very expensive in practice.

Thanks for writing!

Chuck Hollis

Hi Blake

Good question. I don't have a good answer.

I think that the SPEC was pretty well established and accepted (warts and all) before EMC came to market with NAS devices, so I think we were faced with a different choice there.

SPC does not enjoy those same circumstances, so it might be a different situation.

Good point, thought -- as any criticisms leveled at the SPC could be aimed at the SPEC as well.

orbist

Thanks again Chuck for the reply. I guess we'll have to agree to disagree on this one.

As for the 'urban-myth' of short-stroking magnetic media - its no myth. Reduce the distance the physical arms have to move and you reduce seek and latency time quite dramatically.

Chuck Hollis

Hi Orbist

We're talking about different things.

You're talking about short-stroking, a well-understood (but expensive) practice used when random reads are the dominating I/O profile, and there's insufficient NV cache to soak up writes.

I'm talking about the "reccommendation" from some array vendors that only half the capacity be used in normal production to get decent (not optimum) performance, due to poor contoller design, poor microcode design, or sometimes both.

I bet you don't short stroke your enitre pool production arrays -- why should we see SPC tests for unrealistic configurations that no one is likely to put into production?

It's possible on large cached arrays (such as DMX) to lock entire volumes in cache, resulting in unbelievable performance. Should we submit this as our example of "short stroking"? No, because that would mislead people, wouldn't it?

Same idea here.

So, what do you use the SPC information for? I'd be curious.

open systems storage guy

I agree that the SPC can't possibly match a clients workload, but at least it is an open documented workload. I know that EMC does quite a bit of testing, but if the results are going to mean anything, the test methods have to be inspectable by the clients and the competition.

If HP performed a "benchmark test" comparing their controller with yours, but they deactivated caching and gave your box slower disks, the results would be slanted in their direction. Now imagine they did not disclose their testing methods. It would look like their box was better.

The only way to meaningfully compare performance is if the method of comparison is public.

Storagezilla

Hi Blake,
Comparing SPEC and SPC is a bad idea since they operate in drastically different manners and SPEC for NFS doesn't favour JBOD. Something the SPC tests most certainly do.

If you check the SPC benchmark results you'll notice that only IBM & Sun have recent results listed, HP not submitting anything for well over a year and Hitachi, NetApp, and EMC being absent.

http://www.storageperformance.org/results/benchmark_results_all

What we have here is a case of IBM competing against itself in a benchmark the majority of the industry appears to have found fault with.

Chuck Hollis

Hi OSSG (open systems storage guy)

I think we're not seeing eye to eye on a key point.

Whereas the SPC is a (somewhat) repeatable workload, the testing conditions allow variability of some key variables.

As an example, when vendors get to choose short-stroked 36GB drives for their configs, or turning off write-back cache, or other "creative" approaches, I still wonder at the relevancy of the test.

As Storagezilla points out, both SPC1 and SPC2 are designed to defeat the benefits of cache -- SPC1 by randomizing I/Os and focusing on reads, the SPC2 by simply streaming files.

To the extent that you, the customer, want to use no-longer-manufactured drives in an inefficient manner, or expose yourself to risk by turning off cache protection, or have convinced yourself that storage cache has no value in your environment, then the tests are valid.

Again, many of the posters stress the repeatability of the SPC. While I will partially grant that point, no one has been able to make a case for relevancy.

Imagine I create the PDB (pencil drop and bounce) benchmark for storage. This consists of dropping a pencil (eraser end down) on the top of a storage array, and watching how high it bounces. Higher scores mean you have a better array, according to the test.

I will shortly announce the formation of the PDB Council to ensure that all tests are conducted in an open and fair manner.

I can make it very repeatable, open, independent, etc. -- but is it relevant?

BTW, for the record EMC publishes full details (down to excruciatingly boring detail) on every benchmark we run on our own (or competitor's) gear.

A key part of our customer base are some very sophisticated storage guys; they do not appreciate "creativity".

Thanks for writing!

blake

Hey open systems storage dude.

I'm just pointing out that Chuck calls any industry benchmark as bad, then why does emc submit results to spec?

Sure the tests are different, but that's not my point.

Chuck Hollis

So, OK, this is a fun discussion, and I'm getting something out of it.

There is definitely a school of thought out there that "any half-decent test suite is better than none at all" glass-is-half-full perspective.

And, to be honest, I can't argue with that. Tools are tools, and -- in the right hands -- they can be effective.

From that perspective, SPC is not a bad thing, nor is IOzone, nor is SPEC and so on. Smart people know what they're getting into -- warts and all.

And there is a second school of opinion that any semi-standardized comparison is better than none at all. Yes, the test has its flaws (both in construction and vendor methodology), but they're well known, and at least we -- as consumers -- have some sort of metric to look at.

And then there is a third school of thought (mine) that says that the flaws in both test construction and methodology are so severe that -- on the whole -- the test does more harm than good, especially when transformed into a benchmarketing tool by an aggressive vendor such as IBM.

SPC1 is a random IO generator that runs for a moderate period of time. I can't easily map its behavior into any mainstream application profile that I'm aware of.

Being a random IO generator, it favors smaller, short-stroked disks, and does not favor any design that uses read cache, read-ahead techniques, and the like.

In this environment, small JBOD disks will most likely deliver the best cost/performance ratio.

It does not contemplate use of local or remote replication. It does not contemplate performance in the event of a component failure. It does not contemplate a changing I/O mix or rate. It does not contemplate a mixed I/O profile. I could go on, it's a long list.

SPC2 appears to be a file streaming benchmark, and -- as such -- I can make a bit more of a case that there are real-world applications (such as video streaming and the like) where it might be just a tad more relevant.

I take issue with the less-than-relevant configurations that some vendors post. Go take a look at what they've tested, and ask yourself, would you ever put in a configuration like that?

I note that vast portions of the storage vendor landscape have voted to pass on this particular exercise (EMC, HDS, NTAP, Dell and many, many others).

Other than IBM, those that participate to some degree appear to do so half-heartedly -- an obsolete config here, a small array there -- and do not embrace it in a consistent fashion.

Even IBM cherry picks to suit their interests, which -- in my mind -- has turned the SPC into IBM's favorite storage marketing tool, especially in regard to SVC.

The degree of insincerity and manipulative behavior from IBM on this issue is appalling, and reflects poorly on an otherwise fine company.

Fortunately, it does not appear to be effective in the marketplace. The question in my mind is -- why do they persist?

Storage dog

I can tell you, if EMC could win this performance benchmark, they would have done this test and not only published it, but knowning EMC, would have put a half page ad in Wall Street Journal and sent teams flying to NetApp customers globally, talking about this benchmark...

This is like saying, 'Since I can't win this game, the game is flawed'

EMC is exposed!

Chuck Hollis

Storage dog, I guess you don't know us very well.

We think the test is flawed. Culturally, we don't like flawed tests. Too many storage engineers here to make that one fly.

We think the SPC methodology and organizational construct is flawed. We really don't want to lend credence to something we don't believe in.

Finally, getting a half-page ad in the WSJ would be a lousy way to get this sort of message out.

The bit about sending flyers to NetApp customers, well, I wouldn't put that one past us ...

Thanks for writing!

Tom

"I'd hazard a guess that we've got multiple billions of dollars invested over the years doing performance characterization. That's not a typo. It's such an ingrained part of our culture (and our R+D spend) that it's hard to measure accurately."

EMC's *total* R&D spend in 2007 was $298M. 2006 was $143M. 2005? $72.6M.

Even assuming that all R&D spend since EMC's founding went exlusively to performance characterization, "multiple billions" strikes me as a very hazardous guess.

Chuck Hollis

First, your numbers aren't even *close* to the actual spend. I wonder where you came up with them, since we don't routinely disclose precise R+D spend.

Second, some of the investment appears as customer support, environmental qualification, and is not entirely captured as R+D.

As an example, EMC publicly states that we spend more than 10% on R+D. If we're north of $10B in revenue, that would certainly be a much larger number than you're offering, wouldn't it?

Finally, I'd invite you to tour the physical eLab facilities where we do all this testing. There are multiples, so be prepared to travel a bit.

And I think you'd come away convinced regarding the validity of that statement.

Thanks!

AvisMORENO22

I would like to propose not to wait until you get big sum of cash to buy all you need! You should just take the lowest-rate-loans.com or just collateral loan and feel yourself fine

Ayırma Büyüsü

How many have had their computers crash because of viruses or Trojans loaded when we accidentally arrive at the 'wrong' web site or have had our teenagers do the same thing? How much pornography is available on the Internet? Are these questions that may be addressed by the FCC or Henry Waxman? Absolutely not! There are web sites that broadcast conservative, libertarian or Republican thoughts that need to be eradicated. The fact that people want to hear Rush Limbaugh a great deal more than that idiot Chris Matthews only indicates they've been brainwashed . . . but in the wrong way. Henry Waxman to the rescue! Matthews, Maddow et. al. need to be heard! We'll whisper if need be. We'll seek others with open minds and we'll find each other. Neither Henry Waxman nor any other liberal will stop us. Thank you for sharing :)

The comments to this entry are closed.

Chuck Hollis


  • Chuck Hollis
    Chief Strategist, VMware SAS BU
    @chuckhollis

    Chuck has recently joined VMware in a new role, and is quite enthused!

    Previously, he was with EMC for 18 years, most of them great.

    He enjoys speaking to customer and industry audiences about a variety of technology topics, and -- of course -- enjoys blogging.

    Chuck lives in Holliston, MA with his wife, three kids and four dogs when he's not travelling. In his spare time, Chuck is working on his second career as an aging rock musician.

    Warning: do not buy him a drink when there is a piano nearby.
Enter your Email:
Preview | Powered by FeedBlitz

General Housekeeping

  • Frequency of Updates
    I try and write something new 1-2 times per week; less if I'm travelling, more if I'm in the office. Hopefully you'll find the frequency about right!
  • Comments and Feedback
    All courteous comments welcome. TypePad occasionally puts comments into the spam folder, but I'll fish them out. Thanks!