Rather than weigh in on one side or another, I thought I'd take a few moments to share the basic concepts, and to shine a light as to why different vendors are lining up on one side or another of the discussion.
Now, since this is a simple treatment, I'm sure that others will want to either extend or amend some of my comments here. And, yes, this is an over-simplified treatment -- that's the point.
Please feel free to do so.
Memory Is Faster Than Disk
It's a simple premise, but it's the basic idea behind the storage caching discussion. But memory costs more than disk, so the devil is in the details.
If you're interested in storage and application performance, caching is an interesting topic. Conversely, if you don't care about storage performance, you're not interested in any of this.
Of Reads And Writes
Storage access can usually be broken into four broad categories of I/O access: reads (both random and sequential) and writes (both random and sequential). Very few real-world applications are a pure version of one or another ; most applications tend to slosh around between different profiles.
You have to also think about different lifecycle aspects of a given application. For example, a given application might usually be random read, except when it's backed up (sequential read), or restored in a big hurry (sequential write).
Similarly, you might think of a data warehouse as mostly read-only -- except when it's updated as part of a massive batch job!
Where And How To Cache
Storage caching can actually be done in *three* places: on the storage array, in the server, or (occasionally) in the storage network that sits in between the two.The discussion is further subdivided into "volatile cache" and "non-volatile cache". Volatile cache loses data when something bad happens -- power is removed, or a component fails. Non-volatile cache preserves data in the event of either a power fail or component fail.
The distinction is important -- volatile cache can safely be used for reads, but (generally speaking) shouldn't be used for writes. When an application writes data and gets an acknowledgment, it assumes that the data is safely stored and can be re-accessed when needed.
Bad things usually result when written data goes missing -- you *did* do a backup, didn't you?
Finally, storage cache is part of the I/O subsystem. You should be able to add, delete, remove or replace cache non-disruptively, much in the same way you'd want to do for disks, power, I/O controllers, etc.
Ostensibly enough, a given hunk of data sitting in read cache is a huge performance win -- the *second* time you accessed it. Not surprisingly, it does absolutely nothing for you the *first* time you read it.
Hence you'll see that some of the vigorous debate around different caching approaches have much to do as to whether you're re-reading the same data over and over again, or accessing information relatively randomly.
There's more to this, though. Storage isn't the only thing doing read caching.
Applications, databases, operating systems and file system clients also do read caching as well, for obvious reasons. If a given piece of data is popular, there's a fair chance that it'll end up in server-side cache, and any storage-side read caching will be of potentially less value. It's not unusual to have gigabytes of server memory doing some sort of read caching or another.
Part of the "heatedness" of the debate has to do with enterprise flash drives. Today, they do an excellent job at random read profiles -- there's no disk heads and virtually no latency. And, of course, they can be written to. Physical disk drives do poorly with random read profiles, large read caches only marginally better.
The exception, of course, is if you've got the bucks to create a ginormous read cache, and pull almost all the significant data into memory. Don't snicker -- there are a few use cases where this sort of approach makes sense.
In these approaches, the physical disk becomes the "data of record", and the vast majority of read requests are served directly from storage read cache. But not everyone can afford terabytes of read cache for their data.
This all works well, until we consider writes.
The Value Of Write Cache
As mentioned before, write cache is non-volatile. That usually means that it can preserve data in the event of a power outage and/or a component failure. This also makes write cache much more expensive than read cache, which is why you don't normally find it on servers, and don't find it on some storage arrays.
With a sustained sequential write stream, the value of write cache (also known as non-volatile cache) becomes of limited use. Sooner or later, the cache fills up, and it's the underlying storage devices (whether spinning disk and/or enterprise flash) that end up determining the storage profile. Splitting the I/O stream between as many devices as possible becomes attractive.
However, in the real world, this pure sustained sequential write pattern tends to be the exception, rather than the rule. Most write patterns tend to be both bursty and relatively random. And large write caches help with both.
Write bursts (think database updates or busy file systems) tend to be easily soaked up by write cache. The application is essentially writing to memory, rather than storage media, and you see an eye-opening performance increase as a result.
In addition, random write patterns can be "coalesced" into more sequential patterns that can be written to disk in a more optimal fashion, greatly improving the performance of the back end.
If you've got an application that's capturing changes (transactions, updates, etc.) -- and end-user performance is important -- you'll generally want to consider storage products that have non-trivial amounts of non-volatile write cache.
Then again, if you aren't updating data frequently, or if update performance doesn't matter, write cache isn't particularly going to be a concern of yours.
The Importance of Algorithms
Storage cache is an expensive resource, so you want to use it in an optimal fashion. No simple algorithm will do here; there's an enormous amount of intellectual capital invested around caching algorithms. Not only do you need to do the right thing for a given application, you've got to do the right thing for all applications that might be using a given storage array.
In particular, the notion of QoS becomes important -- how do you keep non-critical applications from flooding cache pools? So some sort of algorithmic discussion needs to be had in addition to "what type of cache" and "how much cache".
And you've got to do it intelligently, and without a lot of human intervention. As an example, the amount of R+D that EMC has put into caching algorithms over the last few decades would be mind-boggling.
Putting It All Together
At the end of the day, it's all about making smart choices.
If my workload was primarily re-reading the same data over and over again with infrequent updates (and these do exist), I would strongly consider a design that had cheap SATA and the potential for large read caches.
If I was concerned about random read performance (much more common), I'd be far more interested in something that supported enterprise flash for part of the workload.
And if I had a part of my workloads involved bursty updates to data, I'd seriously consider an array with non-volatile write cache.
Surprisingly, most customers have a mix of all three -- which is reflected in the way EMC builds its storage array products.
V-Max is a large, shared non-volatile cache model -- works well with both massive reads and massive writes. CLARiiON (which also supports the Celerra), has more modest amounts of non-volatile cache. Atmos and Centera by comparison, does comparatively little read caching, and almost no write caching.
So you're not going to see EMC saying that one storage caching strategy is arbitrarily "better" than any other, unless we have a clear discussion around your use cases and priorities.
I hope this helped those of you trying to follow the discussion ...

Good post Chuck.
Posted by: marc farley | March 04, 2010 at 06:47 PM
Wow, great article. Enjoy reading it. Keep up the good work !!!
Posted by: Roy Mikes | March 05, 2010 at 01:30 AM
Excellent article indeed!
At the risk of turning this post into Storage Caching 201, there are a few nits you might want to clean up:
In the Where to Cache section, the 4th place data can be cached is in the drive itself. This is very important for both disk and flash-based drives, as cache not only assist writes but sometimes is used even to pre-fetch data on reads (often reading full tracks or even cylinders on a block read request).
In the "Value of Read Cache", the cache can be of value the FIRST time the data is read, if the I/O subsystem was able to pre-fetch the data (a large part of Symmetrix' secret sauce has to do with prefetch algorithms). I know you discuss algorithms later, but "first time read hit" is perhaps one of the biggest benefits of an intelligently cached storage array.
On the "Value of Write Cache," you assert that write cache is "non-volatile." It's a nit, but the cache itself isn't always implemented using non-volatile memory (NVRAM or even NAND Flash) - and when it is, the write cache is usually extremely small (4-8GB in the DS8K for example). Vendors take different strategies to protect write data that has not yet been destaged from loss (due to power failure, for example). Most will mirror the write data to two different cache boards/components, but the power failure scenario is addressed in a variety of ways: For example, battery hold-up (as in Hitachi's USP-V) is used to keep the SDRAM powered for several days (36-72 hours tops, I understand). Symmetrix V-Max and CLARiiON use a vaulting strategy - internal standby power provides hold-up long enough for cached data to be destaged to vault drives; previous generations of Symmetrix used a "destage to destination" strategy that pushed the writes out to the target drives under standby power.
This handling of power loss also forces concessions in the write caching strategy. In the NVRAM systems, the maximum of unwritten writes is limited by cache size; in destage-to-destination, the "write pending" limit must be restricted to what can be destaged inside of the SPS window AND is limited by how much data is destined to any single drive. The vaulting strategy affords the most flexibility, since the destage time is fixed - a V-Max system thus can use 100% of its global cache memory to hold writes.
Oh, and unless additional (extrenal) backup power is provided, the system that has 72-hour hold up will simply lose the data once the internal batteries are drained.
As to the importance of algorithms, Symmetrix has rather uncanny ability to self-optimize its algorithms. For example, you'd expect that an Oracle server running with 128GB of (local) SGA cache using storage on a V-Max with 128GB usable global memory would get little benefit from the V-Max cache. But in fact, the V-Max will deliver as much as 80% cache hit rate, a true testament to cache algorithms that can predict what Oracle is not able to cache locally.
Finally, there's a whole 'nother angle to persue, and that's how write caches in servers, network and arrays interact with regards to consistency, backups and replication. Maybe you/we'll tackle that one another day :)
Posted by: the storage anarchist | March 05, 2010 at 07:51 AM
A great post! I was pondering read cache just last night and your article answered many of my questions.
Posted by: Jay Livens | March 05, 2010 at 09:55 AM
I was thinking just the opposite as far as good caching algorithms go, my last storage vendor had an interesting setup in that the disk storage they OEM'd for their NAS system had such bad algorithms that their best practices include disabling the write cache on the arrays, since it actually hurt performance to have it enabled. Instead they just cached more in the NAS layer.
What was even stranger to me is despite having write cache disabled on the disk arrays they still kept batteries in the systems and still wanted to replace them when they expired.
Because of that experience I suppose, their proposals to us for a system refresh, they wanted us to assume a near 0% hit rate for cache, since the workload was so random. They didn't want to believe me when I said even if it's really random cache can really help when organizing/ordering writes to the spindles. The people I was dealing with just didn't have experience with good storage I guess. Of course we ended up not going with their proposal.. They knew NAS hands down, but not much beyond that.
And as for battery backups I think it is pretty creative that some systems include an internal HD that the system can dump the cache to so you you just need a few minutes of battery, then the system can run forever without needing to worry about getting power back.
There was a data center fire here last year at a local facility and power was out for a good 48 hours, and they only got it back online after 48 hours by bringing in generator trucks, it took weeks to restore utility power to the building as a whole. Fortunately my organization wasn't impacted but a couple of my friends were.
Funny enough those friends worked with me at my previous company which I moved out of that facility to another one specifically because that original facility was prone to power outages. Fortunately their storage array was one of the ones which wrote cache to disk before shutting down.
I can only imagine how the people running storage systems relying on battery backed cache(whether it was a storage array or a server with a BBU) felt as the clock ticked by and they didn't know when/if power would get restored.
Posted by: nate | March 06, 2010 at 12:06 AM
Mr. Hollis
I really like the foundation this post provides around storage caching techniques. Unfortunately it fails to cover advanced caching technologies specific to virtual infrastructures.
http://blogs.netapp.com/virtualstorageguy/2010/03/transparent-storage-cache-sharing-part-1-an-introduction.html
Maybe you should add Transparent Storage Cache Sharing to this post. Doing so will demistify the 'magic' available in non-EMC arrays.
Posted by: Vaughn Stewart | March 17, 2010 at 12:00 AM
Vaughn --
I'm posting your "comment" here out of professional courtesy, but I take a dim view of vendors who try to use this blog as a way to shill their latest talking point -- as you are*.
Clever name for an old feature -- "TSCS" -- but, hey, storage cache sharing has been around for over 15 years, maybe it's time for a new name. And let's not forget, NetApp's "cache" only handles reads, and does nothing for writes.
Since the purpose of this post was "Storage Caching 101", I don't think I'm going to be adding vendor-specific views here.
I also think a good understanding of storage caching will be required to appreciate some of the new enabling technology EMC will be delivering before too long, like distributed cache coherence.
The "storage cache" discussion is about to move into an entirely new chapter, IMHO.
(* unless the vendor shilling their latest talking point is me, of course ...)
-- Chuck
Posted by: Chuck Hollis | March 17, 2010 at 05:25 AM