One fun exercise is to take terms and concepts we've been using for many years, and look at how their definition has evolved to a point that they're somewhat unrecognizable as compared to where they started.
And today, I'm going to have some fun with LUNs ...
LUN -- The Fundamental Unit Of Storage Allocation
If you'd like a clinical treatment, I'd refer you to the Wikipedia entry here.
For most of us, a LUN is nothing more than a hunk of storage handed over for use by a server and/or application. At one level, it appears as nothing more than bounded sequence of blocks that can be read or written as needed.
It's about the simplest and most useful storage abstraction we have to work with.
LUNs can be used raw, or have higher-order constructs built on top of them -- like file systems, object repositories and databases. Because it's such a low-level abstraction, any functionality offered at the LUN level automatically becomes interesting to anything above it.
Most people think of a LUN as a physical disk, or a piece of a physical disk. At one level, there's nothing wrong with that visualization, but -- given the rapid pace of technological advancement -- that sort of visualization is becoming severly antiquated.
Let's take a look ...
From Many, One
Early on, people hit on the idea that what looked like a single physical hunk of disk could actually be deconstructed into multiple hunks of disk that acted as a single unit.
The rationale for doing so might be improved availability (insert long diatribe about all the RAID flavors here), improved performance (insert long discussion around wide striping and using lots of spindles) or even cost reduction (concatenating multiple smaller leftover disk slices into a more usable aggregate).
While all of this was clever, it represented a clear separation between the physical entity (a portion of a rotating disk drive) and a logical entity (a container of storage for application use).
And, once that separation was made, there was no turning back.
Enter The Cache
Rotating disks are comparatively slow when compared with different forms of random-access memory. Early in the 1990s, EMC made a name for itself by introducing the notion of ICDA -- an integrated cached disk array.
It was a big deal at the time, based on a simple idea -- use non-volatile memory to cache the popular bits of data -- all writes, and some portion of the reads. The result was a dramatic improvement in performance, while being able to use lower-cost and lower-performing disk drives.
The use of cache is now an important area of differentiation in storage devices, as we'll see a bit further on.
But our definition of a LUN had now evolved to include not only the physical portions of media involved, but some proportion of non-volatile cache resources as well.
Again, further separation of logical from physical.
What You See Isn't Necessarily What You GetAt some point, a lot of people began to notice that an important part of storage inefficiency was over-provisioning. The application and/or server people would ask the storage people for a big hunk of storage, and then only use a small fraction of it.
Part of this could be blamed on the difficulties associated with making storage containers bigger in many server and application environments (since largely resolved), or -- more to the point -- process inefficiencies between different IT teams.
The answer became known as thin (or virtual) provisioning -- the practice of handing over what looks like a big storage container to the application and/or server, but only physically allocating storage once it's written to.
This, of course, saved a good amount of physical capacity, but made the storage team responsible for monitoring overall resources to make sure that logical demands didn't exceed physical resources.
Today, you'll find something like this in just about every modern storage device. And, with its popularity, the LUN became even more abstract.
Although it looked like a single entity, the LUN might be resident on multiple physical devices, might be some composition of disk and cache, and might have both a logical (virtual) view as well as a distinctly different physical capacity assigned to it.
And Onwards To Automatic Tiering
Everyone who works with storage knows that it's usually the case that -- at any given time -- only a small portion of a given data set is active. That's the principle that makes non-volatile storage caching so effective, for example.
Take what's popular, put it on the screamingly fast media. Take what's not being used, and put it on the slower, cheaper stuff.
If you've ever spent time looking at application access patterns, it's much more dramatic than the proverbial "80/20" rule. Sometimes, it's more like the "96/4" rule, with 4% of the data being moderately active, and 96% being infrequently used -- if at all!
The more real-world access patterns you look at, the more you'll see this pronounced locality of reference effect -- both reads and writes.
So, why not create a composite LUN out of different types of storage media, and automatically move the bits around as usage patterns change? This sort of approach is moderately interesting when considering the different flavors of spinning disk (FC, SATA, etc.) where you can see a moderate performance ratio between different spindle flavors.
But the approach becomes downright compelling when you start adding enterprise flash storage into the mix, and start seeing a 30:1 performance ratio (perhaps more) between rotating rust and semiconductor storage.
The primary concern people might sometimes have with these fully automated approaches is reacting to an unpredictable access pattern. Imagine that your storage array has nicely put the idle data on very slow media, and -- all of the sudden -- it gets popular again.
The storage device has to detect this condition, and quickly move the popular data back to a higher tier. That "reaction time" can be a concern in some use cases.
In some use cases (e.g. file systems and object repositories), that's not a huge deal -- within a few seconds, the array has done its job, and everyone is happy. But for a performance-oriented application, even that sort of delay isn't acceptable.
Our friend non-volatile cache often comes to the rescue here, by masking physical media location while the array is busy re-optimizing. Where it's the large non-volatile cache found on the Symmetrix, or the newer FAST cache feature on the CLARiiON -- one way of thinking about this is as an "insurance policy" if unpopular data suddenly becomes extremely popular.
The net effect is that storage administrators can get far more aggressive in "down-tiering" their storage pools (increased use of cheaper and slower storage) because the performance penalties associated with a surprise are masked by a moderate pool of non-volatile cache.
Once again, our definition of a LUN has to be extended -- a dynamic mix of multiple media types (physical disk and flash), non-volatile cache, as well as algorithms that automagically shuffle the bits based on usage patterns and pre-defined policies.
Our quaint notion of a LUN being physical storage unit is starting to look dangerously outdated, no?
Squeeze Me!
Many larger storage objects like LUNs have a lot of "white space" that's amenable to compression, or in particular use cases, redundancies between the storage objects themselves.
Enter the whole topic of compression, deduplication, linked clones and other mechanisms to eliminate redundancy. These approaches first became popular in the backup world due to the preponderance of redudant information (think DataDomain and Avamar), then found their way into tier-2 storage (think file server dedupe) and starting to make their way into primary storage use cases (CLARiiON's recent LUN compression comes to mind).
But this concept isn't limited to storage arrays. For example, many of the database and data warehouse tool vendors are starting to include compression as a native feature -- not only to save space, but to accelerate the transfer of data between storage and server. Flavors of compression features are starting to show up in operating systems and hypervisors, for another example.
And I'm sure we'll see more of this going forward.
Going back to our proverbial LUN, it now has a set of compression attributes associated with it. Logical and physical are now even further separated.
Thinking Outside The Box
Up to now in the discussion, LUNs have largely been entities contained within a single storage device. What happens when we start breaking that constraint?
The first round of external storage virtualization attempted to pool array-based storage LUNs, and offer some limited forms of dynamic movement, much like a volume manager running in the server would do. But these approaches were inherently static in nature -- they really hadn't transcended the physical devices that stored them.
Enter the concept of federation: multiple devices acting together as a single, dynamic pool -- either in a single data center location, or across progressively longer distances.
The recent announcement of VPLEX perhaps best characterizes storage federation in the LUN world -- storage containers are now largely independent of not only physical device, but potentially geographical location as well.
LUNs can now freely flow from array to array, or data center to data center as needed. Going further, the same LUN can be easily be presented in two physically separate locations if needed.
Our old friend the LUN is now mobile -- it can move, and can actually be in two places at once. A neat trick that most people are just now starting to get their head wrapped around.
Where Does That Leave Us?
Our friend the storage LUN has now come a long way.
It's been disassembled into constituent physical pieces, reassembled in a variety of useful ways, fully automated across multiple media types, cached aggressively, made thin and made compressed, and now fully mobilized across increasing distances.
Forgive me if my over-simplified story has left out some key aspects -- feel free to comment below.
I find it interesting to occasionally look back, and just see how far a simple, basic construct like the LUN can evolve in the face of rampant technological innovation.
I wonder where the LUN is going to go from here?

Hi,
I hope that the V-Max LUN's will some day make the leap into 2010 and be expandaple without making it into a meta lun. I don't know if there are still other arrays where concatenation of lun's (meta) is required for expanding lun's on virtually provisioned pools.
Example: largest hyper size on v-max is ~256 GB. Anything beyond this, you have to make it a meta lun.
And also, as we all know, the lun has to be created as meta lun in the beginning. Otherwise it is not possible to expand it afterwards. So, on example a Windows host, all luns must be created as metaluns when shown to a host, in order to be able to expand it later. If I show 10 GB to a windows host, I must create ie. 2 x 5 GB meta lun. When later I want to expand it, I must add (concatenate) another meta members into it. How can this be, even with virtual provisioning? This is from the graves of storage management.
Imagine if it was like this: I want to provision 100 GB to a windows host. I give it a 100 GB lun. After 12 months, I want to expand it into a 500 GB lun. I just tell that expand the lun with 400 GB. No creating meta's, no fuss... And all this with de facto virtual (thin) provisioning or whatever you like to call it.
I sincerely hope that this is in the works, and at least noted as a critical thing to add...
Posted by: soikki | June 07, 2010 at 03:05 PM
Hi Soikki
Why I can't argue with you that "boy, wouldn't it be nice to never have to plan ahead on storage configurations", I think that all arrays require some modicum of up-front planning in regards to how you expect to be using it.
Sure, we as vendors should eliminate as many of these "gotta think about it up-front" restrictions, but they're everywhere in every storage product.
Quick question -- why wouldn't you just use virtual provisioning everywhere?
I'm not sure you're 100% up-to-date with the product's capabilities -- some of what you say doesn't jive with my own understanding, so I'm going to ask someone to check and hopefully get back to you soon.
Thanks for the comment.
Posted by: Chuck Hollis | June 07, 2010 at 03:16 PM
Hi Chuck,
and thanks for the SSD-speed reply :)
We are using virtual provisioning everywhere on v-max, but it still requires creating meta lun's. I guess this comes from the legacy code or something similar.
When you have hundreds of windows hosts using your storage array, how do you plan ahead which of the thousands of luns would require epanding later, and which would not? Let's stick to the subject.
This is not about planning ahead, this is about poor usability, and lack of understanding how large storage arrays should be managed, if this is not going to be changed soon. Or then just the technical issue.
Meta lun's cause herendous storage management overhead; thousands of hypers/luns to be managed vs. hundreds withoud having to have meta's.
We have been using pools and wide striping on enterprise high end storage arrays for 1,5 years now so I have some real-life experience on those (v-max for a shorter while but my technical statements here are valid)
Maybe blog is not the correct place for this conversation?
Posted by: soikki | June 07, 2010 at 03:37 PM
Soikki
You've exceeded my detailed product knowledge. If you're interested, I can put you in touch with some of the product architects to (a) offer up your opinions in the hopes of improvement, and (b) check to make sure you are aware of all the capabilities in the product.
I'd encourage you to take me up on this, if possible. If I have been persuasive enough, please drop me a line at hollis (underscore) chuck (at) EMC (dot) com.
Thanks!
-- Chuck
Posted by: Chuck Hollis | June 07, 2010 at 04:12 PM
I look forward to the day when LUNs die and I can stop worrying about what makes them up, and can just ask for "some storage".
As you point out here, we're so far abstracted from what a Logical Unit Number originally was that I'm astounded we haven't killed them off in favour of something better suited to today's storage needs.
Posted by: Justin Warren | June 07, 2010 at 06:29 PM
I had a similar experience as Soikki on a win2k3 machine recently, but from the opposite angle..
My storage system doesn't have an issue expanding LUNs on the fly so when the DBA asked for more space for his SQL server I thought with thin provisioning why not just give him something big like 4TB(current LUN size was 1TB). So I expanded the LUN to 4TB. Presto! the OS decided it didn't want to see anything beyond I think the original 1TB size(memory is kind of foggy). It came down to the MBR on the volume needed to be GPT instead of the old DOS style. Which at the time I didn't think of of course just goes to show.. ended up creating a new LUN, making it GPT and migrating data to it and removing the old.
Ran into another similar issue with vSphere, in trying to map a 6TB LUN via raw device maps to a VM, vSphere wouldn't have it. So I had to split it up into 3x1.9TB LUNs and use LVM at the VM level to get the size I wanted.
Given I was using RDM I wasn't expecting any sort of storage limits, but I guess since VMware uses VMFS volumes to map the RDMs the limits of VMFS file size kicked in when I tried to go beyond 2TB.
So even if LUNs become infinitely flexible it seems the server end of things still need work to catch up.
Posted by: nate | June 08, 2010 at 01:20 PM
Soikki (et al) -
We hear you - loud and clear.
Metas are an artifact of Syymetrix' roots as a mainframe CKD storage array. And we know that MANY of you don't like them.
Symmetrix Engineering has been systematically overhauling the internals of Enginuity and they have made tremendous progress over the lifetime of the DMX family and now VMAX.
In recent history (for example), the time to allocate and assign 1TB of Symmetrix storage to a single host has been reduced from about 1 hour in 2007 to less than 10 minutes in 2010 - and the team won't stop until storage can be ready for first I/O virtually as fast as you can switch from the "allocate storage" window to the "mount storage" window on your servers.
The transition to the use of Virtual Provisioning for ALL applications has begun, and features like Auto Provisioning and Fully Automated Storage Provisioning are foundational to the future of Symmetrix.
Simplifying everything to "give me X GB of class Y storage for server Z" is a vision that is now within our reach; everything that is getting in the way of reaching that target (e.g., Metas, config changes, etc.) will have to be addressed along the way.
Given the size of our installed base and the applications that depend upon Symmetrix, we pretty much have to perform this overhaul with precision; sort of like changing the engine of a race car without leaving the race track.
Oh, and FWIW, Nate is correct - not every file system or volume manager is readily able to take advantage of LUNs that can be expanded...in fact, this was added to Windows only within the past couple of years (Windows Server 2K8 and Win7 were the first Windows platform to support this, I believe).
Thanks for your patience and feedback - we ARE listening!
- Barry
Posted by: the storage anarchist | June 16, 2010 at 04:07 PM
Hi,
sorry I've been away for a while. I'm happy that you are listening... I recognize that your vision is correct and I hope that it will be reached soon. Maybe with the FAST2 -release which I hope is re-write of the code.
As for the comments of OS-versions supporting lun extension, pls do not underestimate the importance of this matter. Winodws 2003 accepts lun extension online, later Linux as well... AIX too... These probably account for majority of the SAN-connected servers out there.
Posted by: soikki | July 13, 2010 at 05:54 PM