As a category, NAS (file-oriented) storage has done extremely well in areas where SAN wasn't a good fit.
But every technology evolves (or gets disrupted!), so it's a fair question -- where does NAS go from here?
And I don't think there's a consensus -- there's a very wide diversity of opinions.
So, let's take a step back, and see where it all might go.
Why Is NAS So Popular?
I guess I got a ringside seat during its initial evolution. Way back when, all the interesting storage was high-performance, block-oriented stuff. That's where the action was (at least for EMC) at the time. Who'd be that interested in files?
But, over time (and to this day), a couple of trends played well into the growth of NAS:
- rapid growth in file-oriented data (as compared to databases and such)
- attractiveness of an easier management model and cost-effective storage networking (think TCP/IP and ethernet)
- improved performance of NAS devices that allowed more and more traditional block-oriented uses (smaller databases, email servers, etc.) to run comfortably on NAS.
We also saw this aggressive small company (NetApp) getting out there and getting business in areas we weren't -- smaller companies, non-critical storage, and so on. And, given our extremely competitive DNA, that got us motivated as well.
So where does the discussion go from here? Lots of threads, but an important one I think is worth focusing on. I probably haven't done a good job of categorization, or covered everything, so don't be too critical, folks.
Let's take a look ...
A Better NAS Box
In this camp are box vendors (including EMC) that are trying to build a better box. Faster. Cheaper. More reliable. More features. Easier to use. Lots of players here, large and small, including folks like Microsoft and Sun.
And, trust me, there's always room in the market for a better box.
But that's not where I think we'll see the real innovation going forward. The model's well understood, and there are lots of great alternatives for folks to choose from, with more choices every day.
Bandwidth NAS
In the perpetual quest for ever-faster NAS (think high performance computing, or digital rendering, or anywhere where you'd like a bunch of bandwidth), there are various approaches to lashing together multiple NAS devices to deliver superior performance.
EMC has pursued two courses here. First, we've had a product for a while (MPFSi) that presents a file interface to clients, but does data delivery using block protocols, either FC or iSCSI. The flexibility of the NAS model, with the performance of SAN. Stunning performance numbers as compared to traditional NAS models, where we were no slouch either.
The emergence of pNFS (similar in approach to MPFSi) also will potentially take on additional workloads: not only more bandwidth-oriented applications, but potentially more transactionally-oriented workloads as well.
Secondly, we've partnered with a few folks who do SAN file systems (such as Ibrix on CLARiiON) to fill in the use cases that we can't get to with our own products.
There will always be a market that pays for performance and scale. And I think we'll see a few vendors who carve out a specialty in this space, and continue to invest to get bigger and faster over time.
NAS As Feature
And, in smaller environments, there's strong interest in combined NAS/SAN products that can do both: high-performance / low-latency SAN for the environments that need it, and NAS for everything else.
Call it unified storage, call it whatever, it's just basically a converged environment that keeps you from having to buy two things instead of one. Microsoft has done a great job with their product in this category, as has NetApp.
I happen to be of the belief that a box designed from the ground up to be a file server will only do a mediocre job of being a good SAN box, but there are many, many environments where that consideration will be outweighed by convenience and simplicity.
That thinking is reflected in EMC's product line. Yes, we do iSCSI-over-NAS just like everyone else, but if you want a *real* SAN, we carve up a CLARiiON between NAS and FC SAN.
All well and good, and an interesting part of the market. But I don't think we'll see a lot of radical innovation here, either.
And A Few More Wrinkles
There's the folks who believe that the secret sauce going forward will be around new-generation filesystems, like ZFS. The belief is that servers running them will be better NAS boxes.
I beg to differ with this point of view, because being good at NAS takes more than a cool file system running on a server.
And there are those who think this sort of stuff is ripe for commoditization by either progressively cheaper appliances, or rolling your own with leftover servers and a nice open source stack. Sure, we'll see some of this, but I don't see this happening in a big way.
And there's a line of thought around NAS clusters that can share loads and act as one. Sometimes this is called grid storage, sometimes something else. Some of the analysts have started to use the acronym FAN (file area network), but I'm not 100% sure what they mean by it.
And, of course, there are a few interesting incremental technologies coming into this space: 10gE, data reduction techniques like dedupe, compression and single instancing, and maybe even Infiniband. None of them will likely change the game substantially, and will probably be more like the natural evolution of things.
All interesting. All likely to be in the market in some form. But I think each will miss the mark in meeting what the mainstream of corporate customers will want.
So Where Does It Go From Here?
First, I think that most organizations are waking up to the fact that the next battleground in information management will be in file systems.
They've got a potential handle on databases, and email -- but no control over the file environment.
File systems growing much faster than other forms of information, and -- as we get better at understanding information security risks -- we'll realize that there's more than a few time bombs waiting out there. There's also a huge treasure trove (or cost sink) around information that's poorly understood.
Put simply, getting control of files will be more important than simply storing them.
Tiering will become more important. Archiving will become more important. Backup and recovery will become more important. Classification will become important.
Managing files as a corporate information resource will become more important.
But to do all of this, you'll need a control point. Something that can look at all your files and make a few decisions.
Should this file be backed up? Archived? Replicated? Deleted? Wrapped in a DRM envelope? Fed to a repository? Indexed for search? Is is a duplicate? A near-duplicate? Can it be compressed? And so on.
Lots and lots of potentially attractive features -- but what's the control point?
Today, there are three major options for control points: client, server and appliance.
In the client model, you load functionality into everything that touches a file. Think windows file servers, desktops, etc. As an example, EMC's DiskXtender can be used this way. Works well, but at some point, you end up with lots and lots of instances of DiskXtender.
In the server model, you buy NAS devices that have the functionality you're looking for, either embedded, or exposed as an API that others can take advantage of. As an example, in addition to EMC's Celerra native capabilities, it has an API (FileMover) that allows the integration of all sorts of interesting functionality, like archiving.
Most organizations will have multiple NAS devices, so management of all this functionality could get interesting. Going a bit further, this sort of approach limits some of your choices -- I have to choose specific NAS devices to get this sort of functionality.
Hmmm, maybe that's why they did it?
The appliance model is when you buy specific functionality in an appliance, and point it at your file systems. The appliance mounts them up, and looks through them for decisions to make. Usually, you're talking about multiple appliance -- different appliances offer up different functionality.
But there's another important consideration that I think will become more important, and that's immediacy. Today, most appliances process files after-the-fact. The file is created (or updated) and at some point an appliance comes through and decides to archive, or index, or whatever. As an example, this is how EMC Infoscape operates today.
Delayed processing is better than no processing, but -- at some point -- I think the market is going to want near-immediate categorization and processing of files -- especially when it comes to information security.
Why File Virtualization Is Interesting Today
Today, EMC offers Rainfinity, and it's extremely popular.
We walk into an environment that's got file servers proliferating everywhere. Utilization is poor, resources aren't optimized, and IT's efforts to re-rationalize the enviromnent are stymied by the spectre of a long and painful migration. More often than not, it's a NetApp environment.
Simply put, Rainfinity abstracts logical file systems from physical ones.
Users see the file systems they've always seen, yet IT is free to move things around (new file servers, consolidated file servers, etc.) and do so without impacting users -- including moving stuff when it's being used.
It's pretty agnostic as to client and NAS device. And, because it selctively filters, it doesn't get in the way of most file server traffic, and native NAS functionality (e.g. replication) is exposed as it's always been.
I like it because it's one of those bounded, high-payback IT projects. Walk in, do a quick assessment, install the product, move things around, and -- voila! -- quick payback with no user pain. Few things in IT life are as simple.
But -- quite correctly -- it's being positioned today as a migration tool to make pain go away.
However, I don't think most people realize how strategic file virtualization could be in the future.
Why File Virtualization Will Be Very Interesting Tomorrow
In the process of doing file virtualization, architecturally the customer has established the all-important control point for every file server in the environment.
That's big.
Rainfinity can potentially filter (as well as virtualize) all sorts of file activity. It does this in a network-centric manner, with very little sensitivity to who's the client and who's the server.
And, architecturally, it can do this relatively cost-effectively without the need for industrial-strength network computing power to keep performance up.
So, where can this new control point go from here?
Well, today it's already being used to provide certain forms of remote replication that are independent of client and NAS device. Like RecoverPoint does for block devices, Rainfinity's replication is network-based and completely agnostic. That makes it architecturally attractive.
And today, it offers a few useful modules for things like capacity utilization, so you can get a quick view of shared file resources.
Recently, it started to offer simple archiving capabilities (either between different tiers of NAS devices, or perhaps to EMC's Centera). The policy is self-contained today, but -- over time -- will probably be opened up to more sophisticated engines.
Now, it's this last feature that begins to show the power and the potential of file virtualization as a control point -- something that can offer multiple types of payback over time.
Let's say that I want to establish a tiered file environment (including archiving) and had to do it across lots and lots of NAS devices that were already on the floor.
I start by defining a few tiers of service (high, medium, low, archive, etc.). I inventory what I've got (files and devices) and figure out if I've got enough capacity, or need some more.
I implement Rainfinity to move everything around to the right place, and users don't notice. I probably free up a lot of capacity in the process. First payback achieved ...
I now turn on simple automated policy moves (e.g. no one has accessed this file in 30 days, so it's going to a lower service level). I implement it once, in one place, for all my file systems. Slick! Second payback achieved ...
OK, now let's say the game has changed -- legal has come to you and realized that there are files out there with sensitive data that's unprotected.
How do you find them? Well, you have half the problem solved already -- you have a single name space and can see everything.
You might add something like EMC Infoscape to do the pattern recognition and disposition, potentially wrapping everything you find with a DRM envelope and leaving the stub for file access. Users will be authenticated when they try and access it -- wherever it goes. Third payback achieved ...
Or, let's say that you want to do anti-virus scans on every file system. Or perhaps index every file so it's searchable. More and more paybacks -- all built on a single architectural control point.
Now, today the integration between Rainfinity and these other functions are casual at best -- they work in the presence of each other. But there's opportunity for more integration, and it's not rocket science.
Shfiting to traditional NAS, as an example EMC Celerra can "quarantine" an updated file until it's looked at, usually in the context of anti-virus. It's called CAVA, I think. It establishes a control point for certain aspects of file management. but only in front of the files it stores behind it.
Well, no reason that particular "quarantine" feature couldn't be implemented at the file virtualization level. Now you have a "quarantine" capability for every file server in your environment -- without touching a single legacy file server, or a single client.
You might decide that different file domains require different flavors of file management. Engineering and legal want everything archived. Marketing has lots of variations of the same files with small changes. Generic user space gets aggressive tiered and de-duped. Strict security policing in other areas. Or versioning to allow a rewind in case of an error.
Use your imagination.
All implemented independently of who's accessing it, or where it's stored -- all functionality (and a single control point) for file management is in the network, and not at the endpoints.
Kind of an attractive scenario, no?
Will It Be More About File Storage, Or File Management?
Yes, NAS will continue to evolve. We'll see more innovations in the devices themselves, and that's good.
But I happen to have the strong belief that most organizations will be increasingly interested in file management over and above simple file storage.
Customers will start to look at the functionality they need today and in the future. Some of it will be to save money, some of it will be to create new value, and a big piece of it will be to avoid new forms of information risk.
And they'll look at a variety of ways to implement it in their environment.
Some will go down an ad-hoc road of bits and pieces here and there. They'll use combinations of client agents, or embedded NAS device functionality, or perhaps a mish-mosh of various post-processing appliances.
Others will look at file virtualization, and say "hey, this makes real sense to put it here".
And, not surprisingly, I've started to run into a few smart customers who've independently come to the same conclusion.
And I think I'll be meeting more in the future.

re: FANs
Just to follow-up on your question about the thinking around the "FAN" descriptor --
Rainfinity would qualify for what is currently being discussed as File Area Networks in SNIA. This is still a taskforce analysis work-in-progress, but the bottom line comes down to any device that exports a file system (like nfs or cifs) and consumes file systems (like nfs, cifs, or XAM), provides a single federated name space and does not provide any storage itself. Migration, archiving, etc are possible add-on services from this perspective.
-edgar
Posted by: Edgar StPierre | June 21, 2007 at 02:05 PM
Thanks, Edgar, for the precise description. Makes sense to me!
Posted by: Chuck Hollis | June 21, 2007 at 02:44 PM