In the spirit of 2010 predictions, I'd like to offer up the prediction that a very specific term will become important during 2010: federation.
It's inevitable, as many of us will have to begin thinking about how pools of resources -- frequently separated by distance -- will communicate and cooperate to act as one thing, rather than a disparate pool of uncoordinated resources.
Perhaps the most interesting pathway into the new "federation" discussion is to review how server virtualization has evolved -- something most of us are familiar with already.
Several years, all the discussion was around low-level hypervisors: layers of server firmware that logically partitioned underutilized servers into multiple virtual machines.
VMware successfully changed the game by re-framing the discussion as "pools of servers", rather than individual servers. Indeed, many of the early attention-getting features of VMware were around this foundational theme: Vmotion, DRS and more lately dynamic power management and software fault tolerance.
None of these server virtualization features are particularly interesting or useful when limiting our thinking to a single server. However, they all become very compelling when we start thinking about pools of servers, whether in a single location or in multiple locations.
This line of thinking can get you to a working definition of federation: multiple, disparate pools of resources, cooperating together around one or more shared purposes: load balancing, disaster recovery, etc. Indeed, many of the "cool topics" in VMware-land revolve around notions of federation -- SRM, long-distance Vmotion, etc.
And, yes, the industry cloud buzz is also leading us into a broader discussion around federation principles. Why? We want our clouds to work together, don't we? We'd like to be able to federate our internal IT resources with one or more cloud providers; and it logically makes sense that cloud providers would want to federate amongst themselves.
This is not a new idea by any stretch. Squint your eyes, and you'll see successful examples of federations models almost everywhere: telephony, networking, air transportation, global finance, etc.
With mature global infrastructure, it's usually the same end result: multiple entities working together to achieve a shared goal.
What About Storage?
By comparison, federation concepts in today’s storage world are extremely primitive.
We've got the traditional "storage virtualization" discussion which is about as interesting as a commodity hypervisor, by comparison. Nice, but doesn't really address the foundational issues.
We've got different forms of long-distance storage replication, but as we consider these models, we notice that they're extremely brittle and inflexible, in that they have hard definitions about source, target and roles. Not what we want for our "liquid pool of geographically distributed resources"
We've got various mechanisms that present global names spaces, i.e. attempt make everything look like one, giant liquid pool, but not address the underlying challenges of information logistics and coordination.
And maybe a few things I've forgotten about ...
Clearly, there are pieces and parts lying around the industry for a storage federation model, but none of them are as usable as we'd like them to be in their current form.
Much like Vmotion changed the way we think about server virtualization, I will argue we're going to need a similar foundational abstraction for the storage world, and we're going to need it soon.
Concepts Of Federation In The Storage World
So, let's put a shopping list together of what we'd like this new foundational storage abstraction to do. It's a fun exercise.
First, we'd like our new abstraction to consolidate, pool and abstract arbitrary physical storage resources -- much the way that traditional storage virtualization does today, but with far more scale, flexibility and functionality. And, of course, we'd like to have choices as to whether this existing inside the array, or externally. Choices are good.
Second, we'd like to go beyond today's source-target remote replication model. We'd like to think more in terms of storage nodes and locations that can play multiple roles dynamically, rather than the more simple "you're the recovery site".
Third, we'd need the "flat name space" view across all (storage) resources -- some sort of global LUN namespace that would facilitate identifying and orchestrating different pools of storage resources.
There's probably a few other cool things we'd like (like cache coherency to support multiple concurrent writers), but that's a good start on our shopping list.
I Dub Thee ...
Since we're talking about something relatively new, it'd be helpful to coin a phrase that helps identify what we're specifically talking about, and keeps us from talking about other stuff.
Distributed Storage Federation. Yep, that's it.
Distributed storage federation will be a new foundational storage abstraction that enables multiple pools of physical storage to cooperate around new classes of use cases. Ideally, it would be built on block-level abstractions, since (a) that's where the hot transactional data is, and (b) it's easy to build more convenient abstractions (file, object) on that foundation.
It fits in neatly under a related concept I'll most certainly be talking about during 2010 -- virtual storage. Put differently, distributed storage federation is just one of many unique attributes we'll see supporting the concept of virtual storage.
Who's Going To Want This (Theoretical) Stuff?
Not every new storage technology is immediately in demand by everyone. Lots of different use cases for storage, so it's always worthwhile to spend a moment to figure out who the early adopters might be.
The big category that jumps out here is IT organizations that do business in multiple time zones. Regardless of whether they own multiple data centers, or rent them, there's a certain audience that knows how hard it is to mix distance and synchronized information. They'll be very interested in these concepts, to be sure.
For those of you that follow the EMC product portfolio in some depth, you'll realize that many of these concepts already exist in the EMC Atmos storage platform, albeit for abstracted information objects. And Atmos has already found a strong following for some pretty interesting use cases.
Now, imagine the concepts extended for hot, transaction data and generic block devices. And let's include abstracting away any synchronization or coherency issues while we're at it. Do this for block storage, you can do it for files and everything else.
If you think about it, these concepts could influence some pretty big numbers for larger IT users, like "how many data centers will you need" and "what can be done across multiple data centers". And, since not all data centers are owned, we've got another interesting piece of functionality for private cloud models and service providers as they emerge.
But I think there's a sub-audience of IT organizations that do business in a single data center that want to build "bigger pools" of storage. Being able to do traditional sorts of storage virtualization things (without traditional sorts of storage virtualization limitations!) might be appealing to them. And the ability to flex the model to incorporate significant distance would be an added plus.
And let’s not forget, many IT organizations – large and small – like the idea of being able to flex workloads back and forth to compatible service providers, the whole “private cloud” thing. Well, moving a set of virtual machines is one thing; moving the terabytes they need is something quite different …
Coming Full Circle
It took several years for most people to understand the full potential of an important foundational server abstraction like Vmotion. It's still changing how we think about compute, and has a long and rich road ahead of it.
In the storage world, the advent of new storage media (flash) and new automatic tiering methods (FAST) have started to change how people think less about physical storage, and more about virtual storage.
Will this new potential abstraction -- distributed
storage federation – change how people think?
It should be an interesting year :-)

Chris -- you asked "How does a storage federation differ from a storage cluster?" on Twitter, which is a good question.
My answer would be "a cluster is but a single -- somewhat restricted -- example of a broader set of potential federation models".
For example, if we look at the typical VMware cluster, all the components live in the same place, run by the same person, and all 100% dedicated to the same set of tasks.
Under a broader federation model, not all the components would need to live in the same place, they would not all have to be administered by the same entity, and they wouldn't have to be 100% dedicated to the same set of tasks.
Let's jump from IT to airlines as an example.
In the US, Southwest runs a fleet of identical 737s. I'd call that very similar to a "cluster". However, when I book a ticket with United to Europe, I might be code sharing with multiple airlines, and using all sorts of different aircraft.
Back to IT again.
Imagine I'm running a fleet of 1,000 virtual machines. Some of them live here, some of them live there. Some of them are administered by my own staff, others by my service provider. And not every physical server and/or storage array in my 1,000 VM complex might be 100% dedicated to my needs, or identical in terms of make and model.
Hope that helped more than hurted :-)
-- Chuck
Posted by: Chuck Hollis | January 04, 2010 at 11:25 AM
I don't follow many blogs, but yours is of the. Please continue the great work. Regards!!!Keep up the nice blogging.blogs
Posted by: lindahuang | January 06, 2010 at 09:58 PM
In your opinion, where are we with respect to interoperability requirements implied by federation?
(assuming multiple vendors/storage pools in picture similar to the multi-airline example you cite)
Posted by: Sameer Deokule | January 18, 2010 at 04:03 PM
Hi Sameer
Today? Multi-vendor interoperability at any level is very impractical, except, perhaps, network protocols! Somewhere in the stack there needs to be a homogenous layer that orchestrates its peers.
If we take this to storage, this presumes a software layer that orchestrates multiple flavors of storage around a shared set of outcomes: performance, cost, separation, etc.
Not that I would preview anything EMC is working on ...
-- Chuck
Posted by: Chuck Hollis | January 18, 2010 at 05:21 PM