Maybe you saw this obscure news about an even more obscure industry standard: XAM.
Although I think I've met my quota for blog posts this week (4 so far), I just couldn't resist doing one more, and -- besides -- I'll be travelling next week.
So, what's the big deal here? And why are certain customers watching this one very closely?
Let me try one of my infamous over-simplified explanations.
So, What's The Problem?
Actually, the thinking behind XAM (eXtensible Access Method) address three distinct problems in information management, at least the way I see it.
The Metadata Problem
Metadata is information about information. Simple metadata might include things like file name, owner, last access time, length, and so on.
Moderately advanced metadata might be things like application that created it, sensitivity level, retention policy, maybe a few searchable conceptual tags, and the like.
And, like everything else, there's very complex metadata that's entirely possible as well, including what business processes might be interested in this information, who's seen it and for what purpose -- you can get very imaginative when it comes to metadata.
And, as information becomes more and more important, not surprisingly there's more and more interest in having more metadata around to figure out what you've got and what to do with all of it.
Funny, you'll often hear people say that it's possible that we'll end up with more metadata information sloshing around than information itself. I used to think that was outlandish, but not anymore.
And, if you like to play conceptual games, it's easy to imagine meta-metadata (metadata about metadata), and so on.
Up to this point, most useful metadata had to live in a repository of some sort. One of the key features that Documentum provides is that it can organize metadata (and meta-metadata!) without storing the information itself.
But there's a very strong case to have certain kinds of metadata stored with the file or object itself in a permanent and inseparable kind of way. If there's a security concern with a file or object, you'd probably like to embed that sort of information within the object itself, rather than trust an external repository that may or not be around.
Same with retention policies, or potentially a few other interesting areas.
And, of course, this opens up the possibility that the device (or service) storing this object can act on this metadata to provide value-added services without applications getting involved. Securing it. Retaining it. Shredding it. Searching it. Making it available to other applications that might be interested. Use your imagination.
So the industry needs a relatively standard (and application independent!) manner of encoding this information as part of the stored object. That's one aspect of XAM.
This idea was first introduced to the industry by EMC via Centera, and is one of the key tenets of CAS.
Everytime you create an object on a Centera -- it creates some extended metadata, but applications and users are free to create more if needed. And the metadata and the object are bound together. And a Centera can use the metadata if you like to provide some value-added services, some of which I mentioned above.
And, friends, file systems aren't the answer to this particular problem.
The Filesystem Problem
Simply put, the technology is not keeping up with requirements. Today, we work with people who are already responsible for billions of stored objects, with more coming.
Creating lots of little file systems (even TB-sized ones), and managing them all just won't work. Anyone want to predict how long a PB-sized fsck (file system check) would take? Or how many filesystems you'd need to store a petabyte? Or how you'd manage all of that at an application or infrastructure level?
Having applications store pathnames to files just isn't workable at scale either.
What you'd like to have is what I call a "claim check API".
If you're an application, you make a call to an API that you'd like to store an object. You get back a unique claim check, which you store away. When you want the object back, you present your claim check.
You -- as an application -- have no idea of how or where it's stored, nor do you care. You don't see a file system. The "claim check API provider" takes care of all the details of how and where it's stored.
All of the sudden, managing petabytes of information and bazillions of objects just got a whole lot easier.
That's another key concept you'll find that was introduced by Centera, and is another core tenet of CAS.
EMC provided a simple API to application developers, which they used to store and retrieve objects.
I've lost track of how many applications support the API, but at one time there were several hundreds who had done this integration, so I am led to believe that it couldn't be that hard to implement ...
What we found was that there were a couple of benefits from this approach.
First, the environment was far easier to manage -- both from an end-user perspective, but also from an application perspective.
Second, there was an opportunity to scale without pain.
Third, the metadata could be put to use in all sorts of interesting ways that file systems couldn't hope to achieve.
And finally, IT had a respository where everything was stored, instead of lots of separate information puddles.
So if the future to this problem isn't file systems, and isn't database, maybe it should be something like XAM.
The Industry Adoption Problem
Centera -- and its API, and its management model -- have been one of the huge successes not only for EMC but in the information management industry.
But, let's face it -- it's an EMC product with an EMC-defined API.
Yes, we're really nice guys, and we're more than willing to share, but this thing ain't gonna take off until the technology (or its specification) is in the public domain, and everyone can play by the same rules.
We recognized that several years back, and started to work with standards organizations (such as SNIA) to define what an open Centera-like API might look like. Lots of people joined in, including some people we fiercely compete with. The idea gained support, and today, we have the draft XAM spec.
But the thinking was pretty clear -- this approach to managing enormous amounts of objects wasn't gonna take off unless everyone wanted to play, and that's what XAM is addressing.
Should You Care?
That depends, doesn't it?
If you see yourself storing millions (or billions) of objects in the future, and having to manage all that stuff in a useful manner -- yes, you should care about XAM, and be rooting for its adoption by both storage vendors and application vendors.
You want to live in a world where you're not wrestling with hundreds or thousands of file systems (or databases), and -- at the same time -- there are lots of applications that work with a shared spec, and lots of value-added storage choices on the back end.
Maybe sorta like what SQL did for the database world.
And How Do I Find Out More?
Well, I'm sure there are good resource over at the SNIA website where you can learn more. It's at draft spec stage now, which means that the engineering work is pretty much done, and the political process has begun.
History has shown that with popular standards, you'll see products coming to market in advance of a finalized specification. I think that will be the case here, and I'm waiting for the first slew of vendors (other than EMC, of course) to announce support for the standard via products.
Or, if you can't wait to find out how all this stuff works, you can always check out a Centera ;-)

Comments