It doesn't take too long at looking at the overall information management problem before you quickly realize that metadata (information about information) is one of the important keys to all of this.
With the right metadata, it's not hard to imagine a world where by simply looking at the metadata, you could figure out where to store it, how much protection it needs, retention, security.
Or, in terms of value-generation, what are the key attributes of this piece of information that might make it valuable to other parts of the business? Easy to find and use?
And then you get into the gritty details of how, and why ...
I was first intrigued by this discussion when EMC started to work on ILM (information lifecycle management). Metadata (or tags) could help a lot in the day-to-day management tasks associated with information.
But it's not as a simple or straightforward as we all would have hoped.
Can Labels Be More Important Than The Thing Itself?
In the real world, sure. Five grams of arsenic powder isn't worth much -- the label that screams "POISON!" -- priceless.
When it comes to information, the answer might also be "yes".
A simple thought exercise might help. Imagine a JPG of a group photograph of several important people. Hard to tell -- without some context -- what it all might mean.
Tagging the picture with who's in the picture might help. That tag could lead you to biographies and context of the participants, if you're interested. A bit of when, where and why it was taken would help as well, for similar reasons.
It might be copyrighted material -- tagging it as such would be useful. Or it might be one of dozens of pictures of essentially the same group, and it's not the best shot. Useful to know that as well.
Or, being a bit more sinister, perhaps the photograph is a record of a meeting that might not want to be widely publicized by its participants. Being able to capture that would be important as well. Or maybe you'd like an audit trail of who saw it, and where it was used.
OK, maybe this is not the most perfect example, but you can quickly see how the tags (metadata) might end up being more important than the thing itself.
It's Easy To Think Of Metadata As A Panacea
Gee, this could solve a lot of big problems, couldn't it? Saving money on storage and backup, securing information appropriately, generating audit trails of who is using what where, finding valuable information more quickly, etc.
Yes, but getting there will be hard, won't it?
One way of organizing the discussion is
(1) how will tags get generated?
(2) where will they be stored and maintained?
(3) how will tags be used to save money / reduce risk / generate value?
And, of course, I can only skim the surface of each of these -- apologies in advance.
So, How Do We Generate These Tags?
There doesn't seem to be a "best answer". There seems to be many approaches, with more coming.
The classical approach is "user tags data using an approved taxonomy".
Now, there are some cases I'm sure where this is workable, but -- let's face it -- users generally do a lousy job of filing things away. Everything ends up in "other" or "misc". Now, I think that's for a lot of reasons, some fixable, some not -- but we can't really rely on this to solve everything.
And, having had to use a few of these "official" taxonomies in my life, my general impression is -- well -- they're really awful. I end up spending more time arguing with the logic than doing useful work.
Another approach is observational tagging. This particular hunk of information seems to be popular -- it's getting hit frequently -- so this database / file / object ought to be on a higher service level of storage, maybe protected to a higher level, and so on.
Solves a few IT-specific optimization problems, but -- once again -- no panacea.
Related to this is defacto tagging of containers. E.g. this is a financial database, so everything in it needs a certain service level, a certain level of protection and security, a pre-defined retention strategy, and so on.
The picture changes radically when we look at unstructured information (or rich content). Reports, powerpoints, spreadsheets, videos, scanned images, etc. etc. etc.
Not only is most of the information growth happening in this type of information (as compared to structured or transactional information), but it's the wildest and wooliest when it comes to tagging.
It doesn't live neatly in a database. Interpretations of value (or risk) are highly subjective, and are often in the eye of the beholder. Changing context can change what's important -- this customer letter wasn't particular important, until there was a lawsuit, and then it became very important indeed.
Generating tags for unstructured information will need a different set of approaches.
One promising approach is using domain-specific content filters to analyze files and make some suggestions on how they might be tagged. Most of the interest in this has been around risk mitigation.
As an example, EMC's Infoscape can recognize that there's stuff that looks like credit card numbers, or account numbers, or someone's home address, or keywords that might suggest a confidential project, and take remedial action, such as tagging as an example. Or wrapping it in a DRM wrapped to keep it from prying eyes. Or anything else you might imagine.
This helps a bit, but won't completely solve the problem, will it?
Over the last six months, I've been enamored with social tagging and bookmarking. This is when you enlist people to look at content, and encourage them to offer a few tags regarding how they think it should be organized.
The idea (which I can partially validate from personal experience) is that -- over time -- an "enterprise vocabulary" emerges as more people tend to use the same keywords to tag files and documents.
As a quick example, we have a behind-the-firewall social media platform running here, and people are doing a pretty good job of tagging. I can click on "security_strategy" as a tag, and see all the documents that discuss it. It's better than search, because I'll only see things that someone thought was useful to tag, as opposed to umpteen bazillion results from a typical search.
Going a bit further, I can look for authorities on the subject, and see how *they* tagged things, which is even better.
Now, all of this presumes a nice social media environment, with lots of useful information in it, with everyone cooperating on the tagging behavior, and so on. I don't live in that world -- yet.
Bottom line -- metadata will come from a wide variety of sources.
We can't pre-determine how it will be generated, and we can't assume an authoritative taxonomy.
We must be prepared for a variety of approaches, and get comfortable with many tagging views of the same piece of information.
And Where Will The Tags Be Stored And Maintained?
For unstructured content, there are many different alternatives. One debate is whether it's going to be in a federated metadata repository (as in Documentum), or whether it's going to be a property of the file system (e.g. EMC's Centera). Or a combination of both.
There are arguments for and against both. Let's just say that there's no clear-cut winner at this juncture. But it's hard to argue that the more consolidated the metadata repository might be, the more useful it can be.
But it'll make an interesting blog post in the future ...
And I think it's safe to say that any proposed metadata management scheme ought to be infinitely expandable in terms of number of objects, number and types of metadata fields, and so on. It's just too hard to predict the future here.
Another interesting wrinkle will be the security of the tags themselves. It's not hard to imagine scenarios where it might not be good to expose certain metadata widely. Or which tags are high value and should be saved, and which ones are of less value and should be discarded.
So I guess we'll end up with meta-metadata at some point. Ouch.
But what about databases and the like? We'd like to store metadata about the database as a whole -- importance, retention, security, etc. And neither of the above approaches are inherently amenable to storing IT-relevant information about them.
Using The Tags -- Even Harder
Easy use cases spring to mind, with things like search and finding useful files.
It's not conceptually hard to expose metadata to users or other applications in a straightforward manner. I'm sure there will be lots of nuances in how to best do this, but -- at least conceptually -- I can see a clean line of sight for doing this.
A bit harder is enforcing security. Lots of different approaches here, several of which EMC is pursuing, but there's more work to be done. I'll save this can of worms for a future post.
From an IT perspective, the big payoff will be in linking metadata to IT housekeeping -- service levels, backup, retention, etc. As an example, Documentum offers an add-on (CSS -- content storage services) that provide a layered bridge between metadata attributes and automating certain aspect of housekeeping.
Intriguing, but the idea really hasn't caught on like you might think. I think it's mostly root-caused in the lack of overall policies around information management, which would drive increased need for something like this.
It's All Part Of A Big Transition
No clear answers, but the trend seems obvious, at least to me -- we're moving from a world of managing information, to a world of managing information about information.
As I talk with customers, I can name a few content-intensive (and knowledge-intensive) industries who are well along the road to understanding metadata, its importance, and have reasonable capabilities for capturing it, managing it, and exploiting it.
Pharma research comes to mind. As does oil exploration.
They take this topic very, very seriously.
But, step outside of a few bright rays of light, and the picture becomes much dimmer.
The real question in my mind is not whether or not metadata becomes really important to most organizations, but exactly how and when.
And I think that's tied up in the broader question of when organizations are going to start managing information the same way they manage money.

Comments