Maybe you saw the recent acquisition of Zantaz by Autonomy. Nice piece in Byte and Switch about not only that deal, but other related activity.
A few months ago, I took a leap of faith and annointed information classification as an up-and-coming "killer app" for the next few years.
I got a few scornful comments at the time. But I think recent activity might be pointing in that direction.
Recent Activity
Most of the recent press has been highlighting the convergence of search and classification (or the other way around, if you prefer).
Bottom line, I think we'll see search and classification co-mingle in interesting ways, and -- just to keep things interesting -- archiving will become part of the mix as well.
Let me see if I can explain.
Enterprise Search -- It's Early Days
First, I think most people realize that a Google-style search model in the enterprise is just a convenient starting point for the discussion, and there's lots and lots more ground to cover before we think of search as a full-fledged enterprise application.
People are usually interested in concepts (rather than keywords) when searching. They have an idea in their head, and they're trying to find stuff that relates. Specifying a conceptual search in terms of a boolean keyword search can be a frustrating exercise in trying to map concepts into keywords.
Ever tried to find anything by the band "Live" using Google? Sheesh ...
They're also interested in seeing related concepts. In the database world, we know about table joins, but what's the analagous concept for the unstructured content world? Concept clusters?
Someone smarter than me has probably figured this out already.
And then, there's the whole issue of privileged search -- who can see what. Does this get enforced in the search engine? File system? Somewhere else? And can you see the whole document, just the metadata, or nothing at all?
Ouch.
But, being a bit visionary here, I can hopefully imagine a world where I can easily specify a few concepts to a e-search bot (hopefully learned by watching me a bit, coupled with a decent internal conceptual model), it goes out and finds everything I have permission to see, and organizes it in a sort-of useful way.
Now that'd be a killer app. Not only in the corporate world, but in the general access world.
And Classification Has Just Started
One way of thinking of classification is pre-applying contextual rules that make sense, regardless of subsequent access method.
This file has what looks likes credit card numbers in it, so protect it appropriately.
The email came from our legal department, so we'd better keep it around for a while.
This one here looks like it comes from a named project, so better group it with other information regarding the project (including inhereting any security concerns) ... and so on.
The goal of classification is to help the organization understand the importance of a given data object: keep it around, protect it, organize it, etc.
The goal of enterprise search is to find information quickly in a specific context. Good classification helps enterprise search do its job well.
Now, Let's Add Archiving To The Mix
A few years ago, EMC started promoting active archiving (emails, files, databases) as a way to implement ILM, and -- hopefully -- save money in the process.
It's a simple pitch: take inactive information you can't throw away, and move it to more cost effective media.
Done right, your apps run faster (smaller production footprints), your backup windows will dramatically shrink (no need to backup something that's archived, right?), you'll save money on storage -- and you'll get all those benefits without starting a revolt from your user community.
And, if you're faced with information management regulations (and who isn't), well, that's another reason to think active archiving.
Turns out that you only want to save stuff that's interesting. Hence the interest in classification.
And, of course, if you've gone to the trouble of archiving something, perhaps you'd like to serve it back to your users to create additional value? Hence the interest in enterprise search.
Now, of course, you can load all of this classification and enterprise search software onto all of your file servers and email servers and application servers, and ... whew, is there an easier way?
Yes -- simply build this functionality as part of the active archive, and limit the impact to production environments. We've already seen dozens of customers working in this direction -- active archiving coupled with value-exploitation at the back end.
A partial list of key technologies for EMC includes classification tools (EmailXtender, InfoScape, DiskXtender, DatabaseXtender), enterprise content management tools that manage metadata and provide flavors of enterprise search (Documentum), information rights management (Authentica), purpose-built archiving storage platforms (Centera) as well as a whole bunch of other products and services.
We've worked hard to establish standards for metadata management in this new world, such as XAM. And made sure that our approach worked with newer collaboration environments, such as SharePoint.
We've made our investment, and we're still in the early rounds of this market. It's fair to say that no one has a lock on this market yet, but I think we've done pretty well to become a serious player.
But this begs the question of where will we see these technologies first be deployed?
Meet Your Friendly Corporate Lawyer
Most people don't look forward to a visit with the company's lawyers, but they've got a job to do, and the success of a company usually hinges on them doing their job well.
We've been in a litiguous era for quite a while. Add SoX compliance, FRCP and a few more hairy information management requirements, and -- all of the sudden -- most corporate legal departments are starting to get very interested in IT.
They have a vested interest that information is being managed appropriately.
They also have their own workflow of responding to lawsuits and audits that can consume a lot of their time, as well as IT's time.
Do it right, you reduce risks for your company, you save money, and you win more lawsuits. Do it wrong, and you're at a bit of a disadvantage.
Now, of course all business support functions at a company are important, but -- you've got to admit -- the legal guys play a particularly high-stakes game that gets the attention of senior management.
So, what's on their shopping list?
They care that each and every information object is classified upon creation (or shortly thereafter) and managed appropriately. Retained for the specified period, and no longer. Kept away from prying eyes if need be. If it's sensitive, a "chain of custody" model that proves it got from source to destination, unmodified.
They're going to want that for all files, emails and database entries throught the company. If they haven't asked for it yet, they will soon.
Second, they care that they can find things quickly when they need it. There's a subpoena, and they've got to respond. Or there's a court date looming, and they have to be ready. And, of course, they don't want to miss anything when searching.
Hint: they're not going to be happy with a Google-style boolean keyword search. They're going to want to search the information any number of ways: concepts, chronology, people, departments, etc. -- and do it across emails, files, etc.
Third, they have their own workflow to implement. A request comes in, materials are searched and put aside in a "legal retention locker", work product is produced for regulators and/or opposing counsel, a copy of that has to be saved -- it all gets pretty darn complex.
My good friend at EMC Andy Cohen has spent the last few years not only building an e-Discovery offering for EMC, but assembling a group of IT-oriented lawyers who can help customers figure out what they need and implement it effectively.
He has patiently tried to explain the nuances of the e-discovery world to me on several occasions. I learn enough to conclude it's really important and really complex.
So Why Do I Think Legal Support Will Be The First Battleground for Classification, Archive and Search?
It's simple.
Corporate lawyers have a big important (and expensive) problem.
Corporate lawyers also tend to get budget approval in a way that IT guys can only dream about.
Put the two together, and I think we'll see more "convergence" activity in corporate legal departments in the next few years.
Not to mention more M&A activity.

Chuck, you're spot on as usual. A few quick thoughts:
- Enterprise search is critical functionality, but it does not solve the eDiscovery challenge. The reason is that once you find the content you need to be able to do something with it (e.g. copy it, move it, collect it, preserve it, dispose of it, etc.) In other words, companies need to do more than find content, they need to policy manage it.
- Traditionally, the belief was that individuals should be responsible for classifying content, but today, there's so much volume that to have every person manually classify every thing they receive or create would be too burdensome. In the real world, it doesn't work. That said, the auto-classification technology is not yet to the point where customers fully trust it, so they want some human validation. The practical result is that there's a need to provide users with classification tools that are low impact. I sometimes talk to customers about users at their desktops acting as "filters" rather than "mini records managers". In other words, they either apply a very simple tag, or they make a decision - "keep or no-keep", but they are not asked to make granular classifications. --Andy
Posted by: Andrew Cohen | July 10, 2007 at 05:49 PM