EMC's SourceOne group just announced their new File Intelligence product. And, as organizations struggle to get their arms around unfettered information growth, we may just have a new and important tool in the arsenal.
You be the judge ...
The Back Story
It's obvious -- information in corporate environments is growing like crazy. Whether it lives in file systems, email boxes, repositories like SharePoint or Documentum -- the forecast is rampant growth followed by more of the same.
If you'd like a truly sobering look at just how much information we're creating now, and are likely to create in the future, I'd suggest this post.
At EMC, we're involved with this topic, especially as we see more customers starting to consume petabytes like popcorn.
Sure, storage optimization technologies -- like FAST and compression and cloud archives and whatnot -- those things can certainly help store information more efficiently.
But no matter how efficient our storage technologies are likely to become, it's obvious that many organizations need a solution that's closer to the source of all this.
And that's where I think the discussion around information governance is so interesting.
A Familiar Theme
If you're a long time follower of this blog, you probably remember a time when I was discussing this topic frequently.
My case went something like this:
- more and more of our business models are being built on information -- in some ways, it's the new "money" of our digital age.
- whereas many organizations know how to effectively manage money, far fewer are proficient at managing information in all of its forms.
- over time, more and more organizations will be forced to consider the topic of "information governance", a cross-functional approach to balancing costs, risks and value of the organizational information portfolio.
- information governance teams -- over time -- will arguably be at least as important as, say, financial governance or other forms of organizational governance.
So, Where's The Real Problem?
Most people when they think of information governance tend to think of transactional data: customer records, structured databases and the like.
While it's true that important information lives in these entities -- and it's a great starting point, these structured databases are relatively straightforward to get under control. They are somewhat finite in number, for one thing. Their inputs and outputs are usually well understood. And the role they play in the business is somewhat easy to understand.
And, clearly, any serious approach to information governance will usually start here. But will it be enough?
Many of us believe that the more daunting challenge will lie in unstructured data: all those files, messages, documents, repositories and whatnot that represents the collective intellectual capital of our organizations.
This is where the high-value (and high-risk!) information will likely end up. These are the reports, the summaries, the interpretations, the analysis, the memos -- all the value-add that we as knowledge workers create over and above what lives in traditional databases.
This stuff lives in a lot of different places. It's in a lot of different formats. And it's not amenable to the approaches we typically use for structured and/or transactional data. It's liquid, malleable stuff that's usually freely created and shared.
And, to my way of thinking, represents the prime battlefield where information governance will play out.
Imagine ThisLet's say you're a senior IT leader. You've just gotten a new exciting project.
Go do an analysis on all the unstructured data in the enterprise, no matter where it might be. Find out how much risk we're exposed to. Find out if there's any valuable information there that might be beneficial to others in the organization.
Oh, by the way, see if you can save any money by deleting redundant or unneeded information so we don't have to store, backup and archive all of it. And make sure you get *all* of it, please.
Anyone here up for that task?Be warned -- we're starting to meet people who have been handed that exact assignment.
So -- where would you start?
EMC SourceOne File Intelligence
If you've got that mental picture in mind, you can understand the appeal of the new EMC SourceOne File Intelligence product.
It can be thought of as a "front end" to the other EMC SourceOne products that do email and file archiving, but there's much more to the picture than that.
It's largely built on the Kazeon technology we acquired a while back that's turned out to be so popular for eDiscovery applications. In some ways, eDiscovery is nothing more than a special use case for a very generic set of capabilities:
- find out what information you have, no matter where it lives or what format it lives in
- assess the found objects for "interest" (value and/or risk)
- drive appropriate workflows if the objects are very interesting, or very uninteresting.
Using EMC as an example, we've got thousands and thousands of people who work with customers each and every day. They, in turn, produce a torrent of content: emails, files, presentations, etc. about what they're doing and what they've learned. Frankly, our ability to mine this information river is quite poor at present -- at least compared to what the technology can potentially do. It's something we'd like to improve.
Going further, we have the normal "sensitive" stuff we work on as an organization: new products, financial reporting, mergers and acquisitions -- all sorts of information that makes people squeamish when they imagine the wrong information in the wrong place at the wrong time.
EMC SourceOne File Intelligence will give us the capability to cast a very wide net indeed across all of our "information at rest" repositories, and figure out where we stand: file systems, SharePoint, Documentum, our social portal, etc. etc.
"Very uninteresting" works in much the same way: maybe a document in a known location with umpteen copies made in local file systems. Or perhaps content that hasn't been accessed in a very long while. Or perhaps it's a bunch of content associated with a project that really shouldn't be hanging around any more.
Indeed, as EMC virtualizes 100% of our desktops, there's going to be a boatload of redundant stuff they're going to find on my C: drive :-)
Interesting Use Cases
One set of target customers for this product are the people we've already done eDiscovery with. They've seen immediate payback for information governance associated with the legal function; EMC SourceOne File Intelligence simply extends this capability to other parts of the business, and other use cases.
Another set of target customers realize that they've amassed dozens of terabytes of "stuff", and simply want a handle on what they've got in these vast, unstructured repositories. GRC-aware customers realize that they're obliged to have some sort of handle on what they've got -- as well as take measures to manage sensitive information properly.
The thought of a corporate "dumping ground" for all sorts of information with no process or checks -- well, that's the sort of thing that can keep you awake at night.
One customer did something that I found absolutely fascinating -- they used the product as a de-facto taxonomy builder.
Now, if you've ever sat down and tried to come up with a consistent corporate taxonomy, you'll realize that it's a very hard thing to get right. But, if you instead analyze your corporate information base, and come up with how people are actually using various terms and words, not only is the taxonomy definition process much easier, but far more likely to be accepted by the organization - since, after all, you're merely reflecting how people are using the terms already!
Another customer is interested in trending "buzz metrics" within the corporate information pool.
What topics are hot? What topics are not? How are various terms and concepts coming up in discussions -- be they either positive or negative associations? I found that fascinating ... a bit creepy perhaps, but interesting nonetheless.
Another use someone spotted was "find the experts" -- search content for keyword and/or concept, and then see where it comes from. For example, we've got ~40K employees who are very knowledgeable and very experienced. Sometimes we know who can do what, other times it's a bit of a challenge. Imagine inputting a few concepts, and getting a rank-ordered list of people who generate content associated with those topics.
I'm sure that more will be found in this vein over time.
Should You Care About Information Governance?
I think the question is more about "when" than "if": sooner or later, just about every organization will care about their information asset base much in the same way that they care about their financial asset base, or their human capital asset base.
The real question is -- how to get started?
And being able to run some simple reports that will undoubtedly get people's attention might be a great place to start ...

Comments