Legacy thinking can get you in trouble in so many ways. The challenge is that -- well -- there's so much of it around.
Maxims that seemed to make logical sense in one era quickly become the intellectual chains that hold so many of us back. Personally, I've come to enjoy blowing up conventional wisdom to make room for emerging realities.
I'm getting into more and more customer discussions with progressive IT organizations that are seriously contemplating building platforms and services that meet the broad goal of "analytically enabling the business" -- business analytics as service, if you will.
The problem? The people in charge have done things a certain way for a very long time. And the new, emerging requirements are forcing them to go back and seriously reconsider some of their most deeply-held assumptions.
Like having "one version of the truth". I've seen multiple examples of it get in the way of organizations who need to be doing more with their data.
Data Chaos -- That's The Problem!
We've all probably been exposed to situations where data chaos has reigned supreme: multiple interpretations of key data elements like "customer" or "purchase order" or perhaps "product". A wave of frustration builds, coupled with a realization that the situation is serious, and something needs to be done.
The characteristic response is to empower a cross-functional data management team to begin to chip away at the problem: create standardized definitions of shared data elements, and then figure out how best move to that goal. Enter many familiar IT-related disciplines: master data management, et. al.
And there's no debating that this is a good thing: to run a suite of interrelated core business processes effectively, there has to be a reasonable degree of standardization of how data is captured, defined and consumed. No argument there -- it's important work. Wouldn't it be wonderful if we could all get on the same page, data wise?
But you can have too much of a good thing.
Over time, the data management team grows and looks for more work to do. They've made a great contribution by standardizing how the business uses data -- can we take our standardized definitions and methodologies across the entire business, and get to the proverbial "one version of truth" that everyone uses for every purpose?
And that is turning out to be a bridge too far.
A Big Company Example?
One useful way to visualize a large enterprise is by characterizing it as a set of interrelated business processes. At a top level, how does stuff get produced, sold, delivered, supported and paid for? Beneath that, many dozens of important but somewhat less-critical processes. New processes emerging that meet new needs, old processes that need to be improved, and so on.
It's one of the more interesting lenses you can put on any organization -- it's basically a continually-improving set of responses to "how do we do X?" where X can be replaced by hundreds or thousands of different questions.
Ideally, it's dynamic and adaptable. I think we've all worked in environments where processes haven't been looked at in a *very* long time :)
A accounts payable function needs a very precise definition of "customer" for example. A marketing function might need a slightly different interpretation, though. As would a customer service organization. Same general notion -- but potentially dozens of useful interpretations.
Now, if we were talking a core transactional business process, one could make a strong argument for a standardized core of meaning through the life cycle -- we found a customer, we sold them something, we delivered what we sold them, and then we supported them.
And, after we're done, let's dump all of that standardized, normalized data into a data warehouse so we can run useful queries on it, and understand our business a bit better. So far, so good.
But there's much more to data these days than optimizing standardized core business processes.
Let's Now Do Predictive Analytics
Simply put, the goal of big data analytics is to create successfully better predictive models for key business processes using data science.
You want to better understand something really, really important: customer behavior, field quality, logistics, healthcare outcomes, financial risk -- the potential list is very long indeed. So you go looking for data.
One important source of data is whatever you've got coming from your core systems. But there's a problem: it's probably been ruthlessly standardized and normalized in support of making those core business processes work better. Like overcompressed audio -- most of the interesting rich signals have been filtered out.
Or, as one data scientist famously quipped: "the oats looked much better before they went through the horse".
These advanced practitioners want unfettered access to the raw data -- captured at the source, with as much audio fidelity as possible. The more diverse relevant data sources, the better.
The data science craft is to build those precious predictive models that can detect, correlate and amplify the important signals. They decide -- based on the task at hand -- what's important and what's not.
Not someone in the data management group.
When Worlds Collide
Part of the rationale for having strong, centralized data management functions is simple: people generally aren't proficient consumers of data. Business users could get into all sorts of problems around misinterpreting or misusing -- and they do.
Better to have "safe" data: clean, sanitized and fit for purpose, right?
Enter the data science practitioners. They are -- as a group -- *extremely* proficient at consuming data. The training wheels, airbags and restraining systems simply get in the way of them doing their job.
And they get frustrated as a result, with inevitable consequences.
A Short Detour?
I came across a version of this problem several years ago when we were designing EMC's internal social platform, EMC|ONE. One of the most contentious debates was around taxonomy: shouldn't there should ideally be a single, orderly and company-wide categorization system that helped everyone find what they were looking for?
Imagine how much time and effort would be wasted debating where hundreds of thousands of random topics would be organized. Besides, aren't useful taxonomies in the eye of the beholder? What's useful to one person is completely irrelevant to another. Our business was moving fast, so we'd have to come back every so often and re-jigger things to reflect the new reality -- and confuse everyone in the process.
But there was a more important point I thought.
The real goal of the exercise was to help teach social proficiency skills, and one of those key skills is knowing how to find what you're interested in on the great big web: search, follow, etc. Nobody makes the internet a neat and orderly place that we can all agree on.
We are continually learning how best to get what we want from it -- it never ends, does it?
But the adherents to orthodoxy would not back down even a bit. I finally pulled the "I'm in charge" card, and we moved on. But it was an important decision at the time, as I was vastly outnumbered by some very stubborn people who had the infinite patience to marshall all sorts of well-intentioned (but ultimately invalid) arguments.
The Bottom Line
Clearly, when we're talking about standardized business processes and transactional information flows, there is (and always will be) a need for the one version of truth -- in that specific domain.
But getting good at big data and predictive analytics is all about learning: discovering new relationships between wildly divergent information sources. You're using information in entirely new and fascinating ways.
Part of that inevitably implies developing the skills to do so proficiently -- something most of us are going to need to get better at.
One example: in EMC's nascent BAaaS (business analytics as a service) platform, you can have it either way.
If you prefer standardized, normalized and sanitized corporate data sources, here they are. Or, if you're going a bit further, here are the raw feeds (along with full disclaimers) as well as the self-service tools to impose your own purposeful views on data quality, management and integration.
But the assumption is that you know what you're doing, and why. And we're finding a surprising number of people across the organization are learning to do just that. Yes, there are still pools of orthodoxy in the organization - why can't we just have one version of the truth?
Because when you have multiple competing versions of the truth, you get into some very useful and productive discussions :)