I'm sure the first cave dwellers thought fire was a magical thing, until someone got burned, or a forest fire started.
Cheap, plentiful carbon energy has done wonderful things for our lives, until we realize we might be melting the ice caps in the process.
Modern mobile technology is transformative and wonderful stuff. Using smart phones while driving is quickly supplanting alcohol as the #1 cause of automobile accidents.
Social network sites can be amazing things, until we realize just how open we've been with our personal information.
The era of widespread big data and predictive analytics is upon us -- should we be concerned?
First, a growing torrent of rich, uncorrelated data rivers from all sorts of sources. And there's more of them with every passing day.
Second, the cost to acquire, store and process these data rivers has dropped precipitously, and will continue to free fall.
Third, there is an entirely new generation of tools and trained practitioners who can extract meaning and insight from these diverse data sets in a way not previously possible.
Five years from now, having dedicated teams and resources pointed at this opportunity may become as common as, say, having a corporate web site.
The good news? We'll unleash incredible value and societal good in the process. On the average, the world should become a better place for everyone.
But that doesn't mean that there won't be unwanted side effects.
The End Of Privacy
Scott McNealy's eerily prescient quote from 1999 is now more true than ever ... "there's zero privacy, get over it".
It is getting near-impossible to live in the modern world without being continually sensored in your day-to-day activities.
All of those digital crumbs are there for the collecting, aggregation and correlation.
No amount of legislation or social code will prevent this from happening.
The prize is just too big to ignore.
While many of us will rightly mourn the loss of our proverbial fig leaves, society as a whole will improve.
There will be less crime, less disease, less corruption, less ignorance, less poverty.
Unfortunately, none of us will be asked to choose which world we prefer.
The Rise Of The Generalist
Deep expertise in a given topic has always been revered and encouraged in our culture and our educational system. Just about everyone wants to be an "expert" on something.
But in a world of widespread big data analytics, there's a new skill set that's even more important: the ever-curious, ever-inquisitive data science professional, looking for new patterns across seemingly uncorrelated data sets. They're the new continually-learning generalists. Their "expertise" is finding relationships that were previously undiscovered.
Their lack of focus on deep domain expertise is frequently turning out to be their most powerful asset. They're not interested in what the experts "know", they're interested in finding what they don't know. The best ones frequently slice across dozens of traditional disciplines without any regard to traditional orthodoxy. They can make some people very, very nervous.
And, based on my observations, they are regularly and predictably whupping the deep-domain experts.
Correlation is when you realize that multiple variable are somehow related. Causation is when you can prove the relationships.
Data science professionals are primarily interested in correlation, and don't seem to care as much about causation. Deep domain experts frequently won't accept the validity of a predictive model unless they can explain and prove causality.
To oversimplify, the data science professional looks at historical data about the sun rising every morning, and makes a statistical prediction that it will continue to do so.
The deep domain professional may be unwilling to accept this prediction unless there's a fully developed and provable model around solar systems and how they work. One is somewhat easy to get to; the other takes a bit more work. Both are important and valuable.
But if all you really care about is "what will happen tomorrow?" the first observation is the more useful one :)
Guessing The Future Isn't Predicting The Future
There is an enormous intellectual gap between saying "based on current variables and understanding, this is our best prediction of future outcomes" and saying "this will happen" with some degree of certainty.
And we certainly want to avoid earnest predictions becoming self-fulfilling prophecies.
But we don't have to wait, we've already seen it: educators have been labeling students for years (based on test scores), with strong evidence that the early label largely influences the eventual outcome.
Imagine an average HR professional armed with a predictive model that you're likely to be a "performance problem" in the next few years. Or a car insurance company that charges you more because they think you're likely to have an accident you haven't had yet. Or a mortgage broker who won't loan you money because they predict housing prices in that neighborhood will decline in the next five years.
I, for one, would appreciate transparency in these situations: share your model, and tell me what I can do to improve the likely outcomes.
That's what my doctor does with me.
In The New Organization, Knowledge Is Power
If you work in an organization of any size, you have an appreciation for the multiple power structures that are inevitably in play. Who you know, what you've done, what resources you control, who you influence, etc. -- there are always multiple power levers in play.
Now, let's introduce an entirely new one: powerful predictive analytical models which can be veritable MRIs on organizational performance. All of the sudden, there's an entirely new "power tool" on the table that can be wielded in new and interesting ways.
True story: at this year's EMC leadership conference, we were fortunate to have a handful of very progressive CIOs on one of our panels. I still remember when one of them said "I knew that data was going to be important when people started fighting over who owned it".
During the social era, the ability for people to freely connect, engage and form communities redrew the lines of power in many corporate settings, and still is doing so today. During the forthcoming big data era, the abilty for people to freely collect, analyze and share powerful insights will further redraw the lines of power. There will be winners and losers.
And that party is just starting :)
Corporate Governance In The Big Data Era?
Imagine a large, complex organization where multiple groups are eagerly gathering and correlating diverse data sets.
Saltpeter, charcoal and sulfur are pretty benign by themselves, but when you combine them and add a spark, you've got an explosion on your hands. The same will inevitably happen when people start mixing data sets together and adding the spark of creativity.
Sooner or later, you're going to have an explosion. Maybe a really big one.
This presents an entirely new form of challenge when it comes to corporate governance. On one hand, you want to avoid risk, but on the other hand you feel compelled to exploit the value. It'd be nice if we could turn to a well-understood body of accepted practice and precedent when grappling with these issues, but -- well -- that's not the case today.
Indeed, look to social networking sites like Facebook -- and how they're learning to deal with many of these issues -- for a glimpse of what lies ahead.
The issue won't be so much around the data we've gathered. It'll be around how it's being put to use.
And I haven't met many organizations that are prepared to thrive in this new world.
Should We Run Away?
Millions of years ago, when were just another mammal on the savannah, that instinct probably served us well.
But, at the same time, we're curious and innovative animals. We build tools, and learn to put them to use to transform the world around us. Not everything is always perfect along the way.
If history is any indicator of the future, there's no turning back.