Our story so far ...
Many now realize that big data analytics is the next big thing in creating value from information.
As such, more than a few people are wondering how they might think about their journey: how they get from where they are today to where they want to be in a sequence of logical, digestible and justifiable steps.
We spent a great deal of time in 2011 learning from organizations who had visibly achieved this proficiency. What kinds of skills were required. How they were organized. And, more importantly, how they got to where they are today.
We've boiled down our learning into a relatively simple three-phase model that describes that pathway that many organizations seem to follow.
And, if you're wondering how to get started on your own journey, perhaps you'll find this useful.
A Quick Primer On Big Data Analytics
The ideas -- at their core -- are pretty straightforward.
The typical goal is to establish a predictive capability for the organization by correlating an ever-increasing number of data sets, both inside and outside the organization.
The key actor in this story is the data scientist: an exceptional knowledge worker who can derive insights from these data sets to help the organization make better decisions.
The ideas aren't all that new: you'll find big data analytics (and data scientists) powering web-based companies, financial traders, energy production and more.
What *is* new is their broader applicability: thanks in part to an explosion in relevant information sources coupled with dramatically declining resource costs. Asa result, data scientists are now very much in demand, and the rush is on to harvest the new digital wealth.
Many business leaders are looking at this trend, realize they have to play, but might not know where to get started. Personally, I'm increasingly drawn into these discussions, so I'm using a simple model to help explain the core concepts.
Phase 1 -- Learning How To Use All Your Data
If you're company is anything like mine, we've got plenty of internal data at hand. Getting at it -- and using it productively -- well, that can be a challenge. So a logical starting point is putting the foundations in place to help the business harvest what they already have access to.
Although classical enterprise data warehouses are well understood, what's different here is the consumption model. The goal is to create a self-service consumption model that empowers anyone wanting to do a bit of analytical work. Choose your data sets, choose your tools, and go. Make the data as easy to get to as possible.
The goal is to create an environment where people can experiment with data. That's a very different mission than the usual operational reporting tasks.
Many classical implementations I've seen make it really hard for people to experiment with corporate data sources. The data might be incomplete, or stale. Perhaps it's been "cleansed" so as not to be useful to many. Or the turnaround time to get access and resources might be constrained, or even impossible.
Cranking out the monthly reports: fine. Supporting widespread innovation and experimentation: not so fine.
In terms of widespread big data analytical proficiency, I wouldn't hold up EMC's current internal capabilities as a stellar example. Sure, there are a few places across the business where we're doing real-deal big-data analytics, but -- generally speaking -- we do great operational reporting. However, if you just want to experiment with large, multiple corporate data sets, that's much harder.
To address this, our EMC IT team is in the process of standing up BIaaS as part of their broader ITaaS offering -- business intelligence as a service. Choose the data sets, choose the tools, and you're off to the races with the resources you need -- including a bit of consulting to show you how to use them!
Before long, we'll likely have a general-purpose "BI cloud" and hopefully lots of people who are comfortable using it. Not surprisingly, we're using Vblocks, Greenplum and a bevy of user-selected analytical tools to do this.
The business case is likely a mix of cost savings (lots of big PCs under people's desks doing this stuff today), risk mitigation (not that anyone would -- ahem -- use external resources) and of course value generation through better and more timely decisions.
I'll keep you posted on how it goes.
Phase 2 -- From Data Analysis To Data Science
Now the foundations are likely in place -- technological and cultural -- where we can think about bringing in a dedicated data science team, as well as creating an environment that supports their work.
I've written before about data scientists, what they do and how they're fundamentally different than a traditional business analyst. And, based on what I've seen, you'll need both skill sets.
From a platform perspective, the general purpose environment becomes a purpose-built one, optimized to the tasks at hand.
Think in terms of scale-out architectures, collaborative workflow tools, advanced visualization -- and newer IT supporting roles to run it all. Just to be clear: I have never, ever seen serious data science being done on the back of a general-purpose IT environment.
EMC is investing heavily in the burgeoning field of data science -- from offering advanced education to purpose-built productivity software that's built specifically on their needs. And you'll see much more of this in the future.
But, let's assume for the moment you've assembled your data science team, empowered them with resources and support, and -- if all goes well -- a river of major and compelling insights start pouring out. As a result, everyone across the business starts to 'get it'. Very logically, they want to begin to operationalize and leverage what the data science team is coming up with.
You're now ready to consider the next phase.
Phase 3 -- Creating Analytically-Enabled Applications
It's hard to imagine a core business application or process that wouldn't be dramatically more effective if it was enabled with real-time analytical insight. From retail to health care to financial services to manufacturing to transportation to law enforcement: we're talking about an incredibly broad spectrum of industry use cases -- and we've just begun.
Many of the low-hanging-fruit use cases involve workflows: approving a loan, recommending a course of treatment, scheduling inventory and similar.
At EMC, our IIG team is building tools and use cases around Documentum/XCP to do just that: create a new generation of analytically-enabled core workflows across a broad spectrum of industries.
As more and more data is added and analyzed, outcomes dramatically improve. Plenty of well-documented examples, with many more coming. It works.
Indeed, in a handful of examples the results have been so compelling that an interesting firestorm has resulted. The business loves what the new analytically-enabled applications can do, which drives the data science team to discover new insights, which forces the acquisition and processing of even more data, which creates the new insights which are captured in an even more effective set of applications and processes -- and, well -- we're off to the races.
Yes, they're consuming a lot of technology. But they're also getting incredible value from it.
Other Pathways?
This three-phase model doesn't neatly cover all the stories I've heard. For example, there are examples where substantial analytical proficiency came into an organization through an acquisition. Or a few where analytical proficiency was achieved by using outside services and resources.
But these seem to be the exception, and not the rule. If you're interested, my colleague Josh Kahn has offered up his own interpretation of this three-phase model for your consideration.
The Secret Ingredient?
In each and every one of the stories I've heard about how a traditional organization achieved analytical proficiency, there was always one or two key actors -- typically leaders relatively high in the organization -- who had a clear and compelling vision of what could be achieved, and were prepared to invest in it.
Conversely, I have yet to find an example where analytical proficiency "just happened" as a natural consequence of day-to-day activities.
Someone had to do something. Maybe those stories are out there, but I have yet to see it.
Which leaves me with the inescapable conclusion: the secret ingredient to achieving big data analytical proficiency is -- leadership.

Chuck, FYI, I enjoy your podcasts on Odiogo (the podcast version) however for several of the last posts, Odiogo doesn't play the complete blog, only the first few seconds. Thought I'd let you know
Posted by: Omer Ansari | January 14, 2012 at 11:42 PM