Evolution and differentiation -- a powerful theme in life sciences. Also in IT.
Just as our current understanding of clouds evolved from what came before, clouds themselves will inevitably evolve and differentiate. Crossbreed cloud concepts with big data concepts, and the offspring is what we're starting to informally call a "datacloud".
Children not only can inherit the best traits of their parents, but often they exhibit talents that neither parent had. The same is true of dataclouds -- clear lineage with the parents, but showing some unique traits as well.
It was great to see the positive reaction on my recent post "From Databases To Dataclouds". Lots of interesting discussions resulted, so now I'm encouraged to share more of the thinking.
Since many of us are familiar with cloud concepts as well as big data concepts, I'd like to outline my case as to why the latter will inevitably shape the former.
I should tread carefully here, lest I inadvertently trigger a flame war from the clouderati as well as the more aggressive clouderistas.
Most clouds today are thought of as variable compute services: elastic, on-demand, easy to consume with pay-per-use. Whether that is delivered externally as a service, or provided by an in-house IT organization is not its defining attribute.
The elastic nature makes cloud attractive for bursty or unpredictable workloads. The ease-of-consumption and on-demand aspects make them incredibly easy to consume as compared to traditional IT models. At anything larger than moderate size, scale keeps per-unit costs down. Finally, pay-per-use encourages comparisons and helps rationalize consumption.
All good. Now you're a cloud expert.
Big Data Concepts 101
We're in the midst of a perfect storm: oceans of easily accessible information sources, cheap computing resources and amazingly powerful analytical tools.
Extracting new wealth from the physical world was the business model of the 20th century; extracting new wealth from the digital world is the model for this century.
Industrial empires have been built from extracting raw materials from the ground, refining them, manufacturing them, and distributing them for easy consumption. We're starting to see the exact same patterns emerge in the burgeoning digital world.
Enter data scientists, Hadoop distros, new business models and all that goes with it.
All good. Now you're also a big data expert.
When Big Data Met Cloud
Today's clouds are certainly useful things, but most of them are essentially efficient consolidation and delivery platforms for many smaller and often unrelated applications. The primary resource at hand is compute, augmented with memory, bandwidth and a modest amount of storage. All good for today's world.
Now, let's consider the broad spectrum of emerging big data applications -- and the environments they run in.
It's best to think of them as a refinery or manufacturing line.
Massive amounts of information are ingested and landed on a shared file system visible to all, with real-time feeds being more valuable than large historic data sets transferred via batch.
Enormous amounts of compute and large memory spaces are needed to analyze and extract immediate value from the raw materials.
Further processing is done by skilled knowledge workers using ad-hoc query tools to probe for new connections and relationships. Finally, batch runs of deep analytics are applied to extract even more value.
The value extracted can either be discovery of new insights and relationships, or perhaps an analytically-enhanced "information product" for a customer or partner. Unlike manufacturing or refining in the physical world, there's no waste product in the digital world: the data collected is often saved and continually reprocessed as a historical data set.
Can Today's Clouds Handle Big Data Applications?
Technically, yes -- but I would argue the majority are not sufficiently optimized for big data purposes.
The shared storage spaces don't scale enough, and perhaps they're not performant enough. While there's plenty of compute at hand, that's not sufficient -- relatively large amounts of memory and flash are also required to get the needed performance.
Information logistics gets interesting -- the ability to move large amounts of information into, between and out of application streams -- perhaps ones that are geographically dispersed.
The network fabric looks different. The data fabric certainly isn't the familiar relational database. The orchestration model has a new focus -- it's about end-to-end factory productivity vs. the service level of any one individual application or group of applications.
There's more at hand, but the picture is starting to emerge.
Demand for big data applications will cause today's compute-centric clouds to evolve into newer dataclouds -- purpose-built clouds designed to efficiently extract value from mountains of disparate information feeds.
How It's Starting To Go With Certain Customers
I'd like to think I have a sharp eye for emergent patterns I see from our customers and partners, and I've clearly spotted one.
Early experimentation with big data analytics inevitably produce amazing results with modest investments.
Very quickly, the business appetite for more analytics becomes insatiable: more data, more compute, more questions, more performance. More value results, even more investment is made.
Not infrequently, the new investment in analytics starts to dwarf other forms of IT investment.
At some point, the business wants their insights to be "productized" via a new breed of analytical applications. More data, more compute -- and the need to run these new beasts as production applications. Since more business value comes from the newer apps than the legacy ones, that's where the new investment inevitably goes.
Once this party gets started, the preference for home-grown, cobbled-together hardware and software quickly gives way to a more serious discussion around mature architectures and operational models. Sometimes this is done with the support of the existing IT organization; very often it is done separately using a dedicated team that reports to one or more business units directly.
These teams are now looking for a datacloud: either one that they own and run, or perhaps one they can consume from an IT service provider. And I've lost count on how many times I've seen this starting to happen.
It's Inevitable, Or Should Be
Cloud is inevitable. Big data is inevitable.
Shouldn't their offspring -- the datacloud -- be inevitable?

Great blog. Inspired me - Database Evolution Revolution
Posted by: CTO Chief | March 13, 2013 at 01:05 AM
A great article which raises some very interesting points which I would like to see further explored.
The success we've seen in big data applications are in new applications built from scratch on new and highly-performant data models such as Hadoop. However, the vast majority of current applications and current data models are legacy ones as you note which must be massively rewritten to take advantage of the datacloud opportunity.
The big opportunity then is to provide a reliable migration path for these legacy apps and data models to these new datacloud versions.
The challenge is cobbling together a partnership between companies who want to utilize dataclouds and companies with the requisite software and hardware technology to provide them. No single company can provide a complete solution either technically, organizationally, or politically. Unfortunately, I don't see these partnerships forming, certainly not in my industry (oil & gas). Do you? How would you see them developing?
Thank you again for your thought-provoking blog.
Posted by: Tom Lasseter | March 14, 2013 at 12:37 PM
Hi Tom -- thoughtful comment, so thanks.
The current structure of the industry and players is organized around legacy value propositions: here's what I used to do in the past.
If you agree with my premises, obviously there will have to be new organizational constructs created (companies, business units, alliances, JVs, etc.) that can cross the historical boundaries. We're all organized across the legacy.
Internally here at EMC, we just formally announced the Pivotal Initative, which was our internal recognition of the need to "organize for success" across traditional boundaries. Historical assets plus some new ones, put in a new business unit and given a rather radical new mission -- build the technology and the ecosystem for the datacloud.
Simply put, there's a large number entrepreneurial opportunities at hand, and I'm waiting to see who else starts to throw their hats into the ring.
Thanks!
-- Chuck
Posted by: Chuck Hollis | March 14, 2013 at 12:48 PM
Good post about the cloud concept and big data concept. Learn interesting and new things.
Posted by: Elan - App Development | March 18, 2013 at 08:10 AM