Evolution and differentiation -- a powerful theme in life sciences. Also in IT.
Just as our current understanding of clouds evolved from what came before, clouds themselves will inevitably evolve and differentiate. Crossbreed cloud concepts with big data concepts, and the offspring is what we're starting to informally call a "datacloud".
Children not only can inherit the best traits of their parents, but often they exhibit talents that neither parent had. The same is true of dataclouds -- clear lineage with the parents, but showing some unique traits as well.
It was great to see the positive reaction on my recent post "From Databases To Dataclouds". Lots of interesting discussions resulted, so now I'm encouraged to share more of the thinking.
Since many of us are familiar with cloud concepts as well as big data concepts, I'd like to outline my case as to why the latter will inevitably shape the former.
I should tread carefully here, lest I inadvertently trigger a flame war from the clouderati as well as the more aggressive clouderistas.
Most clouds today are thought of as variable compute services: elastic, on-demand, easy to consume with pay-per-use. Whether that is delivered externally as a service, or provided by an in-house IT organization is not its defining attribute.
The elastic nature makes cloud attractive for bursty or unpredictable workloads. The ease-of-consumption and on-demand aspects make them incredibly easy to consume as compared to traditional IT models. At anything larger than moderate size, scale keeps per-unit costs down. Finally, pay-per-use encourages comparisons and helps rationalize consumption.
All good. Now you're a cloud expert.
Big Data Concepts 101
Extracting new wealth from the physical world was the business model of the 20th century; extracting new wealth from the digital world is the model for this century.
Industrial empires have been built from extracting raw materials from the ground, refining them, manufacturing them, and distributing them for easy consumption. We're starting to see the exact same patterns emerge in the burgeoning digital world.
Enter data scientists, Hadoop distros, new business models and all that goes with it.
All good. Now you're also a big data expert.
When Big Data Met Cloud
Today's clouds are certainly useful things, but most of them are essentially efficient consolidation and delivery platforms for many smaller and often unrelated applications. The primary resource at hand is compute, augmented with memory, bandwidth and a modest amount of storage. All good for today's world.
Now, let's consider the broad spectrum of emerging big data applications -- and the environments they run in.
Massive amounts of information are ingested and landed on a shared file system visible to all, with real-time feeds being more valuable than large historic data sets transferred via batch.
Enormous amounts of compute and large memory spaces are needed to analyze and extract immediate value from the raw materials.
Further processing is done by skilled knowledge workers using ad-hoc query tools to probe for new connections and relationships. Finally, batch runs of deep analytics are applied to extract even more value.
The value extracted can either be discovery of new insights and relationships, or perhaps an analytically-enhanced "information product" for a customer or partner. Unlike manufacturing or refining in the physical world, there's no waste product in the digital world: the data collected is often saved and continually reprocessed as a historical data set.
Can Today's Clouds Handle Big Data Applications?
Technically, yes -- but I would argue the majority are not sufficiently optimized for big data purposes.
The shared storage spaces don't scale enough, and perhaps they're not performant enough. While there's plenty of compute at hand, that's not sufficient -- relatively large amounts of memory and flash are also required to get the needed performance.
Information logistics gets interesting -- the ability to move large amounts of information into, between and out of application streams -- perhaps ones that are geographically dispersed.
The network fabric looks different. The data fabric certainly isn't the familiar relational database. The orchestration model has a new focus -- it's about end-to-end factory productivity vs. the service level of any one individual application or group of applications.
There's more at hand, but the picture is starting to emerge.
Demand for big data applications will cause today's compute-centric clouds to evolve into newer dataclouds -- purpose-built clouds designed to efficiently extract value from mountains of disparate information feeds.
How It's Starting To Go With Certain Customers
I'd like to think I have a sharp eye for emergent patterns I see from our customers and partners, and I've clearly spotted one.
Very quickly, the business appetite for more analytics becomes insatiable: more data, more compute, more questions, more performance. More value results, even more investment is made.
Not infrequently, the new investment in analytics starts to dwarf other forms of IT investment.
At some point, the business wants their insights to be "productized" via a new breed of analytical applications. More data, more compute -- and the need to run these new beasts as production applications. Since more business value comes from the newer apps than the legacy ones, that's where the new investment inevitably goes.
Once this party gets started, the preference for home-grown, cobbled-together hardware and software quickly gives way to a more serious discussion around mature architectures and operational models. Sometimes this is done with the support of the existing IT organization; very often it is done separately using a dedicated team that reports to one or more business units directly.
These teams are now looking for a datacloud: either one that they own and run, or perhaps one they can consume from an IT service provider. And I've lost count on how many times I've seen this starting to happen.
It's Inevitable, Or Should Be
Cloud is inevitable. Big data is inevitable.
Shouldn't their offspring -- the datacloud -- be inevitable?