Yet another concept starting to pop up ever more frequently in day-to-day conversations.
Although no formal definition likely exists, my working definition seems to be adequate for the time being: the act of creating new value through harnessing enormous amounts of information.
Yes, though the definitional net is quite large – and many fish are caught as a result – the core thought is the same: big data means that increasingly enormous amounts of data are seen as an inherently good thing vs. an evil to be avoided.
A Refreshing Perspective?
In most IT settings, the people who get value from information (and presumably storage) are largely decoupled from the IT people providing the service. The people using the information see having lots of data available as a good thing; the people provisioning the capacity, protecting it, etc. generally see this attribute as a bad thing.
In big data settings, this isn’t the case: the people deriving value from these massive amounts of information are tightly coupled with the people providing the storage services. Sure, there’s always an interest to do things more efficiently – but not at the expense of limiting the value of the proposition at hand.
Dig a little deeper, and you’ll find a brilliant constellation of very cool applications that span just about every industry. Media creation and distribution. Software and hardware development. Energy exploration, production and distribution. Biotech and healthcare. Retail and distribution. Financial services. Transportation. Manufacturing. Government. And so on.
I challenged myself to find at least two or three cool big data apps in every big vertical I knew about. So far, I haven’t been disappointed.
Some of these come by familiar legacy names: data warehousing, content repositories, active archives, big file systems, etc. I’m finding it very useful to simply abstract out the details and simply use the term “big data”.
A New Form Of IT Needed For Big Data?
Organizationally, there’s turing out to be a common pattern. Many IT organizations are built around the “shared services” principle: here’s the catalog of things that centralized IT does that are standard, and here’s how we approach special projects that aren’t on today’s menu.
I find the opposite to be true in many big data settings – the IT people working with big data tend to work directly for the business (or organizational) unit that’s using the data. There aren’t usually organizational boundaries at play here – just some people doing important work, and the IT team that directly supports them.
Why? I think it’s safe to say that standard enterprise IT processes and organizational structures weren’t designed for petabyte-class single applications. Many of these folks tend to push the hairy edge of general-purpose products and processes – they need IT that’s designed around the problem vs. designed as a standardized enterprise service.
A Growing Set of EMC Credentials
The first two days this week, I was privileged to join EMC’s annual leadership meeting. The entire agenda was amazing, but – in particular – one of the panel sessions was on this very topic.
Many big data applications are based on enormous, centralized file systems.
We had Sujal Patel from Isilon describe how his target customers quickly outgrew traditional filers designed for general-purpose IT environments. They needed to scale to enormous levels of capacity (and frequently performance) without the need for scaling an army of storage administrators. A scale-out approach -- with automatic rebalancing -- was the only logical approach.
Other big data applications require the analyzing of billions of facts or events, ideally delivered in a self-service environment.
We had Scott Yara from Greenplum (now the core of EMC’s new Data Computing Division) discuss how there were entirely new requirements for scale, performance, efficiency and openness that he felt EMC was ideally positioned for, and how the legacy players were going to have a rough time competing in this new world. A massively scale-out, shared-nothing approach was the most attractive approach.
Frequently, these big data applications create and use massive amounts of rich metadata, supporting new kinds of collaborative workflows.
Rick Devenuti, the new president of EMC’s Information Intelligence Group (IIG, ex-Documentum, Captiva, et. al.) spoke at length at how he was seeing more and more customers building entirely new forms of applications from EMC’s toolset that used enormous amounts of both capacity and rich metadata.
And some of these newer applications want to be inherently geographically distributed and very cloud-like.
Mike Feinberg, SVP of the Atmos group, spoke of how the early adopters of Atmos had built newer web-savvy forms of applications that were now easily scaling to massive numbers of objects – and substantial numbers of geographically distributed nodes. Here again, scale-out principles apply, but in this case with serious distance between the nodes.
Finally, it’s clear that the use cases are rather new, and the supporting infrastructure (and processes) required can be rather new as well. And, unless the IT capability isn’t tightly linked to the specific opportunity at hand, the result will be sub-optimal.
Tom Roloff, SVP of EMC Consulting, shared his stories of specific engagements we’d had around a growing number of big data projects – from the underlying infrastructure design, build run – to the business-level consulting around the specific goals and outcomes desired by the ultimate users.
I was struck by the blindingly obvious: that there’s no such thing as a “small” big data project :-) There’s always a lot of money being put on the table to achieve a specific outcome. Hence the opportunity for consulting to accelerate the desired outcome.
Another perspective I saw was more back-to-the-future -- EMC was built on providing storage solutions for customers who used then-unprecedented amounts of information to power their businesses. I suppose that big data is nothing more than an extension of what we've always tried to do for customers.
A Thought Exercise
Many of my readers live in the IT world. Try this one on for size – just as an experiment: if you could harness a virtually unlimited amount of data (and associated compute and bandwidth), what new things would be possible in your organization?
How would that create new value? Closer relationships with customers? Could you compete more effectively? Enter new markets? Reduce certain forms of risk?
It’s a safe bet that – due to steadily improving technology and steadily declining technology costs – the previously unthinkable quickly becomes more of a reasonable proposition with every passing day.
I think it’s a very useful thought exercise.
Why?
Because somewhere, someone else in your industry is proably doing the exact same thing right now.

Comments