I, like many, saw a brave new world of petabyte-class data sets, gleaned through by trained data science professionals using advanced algorithms — all in the hopes of bringing amazing new insights to virtually every human endeavor.
It was pretty heady stuff -- and still is.
While that vision has certainly is coming to pass in many ways, there's an interesting distinct and separate offshoot: use of big data philosophies and toolsets — but being applied to much smaller use cases with far less ambitious goals.
Call it Modest Data for lack of a better term.
No rockstars, no glitz, no glam, no amazing keynote speeches — just ordinary people getting closer to their data more efficiently and effectively than before.
That’s the fun part about technology: you put the tools in people’s hands, and they come up with all sorts of interesting ways to use it — maybe quite differently than originally intended.
It Hasn’t Been That Long …
Go back just a few years, and Big Data was like teenage sex: everyone was telling wonderful stories, but precious few had any direct experience, so to speak.
The analysts tried to define Big Data by using a variable number of “v’s”: volume, variety, velocity, veracity, variability, validity, volatility, etc. I guess more V’s made for a better definition.
The rock stars of this brave new world were going to be the hitherto unheralded data science professionals. Harvard Business School called it “the sexiest profession of the 21st century”. And an open source project known as Hadoop was thrust into the spotlight, with corresponding interest from the VC community.
To be fair, the era of really-big-data and advanced analytics is now upon us, and certainly is no fad.
But that’s not really where the action is through my jaundiced enterprise IT lens.
On To Modest Data
I would argue that the market for big data tools and infrastructure has visibly forked into a new and more pragmatic camp. Yes, we still have very Big Data, but now we also have much more Modest Data to consider.
Here’s how the conversation tends to go with an enterprise IT group that’s pursuing what I call Modest Data.
Are you using Hadoop and HDFS today? Yep, we've started to.
What are you using it for? A whole bunch of things we used to use reporting databases and data warehouses for.
Anyone doing data science or predictive analytics? No, not like you read about.
Do you call it “big data?” No …
What value do you see? Easy — the new way is cheaper, it’s faster, and everyone can get what they want.
What’s The Motivation?
The projects are starting to look familiar.
First, there’s some notion of a data lake, or data pond, or perhaps a data ocean. Look inside, and you’ll see lots and lots of data sets from all over the business, all landed in a single logical place.
Logfiles from the websites. Logfiles produced by IT. Extracts of various databases. All sorts of stuff from all sorts of places, landed in a big, shared, cost-effective NFS/HDFS repository that’s designed to be scalable and reasonably performant.
On top of that, it’s pretty much BYOT — bring your own tools. Need your data cleansed or transformed in some way? Help yourself.
Yes, there’s plenty of Hadoop use, but there’s also all sorts of BI-type data navigators (e.g. Tableau) that don’t require any coding, plus the ability for people to load up subsets into familiar databases if need be.
To hear people talk about what they're doing, the motivations are pretty consistent.
First, people are realizing that they now don’t have to spend big money on ever-bigger traditional data warehouses -- and people spend a lot of money on this stuff.
Sure, the enterprise data warehouse might still have a role (legacy never disappears overnight, heck the mainframe is alive and well) but a surprising amount can be now be done using the data pond/lake/ocean and an open tool set.
The same is generally true for the vast hordes of “departmental databases” that people use for playing with their data. Did I ever tell you about the customer who woke up one day and realized there were over 1,500 SQLserver reporting instances running in his shop?
Sure, there are visible cost savings to be had — that much is clear. And the new tools can be wicked fast compared to their legacy counterparts. But I think there's a more important motivation: access and empowerment.
On a broader note, it’s hard to make any sort of informed business decision these days without flexible and convenient access to data and analytics. Everyone is turning out to be a consumer, and the historical one-size-fits-all data access approaches (using only approved and cleansed corporate data!) is inherently limiting.
Smart IT organizations are realizing the limits of the traditional approaches — economic, operational, lack of agility, etc. — and are highly motivated to provide their power users with something better -- and more cost-effective
Data Lake + BYOT = BAaaS: business analytics as a service.
Two Important Innovations?
I need to point to two rather important recent developments that I think is leading to the recent explosion in interest.
One is with Hadoop itself — it now can support familiar SQL access methods — which brings along a gigantic ready-made universe of SQL-compatible tools and people who know how to use them. That’s huge.
These people (and their tools) don’t really care what’s organizing the data behind the scenes, so why not use HDFS?
I would put Pivotal’s recent HAWQ announcement in this category, among others.
The second is that modest-sized Hadoop environments (or any analytical environment) are getting far easier to instantiate and consume, using the same infrastructure and processes used elsewhere in IT — thanks to virtualization and the proliferation of infrastructure-as-a-service.
I would put VMware’s Big Data Extensions announcement in this category — surprisingly popular.
Power Users Everywhere?
So much is made about the marketing function leading the charge in driving the use of analytics, but the picture is much broader than that. HR teams are getting into the action, as is manufacturing, customer service, finance, sales, etc. — it’s hard to imagine any part of a modern business that doesn’t want flexible and convenient access to analytics.
Even the IT team is getting into the water …
Delivering IT services is a business like any other, so IT professionals are finding that the toolsets and models can find ample use in their own domains.
Security professionals who analyze logs to go beyond what the packaged tools can tell them. IT finance professionals who need to forecast demand. Infrastructure professionals who want to come up with reasonable sizing models. Operations people who want to do a better job in finding and fixing problems.
For everyone, the message is simple: the data is all there — it just needs to be harnessed.
Where Do We Go From Here?
Just like Linux eventually found its way into the enterprise and displaced more proprietary operating systems, the Hadoop ecosystem is moving in the same direction for data reporting and analysis. But things seem to be moving much, much faster this time around.
I guess the motivations are pretty compelling, especially if you’re a heavy user, e.g. addicted to analytics.
But I also think this is part of the broader story around the ongoing reshaping of the relationship between IT and the business: IT is responsible for providing cost-effective services that the business wants to consume, and the business is responsible for consuming them intelligently.
Although the term “Modest Data” isn’t nearly as sexy …
Like this post? Why not subscribe via email?