A seriously cool announcement from the EMC Greenplum folks -- a new "Community Edition" of their powerful database, as well as related open-source analytical power tools to put all that big data to work.
All free.
I had a serious fling with analytics a long time ago. Maybe it's time to rekindle that old flame :-)
The Context
From EMC's perspective, there's an important new trend we're investing in: big data.
The idea is simple: using enormous amounts of information to generate entirely new forms of value. Examples can be found in just about every discipline you'd care to conisder. Many of these new use cases focus on analyzing billions of facts in entirely new ways -- power analytics, as I've come to call them.
We're starting to see a new warrior class of interdisciplinary data scientists emerge that are very comfortable with a variety of diverse data sources (and an equally diverse variety of analytical tools) as they repeatedly poke and prod extremely large data sets to unlock new insight.
This isn't your father's classical Enterprise Data Warehouse -- the familiar world standardized reports, over-cleansed data, what-happened-last-month sort of processing.
Clearly, the supporting infrastructure to do this work at scale will be very cloud-like -- fully virtualized, scale-out, etc. But what about the software stack?
The Story Begins
About 18 months ago -- around when EMC and Greenplum started seriously dating -- Greenplum put out a freebie single-node edition of their database platform. All the significant functionality of their full-blown version, except for multi-node performance scaling.
Click. Download. Done.
It was very, very popular -- tens of thousands of downloads. When we started to track down what people were using it for, we found a fascinating pattern.
Someone very bright had decided that more could be done with data. They didn't have enough sponsorship for a full-blown Officially Sanctioned And Funded Project, but they had managed to cobble together some servers and a decent amount of storage.
And they were using the single-node edition to prove to their organizations that -- yes -- much more was possible with the data at hand.
That's cool, if you think about it.
Feeding A Great Idea
With this announcement, there's even more.
First, the new "community edition" can do more than the previous single-node version. Fundamentally, the only thing really different between this version and the more traditional one is the support model.
Second, you can download a version nicely packaged as a virtual machine, ready to run on your choice of hardware. Me? I'll take a Vbock, thank you ..
Third, the distro includes two of the most popular open source tools for working with big data analytics: the popular MADlib analytics library, and the Alpine visual data modeller.
Finally -- if all goes well -- we'll actually have a community of engaged people, brought together by a shared passion, all using these newer tools to create new value from their data -- sharing insights, offering suggestions how to make things better, and generally doing very cool things with massive data sets.
Needless to say, this approach stands in sharp contrast to most of the data warehouse vendors out there today -- their business models are inevitably wedded to expensive licensed software, proprietary hardware platforms, or a bit of both :-)
Revolution From The Bottom Up?
Occasionally, I'll meet a large IT organization that "gets it" when it comes to the new style of mining large amounts of data for fun and profit. They're organized for success: the business users, data scientists and supporting IT team are all on the same page.
We can make our case for doing it with EMC, and -- generally speaking -- we do pretty well
here.
However, it's far more the case that there's a nucleus of bright, passionate people flying a bit below the official horizon. There's no real funding model. There's no formal support from the IT organization. Just a cadre of forward-looking analytics passionistas.
To the extent we can support their efforts with modern software tools, an open-source-ish acquisition model and a robust community -- well, that's a good thing.
Many of these people will be successful in convincing their management to formally invest in these capabilities -- as a matter of business strategy vs. IT efficiency.
Not surprisingly, we're hoping that they'll want more of what they've learned to use already :)
Me? I'm scratching my head trying to come up with some plausible reason why I should go learn all these new tools.
The lure of big data is calling :-)

Comments