I love these guys.
It wasn't long after their acquisition by EMC, I realized that -- not only had EMC acquired state-of-the art database technology -- but we'd acquired a world-class team as well.
Shortly after the acquisition, EMC announced the formation of a new division -- the Data Computing Division with Greenplum at its core -- and its first hardware platform, the Greenplum DCA -- data computing appliance.
This was shortly followed by a popular "community edition" of their powerful platform and toolset Their full toolset, basically free for non-production environments. More coolness.
And today, at the SAS Global Forum, they did it again: announcing not only a major integration with SAS, but entirely new versions of the DCA as well.
The Back Story
As we move from simply storing big data to creating value from it, the role of the data analytics platform becomes strategic.
This isn't your father's data warehouse; it's essentially a new requirement for a platform based on modern requirements: multiple data sources, fast ingest, supporting multiple applications and use cases in a self-service, cloud-like model.
Greenplum continues to fit this new requirement perfectly.
Since the acquisition, the new EMC team has been running hard to keep up with customer interest. At the same time, they've continued to evolve their software technology, form important new strategic relationships -- and push out some interesting pre-configured appliances -- all at once.
News Item #1 -- SAS and Greenplum Team Up On High Performance Analytics
Talk to the people who are really using this stuff to create value, and you'll quickly discover they're speed junkies. Fast is good, faster is better, right now would be best of all. And if you can do it across all my data (and not just subset!) that would be truly wonderful indeed.
In the world of analytics, SAS is the undisputed king -- since 1976, they've essentially defined the broader analytics software market. Meet someone who's using SAS extensively, and the refrain for supporting infrastructure is familiar: they want bigger, faster, better.
To meet this need, SAS is introducing a family of High-Performance Analytics products. At a glance, they appear to combine in-memory processing techniques with a shared-nothing scale-out architecture -- making Greenplum and the DCA a perfect platform fit.
Bottom line -- if your organization uses a lot of SAS and is screaming for more -- you owe it to yourself to check out this integrated combination.
News Item #2 -- New Models Of The Greenplum DCA
You can run it on your own kit, you can run it on something like a Vblock, or you can buy a dedicated appliance for the task at hand.
The first version of the DCA was surprisingly well-received, at least from my personal perspective. I had assumed that just about everyone would want to run Greenplum on general-purpose infrastructure.
I was wrong in that regard.
The appeal of the appliance approach is two-fold: customers can get started quickly with a minimum of fuss and effort, and -- in very large environments -- the benefits of being purpose-built takes on substantial importance.
The first DCA offered a nice balance between cost, performance and capacity. The two new models are optimized for performance and capacity, respectively.
The new High Performance EMC DCA is all about speed, baby. Lots of cores, lots of SSDs. And, as you'll see, when speed matters, it's got a compelling edge.
And the new High Capacity EMC DCA is all about efficiency -- storing as much information as possible in as small a cost footprint as possible.
The two new models use the same sexy packaging as the first DCA. Love that green LED light bar …
New DCA Speed And Feeds
There is nothing particularly unique or exotic about the DCA hardware -- all fairly familiar stuff -- right in the mainstream of current price/performance technology: Intel, ethernet, SAS drives, etc.
All the secret sauce is in software, not hardware.
For example, storage types are amused to discover that all primary storage is located within the server enclosures.
Although Greenplum software supports all flavors of external storage, in the DCA external storage is used only for data protection and replication.
The original DCA used up to 16 parallel segment servers in a single full rack, delivering up to 192 cores and 768 GB of RAM. Each full rack could house up to 192 internal SAS drives deliveingr 144TB of compressed capacity.
If that isn't enough, the config expands up to 6 racks. Due to the DCA's scale-out, shared-nothing architecture, performance also scales by six times as well.
The new High Capacity DCA still uses the same server/rack config, but uses high-density disk drives.
Compressed capacity now balloons to 496 TB -- that's half a petabyte -- in the same single-rack footprint as before.
Those high-density disk drives impact performance, though -- the ingest rate drops to 4.8TB per hour, the scan rate to 16GB per second.
Six racks gets you a bit less than 3PB of compressed capacity. And, of course, six times the performance.
A scan rate of 72GB per second, and the ability to ingest data *twice* as fast as the original: 20TB per hour.
For those vendor still clinging to proprietary platforms and architectures, best of luck competing here ...
What Does All Of This Mean?
The key message behind big data is opportunity: harnessing enormous amounts of information to drive new insights and better decisions. The people who have figured this out see no practical limits to what's achievable in this new world.
I look at Greenplum and DCA and think -- these are the power tools for the new world of big data analytics.
Bigger. Faster. More cost-effective. Easier to use for knowledge workers, data scientists -- and the IT groups who support them.
Add in cool and highly-relevant integrations -- such as the SAS news above -- and it's clear to me that my Greenplum colleagues at EMC are way out ahead on this one ...