Sorry, another acronym alert. It's just that I get tired of typing "Data Warehousing and Business Intelligence", so I have conveniently shortened this to DW/BI.
No, I am not making up another buzzword ...
I thought I'd share with you yet-another behind-the-scenes peek into an interesting area EMC has been working on for a while, as well as some distinct customer perspectives I've gathered.
Basic Concepts
[updated on 5/27 with additional links]
As far as IT concepts go, this is not what you'd call a new topic. Production apps feed transactions into a data warehouse where they're normalized and scrubbed. Correlated information is extracted and massaged to support business decision making.
From a high level, it's a pretty simple equation: more data + fresher data + answers sooner = better decisions made sooner
Certain verticals with large numbers of customers (for example, retail and financial services) have been doing this stuff for many years. It's one of their core applications that helps run the business.
What's more interesting to me is the rapid growth of DW/BI and other forms of decision support in industries that you wouldn't initially think would be big users. And, like anything new, these IT groups may struggle a bit with an entirely new beast that's growing like a noxious weed.
From a storage vendor perspective, this is a *very* interesting topic for several reasons.
First, about 20% of all external storage ends up behind some sort of DW/BI environment. And it looks like this share is growing faster than the overall storage market. OK, you've got my attention.
But more importantly, DW/BI puts some interesting performance loads on storage arrays that -- well -- not every vendor's product can adequately handle.
Demanding DW/BI apps have this way of separating the storage men from the storage boys, so to speak.
The Dynamic In IT
If your company is proficient at large-scale DW/BI, this will all sound familiar, won't it?
The environment is now part of the operation. Things like performance, availability and recoverability really, really matter to the business. Having fresh data feeds from production sources is important as well. And, of course, everyone and their dog wants their own extract of the damn thing for their own purposes.
And when you look at all the storage required -- not only the DW/BI itself, but all the downstream variations, not to mention the reports and models -- you're looking at a considerable storage farm that's got a healthy compound growth rate.
But, to me, what's more interesting is speaking to customers who are just entering this world.
The business has selected their first application tool. Maybe even a server vendor they like. And they've asked IT to go put it on the floor.
WARNING: DW/BI can be like kudzu in your data center -- either have a foolproof plan for supporting dynamic growth, or have a plan for curtailing it. Once one part of the business is getting cool, up-to-the-minute analysis, everyone else is gonna want the same thing.
Kind of like what happened with BlackBerrys -- one day they were a novelty, the next everyone wanted one.
So, What Is EMC Doing About This?
Actually, quite a lot ...
Let's start with storage. DW/BI does best with low-latency, high-bandwidth storage interconnects. Think real-deal FC SANs for starters; not a lot of serious DW/BI gets done on garden-variety NAS or iSCSI, or FC emulators for that matter.
Turns out that if you were imagining the "perfect" storage for a larger DW/BI environment, you'd probably end up with something very like today's CLARiiON for a variety of reasons: architecture, performance, availability, functionality, support and so on.
And, oh yes, it's probably the fastest storage you can lash behind a DW/BI environment.
Most DW/BI environments seem to prefer a scale-out performance model, so the trick is to balance the storage configurations with the server configurations, as opposed to big, honkin' arrays.
Our CLARiiON engineering team has been working this angle for quite a while, and has produced some very interesting performance optimization studies that show that when you hit the "sweet spot" in terms of disk geometry, RAID grouping, controller/spindle ratios, etc. -- it really rocks.
Not to be counted out, our DMX engineering teams have been busy as well ...
Moving up the stack, there's the issue of backup and recovery. If your DW/BI environment is "operational", e.g. it can't be down for a while without serious consequences, you'll be looking at very high-speed backup and recovery. Think disk libraries, for example. Or serious remote replication. Remember, these things can get big.
In addition, we've spent some serious time partnering with almost all of the DW/BI players out there: big names like Oracle, Microsoft, SAS, Teradata, DatAllegro, ParAccel ... including this recent relationship with Netezza.
And we've also invested in creating expertise and services that can help design and implement these environments. In addition to the usual infrastructure-oriented stuff we do, I have to point out that our BusinessEdge acquisition (900+ people!) brings an entirely new level of business-strategy-oriented discussion to the table -- they've done some amazing re-engineer-the-business work, and helped more than a few large organizations figure out new ways to get serious value out of DW/BI.
But There's Far More To Do ...
Sure, you'd expect EMC to have its act together in storage, and backup, and replication, and partnerships, and services ... and support!
But, from our perspective, there's so much more we could be doing.
For example, consider RSA's new approach to Data Loss Prevention. Clearly, many DW/BI environments contain sensitive data, and being able to identify that sensitive data -- wherever it might end up -- will be increasingly important in the future.
Sure, the DW is secured, but what about all the outputs?
We know that performance is important to just about every DW/BI implementation, and that every DW/BI we've seen has an I/O hot spot in one place or another -- what can enterprise flash drives do for this situation, and at what cost?
Lots of decision support and business intelligence gets done separately from the main DW instances -- how much of these workloads could live comfortably in virtual machines, and what about the optimized environment that extracts subsets, invokes beefy computational environments, and manages the farm?
DW/BI is mostly transactional data, but many of our customers are building huge content repositories using Document D6 -- how can we make content part of the DW/BI process? And -- given all of the output of a DW/BI is essentially "rich content" that usually has some workflow applied to it, how can we connect the two?
Or, as we consider cloud and SaaS models, is operational DW/BI one of those things that IT would prefer to consume as a service, rather than a set of products on the floor?
This Much Is Clear
We're going to see more DW/BI in the future, and not less. It'll go from decision support, to operational, to business-critical for many shops.
These environments are going to need a ton of storage, and not the generic stuff either. And it'll all have to be optimzed, backed up, perhaps replicated -- the whole enchilada.
And, stepping back a bit, since these enviroments are going to be very important focal points for enterprise information, I'm sure we'll see most of the entire EMC information infrastructure portfolio pulled into this trend -- and it won't just be about storage.
I don't think this is the last post I'll be writing on this topic ...

Comments