IT refashions itself as the internal service provider of choice: creating services that business people want to consume.
But -- once this model gets past the garden variety stuff: better end-user computing, enterprise apps, collaboration, etc. -- what are the killer new services that the business crowd is thirsty for -- and willing to pay for?
It's data -- and lots of it. Gathered from as many sources as possible, analyzed in a variety of ways, and consumed and shared as efficiently as possible. Big data analytics.
Internally here at EMC, we've dubbed our platform BAaaS -- business analytics as a service. From humble beginnings, we're now about a year into the effort.
I've already shared what we're starting to learn from a business perspective: transforming your business to run on predictive analytics is certainly a journey, and in many regards we've only just begun.
But there's a parallel story from the EMC IT side who's racing like crazy to keep up with the new cadre of data junkies we're earnestly creating.
EMC IT BAaaS -- The Mission
The internal summit was organized and led by our EMC IT "dream team": Narayanan (KK) Krishnakumar (VP and Chief Architect EMC IT), Ramesh Razdan (VP Cloud and Big Data Services) and Sean Brown (Director, Enterprise Analytics).
Ramesh led us off with reminding everyone what we were doing, why it was important, and where we were in our journey.
Historically, EMC has invested in all manner of data warehouses, reporting tools and familiar business intelligence platforms. Data, data everywhere ...
The EMC IT team believes its important to draw a sharp distinction between the previous data harvesting world, and the new one.
His slide points out many of the key differences between how things were historically done, and how they're being done now. The old world is still useful and important -- it doesn't go away -- but the investment pattern going forward is around getting better at predictive analytics, and all that it entails.
EMC IT BAaaS -- The Journey
Historically, we believe we started our internal efforts many years ago. There was a concerted effort over a multi-year period to get our enterprise data into decent shape: consolidated platforms, master data management, standardized reporting and the like.
While not an answer in itself, our enterprise data management environment had grown like our company, and the baseline wasn't good. However, that was not an end unto itself -- simply a starting point for what was to come.
The next phase -- first experiments in 2011, a formal set of initiatives in 2012 -- was to bring what we had learned from our work with Greenplum and data science into our own organization. Create a platform that makes data easy to discover and explore. Add in a critical mass of required skills (e.g. advanced analytics) to bring more value. Work towards creating a shared platform that could be used across EMC's multiple business functions.
The final phase -- where we're just approaching -- is to get better at operationalizing our findings. Whether that is new process or new applications -- big data doesn't create value until it's put into practice. And that might end up being the hardest part of our journey.
Understanding The (Internal) Customers
If BAaaS is a service, who's using it and how are they using it?
The basic arrangement is hub-and-spoke. While there are common services and data sources used by all, each tenant in the environment has their own workspace with (usually) their own data sets, tools and interesting questions that they are pursuing.
The current environment, while healthy in size, is not especially enormous: 12 paying tenants, approximately 24 TB of usable capacity, 366 named users running approximately 20K queries per day.
A few points of clarification. First, most of the "interesting" shared data is currently outside of this environment, but more is being brought in quickly -- so that's a fast growth vector.
Specifically, Sean (who does most of the heavy lifting behind the platform) shared that he's planning to be well over 100TB in the shared pool before too long.
Indeed, data sourcing is a big issue that everyone is grappling with these days. Second, individual workspaces (especially the proficient teams) are growing almost non-linearly, as you might expect.
There's a big number tied to the business value documented to date -- as measured by the business users.
What's The Mission In 2013?
Ramesh shares four key themes the teams are working on this year.
The first is getting better at putting insight into practice. We've started to uncover all sorts of nuggets and insights from our current environment, and it's not hard to see that the backlog will quickly become putting the findings into production.
The second is refining the joint IT/business delivery model. The roles, interactions and responsibilities in the cross-functional teams are evolving quickly, and entirely new important roles are popping up (e.g. data engineer). What worked in the past won't necessarily work in the future, so everyone is paying very close attention to how we organize for success.
Part of this increased focus is strongly motivated by bringing business SMEs (subject matter experts) closer into the action. Instead of requirements being simply thrown over the wall, the successful pattern is a tightly integrated cross-functional team where all the disciplines work closely together.
And, while we've certainly seen the power of harnessing unstructured data in many of our internal examples, we've just started to scratch the surface of what's theoretically available to us.
Selling The Service
Part of any ITaaS transformation is learning how to market yourself to your internal users, and BAaaS is no exception.
Here's the "what is it?" slide that's used internally to position the service to potential internal customers.
It's described as a "hosted business analytics service" -- data hosting, plus everything you'll need to actually extract value from the data at hand.
You as business user bring the interesting business questions -- and some resources -- and the environment will help you get where you're going faster and better.
And not everyone is entirely clear on the business value quite yet -- so there's some positioning around why you -- as business user -- might be interested in this particular service from EMC IT.
Most of the uses to date wouldn't be described as "production", although I'm sure the services are very important to the people who are using it.
The majority of uses to data have been around experimentation and proof-of-concept: is there something there to be learned? Of course, one experiment leads to another, so don't think users abandon the service once their first question has been answered.
There's also a healthy component of one-time analytics: an important business question that only gets asked periodically vs. repeatedly. Every time the question gets asked, there's new data and new data sources at hand, so it's not simply doing what you did the last time.
Finally, there's a small but important group of use cases that has progressed to the point where it's time to go invest something that's purpose-built. And I'm sure we'll see far more of that before long.
Using The Services
There's a growing list of simple services available that don't require much in the way of IT involvement, but that's not where the action is. More frequently, business stakeholders come forward with very specific and demanding requirements that require a more formal up-front engagement.
Many of the back-end services are relatively easy to provision, but there's still a lot of hand-holding as business groups more fully understand what they're really looking for, what data is actually sourceable, etc.
The other thing you'll note is that realistic costs are shared at every point in the discussion. These environments -- while much more efficient than the ones they replace -- still consume a healthy ration of IT resources. And it's frequently the case that non-IT people don't fully appreciate the costs involved :)
Serious business questions demand a serious IT response -- and that's the process we're using today. I would guess that when we come back and look at this next year, we'll find an increased suite of no-questions-asked BAaaS services, as well as a further streamlined engagement model.
The Changing Business/IT Model
Much was said around how the interaction between business and IT changes in this environment, and I thought this slide told a useful story.
The historical model was based on the presumption that most business people didn't really want to get into the weeds when it came to data: sourcing it, analyzing it, summarizing it, etc. The new model assumes just the opposite: business people really do want to get close to their data and understand it as much as possible.
IT is still responsible for accessing the enterprise information stores, and will never be relieved of the enforcing information access policies. But in this model, IT has a new responsibility: making enterprise data easy to discover, source, interpret and use.
And that's turning out to be far harder than it might sound.
The Enterprise Information Shopping Mall
Every tenant of this environment has two classes of data: internal data that's used across multiple EMC entities, and data that's somewhat unique to the task at hand.
Since information sourcing has turned out to be very difficult and time-consuming, the EMC IT team is working hard to increase the size and the richness of the easy-to-get-to shared information store available to all tenants.
They've started with familiar classes of data that any business like ours would have: customers, products, key processes, financials, people, etc.
It's not enough -- but how do you think about getting more?
The approach is rather clever. Each tenant of the environment will usually want to source data outside of what's available in the common pool. Use those project requirements to drive the priorities for increasing the elements in shared enterprise data lake. The marketing guys go deep on customers. The sales team goes deep on products and territories. The manufacturing gang is very interested in components and processes -- and so on.
One of the strong advantages of this sort of BAaaS shared service approach is that business people will end up telling you what data sets matter, and which ones are worth investing in to make them more accessible.
Governance Is Not A Dirty Word
To say that there's a note of anxiety associated with all this newfangled data sharing would be an understatement.
Not only are there legitimate concerns about data being used inappropriately out-of-context, EMC like all public companies is subject to routine compliance audits, and you have to be very specific as to who has access to certain information elements.
That's where good governance comes in -- a cross-functional forum where the opportunities and risks can be fully evaluated, and prudent decisions made -- even revisited as circumstances warrant. Note that IT is a participant of the BAaaS governance function, but it's essentially seen as a business role, and not a technology one.
Remember, this event was a kind of internal user-group meeting for BAaaS, so no opening presentation would be complete without a "what's coming soon" session.
Rather than focus on the specifics, I look at their roadmap as a proxy for demand we'll likely see elsewhere.
The first tranche of new service offerings are all about training. Here at EMC, we have plenty of SAS-trained and Business Objects-trained folks, but we're starting to gravitate towards new tools to "surf" multiple data sets at the outset, and the rising star here is Tableau, so Tableau training is in order.
We're still heavy users of structured data, and almost all of that is ending up in the Greenplum database (now part of Pivotal), so there are new training courses to teach people how do to this effectively.
There's also a need for training people on the data sets themselves: what's out there, how it was gathered, how it might be used, etc.
Hadn't thought of that, but it makes a certain sense -- more users are now working with cross-functional data, and the meta-meta-data needs to be exposed and shared in a consistent fashion.
I mentioned before we had just beginning to get started with unstructured data and Hadoop; part of the acceleration recipe is practical hands-on around how to use Hadoop to answer practical business questions -- without having a PhD in math, that is.
More is coming in the service catalog itself -- for starters, the Hadoop-related services are now becoming mainstreamed, and can be ordered up as easily as, say, a structured database environment.
Up to now, the enterprise data hub has been largely built by IT. Going forward, there will be new facilities for tenants of the environment to publish data sets and models for others to use.
And, finally, tools -- and lots of them. All of these were being used by one group or another -- there's now enough overlap in requirements that they're finding their way into the shared tool store for all to use. In particular, in-database analytics is finding strong usage, as well as tools for text analytics.
Challenges Still Remain
Nothing is ever perfect, and the EMC IT team is fully cognizant of the work that lies ahead.
As mentioned before, one big challenge is simply shortening cycle time around data provisioning. Few, if any, of our enterprise applications were built to easily source data -- and that's going to take work on a variety of fronts.
Another challenge is helping business users understand what's available and how to use it. In addition to training, the team will be piloting two enterprise data glossary tools to see if that's a workable path.
While there's some good collaboration between the tenants of the environment, there's still more to do here: more use of the Chorus collaboration environment, as well as documented processes (and incentives!) for sharing your sandbox with others.
The issues with workload management will undoubtedly be with us for some time to come: there will always be more user demand than supply in most. The platform is a shared resource for very good reasons, and that comes with the territory. And the IT environment itself is only semi-automated today -- there's still more work to do in instrumenting and automating common provisioning requests.
All achievable -- all it take is hard work!
A Few Final Thoughts?
It's one thing to proselytize people around the amazing power of big data analytics -- especially in corporate settings.
And it's another thing entirely to roll up your sleeves, and start the work around building a self-service shared platform that the entire business can use.
While the technology challenges are not inconsequential, the real effort is in the soft stuff: organizing for success, investing in the training required, collaborating and communicating with new people in new ways, all wrapped up in a progressive governance model.
What's really happening here is that we're weaning ourselves from our familiar HIPPO decision making model (highest paid person's opinion), and getting far more comfortable exploiting unexpected insights that might challenge conventional wisdom.
I think it's a journey most proficient organizations will find themselves grappling with before too long.
And I'm proud that I work for a company that recognizes the opportunity and is not shy about investing in the hard work required.
Congrats to the EMC IT team for having the vision to stay ahead of the business :)