There are some lessons to be learned from this experience:
#1 -- Don't always believe what you read online from an industry reporter, and
#2 -- Before being critical, do your homework.
I made both mistakes. But, wonderfully, I recently had a chance to talk to the principal behind the story -- Mike Miller -- and found the reality to be far more fascinating.
To Begin With
Mike Miller is the Senior Director for Pfizer's HPC (high performance computing) group. Needless to say, being proficient at HPC is integral to Pfizer's extensive R+D business. Mike is the primary guy behind Pfizer's use of Amazon's capabilities, specifically their VPC offering.
And he tells a great story.
The ProblemAlthough Mike is part of mainstream IT organization, he's looking after the needs of a very specific and important business function: R+D. If you’re someone like Pfizer, that’s the core engine that keeps the business profitable.
The challenge?
R+D’s use of HPC resources is unimaginably bursty and diverse, where on any given day one of 1000 different applications will be run. Periodically enormous projects (of very short duration!) come up very quickly, driven by new science or insights, which sometimes are required to make key financial or strategic decisions with vast amounts of money at stake for the business.
As a result, there's no real ability to forecast or plan in any sort of traditional IT sense. The HPC team has to be able to respond in a matter of days to huge requests for on-demand resources -- far outside the normal peaks and valleys you'd find in most traditional IT settings.
Sizing any sort of infrastructure to handle these dramatic peaks in a timely manner would be economically unwise -- the peaks are short and dramatic, and the concept of "average utilization" isn't really helpful for these workloads – sort of like building an amphitheater for 10,000 people, and only using it four days a year.It's pretty obvious: for this use case, renting makes far more sense than buying.
The Approach
During our conversation, Mike was very clear that -- for the types of workloads he has to deal with -- it's mostly about memory, and considerably less about CPU and storage capacity/bandwidth. The most interesting workloads in his world tend to take the most memory.
Amazon (and presumably similar services) offer modest amounts of memory per core, but not really what's needed for many of these on-demand high-priority workloads. Fortunately, he's got an internal HPC setup that's tailored for just these sorts of problems, but – as you’d expect -- it's frequently kept busy with more mundane tasks.
Here's the interesting bit: when it's fire drill time, the more mundane tasks are shipped out to the cloud, and the specialized HPC complex is turned loose on the most critical workloads.
Most of these "second tier" workloads are, by the very nature of research unique, many aren't using particularly large data sets, and the data itself tends to be rather static -- making the data logistics challenges associated with using external cloud resources far more amenable. Clever.
Other Interesting Bits
Mike and the Pfizer team have built some interesting capabilities on top of Amazon's VPC. First, they're using the exact same run-time operating system and associated stack both internally and externally. Second, they've customized the built their own workflow scheduler that understands both pools of resources (internal and external), and can make smart decisions as to what runs where -- and at what cost.
Going a bit deeper, they settled on AFS as the best way to manage the namespace and the associated data -- a technology I hadn't heard about for a very long time. The net result is the quintessential "pool of resources" that defines one important aspect of cloud.
The best part? The HPC team exposes resource costs very transparently back to the researchers in realtime so users can make intelligent decisions on costs vs. priorities. And they make some interesting decisions as a result.
For example, one of the costs that has recently come into focus is software licensing. Some vendors license by cores used -- more cores, more license costs. Others are taking a slightly more progressive view of charging by data center location.
All kinds of software licensing costs are fed into the cost transparency tool, and can sometimes result in interesting decisions by the researchers to prefer particular applications over others -- simply because of the underlying economics involved.
Capitalism at work :-)
What About Security And Compliance?
Mike said he was fortunate enough to be working with his compliance team as a business partner. Once the business rationale was clear to everyone, his compliance team not only actively engaged, but came back with a list of areas where compliance and security could actually be improved by leveraging some of the underlying capabilities more fully.
Upfront Vs. Operating Expenses
Now, to be fair, there was a considerable amount of up-front work required to work with Amazon, integrate various bits of software, certify the environment as compliant, and so on. Mike and the Pfizer team were out on the leading edge in actively pursuing this type of approach -- not a lot of off-the-shelf was available for them.
But, that being said, most of that solution integration was a one-time, up-front investment. Operationally, the approach delivers quantifiable business value, and the benefits continue to accrue with each passing month.
For example, one of the benefits he mentioned is that he gets timely access to modern CPU technology. Most cloud providers keep their stuff pretty fresh vs. the 3-5 year depreciation cycle associated with most traditional IT models. Pretty useful if your workloads are somewhat computationally intense, no?
Working With Traditional IT
Understandably, Mike and his HPC group is targeting a very specific workload and use case. It's very cool and interesting, but – you’ll have to admit – it’s not exactly traditional IT. I asked him about the rest of Pfizer's IT portfolio, and the interaction between what he's been doing, and what the rest of the IT group intends to do going forward.
Needless to day, the broader IT organization is intensely interested in what Mike and his team are doing -- all the way up to senior levels. He thought that they were going to work towards some sort of internal cloud capability going forward to support the more traditional IT use cases.
Mike isn't what you'd call a "cloud passionista" -- he came across as a technology pragmatist: his company had a problem, and he came up with a brilliant solution. He's the first to admit that it's early days, there's more work to do, better answers may present themselves in the future, and so on.
Going out a bit, Mike saw that -- over time - the potential existed for a standardized continuum of services and capabilities between his HPC function and the broader IT infrastructure requirements. He's essentially right, but getting there might take a bit of work :-)Putting It All Together
In effect, Mike and the Pfizer team have fashioned Amazon's public service to be part of their private cloud -- it's a compatible, logical and controlled extension of the in-house IT environment. He's done a great job of blurring the lines between "internal IT" and "external IT".
You don't hear many stories like his where IT delivers this sort of innovation that really makes a difference to the business. Pfizer should consider themselves fortunate for having a team who can bring a vision like this to fruition, and make it work day in and day out.
I hope he and his team gets the recognition they deserve for the pioneering work they've done.Thanks for sharing your story, Mike!

After reading this story, my first thought was: If they can package the processes and lessons learned, perhaps they could use this IP to get Amazon, or someone else, to sell and implement a version of it to other end customers. Then you have IT not only being a strategic corporate asset but a revenue-stream creator. Kind of like what Amazon did.
Brian
@bschwartz
Posted by: Brian Schwartz | August 25, 2010 at 05:14 PM