Cool-hunting at EMC is much more fun than it should be. Over time, you keep finding the same people doing really cool stuff. And there are a lot of them here.
One of our most visible cool-masters is Nick Weaver (@lynxbat). You might know him as the guy that did all those great "uber" Celerra emulator releases.
Or the guy who built the ultimate VMware partition alignment tool and handed it out to everyone. Or perhaps the mad scientist behind many of the face-melting demos you've seen at EMC World. Or a whole bunch of other cool stuff.
Well, Nick is now part of the CTO Office at EMC, which means he now has the opportunity to do even more amazing things. That's part of the fun of working at EMC. And, with this announcement, I think he's surpassed himself.
And, in the process, he's made EMC an essential part of the burgeoning open stack cloud and devops discussion.
Welcome To The Agnostic Infrastructure Cloud
At broad level, the clouderati divide into roughly two camps.
The "vendor camp" (best represented by EMC, VMware, Cisco and VCE) think of cloud infrastructure as an enterprise product: engineered software and hardware wrapped in enterprise-class service and support.
To be clear, we've done very, very well with that approach -- no regrets.
The "community camp" tends to look at the world differently: open source stacks, commodity hardware, and community support models. The key actor in this agnostic cloud world is the devops professional: the person who thinks of infrastructure as code to automate processes at scale.
And, up to now, we really didn't have much for these folks. But we do now.
Imagine The Problem
Let's say you're the leader of a #devops team, and you're in charge of a very large (and very diverse) homegrown cloud infrastructure. You've got a few thorny problems you're wrestling with.
First, there's the problem of simply inventorying your compute farm as it quickly morphs. You'll never have the luxury of an homogenous standard: there will be always be a veritable zoo of different server types, capabilities and vintages.
Second, you're unlikely to be able to standardize on a hypervisor. You'll need to be able to provision many different flavors of virtual stacks: VMware, Microsoft, Ubuntu, etc. etc. And, of course, you'll also want the capability to provision bare-metal instances as needed.
Third, you're going to want open tools that work together vs. a pre-packaged, tightly-integrated uber approach. You're looking for componentry to orchestrate with your own code and workflows vs. some vendor's idea of what an all-in-one tool looks like.
Fourth, you need far more control over your environment. You need better ways to categorize things. You need the ability to finely control how resources get provisioned, instantiated and booted. If you're ahead of the curve, you'd like the ability to audit and log everything that happens in your compute environment.
Ideally, you'd be building models of existing state, desired state and rules to bring the former towards the latter. Maybe you've looked around, and haven't been happy with the tools you've looked at.
Well, now there's a new one to consider -- Razor, from EMC.
The Back Story
Nick started looking at this a while ago, and went through all the tools that were out there. He didn't find a single thing he liked for one reason or another. So, like any good software engineer, he went about building a better widget.
First, Nick thinks in terms of tools he'd like to use vs. packaged products. That's important when you're creating capabilities for other devops. Second, he's a strong believer tools that do one thing very well vs. tools that overreach and end up not doing anything particularly well. Razor has been designed to work with other popular open management frameworks, specifically Puppet in this instance. Third, he's a strong believer in extensibility -- everything is opened up to encourage as much adaptation and innovation as possible by the people who use the tools.
The server-side is written mostly in Ruby on top of a MongoDB back end. Razor servers act independently, enabling many hundreds (or potentially thousands) of Razor servers, each potentially supporting many hundreds (or thousands) of server images.
The microkernel is written in Linux, and is a small 20MB that gets instantiated over something like PXE at initial hardware boot time. In about 7 seconds, the Razor microkernel inventories the available hardware (CPU, memory, ports, etc.) and reports back to the Razor server. As we'll soon see, the microkernel then serves as a dynamic arbitration point for coordinating boot activities.
The Razor server inventories information supplied by the microkernel: mostly physical attributes about the specific server hardware. This node information can be tagged into logical and overlapping groups using just about any scheme you can imagine.
The server also house definitions of model templates and associated policies that you'd like to use in booting individual server images: preferred stack, vSwitch ports, VMware license keys, etc.
Here's where it starts to get interesting ...
Traditionally, installing a server image (and booting it) involves servicing up a bunch of static text files, images and scripts. Together, the Razor server and microkernel create series of callback communications that enable some pretty cool stuff.
Like being able to compose installation sequences (and elements) on the fly, based on policies you've set. Or, more importantly, being able to create simple state machines that record the progress the installation and boot sequences -- with dynamic logic being able to be inserted as needed.
Step 135 in your configuration process 26 failed for some reason? Catch it, intercept it, and decide what you'd like to do -- try again, try something different, try it somewhere else, etc.
To help with this, Nick has built a rule processing engine that works alongside the state machine, which processes rules sequentially "just like a firewall does" as he says.
He's written the front end in node.js, which includes both a broker plug-in and target for the Puppet environment in current form. The consistent model now enables all sort of stateful workflows between different "puppet masters" who might be doing different things against the same pool of shared resources, e.g. a development team vs. a production team.
And, just to remind you, all this happens on just about any x86 server, any hypervisor stack, or bare metal if you prefer. It happens fast, reliably and at gargantuan scale.
Nobody we've talked to is aware of anything like it in the industry today -- including a number of open cloud devops jockeys who weren't really expecting something like this from EMC.
What Happens Now?
The plan now is to open source the code via Puppet's community model. Our hope -- as always -- is that we'll be able to tap the deep talent pool that's out there to improve and refine what Razor does.
It doesn't take to long to realize that there are some interesting areas where this could potentially go over time. Obviously, what's been done for server resources could also be applied to storage and perhaps network. And, of course, EMC has some nice upper level IT governance management framework tools (e.g. Archer, Ionix) where policy can be specified and reported on.
Perhaps the most interesting pathway (at least, for me) is what this could potentially bring to the trusted cloud discussion. If you remember back, we've been doing some low-level work with Intel and their hardware-based trusted compute capabilities.
Now, add in the ability to dynamically intercept, mediate, verify and log each and every object touched by a installing, booting server -- and work that interaction against an extensible model with associated rules -- well, it could get very interesting indeed.
What would you do with capability?
I can't wait to find out how people are going to use it ...