Several years ago, it became clear to me that the next aspirational model for enterprise IT was “IT as a Service”, or ITaaS.
At its core was a simple yet powerful idea: that the core IT operational model should be refashioned around the convenient consumption of IT services.
Under the ITaaS model, most everything IT does is now presented as a variable service, marketed to users, with supply driven by the resulting demand.
IT becomes the internal service provider of choice.
Now, several years later, that once-controversial idea has clearly grown deep roots, with many examples of progressive IT organizations embracing this perspective. Some have made the transition, some are mid-journey, others have yet to begin. The IT world has moved forward.
So, it’s fair to ask — what might come next? I have a strong suspicion as to what the next operational model will be.
Automation: The Ultimate IT Productivity Lever
"Give me a lever long enough, and a fulcrum one which to place it, and I will move the world" -- Archimedes
When it comes to continually improving IT productivity, automation is that lever. It's the gift that keeps on giving when it comes to IT outcomes. Progressively improved automation means progressively improved capex and opex efficiency, fewer errors, more responsive reactions — done right, everything gets better and better.
It’s not just an IT thing: you’ll seem the same continuing automation investment patterns in manufacturing, logistics, consumer marketing — any endeavor where core processes are important.
Broadly speaking, there are two approaches to how one goes about automating IT. Many think in terms of bottoms-up: take individual, domain-specific repetitive tasks, and automate them — perhaps in the form of a script, or similar.
The results are incremental, not transformational.
During the early days of telephony, switchboard operator productivity was limited by the reach of the operator’s arms. Someone came up with the idea of putting wheels on the chairs. Clever, but only modest productivity gains resulted — what was needed was a re-thinking of the problem at hand.
We’ve got the same situation in IT automation: we’re not after mere incremental improvements, what we really want is a sequence of order-of-magnitude improvements. And to do that, we need to think top-down vs. bottom-up.
Starting At The Top
Since IT is all about application delivery, applications logically become the top of the stack. Approached that way, automation becomes about meeting the needs of the application, expressed in a manifest that we refer to here as “policy”.
We need to be specific here, as the notion of “policy” is so broad it can conceivably be applied almost anywhere in the IT stack, e.g. what rebuild approach do you want to use for this specific disk drive?
Indeed, listen to most IT vendors and you’ll hear the word “policy” used liberally. To be clear, policies can nest — with higher-level policies invoking lower-level ones.
For this conversation, however, we’re specifically referring to top-level policies associated with groups of applications.
The Big Ideas Behind (Application) Policy
The core idea behind “policy” is simple: policies express desired outcomes, and not detailed specifications for achieving that outcome. Policies are a powerful abstraction that has the potential to dramatically simplify many aspects of IT operations.
Speaking broadly, application policies could address three scenarios: normal day-to-day operations, constrained operations (e.g. insufficient resources), and special events (e.g. an outage, software updates, maintenance windows, etc.)
In addition to being a convenient shorthand to expressing requirements, policies are also an effective construct to manage change. When application requirements shift — as they often do — a new policy is applied, which results in the required changes being cascaded through the IT infrastructure. Or perhaps model what a change in policy might do.
Finally, compliance checking — at a high level — becomes conceptually simple. From controlling updates to monitoring service delivery: here is what the policy specifies — is it being done? And if not, what is needed to bring things into compliance?
You end up with a nice, unambiguous closed-loop system.
The ones that work well seem to be the ones that outline broad objectives and provide guidelines or suggestions. The ones that seem to cause friction are the ones that are overly specific and detailed, and hence constraining.
Simple Example #1
Let’s take an ordinary provisioning example of an application. At one level, you can think of a policy as a laundry list of resources and services required: this much compute, memory, storage, bandwidth, these data protection services, this much security, etc.
So far, so good. Our notion of policy is focused more on what’s needed, rather than how it’s actually done. But, since we’re presumably working against a shared pool of resources, we have to go a bit further, and prioritize how important this request might be vs. all other potential requests.
Let’s arbitrarily designate this particular application as “business support”. It’s somewhat important (isn’t everything?) -- but not as important as either mission-critical nor business-critical applications.
It needs reasonable performance, but not at the expense of more important applications. It needs a modicum of data protection and resiliency, but can’t justify anything much more than the basics. It has no special security or compliance requirements, other than the baseline for internal applications.
The average large enterprise might have hundreds (or perhaps many thousands) of applications that fall into this category.
Under normal conditions, all requests using this policy are granted (and ideally paid for) as you’d expect. But what if resources become constrained?
Yes, your "business support" application will get the requested vCPUs and memory, but — if things get tight — it may not get what you wanted, perhaps temporarily. Here’s the storage requested, but if we come up short, your application may be moved to cheaper/slower stuff and/or we’ll turn on dedupe. Here’s the network connectivity you requested for your app, but … you get the idea.
Our expanded notion of application policy not only can be used to specify what’s required, but also how to prioritize this request against other competing requests for the same shared resources. Why? Resource constraints are a fact of life in any efficiently-run shared resource (or cloud, if you prefer).
Let’s take our idea a bit further. The other scarce resource we can prioritize using this scheme is “IT admin attention”. Since our business support application isn’t as critical as others, that implies that any errors or alarms associated with it aren’t as critical either.
What about the final situation — a “special event”, such as hardware failure or software upgrade? No surprise — lower priority.
Just to summarize, our notion of application policy not only addressed the resources and services desired at provisioning time, but also gave guidance on how to prioritize these requests, how closely it should be monitored, how tightly the environment needs to be controlled, etc.
All in one convenient, compact and machine-readable description that follows the application wherever it goes.
Now To The Other Side Of The Spectrum
Let’s see how this same policy-centric thinking can be applied to a mission-critical application.
Once again, our application policy has specified desired resources and services needed (compute, memory, storage, bandwidth, data protection, security, etc.) but now we need to go in the other direction.
If this particular mission-critical application isn’t meeting its performance objectives, one policy recourse might be to issue a prioritized request for more resources — potentially at the expense of less-critical applications. Yes, life is unfair.
When it comes to critical services (e.g. high availability, disaster recovery, security, etc.) we’d want continual compliance checking to ensure that the requested services are in place, and functioning properly.
And, when we consider a “special event” (e.g. data center failure, update, etc.), we’d want to make sure our process and capabilities were iron-clad, e.g. no introducing new software components until testing has completed, and a back-out capability is in place.
But Isn’t That What We’re Doing Today?
Yes and no.
We tend to naturally think in terms of classes of services, prioritization, standard processes, etc. That’s typical. And, certainly, we're using individual policy-based management tools in isolated domains: security, networking, perhaps storage and so on.
With this approach we don't have to limit ourselves to a few, impossibly broad policy buckets, like "business critical". We can precisely specify what each and every application might need, separately and independently.
It seems to be a truism that IT spends 80% of their time on 20% of the applications -- mostly because their requirements are unique.
Application policy can easily capture -- and automate -- the 'exceptions" to standard buckets.
Taking The Next Step Forward — Hard, or Easy?
When I consider the previous transformation from typical silo-oriented IT to ITaaS it’s often proven to be difficult and painful. When you undertake to change IT’s basic operating model, it demands strong, consistent leadership.
It’s not just that new approaches have to be learned, it’s just that so much has to be unlearned.
And the ITaaS transformation isn’t just limited to the IT function. Not only does IT need to learn how to produce differently, the business also needs to learn how to consume (and pay for services) differently.
But for those who have already made this investment — and you know who you are — the next step to policy-based automation is comparatively easy. Indeed, in many ways it will be a natural progression, resulting from the need for continually improving and impactful automation.
To achieve this desirable outcome on a broader scale, there are more than a few hurdles to consider.
First, all participants and components in a policy-driven IT environment need to be able to react consistently to external policy.
This, in many ways, is software-defined in a nutshell. Indeed, when I'm asked "why software defined?" my knee-jerk response is "to better automate".
Servers need to react. Networks need to react. Storage needs to react. Data protection and security and everything else needs to react. All driven by policy.
Policy responses can’t be intrinsic to specific vendor devices or subsystems, accessed only using proprietary mechanisms. Consistency is essential. Without consistency, automatic workflows and policy pushes quickly become manual (or perhaps semi-automated), with productivity being inherently lost.
In larger enterprise environments, achieving even minimal consistency is no trivial task. Hence the motivation behind software-defined.
Second, serious process work is required to formally document actionable policies in machine-readable form. So much of IT operations is often tribal knowledge and accumulated experience.
As long as that knowledge lives in human brains — and isn’t in machine readable form — automation productivity will be hampered.
Third, the resulting IT organization will likely be structured differently than today, overweighted towards all aspects of process: process definition, process measurement, process improvement — just as you would find in non-IT environments that invest heavily in automation.
And any strategy that results in refashioning the org chart brings its own special challenges.
Creating That End State Goal
A world where virtually all aspects of IT service management are driven by application-centric, machine-readable policies. Change the policy, change the behavior.
The underlying ideas are simple and powerful. They stand in direct contrast to how we’ve historically done IT operations — which is precisely what makes them so attractive.
And their adoption seems to be inevitable.
Like this post? Why not subscribe via email?