Climate change has been hotly debated for well over a decade now.
The evidence is largely in, what remains is a discussion around root causes, and what -- if anything -- can be done.
Personally, the topic became particularly urgent when I saw a recent documentary ("Chasing Ice") which features stunning time-lapse photography of massive glaciers and ice sheets melting before your very eyes.
Seeing is believing. Our world is changing, faster than we thought.
In many ways, the most recent IDC Digital Universe study provides clear evidence that our world is also changing from a physical one to a digital one, and much faster than we previously thought. In addition to attempting to quantify and forecast how much information we collectively generate and consume, it highlights particular "big picture" topics of concern to many.
Many of us now live our lives primarily in a digital world. We work digitally, socialize digitally and spend our off hours digitally.
Our children are being raised and educated largely in a digital world. Businesses are in hot pursuit to re-invent themselves in this digital world, followed closely by public sector. "IRL" (in real life) has started to become the exception rather than the norm.
Our world is changing, faster than we thought.
But unlike climate change, our digital world is mostly good for all involved -- a major leap forward for us as a society: richer lives, better answers to social problems, and -- above all -- easier access to the information riches the world now has to offer. Like all transformations, it presents its own unique challenge.
In many regards, I see the IDC study as useful and important -- it measures the "digital footprint" of this unprecedented transformation. And, as in past years, there's much more insight to be gained other than just appreciating some really big numbers.
The Big Findings
This is the sixth year EMC has sponsored IDC in this study.
Each year, the previous forecasts prove to be far too conservative, and all the numbers are significantly revised upwards.
This year is no exception.
Last year, IDC forecasted that we'd generate 44x more information during this decade (2010 - 2020). This year, they've upped that forecast significantly, to 50x more information during the same time period.
The digital universe is expanding far more rapidly than we recently thought.
IDC now forecasts that we'll be generating 40 ZB (that's zettabytes, or trillions of gigabytes) by 2020. Last year's forecast for 2020 was 35 ZB. You'll also see charts later on that are denominated in EBs -- exabytes -- a thousand exabytes gets you a zettabyte. It's hard to visualize a number that large, and every year we struggle to do so.
If we use 64GB iPads as a unit of measure (something most people should be familiar with), that's something like 619 billion iPads -- a number that's still hard to comprehend.
No, Apple hasn't shipped that many -- yet.
Perhaps more accessible is when we divide that number by the human population, and arrive with an interesting result: by 2020, 1.7 MB of new information will be created for each and every human being on the planet -- every second of every day.
This all results in some rather dramatic charts. And we're just getting into the steep part.
Yes, much of that information today is considered transient and uninteresting -- and thus discarded or ignored. But IDC believes (as do we) that as big data concepts and business models grow, more of that data will be retained and analyzed to deliver value.
Macro Trends Behind The Number
One of the important findings from IDC is the expected rapid growth of machine-to-machine communications. IDC estimates the machine-to-machine represented only 11% of the total in 2005, but will grow to 41% by 2020. Take that increasing proportion and factor in the size of the expanding digital universe itself, and it's not hard to predict a massive interest in storing and analyzing autonomous sensor data.
Geography is playing an increasingly important role.
IDC believes that growth rates may have stabilized in developed economies, but it's early days in emerging markets. IDC expects that a whopping 62% of the growth will come from these newer locales; for example 22% from China alone -- a fifth of the global total.
We tend to think in terms of information we personally create: pictures, videos, etc. IDC estimates that the amount of information created about us (vs. what we directly create) will be the larger proportion by far. Privacy advocates, please take note.
A very interesting finding: the vast majority of information generated will have some sort of "corporate touch", e.g. our personal photos uploaded on Instagram, for example.
Even though it might be thought of as "our" information, some entity will be inevitably handling it on our behalf.
And as you consider that big, overlapping region between these two domains, you might get a bit uncomfortable that the whole notion of information ownership, rights, privileges, etc. which is murky at best.
I know I'm uncomfortable.
Of course, there's the inevitable economics discussion: while unit costs for information storage and management continue to drop predictably, aggregate spending continues to grow and grow.
This is not a new phenomenon, it's been going on for at least twenty years, maybe more -- and shows no sign of abating.
The Rise Of The Machines
Not surprisingly, IDC has taken note of the rapid rise of machine-to-machine communications.
We are fast approaching a world where everything around us is potentially intelligent (compute, storage, etc.) meaning that the "internet of things" is not only real and tangible, but will likely dwarf the amount of information we produce and consume as individuals.
But all of this machine-to-machine interaction will more likely happen at the edge vs. the core as is mostly the case today.
I think it's inevitable that we'll need new styles of computing paradigms in this new world, as I attempted to describe in "The Emergence Of Dispersed Clouds".
The Big Data Angle
I suppose that when talking about the impact of zettabytes (trillions of gigabytes), big data might be a good place to start.
IDC makes a very valid point: only a very small fraction of potentially useful information is even tagged -- the logical starting point for any big data discussion.
In 2012, IDC estimates that 23% of all data could have been potentially useful if tagged and analyzed. A scant 3% was tagged, an even skinner 0.5% was analyzed.
In addition to the spectacular growth rates of the digital universe itself, IDC estimates that the proportion that could be potentially valuable -- if tagged and analyzed -- will grow from 23% to about a third.
Once again, we have two multipliers: one, the proportion of data that could potentially have value, combined against the backdrop of the exploding volumes of the digital universe. In the mining business, it all starts with exploration, and I guess digital mining is no different.
Along those lines, you'll often hear people say that "data is the new oil". This particular finding reminds me of the early days of the petroleum industry -- we first found oil where it was bubbling out of the ground, easy to find and gather.
Compare that with what we do today.
One last interesting finding -- IDC's view of the data types (and their relative proportions) that are amenable to value extraction if tagged and potentially analyzed. Look at the overwhelming proportion of surveillance data as part of the mix.
Intriguing -- and just a bit disconcerting.
IDC dubs this difference between potential and current practice the "big data gap" -- perhaps consider a measure of unexploited value?
More IDC's views here.
The Security Framework
So, the inevitable question is -- how much of all of this digital wealth needs to be protected (e.g. secured) in some fashion?
In addition to the overall growth of the digital universe, IDC believes that the proportion that will need some sort of protection will increase from less than a third in 2010 to more than 40% in 2020.
Going farther, IDC asserts that only half of the information that needs protection is being protected (even minimally!) at all.
But there's more -- IDC came up with an interesting five-part framework for breaking down the problem from an analysis perspective.
- Privacy only — an email address on a YouTube upload
- Compliance driven — emails that might be discoverable in litigation or subject to retention rules
- Custodial — account information, a breach of which could lead to or aid in identity theft
- Confidential — information the originator wants to protect, such as trade secrets, customer lists, confidential memos, etc.
- Lockdown — information requiring the highest security, such as financial transactions, personnel files, medical records, military intelligence, etc.
Interesting, but as my colleague Eric Baize pointed out, there's a problem with the denominator in many cases. IDC is measuring capacity; but in the real world even a comparatively tiny amount of information (e.g. your social security number in the US) can represent a breach.
To the extent that we think of large content depots and capacity, I think it's somewhat useful construct. More on IDC's views here.
The Enterprise Cloud Angle
All of these big numbers get particularly scary when you translate them into what it might mean for an enterprise IT function.
Here's some food for thought from our friends at IDC:
- The average number of servers (presumably virtual) under direct enterprise control is expected to increase 10x before the end of the decade.
- Information directly managed by enterprises is expected to grow a staggering 14x by 2020.
- The number of IT professionals is expected to grow by only a factor of 1.5x by 2020.
If that isn't a rationale to invest in a fundamentally different IT model (e.g. ITaas), I don't know what is. Certainly, the scale being predicted will almost inevitably force a change in approach.
More pragmatically, the outlook is good for those of us in the IT infrastructure business:
The investment in spending on IT hardware, software, services, telecommunications and staff that could be considered the “infrastructure” of the digital universe and telecommunications will grow by 40% between 2012 and 2020. Investment in targeted areas like storage management, security, Big Data, and cloud computing will grow considerably faster.
All good ...
Controversial Findings About Cloud Storage?
One of the eye-openers in the IDC material is their estimate about how much data will be stored in the cloud, and what type. You may choose to argue, but it's certainly an interesting perspective.
For starters, they estimate that by 2020 only a paltry 13% of all data will be stored in a cloud of some sort. A larger proportion will be "touched" by the cloud (e.g. processed or transmitted) but not stored there.
That's not exactly the consensus of the current trade press, so it bears some discussion -- although 13% is up significantly from the very small percentage stored there today, and of course 13% of 40ZB is itself a non-trivial number.
Even more provoking are their estimates around what kind of data will be stored there: primarily surveillance video and entertainment media (e.g. movies). Crack open the "embedded and medical" category, and I bet you'll find a preponderance of stuff from the radiology department ...
Look closely -- you'll see almost a complete absence of the normal enterprise-type data we all know and love so dearly. Maybe the slice of that pie is so small we can't see it?
Now, I'm not saying that I 100% agree with this assessment, but it does have a certain logic. Video files can be enormous, especially HD and emerging 3D. Video tends to need to be kept around at low cost, but still accessible on demand -- hence its applicability to cloud-type storage environments.
It'd be interesting to see what a capacity map of AWS looks like along these lines :)
A Really, Really Big Picture?
If you're interested in these macro trends, I'd encourage you to view the entire IDC report. It's good reading. I also recorded a short video with the key findings here.
The hard part with these IDC studies can be getting your head wrapped around some really big numbers, but perhaps that's missing the real point.
We're rapidly becoming a completely digital society -- faster than anyone might have thought. In one sense, IDC is just gathering the forensic evidence and extrapolating what we already strongly suspect.
I believe the concept of a "big data gap" is a valid one; we have barely begun to exploit the value of the data that we're so prodigiously generating. Ditto with the "security gap" -- we're generating far more information, but are we adequately protecting it?
And, of course, it doesn't hurt to be in the storage business, does it?
:)

Comments