Today IDC and EMC released their annual study on the size, shape and structure of the "digital universe": the total amount of information we're collectively generating, storing and using.
Titled "A Digital Universe Decade: Are You Ready?", it goes beyond the usual really-big-number type of forecasts to provoke serious discussion on a number of topics.
And if you're a big-picture type of person, you'll want to seriously contemplate some of these findings ...
To Begin With ...
This is the 4th annual study by IDC and EMC. Each year, the numbers get bigger. Each year, the implications get more interesting. Each year, more people get intrigued by what's going on here.
There's a nice deck that was created as part of this. I'll put the thumbnails here along with my thoughts, and offer you a link to both the PPT and the associated IDC commentary at the end of this post.
Might be useful to use a few of these slides the next time you're going for budget approval :-)
The first slide reprises a previous theme -- information is growing and growing.
The "ticker" is a useful device to put in front of people during a presentation -- after a few seconds of watching the numbers spin frantically, people usually get the message.
What I found interesting is that -- even in a year where the financial economy struggled -- the information economy never really slowed down.
In a year of declining GDPs, we generated an additional 62% of information to add to our existing treasure trove.
So, if that's what happens in a bad economic year, what happens in a good economic year?
And people wonder why I decided to work for a storage company way back when :-)
Welcome to the "zettabyte age". We didn't need that word last year, but we need it this year.
A zettabyte is a trillion gigabytes.
The press materials try to translate this number into something people can easily comprehend, but it's a hard task.
One comparison was "imagine 75 billion 16 GB iPads". I don't know about you, but that's hard to imagine as well.
Note the "44x" growth factor between now and 2020. For those of you who are thinking "well, that's really far in the future", maybe it is, and maybe it isn't.
I can remember clearly what I was doing in 2000 and 2001. It's now 2010. Not hard to see how quickly 2020 will be upon us.
So, you might be wondering, what's the "cloud" angle on all of this, in the sense of external IT services?
IDC predicts that more than a third of all information that's created, copied and used will either be stored in the cloud, or pass through the cloud in some form.
When I share this finding with many people, most see it as conservative. I guess it depends on your definition of cloud, doesn't?
But the conclusion is clear -- there's going to be a lot of information that doesn't live in traditional locations going forward.
We'll come back to the cloud angle in a bit -- because one could make a case that several other forces may cause this to happen sooner than later.
So, let's talk about how this lands in IT land. And this is where it gets interesting.
I found this slide perhaps the most interesting one.
In the middle, we've got the blue line -- showing 44x growth. Obviously, it won't be as linear as show here, but you get the idea.
Now, look at the green line. That's the growth in the number of "information containers" -- files, objects, messages, etc.
That's predicted to grow 67x. All sorts of implications result from this.
First, we're starting to generate shorter "information packets" -- think tweets, smartgrid data, GPS coordinates, RFID messages and the like. We'll still undoubtedly have plenty of the big stuff around: video and other digital signals -- but the 67x growth in information objects points to the need for newer ways of organizing, managing, protecting, finding, sharing -- all these information objects.
These numbers will be measured in the quintillions (billions, trillions, quadrillions, etc.). Now, consider how we're doing this today: file systems, databases, etc. See the challenge?
Now, if you really want to have fun, go look at the bottom red line. That's the forecast that we'll only have 1.4x as many IT professionals available over this period of time. The implication of the study is that this will be supply constrained, rather than demand constrained.
If you're a career IT professional, you'll either see this as a terrible crisis, or a wonderful opportunity :-)
In my mind, this imbalance will likely create strong demand for specialized external service providers (think "cloud") that will perform information management services on behalf of other organizations.
Which brings us to our next topic -- ownership and responsibility for all this information. And the overlaps are getting larger and more complex.
This chart tries to illustrate the problem. Years ago, we used to think in terms of "user generated" information and "enterprise generated" information as distinct entities with very little overlap.
My, how the picture has changed.
Enterprise-generated information is becoming much less important. User-generated information is starting to dominate.
Very often, that information is handed over to an enterprise (e.g. banks, hospitals, etc.) to store and manage on the individuals' behalf -- and with very clear guidelines around accountability, ownership, etc.
But look at the center overlap -- the aqua blue. That's the growing area where it's not really clear who owns what -- it it the user, or the enterprise? And if you're increasingly uncomfortable with Google or Facebook handling your personal information, I'd offer the "overlap" is where we're going to see growing controversy.
As an example of this "overlap content", consider the increasing use of social media by many organization (including, of course, EMC -- and this blog you're reading!)
Who owns this content -- me, or my company? Who's responsible for this content -- me, or my company?
If I should be so lucky to come up with some really cool intellectual property on this blog, is it mine, or my company's?
I don't think that *anyone* has clear answers to any of this -- but social media has unleashed an "overlap content" information beast that puts all sorts of interesting questions on the table.
And that's just the tip of the iceberg -- isn't it?
Which brings us to yet another sobering conclusion: the proportion of information that will require some form of "security' is increasing.
Not only 67x times more information, but close to half of it will come with some sort of responsibility, up from approximately 30% today.
One interpretation is information is increasing in value, hence that value will need to be guarded in some form.
This also can be seen as a forcing function on how we think about this topic -- newer forms of information governance, risk assessment, measuring compliance, etc. -- not to mention the need for new technological approaches.
And if you're involved in some aspect of information security today, your prospects seem bright in terms of demand for your skills :-)
In addition to a growing "security gap", IDC also forecasts that the "protection gap" will also grow.
If information is valuable, you don't want to lose it -- just like you don't want to lose money.
If you take the forecasts at face value, roughly half the digital universe will be inadequately protected against loss.
Another reason to start thinking in terms of information governance, classifying information, etc. -- that is, unless you're happy with the prospect of having a growing number of really bad days -- or spending an inordinate amount of money.
And now for something stunningly obvious, once you think about it ...
The blue curve estimates a rough cost-per-gigabyte number over the next ten years.
Good news, right? -- the cost of storing information is coming down.
Until you consider the amount of information being created.
Make things cheaper -- people tend to use more of it. If you build more highways, you'll end up with more traffic.
In fact, one could argue that the rapidly declining costs of storage, network, compute, etc. is fundamentally enabling the information explosion.
A sobering thought for those of you obsessed with driving costs out of IT. When IT is cheaper, people will end up using more of it, often with positive elasticity.
Final Thoughts
There's a lot to consider and debate here. Whether you agree or disagree with the study - -its purpose, methodology, findings, conclusions, etc -- there is no arguing that something is happening, and it's happening very quickly.
As promised, here's the PPT slides, and a PDF with commentary from the IDC researchers.
Have fun :-)

Great post, Chuck!
This study really opens our eyes to the information storage and management issues we will face in the future.
Thanks,
Gil
Posted by: Gil | May 05, 2010 at 03:55 AM
Great blog, Chuck. Was there anything in the study about the amount of data that was unintentionally lost in 2009?
Posted by: Bob | May 05, 2010 at 11:30 AM
Hi Bob -- insightful question ...
The answer is "no, there wasn't" -- but I think that would be a great area to investigate for next year's survey.
Thanks again!
-- Chuck
Posted by: Chuck Hollis | May 05, 2010 at 12:34 PM
An interesting factor, and one that should provide opportunities for product differentiation in the consumer space, is that it is becoming more difficult for individuals to manage their information as well. Iomega and others sell lots of storage, but I haven't yet seen a product that helps manage all that data to maximize the accessibility and utility of it. Users need tools to do things like help manage the right level of protection and help them find and use the information they need when they need it.
An example of this in my house is digital photos. I have a camera, my wife has a different one, and together we have four computers. Just making sure that I have offloaded the cameras into some semblance of a useful archive itself is a nuisance.
The opportunity I see in this is ILM for consumers. Instead of just providing storage space, people need layered apps that help organize and protect data. You can find examples of data specific tools like Adobe Lightroom, which does some of this for photos, but there needs to be tools that are both focused in their role while at the same time more general in their applicability.
Then, once the user's aggregation of data is managed as an accessible archive, we open up additional opportunities to add value. Such as my Iomega NAS device providing a web or TV based image and video browser, that provides simplified paths to upload content to sites like Facebook, or to Kodak for printing and mailing to grandparents.
In short I think the growth of data is extending the need for something like ILM to consumers, and that provides opportunities for us.
Posted by: Tim | May 05, 2010 at 01:22 PM