I'm on a theme here, looking for broader societal impacts that result from a frighteningly rapid shift to an information society.
I've talked about the new role of IT (information governance). I've talked about privacy and identity theft issues. I've talked about the new knowledge worker, and how we'll all use information differently.
I've even talked about how we're educating our kids.
And now, I'm going to dig into a frustrating issue around preserving our digital legacy.
It's A Natural Tendency
We, as a society, want to preserve our legacy for future generations.
Humankind historically has produced an enormous amount of content (usually written) for the consideration and consumption of future generations.
Sometimes our efforts don't turn out so well ...
Here is what the preserved object -- a 1957 Belvedere -- looked like going in ...
Nice, pristine car. Lots of fanfare.
Notice the nice labelling on the picture -- you can tell from the metadata what it's all about.
Plenty of context. Picture looks very legible as well -- no digital media back then!
and here's what it looked like coming out ...
Not quite as pretty a picture.
The car has been exposed to water seepage for probably close to 50 years. As a result, it's acquired a golden, deep-fried appearance that's actually rust.
Unless I told you what this was, you'd have a hard time figuring out what you were looking at, or what it meant.
Detroit's best didn't stand the test of time, that much is clear.
But the more interesting question is -- how will the digital picture hold up in the long term?
Our Legacy Is Now Our Information
What we do, who we are, what we stand for -- is now largely generated and captured as digital artifacts. When future generations want to understand us better, they'll be looking for digital information.
And I'm worried that they're going to be awfully disappointed.
Just like the proud owner of the car.
The Media Problem
I'm no expert, but I can't name a digital medium that's capable of reliably surviving 10, 20 or 50+ years.
I'm sure the technologists can argue this one in theory, but we won't really know if a CD, DVD, etc. will be physically readable until, well, a whole bunch of time has passed.
The only digital media that I think we have any long-term experience with is tape. And as most tape practitioners will tell you, it's best practices to periodically copy off and copy on to new medium.
So even that's an imperfect solution.
The only information storage medium that I'd trust for really long term storage is printouts to acid-free paper stored in an environmentally-controlled locker.
And then I'd also want LOCKS as well -- that's an acronym for Lots Of Copies Kept Separate.
Sometimes I think that the only reason that we know so much about ancient cultures is that they tended to use carved stone as media. Something to think about.
The Format Problem
As an interesting exercise, think back to the computer programs you were using 20 years ago. Would your word processing files be readable? Your spreadsheets? Your databases?
Although application vendors do a decent job of trying to be backwards compatible, how does this play out over the longer term? What about after 50 years, or 100?
Now, there are information representational standards that we can use. Flat ASCII. EBCDIC -- oops, maybe not. BCD -- Binary Coded Decimal -- oops again! CSV -- comma separated variables.
And there are some that will argue that XML and its brethren will give us the ability to tag and find information in an application neutral way for much longer periods of time.
But over really long periods of time, there's such things as semantical and language drift. Things mean different things in the language. Maybe you don't notice that in 20 years, but it's certainly noticeable over 200 years.
Regardless, I don't think there's anyone who'll bet that information digitally encoded in any format will be readable in 20, 50, 100 years or more.
And it's a hard bet to prove one way or another, because you have to wait a really long time to find out if you've won that bet or not.
Of course, you could print everything out on suitable physical media (like pictograms on stone tablets) but then again ...
Who's In Charge?
In the physical world, we have social institutions who kept copies of information, preserved them, and made them available to future generations. In the US, the Library of Congress is a good example. Lots of these around.
But their model (and their mission) was designed for a physical world, and not an information world.
Funny stories keep cropping up all the time -- like the one where NASA lost the original footage of the 1969 moonwalk, and eventually a producer fo Pink Floyd came to the rescue with a copy he'd snagged way back when.
Funny, but I bet it's not the last time we hear about something like that.
As we look at the events that are unfolding around us, they're captured digitally. But who's in charge of keeping them around?
The media companies? Maybe, maybe not.
The government? I don't think so.
Universities? Kind of an ad-hoc approach.
Or will the great archive of our generation end up being YouTube?
There's another, more subtle problem -- who gets to decide what gets kept, and what gets thrown out? Who gets to edit history for future generations? Kind of a sobering thought.
Storage resources (and money to pay for them) are finite ; information generation seems to be playing by a different set of rules. And if you're going to have to copy things around periodically to keep from losing them, that's expensive as well. At some point, someone's going to decide what gets kept, and what doesn't.
Another thorny question we haven't answered in the new information era.
So, What Is EMC Doing About This?
We're doing plenty, but I don't know if it's going to be enough.
We're encouraging the media industry to come up with bulk, persistent storage that's cost effective, doesn't consume energy, and stands a chance of being machine readable in 50 years. They build it, we'll buy it. So far, no comers.
We're participating in at least a dozen standards groups that are looking at different aspects of the persistent format problem.
God must have loved standards committees, because he created so many of them.
We've even established a charitable foundation to fund information preservation projects in communities and academia. Not a lot of uptake yet.
But it doesn't feel like enough.
A Societal Problem, Not A Technology Problem
Do we want our legacy to be available to future generations? If we do, and we commit resources to it, I'm sure workable answers will be found.
Or will our information legacy end up being as usable as the buried car?

Great post - Technology Review had a great, in depth article on this topic a few years ago:
http://www.technologyreview.com/Infotech/12975/?a=f
Posted by: Michael Toppa | September 27, 2007 at 07:06 AM
Thanks -- nice link and a good read!!
Posted by: Chuck Hollis | September 27, 2007 at 07:46 AM