Please note: All files marked with a copyright notice are subject to normal copyright restrictions. These files may, however, be downloaded for personal use. Electronically distributed texts may easily be corrupted, deliberately or by technical causes. When you base other works on such texts, double-check with a printed source if possible.

These three articles have previously been published in the Swedish daily Svenska Dagbladet in 1994:
Svensk version från Svenska Dagbladet 1994.

I. The electronic age - on the verge of total memory loss?
II. The hacker - archaeologist of the future?
III. Dare we trust the authenticity of electronic texts?

II. The hacker - archaeologist of the future?

by Karl-Erik Tallmo

An increasing amount of the world's literature is now being transferred to digital media and stored on computer networks and CD-ROM disks. We are heading for an Infotopia, envisioned by, for instance, vice-president Gore. Anyone equipped with a computer will have access to the knowledge of the world.

But electronic publishing is also a method to preserve old information that would otherwise be destroyed. However, computer technology develops very fast. Will anybody be able to read our files a hundred years from now?

A representative for a large multinational computer company said to me that if we today can interpret old papyrus scrolls, in the future we certainly will be able to read everything created today. And another representative from the same company claimed that computer files have the best standardization there is - zeroes and ones.

But that is not the point. Of course some future hacker with an inclination towards cyber-archaeology will gladly spend weeks reconstructing old computer files. But what about the much touted easy access for everybody? What about having unlimited knowledge of the world only a mouseclick away?

Monica Ertel at Apple Computer's library in Cupertino, California, does not trust the zeroes and ones: "We keep one of each computer ever manufactured by Apple up and running because sometimes someone needs to read that old Visicalc file or LisaDraw document. It's a little frightening. Paper is so universal and easy."

There is a lot being done today when it comes to standardization between different computer platforms, countries, etc. It will be possible to transfer files and port applications between Macintosh, Windows and UNIX environments. One of the latest fashionable words in the business is "digital convergence," which refers to a fusion of computers, telecommunication, cable TV and eventually - Hollywood. As for standards, the acronyms are innumerable and continue to increase: SGML, IGES, ODA, CDA, RTF, TeX, PREMO, PREGO... There is unfortunately a large amount of corporate egotism involved here. Each company wants to dominate the market with its own proprietary standard. This mechanism delays the emergence of a true, functioning standard.

SGML (Standard Generalized Markup Language) is an ISO-standard that enables one to describe the structure of an electronic document - words in italics, hypertext links, the indexing of a whole book or the markup of pictures as well as descriptions of their content.

"The chief objective of SGML has been to return control of the information representation to the user - not the product vendor," says the inventor of SGML, Charles Goldfarb at IBM. "Every time a word processor vendor introduces an `improved' update with a `slightly incompatible' file format, more information becomes inaccessible."

Steven Newcomb, chairman of SGML Special Interest Group in Tallahassee, Florida, claims that some companies use SGML internally, but refuse to provide their customers with SGML tools:

"Microsoft reportedly uses SGML internally for a variety of purposes, including the creation of its much-heralded Cinemania CD-ROM product. One can only wonder how Microsoft expects its customers to use Microsoft's non-SGML tools to create similar products," Newcomb says.

Nevertheless, Carl Fleischhauer, coordinator at the Library of Congress project American Memory, thinks that SGML is gaining ground: "I have the impression that SGML is coming into more widespread use in commercial publishing. And it may be that reforms will come from the bottom up rather than from the top down," Fleischhauer says.

SGML is a very promising standard, but is it not, after all, a solution valid mostly for present day technology? What about standardization over time?

"The adoption of standards will not help, because this problem goes beyond standards," says Don Norman, professor of Cognitive Science at the University of California at San Diego, also adviser for Apple Computer. "Standards work well within a technology whereas this problem occurs because the technology itself changes, bringing in new technologies and making obsolete the old."

The publishing houses and the computer companies will probably update their most profitable electronic books (i. e. encyclopedias and financial records) and continually transfer them to new digital formats. But a very large portion of electronically published works will probably be subject to oblivion.

"I think most of the projects will be recovery oriented, rather than use routine conversion as a maintenance operation, so a lot of less important information may not be moved to new formats," says Steve Cisler at Apple's project Library of Tomorrow.

Thereby, those titles will be practically lost. In the physical world you can stumble on an interesting forgotten book on some dusty shelf. But a forgotten electronic book in an unknown technical format will not attract serendipity.

If the only considerable method, from an economic viewpoint, is retroactive recovery, shouldn't it be our duty today to help the future with this process by inventing special "embedded reconstruction code sets" or some other means to tell future machines how our files ought to be read?

Maybe future libraries will be equipped with super-computers that can emulate or simulate various sorts of antique hardware and software. If so, this emulation process should start here and now with discussion, planning and guidelines for developers. It would really be a paradox if the high-tech culture of today would leave a smaller heritage behind than other historical periods.

But Steven Newcomb objects: "It is the preservation of information, and the ability to access that information in any conceivable way, which is important. The ability to reproduce exactly the application that was originally used to gain access to the information is much less important."

The paramount issue is, of course, to preserve archival material for the future in any form whatsoever. We are fortunate today that, for instance, some musical notation from the middle ages has survived and some Elizabethan plays have been preserved. We are not familiar, in detail, with the conventions and rules that guided performance in those days, but at least we have access to the raw material the musicians and actors used then. In olden days, a work that combined multiple art forms (for instance early opera) was impossible to preserve in terms of its execution. Today, we should be able to do that.

It is something of a paradox that short-lived PR material, presentations and so on can "afford" to use the latest technology, but anyone who produces literary or educational material meant to last some decades, has to limit himself/herself to some sort of lowest common denominator when it comes to technological complexity.

It would be very tragic if the cultural field of view in the future would get narrower and narrower. Today we can, perhaps with a little help, read Chaucer. Without any special deciphering devices we can read text that is 600 years old! But in the year 2600, will people be able to read (watch and listen to) anything electronically published that is older than 10 or 20 years?

International authorities concerned with nuclear safety deal with a related problem: how can we inform future generations about where nuclear waste repositories are situated and what they contain? This is not a question of hundreds of years, but of thousands of years, so far into the future that nobody even knows if any language spoken today will be understood then.

If we refocus this discussion from the language of man to the language of machines, this problem will occur much earlier. Only 50 years from now, there may not be a machine using any language machines use today.

Hopefully this is an exaggeration, since it would be a catastrophe if future generations were deprived of history. Pedagogy and school systems might change, a generation or two might get an unsatisfactory education, but if the sources for knowledge about our civilization perish, it will be impossible to recover.

"No one really seems to be aware and the people who are, just don't know what to do," says Mark Needleman at the library of University of California at Oakland. "People are worried that as publishers produce more material electronically they won't be interested in preserving it past a point where it has economic value."

"I recommend, even in this age of digital media, that we keep important material archived on paper," says professor Don Norman.

Steven Newcomb suggests that the owners of the information delivered today insist on keeping the information in their own vaults in a canonical, technology-neutral form: "That way, when new technologies come along, their investment is protected, and can be exploited all over again using the new technologies." But a production with a certain combination of text, sound and moving pictures is hard to keep in a technology-neutral form. Mark Needleman again:

"If you take the attitude that the culture and history of the 20th century is as much if not more in things like movies and TV than in print material we have a very serious problem."

From an economic standpoint perpetual conversion is a very expensive strategy. Such procedures are already in full progress, for instance to preserve nitrate film or books printed on acidic paper. All sorts of material is being microfilmed, retyped, scanned and photocopied. If it was possible to carry out one single conversion to a digital medium, with some kind of coded information regarding necessary hardware for future reading, millions of man-years of work could be spared.

Much trivia that the researcher of today can indulge in will probably not be at hand for future scholars. Today you can read loose notes made by Strindberg or contemplate over George Washington's laundry lists. In a few years everyone may be carrying around electronic notebooks. No scraps of paper there for posterity.

It will also be difficult for researchers to follow an author's work in progress, through various sets of drafts and outlines, since computerized writers leave very little trace.

Some debaters wish for extensive legislation to regulate legal deposit of copies of network conferences and electronic mail. Apart from this being a possible violation of personal integrity, such an undertaking would be almost unfeasible. E-mail is now exchanged within and between at least 140 countries and the number of transmitted messages increases by 15 percent each month.

Maybe we must accept that each era has its own form of amnesia. The calculations Archimedes wrote upon the sand are irretrievably upset. Certain historic paintings have edured the passage of time only as copperplate reproductions.

But a whole epoch must not be deleted - neither from computer memory nor from human memory. Especially not an epoch that has turned information into its cardinal virtue.

[English Homepage]
[Svensk bassida]

Go to [I. The electronic age - on the verge of total memory loss?]
Go to [III. Dare we trust the authenticity of electronic texts?]

Copyright Karl-Erik Tallmo 1993, 1994.