PRESERVING DIGITAL INFORMATION DRAFT REPORT OF THE TASK FORCE ON ARCHIVING DIGITAL INFORMATION COMMISSIONED BY THE COMMISSION ON PRESERVATION AND ACCESS AND THE RESEARCH LIBRARIES GROUP VERSION 1.0 AUGUST 23, 1995 EXECUTIVE SUMMARY In December 1994, the Commission on Preservation and Access and the Research Libraries Group created the Task Force on Digital Archiving. The purpose of the Task Force is to investigate the means of ensuring "continued access indefinitely into the future of records stored in digital electronic form." Composed of individuals drawn from industry, museums, archives and libraries, publishers, scholarly societies and government, the Task Force was charged specifically to: * "Frame the key problems (organizational, technological, legal, economic etc.) that need to be resolved for technology refreshing to be considered an acceptable approach to ensuring continuing access to electronic digital records indefinitely into the future. * "Define the critical issues that inhibit resolution of each identified problem. * "For each issue, recommend actions to remove the issue from the list. * "Consider alternatives to technology refreshing. * "Make other generic recommendations as appropriate" (see Appendix A for the full charge). The document before you is a work in progress resulting from the initial deliberations of the Task Force. The Task Force invites you to contribute to its final report by commenting on this work in progress (see below). In taking up its charge, the Task Force on Archiving of Digital Information focused on materials already in digital form and recognized the need to protect against both media deterioration and technological obsolescence. It started from the premise that migration is a broader and richer concept than "refreshing" for identifying the range of options for digital preservation. Migration is a set of organized tasks designed to achieve the periodic transfer of digital materials from one hardware/software configuration to another, or from one generation of computer technology to a subsequent generation. The purpose of migration is to retain the ability to display, retrieve, manipulate and use digital information in the face of constantly changing technology. The Task Force regards migration as an essential function of digital archives. The Task Force envisions the development of a national system of digital archives, which it defines as repositories of digital information that are collectively responsible for the long-term accessibility of the nation's social, economic, cultural and intellectual heritage instantiated in digital form. Digital archives are distinct from digital libraries in the sense that digital libraries are repositories that collect and provide access to digital information, but may or may not provide for the long-term storage and access of that information. The Task Force has deliberately taken a functional approach in these critical definitions and in its general treatment of digital preservation so as not to prejudge the question of institutional structure. The Task Force sees repositories of digital information as held together in a national archival system primarily through the operation of two essential mechanisms. First, repositories claiming to serve an archival function must be able to prove that they are who they say they are by meeting or exceeding the standards and criteria of an independently-administered program for archival certification. Second, certified archives will have available to them a critical fail-safe mechanism. Such a mechanism, supported by organizational will, economic means and legal right, would enable a certified archival repository to exercise an aggressive rescue function to save culturally significant digital information. Without the operation of a formal certification program and a fail-safe mechanism, preservation of the nation's cultural heritage in digital form will likely be overly dependent on marketplace forces, which may value information for too short a period and without applying broader, public interest criteria. In order to lay out the framework for digital preservation that it has envisioned, the Task Force provides an analysis of the digital landscape, including the aspects of digital information and the stakeholder interests that affect preservation. The Task Force then introduces the principle that responsibility for archiving rests fundamentally with the creator or owner of the information and that digital archives may invoke the fail-safe mechanism to protect culturally valuable information. The report explores in detail the roles and responsibilities associated with the critical functions of managing the operating environment of digital archives, strategies for migration of digital information, intellectual property, and costs and financial matters. The report concludes with a set of recommendations for the Commission on Preservation and Access and the Research Libraries Group to take the following actions, either separately or together and in concert with other individuals or organizations as appropriate: 1. Solicit proposals from interested archives around the country and provide coordinating services for selected participants in a cooperative project designed to place information objects from the early digital age into trust for use by future generations. 2. Secure funding and sponsor an open competition for proposals to advance digital archives, particularly with respect to removing legal and economic barriers. 3. Foster practical experiments or demonstration projects in the archival application of technologies and services, such as transaction systems for property rights and authentication mechanisms, which promise to facilitate the preservation of the cultural record in digital form. 4. Coordinate the appropriate organizations and individuals in the development of standards, criteria and mechanisms for identifying and certifying repositories of digital information as archives. 5. Engage actively in national policy efforts to design and develop the national information infrastructure to ensure that longevity of information is an explicit goal. 6. Sponsor the development of a white paper on the foundations needed in intellectual property law to support the aggressive rescue of endangered digital information through an effective fail-safe mechanism. 7. Engage representatives of professional societies from a variety of disciplines in a series of forums designed to elicit creative thinking about the means of creating and financing digital archives of specific bodies of information. 8. Commission follow-on case studies to identify current best practices and to benchmark costs in one or more of the following areas of archiving culturally valuable digital information: (a) storage of massive quantities of information; (b) use of metadata for digital preservation; and (c) migration paths Given the analysis in this report, its findings and recommendations, we expect that the best use of the work of the Task Force will ultimately be to heighten awareness of the seriousness of the digital preservation problem, its scope and complexity -- and its manageability. There are numerous challenges before us, but also enormous opportunities to contribute to the development of a national infrastructure that positively supports the long-term preservation of digital information. We believe that the dialogue that grows from the circulation of this draft will sharpen its content and help identify additional, practical and affordable ways to contribute to the information infrastructure. To provide a means for you to participate in the dialogue, The Task Force listserv (archtf-l@yalevm.cis.yale.edu) is now open. You may subscribe by sending the following message to listserv@yalevm.cis.yale.edu: subscribe archtf-l. Once subscribed, you can submit your comments to the list. Otherwise, you may address your comments to either one of us. If you have comments, please communicate them to us by October 31, 1995. We expect to reconvene the Task Force shortly thereafter to draft the final report. John Garrett (co-chair) CNRI jgarrett@cnri.reston.va.us Donald Waters (co-chair) Yale University donald.waters@yale.edu TASK FORCE ON ARCHIVING OF DIGITAL INFORMATION MEMBERS Pamela Q.C. Andre Director National Agricultural Library Howard Besser Visiting Associate Professor School of Information and Library Studies University of Michigan Nancy Elkington Assistant Director for Preservation Services Research Libraries Group John Garrett (co-chair) Director, Information Resources Corporation for National Research Initiatives Henry Gladney Research Staff Member IBM Almaden Research Center Margaret Hedstrom Associate Professor School of Information and Library Studies University of Michigan Peter B. Hirtle Policy and IRM Services National Archives at College Park Karen Hunter Vice President and Assistant to the Chairman Elsevier Science Robert Kelly Director, Journal Information Systems American Physical Society Diane Kresh Director for Preservation Library of Congress Michael E. Lesk Manager, Computer Science Research Division Bell Communications Research Mary Berghaus Levering Associate Register for National Copyright Programs U.S. Copyright Office Library of Congress Wendy Lougee Assistant Director, Digital Library Initiatives University of Michigan Library Clifford Lynch Director, Library Automation Office of the President University of California Carol Mandel Deputy University Librarian Columbia University Stephen P. Mooney Copyright Clearance Center, Inc. Ann Okerson Director, Office of Scientific and Academic Publishing Association of Research Libraries James G. Neal Director of University Libraries Indiana University Susan Rosenblatt Deputy University Librarian University of California, Berkeley Donald Waters (co-chair) Associate University Librarian Yale University Stuart Weibel Senior Research Scientist OCLC, Inc. Preserving Digital Information TABLE OF CONTENTS Executive Summary Task Force on Archiving of Digital Information: Members Introduction The Fragility of Cultural Memory in a Digital Age The Limits of Digital Technology The Challenge of Archiving Digital Information Technological Obsolescence Migration of Digital Information The Need for Deep Infrastructure Conceptual Framework Plan of Work Information Objects in the Digital Landscape Kinds and Attributes of Digital Information Stakeholder Interests Archival Roles and Responsibilities General Principles The operating environment of digital archives Migration Strategies Intellectual Property Managing Costs and Finances Summary and Recommendations Pilot Projects Support Structures Best Practices References Appendix 1 Appendix 2 INTRODUCTION Today we can only imagine the content of and audience reaction to the lost plays of Aeschylus. We do not know how Mozart sounded when performing his own music. We can have no direct experience of David Garrick on stage. Nor can we fully appreciate the power of Patrick Henry's oratory. Will future generations be able to encounter a Mikhail Baryshnikov ballet, a Barbara Jordan speech, a Walter Cronkite newscast, or an Ella Fitzgerald scat on an Ellington tune? We may think that libraries and archives have stemmed the tide of cultural memory loss. We rely on them to track our genealogies, to understand what science has discovered, to appreciate the stories people told a hundred years ago, and to know how we educated our children during the Depression. Even seemingly trivial, ephemeral, or innocuous information that libraries maintain has unanticipated uses. For example, early in this century railways provided the primary means of transporting oil. The competition for this lucrative business led to rebates, kickbacks and other dubious business practices. The United States countered such practices by enacting severe antitrust laws. Germany, however, prohibited secret rates by requiring all oil carriage firms to publish their tariffs in the railway trade press. During World War II, nobody in Germany thought to repeal the law. Every week, an American agent went to a Swiss library, read the relevant newspaper, and worked out how much oil the Nazis were transporting and where. Society, of course, has a vital interest in preserving materials that document issues, concerns, ideas, discourse and events. We may never know with certitude how many children Thomas Jefferson fathered or exactly how Hitler died. However, to understand the evils of slavery and counter assertions that the Holocaust never happened, we need to ensure that documents and other raw materials, as well as accumulated works about our history survive so that future generations can reflect on and learn from them. The Soviet Union stands as an example of a society in which history was routinely rewritten and pages of encyclopedias were cut out and replaced according to current political whim. The ability of a culture to survive into the future depends on the richness and acuity of its members' sense of history. The Fragility of Cultural Memory in a Digital Age But our ability and commitment as a society to preserve our cultural memory is far from secure. Custodians of the cultural record have always had to manage the inherent conflict between letting people use manuscripts, books, recordings or videos, and being sure that they are preserved for future use. For works printed on acidic wood pulp paper, as most books have been since 1850, we measure the remaining lifetime of those materials in decades, not centuries. And what of the information we are now creating and storing using digital technology? Forty percent of U.S. workers now use a computer. Virtually all printing and a rapidly increasing amount of writing is accomplished with computers. Professional sound recording is digital, and digital video is on the verge of moving from experimental to practical applications. As a means of recording and providing access to our cultural memory, digital technology has numerous advantages and may help relieve the traditional conflict between preservation and access. For materials stored digitally, users operate on exact images of the original works stored in their local computers. Separating usage from the original in this way, digital technology affords multiple, simultaneous uses from a single original in ways that are simply not possible for materials stored in any other form. Digital technology also yields additional, effective means of access. In full text documents, a reader can retrieve needed information by searching for words, combinations of words, phrases or ideas. Readers can also manipulate the display of digital materials by choosing whether to view digital materials on a screen or to print them. The Limits of Digital Technology Digital technology, however, poses new threats and problems as well as new opportunities. Its functionality comes with complexity. Anyone with a compass (or a clear night to view the position of the stars in relation to true north) could theoretically set up or repair a sundial. A digital watch is more useful and accurate for telling time than a sundial, but few people can repair it or even understand how it works. Reading and understanding information in digital form requires equipment and software, which is changing constantly and may not be available within a decade of its introduction. Who today has a punched card reader, a Dectape drive, or a working copy of FORTRAN II? Even newer technology such as 9-track tape is rapidly becoming obsolete. We cannot save the machines if there are no spare parts available, and we cannot save the software if no one is left who knows how to use it. Rapid changes in the means of recording information, in the formats for storage, and in the software for use threaten to render the life of information in the digital age as, to borrow a phrase from Hobbes, "nasty, brutish and short." Numerous examples illustrate the danger of losing the significant cultural memories that information in digital form may represent. The 1960 Census, for example, was written on tapes for the Univac I, a machine that has been obsolete for more than two decades. Its obsolescence caused much of the census data to be lost. In 1964, the first electronic mail message was sent from either the Massachusetts Institute of Technology, Carnegie Mellon Institute or Cambridge University. The message does not survive, however, and so there is no documentary record to determine which group sent the first message. Satellite observations of Brazil in the 1970s, critical for establishing a time-line of changes in the Amazon basin, are lost on the now obsolete tapes to which they were written. Today, the World Wide Web serves as a kind of bookstore that every author can effectively enter and choose a shelf for his or her creation. Even in its present nascent form, the ease of use of the Web as a publication and distribution channel has unleashed the production of information in digital form. If we are effectively to preserve such information in the future, we need to understand the full costs of doing so and we need to dispose ourselves technically, legally, economically and organizationally to ensure that this record of discourse endures to the benefit of future generations. Failure to look for the means and methods of such preservation will certainly exact a stiff, long-term cultural penalty. The Task Force on Archiving of Digital Information here reports on its search for some of those means and methods. THE CHALLENGE OF ARCHIVING DIGITAL INFORMATION The question of preserving or archiving digital information is not a new one and has been explored at a variety of levels over the last three decades. Archivists responsible for governmental and corporate records have been acutely aware of the difficulties entailed in trying to ensure that digital information survives for future generations. Far more than their library colleagues, who have continued to collect and organize published materials primarily in paper form, archivists have observed the materials for which they are responsible shift rapidly from paper objects produced on typewriters and other analog devices to include files created in word processor, spreadsheet and many other digital forms (see, e.g. Hedstrom 1991: 343-44; National Academy of Public Administration 1989). Technological Obsolescence Early attention to the difficulties in preserving digital information focused on the longevity of the physical media on which the information is stored. Even under the best storage conditions, however, digital media can be fragile and have limited shelf life. Moreover, new devices, processes and software are replacing the products and methods used to record, store, and retrieve digital information on breathtaking cycles of 2- to 5- years. Given such rates of technological change, even the most fragile media may well outlive the continued availability of readers for those media. Efforts to preserve physical media thus provide only a short-term, partial solution to the general problem of preserving digital information. Indeed, technological obsolescence represents a far greater threat to information in digital form than the inherent physical fragility of many digital media. In the face of rapid technological obsolescence and to overcome the problem of media fragility, archivists have adopted the technique of "refreshing" digital information by copying it onto new media (see Bearman 1989:21-22; The University of the State of New York, et al. 1988, Lesk 1990: 5). Copying from medium to medium, however, also suffers limitations as a means of digital preservation. Refreshing digital information by copying will work as an effective preservation technique only as long as the information is encoded in a format that is independent of the particular hardware and software needed to use it and as long as there exists some kind of software to manipulate the format in current use. Otherwise, copying depends either on the compatibility of present and past versions of software and generations of hardware or the ability of competing hardware and software product lines to interoperate. In respect of these factors -- backward compatibility and interoperability -- the rate of technological change exacts a serious toll on efforts to ensure the longevity of digital information. Digital information today is produced in highly varying degrees of dependence on particular hardware and software. Moreover, it is costly and difficult for vendors to assure that their products are either "backwardly compatible" with previous versions or that they can interoperate with competing products. Refreshing thus cannot serve as a general solution for preserving digital information and this conclusion has prompted discussion of other kinds of solutions. Jeff Rothenberg, for example, has recently suggested that there may be sufficient demand for entrepreneurs to create and archive emulators of software and operating systems that would allow the contents of digital information to be carried forward and used in its original format (Rothenberg 1995; see also Creque 1995). Migration of Digital Information Refreshing digital information by copying it from medium to medium and the possibility of maintaining a complex set of emulators describe two distinct points on a continuum of approaches to preserving digital information. However, neither refreshing nor emulation sufficiently describes the full range of options needed and available for digital preservation. Instead, a better and more general concept to describe these options is migration. Migration is a set of organized tasks designed to achieve the periodic transfer of digital materials from one hardware/software configuration to another, or from one generation of computer technology to a subsequent generation. The purpose of migration is to retain the ability to display, retrieve, manipulate and use digital information in the face of constantly changing technology. Migration includes refreshing as a means of digital preservation but differs from it in the sense that it is not always possible to make an exact digital copy or replica of a database or other information object as hardware and software change and still maintain the compatibility of the object with the new generation of technology. Even for information that is encoded in a contemporary standard form (e.g., a bibliographic database in USMARC or a corporate financial database in SQL relational tables), forward migration of the information to a new standard or application program is, as anyone knows who has witnessed or participated in such a process, time-consuming, costly and much more complex than simple refreshing (see Michaelson and Rothenberg 1992). The costs and complexities of moving digital information forward into the future raise our greatest fear about the life of information in the digital future: namely, that owners or custodians who can no longer bear the expense and difficulty will deliberately or inadvertently, through a simple failure to act, destroy the objects without regard for future use. Countless anecdotes about the loss of satellite imagery or census data as a result of error or neglect feed our general anxiety about the future of the cultural record we are accumulating in digital form. And uncertainty and lack of confidence about our will and ability to carry digital information forward into the future exert a major inhibiting force in our disposition to fully exploit the digital medium to generate, publish and disseminate information. But how well does the evidence really support our fears, anxieties and inhibitions regarding digital information? The Need for Deep Infrastructure Even after more some thirty years of growth, the digital world of information technology and communication is still so young and so immature in relation to the larger information universe that our experiences of expense, intricacy and error in digital preservation surely reflect, at least in part, our inexperience with this emerging world as we operate in its early stages. Viewed developmentally, the problem of preserving digital information for the future is not only, or even primarily, a problem of fine tuning a narrow set of technical variables. It is not a clearly defined problem like preserving the embrittled books that are self-destructing from the acid in the paper on which they were printed. Rather, it is a grander problem of organizing ourselves over time and as a society to maneuver effectively in a digital landscape. It is a problem of building -- almost from scratch -- the various systematic supports, or deep infrastructure, that will enable us to tame anxieties and move our cultural records naturally and confidently into the future. For digital preservation, the organizational effort -- the process of building deep infrastructure -- necessarily involves multiple, interrelated factors, many of which are either unknown or poorly defined. One of the biggest unknowns is the full impact on traditional information handling functions of distributed computing over electronic networks. The effort to meet the cultural imperative of digital preservation thus requires a complex iteration and reiteration of exploration, development and solution as the relevant factors and their interrelationships emerge and become clearer and more tractable. And the first task in the effort is not to posit answers, but to frame questions and issues in such a way as to engage the many parties already working in various ways with digital information so that they can help us understand the relevant issues and, within the context of their work, to help us identify, define and incorporate solutions that contribute to the larger, common goal of preserving our cultural heritage. Conceptual Framework The Commission on Preservation and Access and the Research Libraries Group (RLG) have joined together in charging the Task Force on Archiving of Digital Information to take this first essential step toward a national system of digital preservation. They have asked the Task Force to "consult broadly among librarians, archivists, curators, technologists, relevant government and private sector organizations, and other interested parties" in an effort to: * "frame the key problems (organizational, technological, legal, economic etc.)" associated with digital preservation, * "define the critical issues that inhibit resolution of each identified problem," and * "recommend actions to remove the issue from the list." The Task Force charge, however, itself frames the anticipated solutions in terms of the concept of "refreshing" (see Appendix 1). In taking up this charge, the Task Force on Archiving of Digital Information starts from the premise, argued above, that migration is a broader and richer concept than "refreshing" for identifying the range of options for digital preservation. For purposes of this report, the Task Force casts migration, in the sense defined in the argument above, as an essential function of digital archives. Moreover, it envisions the development of a national system of digital archives. The Task Force defines digital archives strictly in functional terms as repositories of digital information that are collectively responsible for storing and ensuring, through the exercise of various migration strategies, the long-term accessibility of the nation's social, economic, cultural and intellectual heritage instantiated in digital form. Digital archives are distinct from digital libraries in the sense that digital libraries are repositories that collect and provide access to digital information, but may or may not provide for the long-term storage and access of that information. Digital libraries thus may or may not be, in functional terms, digital archives and, in fact, much of the recent work on digital libraries is notably silent on the archival issues of ensuring long-term storage and access (for some exceptions, see Ackerman and Fielding 1995; Conway 1994; Graham 1995). Conversely, digital archives necessarily embrace digital library functions to the extent that they must collect, store and provide access to digital information. Many of the functional requirements for digital archives defined in this report thus overlap those for digital libraries. The Task Force has deliberately taken a functional approach in these critical definitions and in its general treatment of digital preservation so as not to prejudge the question of institutional structure. Many traditional libraries, archives and museums, as institutions, have taken and may well continue to assume digital library and archival functions. However, the Task Force recognizes that, as the digital environment emerges and its requirements become clearer, traditional institutions may need to change in various structurally significant ways and new kinds of institutional structures may emerge to perform all or parts of key archival functions. For this report, the Task Force has thus ruled as out of scope answers to such structural questions as: What existing institutions should assume archival responsibilities for various kinds of digital information? How should an institution select digital information for archival storage? What should the organizational hierarchy of a digital archive look like? The Task Force sees repositories of digital information as held together in a national archival system primarily through the operation of two essential mechanisms. First, to ensure that no valued digital information is lost to future generations, repositories claiming to serve an archival function must be able to prove that they are who they say they are by meeting or exceeding the standards and criteria of an independently-administered program for archival certification. Second, certified archives will have available to them a critical fail-safe mechanism. Such a mechanism, supported by organizational will, economic means and legal right, would enable a certified archival repository to exercise an aggressive rescue function to save digital information that it judges to be culturally significant and which is endangered in its current repository. The current repository may be a digital library, another digital archive, or some other individual, organizational, public or private source of digital information. Without the operation of a formal certification program and a fail-safe mechanism, preservation of the nation's cultural heritage in digital form will likely be overly dependent on marketplace forces, which may value information for too short a period and without applying broader, public interest criteria. Plan of Work In order to lay out the framework for digital preservation that it has envisioned, the Task Force begins in what follows with an analysis of the digital landscape, including the aspects of digital information and the stakeholder interests that affect preservation. The Task Force then introduces the principles that responsibility for archiving rests fundamentally with the creator or owner of the information and that digital archives may invoke the fail-safe mechanism to protect culturally valuable information. The report explores in detail the roles and responsibilities associated with the critical functions of managing: the operating environment of digital archives; strategies for migration of digital information; intellectual property; and costs and financial matters. The report concludes with a set of recommendations. Because the digital environment is in great flux, the recommendations for next steps that the Task Force has developed from its analyses and discussions are tentative and modest. However, we believe that the dialogue that grows from the circulation of this draft will sharpen the recommendations. In a larger context, we believe that both the dialogue and subsequent actions will help ensure that the creation of digital repositories -- libraries and archives -- on an information superhighway remains a credible item on the national agenda. Digital libraries are a sensible investment only if we also create the archival means for the knowledge they contain to endure. INFORMATION OBJECTS IN THE DIGITAL LANDSCAPE The national system of digital archives envisioned by the Task Force on Archiving of Digital Information would store digital information objects in a wide variety of formats, including scanned images, video files, sound files, ASCII text, and SGML encoding. For purposes of this report, the Task Force distinguishes between two types of digital information objects: document-like objects and other objects. Document-like objects share the characteristic that they can, but need not be, adequately represented in a print format. They include, for example, pure text, text with printable illustrations, and photographs which can be recorded in print. Other objects cannot be represented in print, and include sound, video, film, software and multimedia objects. The Task Force here focuses primarily on the archiving of document-like objects although it recognizes that other objects are likely to become increasingly dominant in future archives. The reason for emphasizing document-like objects is simple: the Task Force and the archiving community in general have more experience with them than with other types of objects, and the institutions which currently provide archiving services (e.g. libraries and archives) frequently emphasize them in their current efforts. Furthermore, in the current environment, there is significant demand for systems that support the archiving of document-like objects. Document-like and other information objects in digital form, like those in other forms, move through life cycles. They are created, edited, described and indexed, disseminated, acquired, used, annotated, revised, re-created, modified and retained for future use or destroyed by a complex, interwoven community of creators and other owners, disseminators, value-added services, and institutional and individual users. The digital world is still too new for us to describe fully the life cycle of the information objects that do now or will in the future reside there. However, given what we do know, we can make some general observations about how our ability to retain them for future use depends on the kinds of objects themselves, on the attributes they acquire during the course of their lives, and on the roles of various interested parties, or stakeholders, with respect to the different kinds and attributes of objects. Kinds and Attributes of Digital Information Digital technologies increasingly serve to integrate information resources. Text and numeric information, images, voice, and, video all have heretofore required different media for storage and transmission. When encoded digitally, these various resources share a layer of technology -- a common means of storage and transmission -- that allows them to be brought together and used in new ways. Integration through digital technologies also gives rise to new kinds of information, such as multimedia objects and software-dependent data objects from computer-aided design (CAD) and geographic information systems (GIS), many of which exist only in digital form. The ability to regard these information objects, some previously disparate and others newly minted, through a common digital lens requires us to develop "a new terminology and taxonomy for the networked information age" (Lynch 1993: 10). Advancing our understanding of and ability to describe digital information, particularly in relation to its life cycle, is essential. For digital objects, no less than for objects of other kinds, knowing how to preserve depends, at least in part, on being able to discriminate the features of what needs to be preserved. Information objects generally have attributes that are structured in multiple dimensions. Among the characteristics one can ascribe to digital information objects, those most critical for preservation include encoding, source, mode of distribution, referential qualities, and dynamic nature. -- Encoding As suggested above, the power and versatility of digital information derive from the property that it is, at one level, the same: simple bitstreams of 0s and 1s. However, to encode content into the strings of 0s and 1s and then subsequently to decipher that meaning in use, one has to employ hardware and software. Generally speaking, encoding methods vary by the kind or format of the information object and by whether the means is proprietary or standardized. Text today is generally covered by a formal, international ASCII standard for encoding characters. Standard extensions exist for encoding diacritic characters in romance languages other than English and a new standard is slowly emerging to incorporate scripted languages under a new common encoding scheme (Unicode). The Standard Generalized Markup Language (SGML) also exists to encode structure and content of digital works, and is increasingly used by formal publishers. Alternative encoding methods, however, abound. IBM maintains its own EBCDIC character encoding scheme and Apple and Intel-based personal computers differ in the ways that they support extended ASCII character sets. Moreover, although word processors, spreadsheets and database management systems typically have mechanisms to convert data to standard interchange formats, for day-to-day use they routinely encode structure and content in proprietary forms, and indeed even model these characteristics of documents very differently. As products change, the encoding mechanisms also change and older ones become obsolete. The circumstances that prevail for information containing textual and numeric characters, content and structure are largely mirrored for other forms of digital information such as images, sound and video, although there are important new issues that emerge for multimedia, such as resolution, quality, size of image, compression used and sampling rates. Standards emerge as a given form matures in use. However, creative, often proprietary, encoding techniques, usually developed in a highly competitive market environment, frequently intervene to provide needed functionality in cases where strict adherence to the prevailing standard may be flawed or inadequate. The variety of encoding techniques both within and across forms -- even the creeping obsolescence as encoding techniques change and improve -- exemplifies the tremendous vitality of the digital world as an emerging information medium. In the face of this vitality, the preservation challenge is to find ways to migrate information structure and content into the future effectively through the maze of competing digital encoding systems. A call to create or use standards simply will not prove effective at a time when the standards themselves are subject to the same dynamic of change and obsolescence as the encoding systems they are trying to organize. Enduring standards may emerge. In the meantime, the successful migration strategies will be those that recognize and respond to, rather than resist, the substantial variation in encoding schemes. -- Source Broadly speaking, information objects, both digital and otherwise, may emerge from three channels. First, there are objects that an individual generates. These may include, for example, electronic mail, notes, manuscripts, preprints, databases, and photographs. Second, there are materials generated in the conduct of institutional business. The organizations may range from formal corporations, government agencies, educational institutions, business partnerships, or ad hoc collaborations. Materials from these sources may include employment and financial records, contracts and other legal records, planning documents, FDA filings, models of financial markets, sensor data, and technical reports. A third source of information in digital and other forms is through a formal publication stream. These may include monographs, serials, video, and sound recordings. The distinction among the sources is important because they represent a range of motivations and abilities for preserving information into the future. Individuals are least likely to have the motivation and resources to maintain an enduring archive for their digital information. They may also resist preservation on privacy grounds. Institutional entities are more likely to build a structure for managing digital information, but they may or may not provide for long term preservation. Institutional motivation for preserving digital information depends on how central it is to the conduct of the business of the organization. Publishers -- a form of institutional entity -- may have a financial incentive to preserve their information objects if they can continue to market them. -- Mode of distribution Digital materials increasingly come into being and subsist in an electronically networked environment. The quality of life of those objects, including the viability of efforts to preserve them, thus depends on specific policy and implementation decisions that address the quality of the technical infrastructure. The willingness of individual and institutional creators and formal publishers to make their valued information available over electronic networks depends largely on the bandwidth, extensibility, reliability, and security of the networks. Bandwidth is a constraint, limiting the usefulness of certain kinds of information, such as images, video and sound. As capacity grows to transmit large digital files, one can reasonably expect demand for them and their archives to grow. Similarly, as the networks extend their reach into many and varied communities of users including individual homes, as the reliability of their operation stimulates trust in the medium among providers and users, and as the security of the environment grows, allowing providers and users to conduct transactions in confidence, both the demand and supply of a wide variety of digital information, including archives of that information, will grow rapidly. -- Referential qualities Referential qualities are a fourth set of attributes that bear on digital preservation. The defining feature of some information objects is that they contain information about and thereby refer to other information objects. Examples of such objects, known as metadata, include bibliographic catalogs, indices, data dictionaries, directory systems and finding aids. By describing features of the objects to which they refer, these kinds of information locate those objects in a larger information context. Metadata thereby serve a critical function in information retrieval. Metadata objects in digital form will themselves need to be preserved, but they need to be distinguished from the objects to which they refer for several reasons. First, it is essential to understand how closely the objects are coupled. The extent to which a digital metadata object, such as a catalog entry, is directly linked to the digital object it describes and classifies will determine the extent to which it is necessary to preserve together the two kinds of objects. For example, a catalog entry may now contain the exact digital location (or locations) of the document it names, allowing the reader directly to retrieve and peruse the contents. While this method is effective in the short term, the direct link between the document and its metadata means that their future is inextricably linked: if one changes the other must also change. Greater flexibility to deal with the object separately from its metadata would be desirable. Coordinated by such groups as the Internet Engineering Task Force (IETF), active research and development is currently underway to devise standard naming conventions that incorporate systems of indirect referencing. Such systems will help decouple metadata from the information to which they refer and so may help to decouple preservation decisions about the two kinds of objects. A related issue is that in the digital environment information objects need at some level to be self-referential. That is, some kinds of metadata must travel with an information object as a kind of identification and license, reporting what the object contains, assuring an interrogator that it is what it purports to be, helping to assure that only authorized users have access to the contents, and directing authorized users to the kinds of hardware and software needed to use it. Both for present use and long-term preservation, these kinds of metadata must be closely coupled to the specific objects to which they refer. In fact, a useful metaphor for metadata of this sort is a kind of wrapper or envelope that defines how the information object inside can be used indefinitely into the future. Research is just beginning into the range of metadata that are most usefully attached to an information object in a networked environment (see, e.g., Moen 1995). Finally, as the amount of digital information grows, the problem of navigating electronic networks to find digital objects grows correspondingly. Considerable experimentation is currently underway, testing methods of automatically gathering information about digital objects, including image, sound and video, and of presenting that information in useful ways. These processes are generating new forms of digital metadata. As "best practices" emerge about how to create and store them, developers will need to focus attention on how best to preserve these new metadata objects. -- Dynamic Qualities A fifth set of attributes that serves to classify information objects in the digital world and that affects their preservation is their dynamic character. First, information objects may be dynamic in the sense that they change cumulatively or interactively, as in contributions to an on-line discussion topic or to a research database. Much has been made of the capacity of the digital environment to serve in these contexts as a "collaboratory" fostering new and different modes of collaboration. However, the high degree of interaction one expects to find when such an on-line collaborative environment is working effectively raises questions about the nature of the information objects that it generates and the mechanisms needed to support it. Are they collective or corporate works, a series of individual works or something different? On-line discussions might best be seen as a series of individual correspondences, but how many single messages are worth saving without the entire thread of discourse? The result of the human genome project is more clearly a collective work but individual contributions may well be worth noting and saving independently. Our ability to save these kinds of dynamic information will ultimately turn in part on how we define the intellectual property the information contains. Information objects in the digital world may also be dynamic in the sense that they are revised and updated resulting in multiple instances, versions or editions of the object. One of the advantages typically cited of the digital environment is the ease of making such changes, but the lability of the environment is also a cause of great concern. Changing information objects can eliminate or overlay significant content and thereby corrupt the cultural record, sometimes with significant legal, political and economic consequences. Preserving the record implies the need for reliable methods of authenticating and tracking version changes (Lynch 1994). Although it is essential to document such changes, it may not be necessary to preserve all relevant versions. Some changes might be generated algorithmically. For example, one might generate different resolutions of a photographic image from a high resolution original to accommodate different screen capacities of users. For archival purposes, it may be sufficient to preserve the sampling algorithms with the original, rather than each of the lesser resolution versions. Finally, information objects may change in the linkages made among them. Objects that are dynamic in this way include, for example, WWW pages. Pages may change their location, and links to and from the pages may be added or deleted. If one views the information objects that need to be preserved for future use as the WWW pages themselves, then the dynamic may simply be a variant of versioning, as identified above. However, if the effectiveness of the WWW truly resides in the network of linkages among the pages, rather than in the nodes of the network, then the objects distinctively needing to be preserved are both the pages and the linkages. Given the dynamic nature of those links, the preservation problem becomes exceedingly complex. At present, there appears to be no good archiving solution other than to treat the network in terms of its component parts and to take periodic snapshots of the WWW pages. Stakeholder Interests Digital information objects acquire the dimensions and qualities described above as they move through a life cycle in a series of relationships with parties, or stakeholders, who have specific interests in their creation, management, dissemination, or retention. The initial stakeholder in a digital information object is its creator. Following creation, a digital information object may pass through a series of gateways of increasingly public release and access. Some digital information objects (like videos of family picnics or private journals) may never be released beyond the initial creator; others will be limited to an immediate circle, which may or may not be physically co-located with the creator. As digital information objects are more widely released, additional stakeholder roles may be involved. For document-like objects, for instance, a creator may disseminate the object to a stakeholder performing the role of editor or publisher, who may or may not disseminate the object more widely. Once the digital information object has been publicly disseminated, it may enter the collection of a collector/disseminator of information objects, such as a library. The collector/disseminator may or may not elect to retain this or any other information object for a longer period. As defined earlier, archives perform the role of longer-term retention of digital information objects, selected according to criteria which may be specific to the particular archive. Stakeholders independently certified to perform archival functions may and will differ substantially in the extent that they are able, because of intellectual property restrictions, to provide access to the digital information objects they store. Additionally, some archives will choose more so than some others to exercise aggressively a fail-safe mechanism to preserve information at risk of being lost. It is important to emphasize that all of the stakeholder functions described above may be exercised simultaneously in relation to any digital information object. That is, the creator of an object may retain a financial or other stake in its dissemination and use for an extended period, as may a publisher/disseminator of the object. At the same time, stakeholders acting as collectors/disseminators may retain an interest in the continuing use of the object, while archives of various kinds may archive it. In general, stakeholders add to or make use of the actual or perceived value of digital information objects. Whether or not an object has sufficient value to sustain use, either present or prospective, may largely determine whether it endures or dies. Among other things, archives ensure that this value assessment occurs over a sufficiently long period to protect objects of value from short-term reassessment and consequent destruction. Because use is the best insurance that information will endure, archives are needed to provide a safety-net against future, unforeseen requirements. As noted earlier, the digital environment facilitates repeated use, and enables information objects to be reused, recreated, and redisseminated as new objects. The ease of use, reuse, and recreation of digital information objects provides the opportunity and stimulus for new stakeholders to emerge and add value, and for the relationships among existing stakeholders to assume new forms. Digital archives contribute fundamentally to this process by ensuring that there is a substantial collection of stable, well defined objects, with clearly established parameters and terms and conditions for use, as a foundation for this continuing process. The next section of this report discusses the critical organizational issues and requirements in creating and sustaining this national system of distributed digital archives. ARCHIVAL ROLES AND RESPONSIBILITIES In a highly distributed environment of digital information, repositories that are linked through electronic networks both to one another and to the various communities they serve will likely also be highly distributed. As the networked environment matures, appropriate divisions of labor will take shape through intense interactions among various parties with stakes in creating, distributing, using and preserving the information. In a time of sustained flux and change, the most effective organizational structures for these interactions will likely be those that are agile and bear the least overhead. Such structures will surely include both informal collaborations (associations and alliances) and formal partnerships among contractors and subcontractors, as well as corporations, federations and consortiums, each of which may range over regional, national and even international boundaries. The basis of interaction in these kinds of structures, moreover, will no doubt also vary and arise from shared interests in, for example, intellectual discipline, in type of information or in function, such as storage or cataloging. In the end, existing stakeholders may successfully adapt their current roles and responsibilities to changing needs, existing roles and responsibilities may be redistributed among those with stakes in digital information, and focused attention on the division of labor may lead to the elimination or diminishment of some roles and responsibilities and the emergence and growth of others. General Principles Whatever the particular outcomes may be as stakeholders, new and old, interact and give shape to the distributed network of digital information, distributed responsibility for preserving that information requires commitment at least to the following set of organizing principles: 1. As long as they hold copyright, information creators/providers/owners have initial responsibility for archiving their information objects and thereby ensuring the long-term preservation of those objects. - The creator/provider/owner may engage other parties, such as certified archives, to take over some or all of the archival responsibility. - Libraries and archival organizations may interact with creators/providers as subcontractors for maintaining an archive during and after the active life of their information objects. 2. Certified archival organizations have the right and duty to exercise an aggressive rescue function as a fail-safe mechanism to preserve information objects that become endangered because the creator/provider/owner no longer accepts responsibility for the preservation function and does not take steps formally to convey responsibility, or because there is no natural institutional home for the objects. The conditions of creating digital information and giving it a useful life are essentially the same as those required for the information to persist over time. That is, it must be stored and maintained in an accessible form. It is not unreasonable, therefore, to assert, as the first principle does, that initial responsibility for preservation begins with creation of the information and rests with the creator, owner or provider of it. Individuals and public and private agencies already regard such a responsibility as a natural one for their critical internal records. As the properties and usefulness of other kinds of information objects become more widely known, the ability in the digital environment to reuse and repackage these objects may generate revenue or other benefits. Creators, owners or providers of these objects, including some publishers, who may not have otherwise counted long-term preservation among their key responsibilities, may thus be more inclined to do so and may either assume the responsibility themselves or find a qualified partner to do so. Among potential partners, they will likely find libraries, archives or similar agencies, which have specific collection agendas and will seek to take or share responsibility for preserving organized collections of digital materials. The second organizing principle calls, in effect, for a fail-safe mechanism. A variety of factors -- budgetary constraints, reorganization of priorities or focus, change of business, the need to go out of existence, or expiration of copyright -- might prompt custodians to neglect, abandon or destroy their collections of digital information. No distributed system of digital archives will afford effective protection of electronic information unless it provides for a powerful rescue function allowing one agency, acting in the long-term public interest of protecting the cultural record, to override another's neglect of or active interest in abandoning or destroying parts of that record. Digital archives operating normally and those operating in fail-safe mode differ only in the rights and obligations they have with respect to rescued material. A special provision in the copyright statute defines such rights and obligations for parties involved in rescuing information on embrittled paper, and so serves to enable the national brittle books preservation program. Similar rights and obligations may need to be established and applied to digital information. Some of the problems and prospects of doing so are explored below in the intellectual property section of this report. Given the principles articulated here, it follows that a commitment to preservation -- the long-term storage and maintenance of digital information objects in accessible form -- is a defining feature of a digital archive, whether it is operating in normal or fail-safe mode. This commitment is fulfilled in practice in any archive by the exercise of four crucial functions: managing the operating environment of the archive, the migration of the archive as the operating environment changes, intellectual property, and the costs and finances of the operating environment and of periodic migrations. The operating environment of digital archives For assuring the longevity of information, perhaps the most important role in the operation of a digital archive is managing the identity, integrity and quality of the archive itself. Users of archived information and of archival services need to have assurance that an archive is what it says that it is and that the information stored there is safe for the long term. In the view of the Task Force, digital archives must be able to meet or exceed the standards and criteria of a independent certifying agency. Appropriate organizations and individuals need to begin now developing such standards and criteria, including standard methods for a repository to declare its existence as a digital archive and therefore its intentions to preserve the contents over which it has custody, and for describing what the archive contains and what services it provides. -- Selection Intimately related in the operating environment of a digital archive to the role of managing its identity is managing its abilities to select material for inclusion, to accession that material, and to store and provide access to it over the long term. Selection processes for archives of all kinds -- paper and digital -- are matters of intellectual judgment about what to include and save and what to exclude. Criteria for such judgments are largely tied to the intrinsic qualities of the material, such as its subject and discipline, and the relation of these intrinsic qualities to the collection goals of the archive. In this sense, selection is a highly specific process and so is not amenable, at least in this context, to significant generalization. However, selection is also dependent on a number of extrinsic factors which are likely to affect its operation in digital archives and about which some comment is needed. Selection processes are, for example, acutely dependent on retrieval mechanisms to help place and evaluate candidate objects in a larger universe of related materials. Information retrieval in large universes of diverse digital materials is the subject of very active computer science research, but retrieval today against a rapidly expanding universe of materials in all forms -- digital, paper, microform -- remains difficult, and the relatively primitive systems currently in operation may, in the short term at least, inhibit the effective selection of materials for digital archives. Redundancy of information is also important for effective selection. Selectors need to know before accepting an object where a copy of record is stored and if and where additional copies are distributed either as backup or for more efficient retrieval. In addition, selectors ultimately need to have a rich understanding of the software and hardware dependencies of candidate digital information objects so that they can factor the carrying costs for an object into their overall assessment of its value. Such understandings remain in relatively short supply in some measure because the educational processes for wedding selection and technical skills still need to be devised and perfected. In larger measure, however, technical understandings are deficient because the digital environment is so immature that the dependencies of information objects on underlying hardware and software are still fundamentally unknown. Finally, archival selection processes in the digital environment are, if not uniquely then at least more explicitly, continual processes. Given the need to migrate digital information regularly from its hardware and software environment, the stimulus and occasion will recur to reappraise the value of the material being migrated. Materials may lose value over time and may need to be withdrawn and discarded (deaccessioned). Hard decisions may also be necessary about how and whether digital objects migrate if changes in a technical environment force alters what is preserved so that it differs in fundamental ways from what existed in the original object. Digital archives thus need to formulate and implement policies and practices for ongoing appraisal (Conway 1995). -- Accession Once an information object is identified for inclusion in a digital archive, it needs to be accessioned, that is, prepared for the archive. The accession process involves both describing and cataloging selected objects and securing them for storage and access. Standards for description are well developed for certain kinds of materials that are likely to appear in digital archives, such as monographs and serials. Because of the high degree of standardization, one can reasonably expect such descriptions to accompany the digital object, thus simplifying the accessioning process. For other kinds of materials, such as WWW pages, motion video and multimedia, relevant attributes are less well understood. Standards of description are less well practiced and one cannot reasonably expect satisfactory descriptive material to accompany the digital object. In all cases, special attention is needed in the accession process to creating, describing and tracking the versions of the object in the archive to satisfy the various requirements for display and other forms of access and for long-term storage. If material is to be deaccessioned then declarations need to be made to that effect, particularly if it is the last known copy, so that rescue efforts by others can proceed if appropriate. In the accession process, selected digital objects also need to be made secure for indefinite future use. An archive may need to establish access controls for its information objects and encrypt or authenticate them. Establishing access controls involves setting terms and conditions for authorized use by specific users or classes of users. For certain extreme cases, such as rescued objects still under intellectual property protection, there may be no authorized uses for certain periods of time except for the archive's internal use to ensure that the object will be accessible in the future. Encryption serves to preserve and protect the privacy and confidentiality of the use of an object from the archive. Authentication, which may make use of cryptography, provides verification that a digital object is what it purports to be and contains the contents that the author/creator or publisher originally intended. One of the characteristics of materials in digital archives may need to be that they contain a digital signature that users can independently verify to assure themselves that the object is unchanged from its archival state. -- Storage The storage operation in digital archives attends primarily to the media level formatting of information objects. Primary considerations include levels of hierarchy and redundancy. In any archive, there may be multiple levels of storage graded to levels of expected use and needed performance in retrieval. Little used material may be stored most efficiently off-line, usually in tape format. For objects in high demand, where retrieval time is at a premium, on-line storage in magnetic media may serve best. In a distributed network, they may need to be stored on-line in multiple locations. An intermediate solution is near-line storage, where information objects may be stored on optical or tape media and loaded in a jukebox. Retrieval time in near-line storage systems suffers by comparison to on-line storage, but is considerably more responsive to user demand than off-line storage. Digital archives may use any or all of these methods. The most sophisticated systems combine the resources so that objects in use or recent use are stored on-line and, as they age from the time of most recent use, they move to near-line storage and then eventually off-line. Another important storage consideration is redundancy. In a system that is completely dependent on the interaction of various kinds and levels of hardware and software, failure in any one of the subsystems could mean the loss or corruption of the information object. Effective storage management thus means providing for redundant copies of the archived objects as an insurance against loss. Depending on the copyright status of the objects, archives may choose to make backup copies on their own or to make arrangements for other sites, which hold the same object, in the network of archives to serve as backup, or they may choose to do both. -- Access Providing access to digital information in a distributed network environment means above all that the archive is connected to the network using appropriate protocols and with bandwidth suitable for delivering the archive's information. The archive has an obligation to maintain the information in a form so that users over the network can find it with appropriate retrieval engines and view, print, listen to or otherwise use it with appropriate output devices. With respect to access, digital archives also have the responsibility to manage intellectual property rights by facilitating transactions between rights-holders in the information and users and by taking every reasonable precaution to prevent unauthorized use of the material. -- Systems engineering Because so many of the operational responsibilities of the digital archive -- selection, accessioning, storage and access -- are functionally identical to those of more traditional permanent repositories, they may successfully extend their scope to include digital materials. Many traditional archives have already embraced digital materials, and libraries and museums are not far behind. Wherever digital archives may reside organizationally, their operation is highly distinctive in one crucial respect. That is, they need, at least now and for the foreseeable future, a high level of systems engineering skill to manage the interlocking requirements of media, data formats, and hardware and software on which the operation of the digital archive essentially depends. As the digital environment matures, the role of systems engineer will serve to integrate new technical developments that promise to streamline and strengthen the operation of digital archives. Commercially available systems for user authorization, and document encryption and authentication, systems for using metadata to automate the maintenance and delivery of archived information objects, and networked-based services that help manage the conventions for naming digital resources and provide means for conducting intellectual property transactions all will need the attention of systems engineers to ensure that they are effectively incorporated into the normal operation of digital archives. In addition, the systems engineering function will serve an essential role in helping to determine when objects in digital archives should migrate to new hardware and software. Migration Strategies As the operating environments of digital archives change, it becomes necessary to migrate their contents. There are a variety of migration strategies for transferring digital information from obsolete systems to current hardware and software systems so that the information remains accessible and usable. No single strategy applies to all formats of digital information and none of the current preservation methods is entirely satisfactory. Migration strategies and their associated costs vary in different application environments, for different formats of digital materials, and for preserving different degrees of computation, display, and retrieval capabilities. Methods for migrating digital information in relatively simple files of data are quite well established, but the preservation community is only beginning to address migration of more complex digital objects. Additional research on migration is needed to test the technical feasibility of various approaches to migration, determine the costs associated with these approaches, and establish benchmarks and best practices. Although migration should become more effective as the digital preservation community gains practical experience and learns how to select appropriate and effective methods, migration remains largely experimental and provides fertile ground for research and development efforts. Stewards of digital material have a range of options when faced with the need to preserve digital information. One might preserve an exact replica of a digital object with complete display, retrieval, and computational functionality, or a representation of it with only partial computation capabilities, or a surrogate such as an abstract, summary, or aggregation. Detail or background noise might be dropped out intentionally through successive generations of migration, and custodians might change the form, format or media of the information. Enhancements are technologically possible through clean-up, mark-up, and linkage, or by adding indexing and other features. These technological possibilities in turn impose serious new responsibilities for presenting digital materials to users in a way that allows them to determine the authenticity of the information and its relationship to the original object. -- Change Media One migration strategy is to transfer digital materials from less stable to more stable media. The most prevalent version of this strategy involves printing digital information on paper or microfilm. Paper and microfilm are more stable than most digital media, and no special hardware or software are needed to retrieve information from them. Retaining the information in digital form by copying it onto new digital storage media may be appropriate when the information exists in a "software-independent" format as ASCII text files or as flat files with simple, uniform structures. Several data archives hold large collections of numerical data that were captured on punch cards in the 1950s or 1960s, migrated to two or three different magnetic tape formats, and now reside on optical media. As new media and storage formats were introduced, the data were migrated without any significant change in their logical structure. Copying has the distinct advantage of being universally available and easy to implement. It is a cost-effective strategy for preserving digital information in those cases where retaining the content is paramount, but display, indexing, and computational characteristics are not critical. As long as the preservation community lacks more robust and cost-effective migration strategies, printing to paper or film and preserving flat files will remain the preferred method of storage for many institutions and for certain formats of digital information. Yet the simplicity and universality of copying as a migration strategy may come at the expense of great losses in the functionality of digital information. When the access method for some non-standard data changes, it is in order to migrate them often necessary to eliminate, or "flatten," the structure of documents, the data relationships embedded in databases, and the means of authentication which are managed and interpreted through software. Computation capabilities, graphic display, indexing, and other features may also be lost, leaving behind the skeletal remnant of the original object. This strategy is not feasible, however, for preserving complex data objects from complex systems. It is not possible to microfilm the equations embedded in a spreadsheet, print out an interactive full motion video, or preserve a multimedia document as a flat file. -- Change Format Another migration strategy for digital archives with large, complex, and diverse collections of digital materials is to migrate digital objects from the great multiplicity of formats used to create digital materials to a smaller, more manageable number of standard formats that can still encode the complexity and functionality of the original. An archive might accept textual documents in several commonly available commercial word processing formats or require that documents conform to standards like SGML (ISO 8879). Databases might be stored in one of several common relational database management systems, while images would conform to the tagged image file format (TIFF) and standard compression algorithms. Changing format as a migration strategy has the advantage of preserving more of the display, dissemination, and computational characteristics of the original object, while reducing the large variety of customized transformations that would otherwise be necessary to migrate material to future generations of technology. This strategy rests on the assumption that software products which are either compliant with widely adopted standards or are widely dispersed in the marketplace are less volatile than the software market as a whole. Also, most common commercial products provide utilities for upward migration and for swapping documents, databases, and more complex objects between software systems. Nevertheless, software and standards continue to evolve so this strategy simplifies but does not eliminate the need for periodic migration or the need for analysis of the potential effects of such migration on the integrity of the digital object. -- Incorporate Standards Digital archives will benefit from the wide-scale adoption of data and communication standards that enable the interchange of documents and data among systems. Business needs in many institutions are driving the development and adoption of data standards. Organizations that create, use and maintain Geographic Information Systems (GIS), for example, are trying to reduce data conversion and maintenance costs by creating data that conform to widely accepted standards so that they can be exchanged, reused, or sold. Rapid implementation of electronic commerce depends on widespread development and adoption of standards for EDI (electronic data interchange) transaction sets under auspices of the ANSI X.12 committee. Standards initiatives that address business needs for the secure and reliable exchange of digital information among the current generation of systems will impose standardization and normalization of data that ultimately will facilitate migrations to new generations of technology. Archives must keep abreast of standards developments and make sure that their own technological infrastructure conforms to widely adopted standards. -- Build Migration Paths Planning for long-term preservation is a critical element of digital preservation. To the extent that creators/providers/owners of digital information accept initial responsibility for archiving their objects, they may begin to see the wisdom of incorporating migration paths or other provisions for preservation as an integral part of the process or system that generates digital information. To assist in this educational process, archives might work one-on-one with potential donors to develop agreements early in their careers and establish arrangements for regular, on-line deposit of digital materials in a format acceptable to the repository. In government, institutional, and corporate settings, archivists and librarians can issue guidelines and advice for digital preservation and encourage their parent institutions to adopt common usage rules, comply with data standards, and select applications software that supports migration. The preservation community as a whole needs to work with industry to develop backward compatibility paths as a standard feature of all software. Backward compatibility or migration paths would enable a new generation of software to "read" data from older systems without substantial reformatting. Although backward compatibility is increasingly common within software product lines, migration paths are not commonly provided between competing software products or for products that fail in the marketplace. -- Use Processing Centers Although standards and migration paths may become commonplace at some future date, a large body of digital material exists today in non-standard formats, and organizations and individuals continue to produce digital materials in formats that will require migration. Developing "processing centers" that specialize in migration and reformatting of obsolete materials may provide a cost-effective method of digital preservation. Processing centers might provide reformatting services for particular types of materials, such as text, certain database structures, geographic information systems (GIS), or multimedia products. Such centers might maintain older versions of hardware and software to support migration. They might provide a platform for reading and viewing digital information with the same "look and feel" as the original version by developing "software emulators" as suggested by Rothenberg (1995). Processing centers would take advantage of economies of scale and maximize the use of uncommon technical expertise. Migration/preservation services centers might resemble commercial firms that reformat old home movies and obsolete video formats or consortia of libraries and archives with distributed preservation programs. A national laboratory for digital preservation, modeled after the National Media Laboratory, is another alternative. Feasibility studies and cost/benefit analyses would be necessary to determine the technological, economic, and commercial viability of such processing centers. Intellectual Property "The biggest problem for preserving digital information," many assert, "isn't technology, it's intellectual property rights." Addressing and resolving issues of intellectual property necessarily involves a complex set of stakeholders, including the creators and owners of intellectual property, managers of digital archives, representatives of the public interest, and actual and potential users of intellectual property. In addition, the stakeholders who represent, for example, owners and users of different kinds of intellectual property (e.g., text and other document-like objects, photographs, film, software, multimedia objects) operate under very different organizational regimes, with different experiences and expectations. Adding to these complexities is the deep uncertainty all these stakeholders face in confronting an increasingly digital world. Owners of intellectual property are unsure about how to price access to their information and about the levels of risk which necessarily accompany digital access. Users of intellectual property are unsure about what rights are conveyed with the use of a particular digital information object and about how much such use is worth. Government representatives seek in the public interest to manage the powerful changes that accompany digital technologies, but are unsure about what levers of influence are available to them and how to generate appropriate public policy. The state of the copyright law, which was generated and developed in an analog world but must be applied to an increasingly digital universe, is itself confusing and uncertain. And all of these concerns are exacerbated by the fact that bits know no borders. Creating and managing a distributed system of digital archives, including particularly a critical fail-safe mechanism that would enable the aggressive rescue of endangered elements of the cultural record, requires confronting these uncertainties. This section focuses on the legal regime established in the 1976 Copyright Act, which provides much of the legal context for the creation and management of digital archives. It is worth noting, however, that the communications infrastructure, which undergirds the digital world, is governed by a different set of legal requirements and these will likely significantly affect the dissemination of intellectual property in a distributed, networked environment. The impact of communication law and rights on the digital world does not mean that intellectual property law and rights will or should be abandoned wholesale for digital information. However, the law in these two arenas will interact in significant ways for all parties. Indeed, such interaction has been ongoing for decades in television and radio. Access to intellectual property traditionally found, for example, on library shelves will likely require the use of an underlying communications infrastructure as in the case of network television today. Thus, communications law and concerns will at least be an equal consideration to intellectual property law and concerns when access is gained to document-like objects. While recognizing that this interaction will be a cornerstone for whatever system exists in, say, ten years, the focus for the time being is on preserving intellectual property within the bounds of current copyright law. Section 108 of the Act provides that libraries may make copies beyond fair use for "purposes of preservation." Congress did not contemplate the preservation of digital information, as is evident by references in Section 108 to replacement of lost, stolen, and damaged (paper) copies. When the Copyright Act was last revised, microfilm, photocopies, and similar paper substitutes were the most likely vehicles of potential copyright infringement. An explicit exception for libraries to make paper or microfilm copies for preservation made sense because it advanced the important public policy goal, mandated by the U.S. constitution, of ensuring the progress of the arts and sciences. The advent of digital technologies does not, and in our view should not, alter this important public policy goal. There is, however, sincere disagreement about the details of implementing digital preservation within the bounds of law. The works in progress presented below reflect the rich diversity of views among participants in the intellectual property and digital archiving communities. Some of the key questions that are discussed but not answered in these materials include: What if any authorizations are required from owners in order to archive digital works? What if any authorizations are required in order to convert works to digital form solely for the purpose of archiving? How do the principles of fair use translate into a digital archiving environment (if at all)? Are the legal principles different for a fail-safe mechanism for digital preservation than for preservation of paper materials? Why or why not? Are, or should, the rules and principles of digital archiving differ for different media? Where does the public interest lie? How should it be exercised? What methods should be employed to seek and achieve consensus among the various stakeholders? Is consensus possible, or even desirable? The Task Force recognizes that reasonable people from the different stakeholder communities may and will differ in their approach to these issues, and in the conclusions they reach. This report is intended to stimulate discussion among the interested parties, as a first step toward clarifying their critical concerns. The materials below, organized in three sections, are not meant to be definitive or conclusive about how to manage intellectual property preservation in a digital environment. Rather, they contain divergent, even conflicting perspectives and serve to indicate that substantial work is needed to ensure that all vital interests are best served in the emerging digital environment. It is hoped that the discussion elicited by this report will clarify those interests and the final report of the Task Force will attempt to integrate these views into a set of proposals for further action. -- Propositions about the Digital Archiving of Intellectual Property As discussed elsewhere in this report, the Task Force envisions a national system of certified digital archives with fail-safe capabilities serving as a safety net to ensure long-term access to at least one instantiation of any valued digital information object. In designing such a fail-safe system of archives, it is critical to determine what, if any, archival activities are subject to mandatory authorizations from intellectual property owners. Activities which may require authorization include, but are not limited to: digitizing analog works; placing digitized works in an archive under fail-safe conditions; migrating digital works by converting them from one digital form into another; moving archived works from one archive to another; providing access to archived works; moving digital works from one archive to another; and deleting archived works. Requiring operators of archives in a fail-safe system to seek and receive permission from copyright owners to carry out these activities would place a substantial burden on the system. Furthermore, the lengthy duration of copyright protection and the not infrequent changes of ownership for particular works would add further strain to an already burdensome information support structure. Finally, and most serious, requiring authorization for the activities listed above would mean that, during the course of copyright protection, separate authorizations might be required to migrate a digital object through as many as ten or fifteen different software and hardware environments. Requiring copyright owner authorization for each "fail-safe" archival activity for the life of copyright protection could effectively eliminate all prospect for a system of "fail-safe" archives. The following propositions are designed to address both the vital interests of copyright owners and the public interest in ensuring the preservation of endangered digital information objects and of our shared cultural and intellectual heritage. 1. Intellectual property owners must authorize the making of a copy of their digital works, except under the conditions described in (2), below. 2. Digital archives which operate under the following set of terms and conditions would not require authorization by intellectual property owners to create a copy of a digital work from a legally procured copy and to store, migrate and manage that work in these archives: - As long as the work is protected intellectual property, the owner or the owner's agent must explicitly authorize any access, use or dissemination of the work. - Any digital work which is not protected intellectual property (either because it has been placed in the public domain or the period of ownership has expired, or for any other reason) can be accessed, used and disseminated according to the terms and conditions of the digital archive. -- Discussion of the Propositions and Other Matters: A Conversation These propositions may be controversial and are intended to spark a continuing dialogue about the framework needed to support intellectual property in emerging digital archives. The following conversation is alleged to have taken place among two members of the Task Force. They are identified here as Gaius and Tiberius Gracchus; it is hoped that the classicists in our midst will appreciate the allusion. There are several reasons for including this dialogue in the current version of the Task Force report. First, it is important to recognize that many of the observations and recommendations of the Task Force -- and, perhaps, particularly those concerning intellectual property -- may and should spark discussion such as that reflected below. Second, the dialogue touches on several key themes, including the relation between archiving in general and the specific functions which may be performed by archives exercising a fail-safe mechanism; the problem of archiving highly dynamic documents, like WWW pages, in a way that captures rather than stifles their basic nature; the relationships between concepts and terms in intellectual property (like the idea of "copy") which have evolved in a document-like universe, but may have less relevance in a digital one; and the impact these questions may have on the success or failure of the proposals in this report. Finally, we felt that a few pages of less formal content is appropriate and would emphasize that this is a draft report. Gaius Gracchus: Your earlier summary of the discussion suggested that we were proposing the archiving system, when in fact the propositions were only about an ultimate fail-safe mechanism. In our view, such a fail-safe mechanism is needed to undergird a highly distributed, (probably) library-based structure of (at least somewhat) linked digital archives. Thus, from my perspective, questions about the rights of libraries/archives to preserve are orthogonal to this discussion, which is only about how "Fail-safe Archives" might operate. Tiberius Gracchus: Understood, but even the Fail-safe Archives need to come into being somehow. The issues that such archives raise and the "how" of creating them should be detailed. At this moment, I don't see how to do that as it seems to be outside the law; the law is an aspect that needs to be dealt with, specifically. Gaius: There's also a major issue around the archiving of living documents, like Web pages: we don't have a way to deal with them, but I think that's the case for all our discussions, not just this one. I've been analyzing what it would mean to preserve home pages, for instance, without anything useful coming out of my thinking yet. Tiberius: Yes -- but if we can't do that, then electronic archiving fails. Home pages, for example, are increasingly characteristic of what we value about electronic information. Preservation of electronic links needs to be addressed even if not resolved. Gaius: My problem with proposing changes to the copyright law is, first, that I don't believe that thinking/talking about digital "works" as copies is helpful. This is reinforced by looking at the convoluted logic in the current discussions about revising Section 108 of the Copyright Law. The discussions refer to "preserving" copy 1, 2, 3, as if those distinctions had much meaning in a digital world. I agree completely that the stored digital object in an archive might not "look" like the work as originally experienced, any more than the peaks and valleys in a CD look like a musical score, or like an audio trail. So what? We know it's the same work, don't we? Do we have to worry about how exactly we know that? I don't think there's a cat in this particular tree, but there are a lot of folks barking underneath it. Tiberius: Well, yes, but the law we have and the case law we have, is what there is. The courts (and copyright owners) deem the notion of digital works as copies to be very, very important. (How could they not?) This line of reasoning takes one right into the John Perry Barlow argument that the current copyright law simply does not address the digital environment and cannot address it and that what we need to do is reconceptualize it (Barlow 1994). There is, however, nothing in the current NII Green Paper headed for a White paper that even begins to do that. Au contraire, the recommendations attempt to carry the current regulations into the networked arena. Now, we can believe that Barlow is right and that over time the laws we have will have to change. But we are not there in 1995. Still, our recommendations could propose that it is time to look at new copyright concepts. It would add to the voices already so doing. Apart from that, I am far from sure that the digital copy is a true copy of what the original work was. Far, far from sure. Only notionally and most simply is it an identical copy. In reality, a succession of migrated copies, with all the choices that can make the information more and more flexible and functional, often add value that was never intended (or better to say, planned) originally. So maybe we disagree on a fundamental point! Gaius: Regarding your questions about how rightsholders learn who has what copies of their works and where, and what say, if any, does the rightsholder have regarding migration, etc.: These are the right questions in my view and I don't know what the right answers are. The rationale originally for proposing that the rightsholders' authorization not be required for these limited cases (i.e. the exercise of the fail-safe mechanism) turned exactly on our anticipation that such a principle would only serve as the fail-safe, supporting the broader interests of all parties. If the parties conclude it doesn't support their interests, then we should drop the idea and await litigation, revisions of the law, etc. But those are very bad options if the goal is to keep building a powerful, sustainable infrastructure for information. Your whole set of questions is terrific -- what answers would you propose? Or what process to get tentative answers? Tiberius: Well, a good place to start is an essay with a full discussion of these many important points. We would be well served by discussing it in the two streams: the fallback, or fail-safe mechanism and how it might be created (whose buy-in is needed); and then the distributed archives which will evolve as laws and practices change. -- Digital Archives and the First Sale Doctrine: An Exploration The following exploration of how the "first sale" doctrine of copyright law might be applied to digital preservation is controversial even among members of the Task Force. It is directed toward a possible avenue for ensuring digital preservation and looks to the day when works enter the public domain, as all works ultimately do. However, this exploration is not a recommendation of the Task Force, at least not yet, nor should it be construed in any way as constituting legal advice. It is meant for consideration and comment. Reasonable people will disagree about this approach; however, even those who advocate it on the Task Force adhere to the basic assumption that it in no way compromises the rights of copyright owners protected in Section 106 of the law. As defined, it is intended very narrowly. We ask only that it be considered with an open mind; we solicit comments. Our goal is to find ways to preserve access to intellectual property and other information in digital form within the bounds of the current Copyright Act. The preservation of access is threatened by rapid changes in technology which render storage media obsolete within a relatively short time. To preserve access to digital materials, many have called for revisions to the Copyright Act, but I believe that such revisions are unnecessary. The "preservation of access" is not the same as actual access; it is more akin to an option. We all strongly support the law which mandates that any use falling within the exclusive rights of Section 106 of the Copyright Act must be authorized by the copyright owner. How then to preserve access to materials without running afoul of the copyright law? The answer may well be in the Copyright Act itself, in Section 109(a), the First Sale Doctrine. Before considering the application of the First Sale Doctrine to the digital world, let's examine how it works in the paper world: Where a copyright owner parts with a particular copy of her copyrighted work, she divests herself of her exclusive right to distribute that particular copy. The purchaser of that copy is not restricted by law from further transfers of that copy. Thus, for example, Levine and Baroundi publish a book called, "The Internet for Dummies." Falstaff buys a copy of "Dummies" at the book store. The First Sale Doctrine allows Falstaff to sell his copy, lease it, loan it, or give it away (or anything else but "reproduce" it), and most significantly, Falstaff does not need the copyright owner's permission to do any of these things. The copyright owner's intellectual property rights are in no way compromised because Falstaff has not sold, leased, loaned or given away any intellectual property. The law considers Falstaff to be trading in a chattel, not in Levine and Baroundi's intellectual property. Levine and Baroundi's exclusive 106 rights remain intact and untouched. This intellectual property/chattel distinction seems to disappear in a digital world. If such a distinction can no longer be maintained, one might conclude that the First Sale Doctrine will be obsolete once we leave the comforts of paper behind. In "A Preliminary Draft of the Report of the Working Group on Intellectual Property Rights" (hereinafter "The Lehman Report"), the working group concluded that "the First Sale Doctrine should not apply " to the sale or other disposal of the possession of that (first) copy by transmission." With respect to First Sale, The Lehman Report focused on the transmission of copyrighted works from one computer to another. The Task Force on Archiving of Digital Information takes no position on this recommendation because we have not considered it and it is irrelevant to our goal. To reiterate, our goal is to seek a way to preserve intellectual property for later access within the bounds of law. The application of the First Sale Doctrine to transmissions of copyrighted material does not fall within the scope of our inquiry which is far narrower. We leave the question of such application to others. Our focus is on preserving for later access materials that have been lawfully obtained, whether by transmission or otherwise. How might the First Sale Doctrine be applied to preserving digital materials for later access? Let's update Falstaff. Instead of going to a bookstore, Falstaff owns a PC and purchases "Dummies" in digital form, either on CD, on-line, or DAT tape. What has he purchased? He clearly has not purchased intellectual property which remains under the exclusive dominion and control of Levine and Baroundi. However, he has not really purchased a chattel either, particularly in the case of receipt of information on-line. What Falstaff clearly did purchase is access. The purchase of access is the common element to the paper and the digital worlds. When one buys a book, one cares far less for the color of the cover and the quality of the paper (book collectors aside) than one does for what the book contains. We buy books (or magazines or newsletters, etc.) to have access to the information they contain. That access does not mean that we own intellectual property, merely that we have access to it, and that the access is maintained so long as one, or another, maintains the book. The First Sale Doctrine allows us to sell our access or to give it away. The copyright owner has been fully compensated in any case. When one buys access to information in digital form, access to the information is still the central focus. A new concern, however, is to ensure that one is technically able to access the information over time. Copyright owners will make information in digital form available through the technical means of the day. The copyright owner will be compensated, as in the case of the book, when I buy "Dummies" on CD, for example. As to the CD for which the owner is compensated, am I not free to give it to my friend? If my friend stands in my shoes, the copyright owner is in the exact position as when I held the CD. He has sold access to one individual. One individual has access to it (my friend, no longer I). If I receive "Dummies" on-line, read it, give it to my friend, deleting what I received, the copyright owner is in the same position he would have been if I had done all these things with a book. Is there any compelling reason that a copyright owner should be rewarded in the digital world when he clearly would not be in the paper world? The First Sale Doctrine should allow me to do these things. Note that in these examples, at the end of the day, there is just one "Dummies" that may be accessed. No harm has come to the copyright owner. Suppose that advances in technology render my version of "Dummies" inaccessible. To ensure that I continue to have access, I must "migrate" the work to a form accessible by a new technology. The technical details aside, if the process is successful, I will have what I started with: access to "Dummies". I have gained nothing by migrating the work, and the copyright owner has lost nothing. The technical differences between the old version and the new version are not central to my concern or that of the copyright owner; they merely enable me to retain what I originally purchased. Some will argue that this is a "copy", but it is not. When the process is completed, I will have the same access, with the same restrictions, as I had originally. Should a mere change in technology enrich a copyright owner who has been compensated and who adds nothing? There are a couple of limits to this analysis. First, it is not clear that it applies to software. Second, the application of the First Sale Doctrine to the digital world would be undermined completely if copyright owners adopt a wide scale approach of licensing works for a term of years. Applying the First Sale Doctrine to digital works goes a long way to solving the legal problem associated with preservation and access. It does not harm copyright owners who retain all of their exclusive 106 rights and who must authorize all access to their works. It enables libraries to maintain current access to their holdings and turns their focus to problems that are difficult enough, i.e. the technical ones. It enables the preservation of materials for later authorized access, and perhaps more importantly, for the day when they fall into the public domain. -- Conclusion Significant questions remain about the rights and duties of digital archives with respect to intellectual property. What forms of access are permitted from such archives? Does the migration of digital information objects through successive hardware and software environments preserve the original objects or create derivative works? Although it may take time for definitive answers to these questions to emerge, the mandate to find such answers is clear. The constitutional goal of ensuring the progress of the arts and sciences will certainly elude us without the preservation of digital information, and a legal framework which fails to account for a reasonable system of digital preservation will itself not endure. Managing Costs and Finances In addition to managing the operating environment of archives, the migration of information through hardware and software platforms, and intellectual property, a fourth function by which archives fulfill their commitment to preserve digital information is in managing the costs of these activities. The principal cost factors of the operating environment are those associated with selection, accession, storage, use, property rights transactions, and the systems engineering needed to maintain the distributed infrastructure. Some costs, like those for hardware and software as well as those for intellectual property if rights are purchased rather than leased, will appear as capital costs and will need to be amortized. Operating costs will vary by the form of the information, by usage and over time. Digital information in full text form will be relatively cheap to store and use compared to other forms, such as image, sound, video and multimedia information. Full-text takes up less space in storage and less bandwidth in transmission than other forms. The modes of usage for full-text are relatively well, though by no means completely, understood, and so the delivery and access software is relatively more stable and less subject to costly turnover than that for information in other forms. Usage will also substantially affect operating costs. Healthy demand for particular information objects will push the archives that provide them to shift delivery to more costly on-line storage or to sophisticated and expensive systems of hierarchical storage. The costs of software and intellectual property may also be pegged to demand. Another usage factor affecting costs is the high variability in the kinds of user access devices. For many types of information objects, modes of access are still closely tied to the type of workstation available to the user. The tradeoff for the archive is either to limit access only to "approved" devices or to support multiple platforms and incur the added associated costs in hardware, software, intellectual property and systems engineering. Operating costs will also vary over time. Storage costs will likely continue to decline both absolutely and relative to other cost factors. The costs of access and of managing property rights transactions are relatively high today because the supporting systems are virtually non-existent; these systems are developing very rapidly and their relative costs will likely also fall. The costs of the property rights transactions themselves, however, cannot be reasonably predicted at this stage and the danger is that costs will rise to the point that they become a barrier to access. In the long run, the primary cost factor in the management of digital archives will likely be the systems engineering needed to support highly distributed network-based functions. Migration costs will vary depending on the complexity of the original data objects, the frequency of migration, the extent to which the functionality for computation, display, indexing, and authentication must be maintained, and the need to compensate for acquisition of intellectual property rights. Migration costs are much greater for complex objects, such as geographic information systems, where it is necessary to retain multiple formats of data, color display, and complex relations between the "layers" of a geographic information system than for flat files of data or ASCII characters. Computer models that drive artificial intelligence systems are of little long-term value if their computation capabilities are not retained; but migrating the models from one generation of software to the next involves complex and expensive transformations. Some types of digital information may require frequent migration -- as often as every three to five years -- if they are stored in formats that are subject to frequent change. Archives may have to compensate for intellectual property rights and may be required to purchase software or site licenses in order to migrate digital information stored in proprietary formats. Planning for migration is difficult because there is limited experience with the types of migrations needed to maintain access to complex digital objects. When a custodian assumes responsibility for preserving a digital object it may be difficult to predict when migration will be necessary, how much reformatting is needed, and how much migration will cost. There are no reliable or comprehensive data on costs associated with migrations, either for specific technologies and formats or for particular collections. -- Cost Modeling Answering questions about the costs and affordability of digital information and of preserving it in a system of digital archives is essential but exceedingly complex. There is a large array of cost factors to understand for a panoply of differing kinds of digital information objects in which numerous parties have a variety of different kinds of interests. A multidimensional matrix -- with present and future stakeholders mapped along one axis, kinds of digital information along a second, and cost factors against a third -- might serve well as a framework for systematically assessing the value of digital information and the affordability of preserving it. There is much we need to know before we can fully elaborate a framework of this kind, and much to be learned from detailed studies of agencies and organizations like National Aeronautics and Space Administration (NASA) and Inter-university Consortium for Political and Social Research (ICPSR), which have maintained large archives of certain kinds of digital information, in some cases in distributed form over electronic networks. For some other domains of the larger framework of understanding -- e.g., for images of textual documents produced in a university in a networked context and viewed from the perspective of a research library -- a relatively detailed sense of archival costs is also beginning to emerge. Careful comparison and assessment of the models of costs being constructed in each of these cases will considerably advance our overall understanding of the long-term implications of digital preservation. One such model -- based on Yale's Project Open Book -- is presented below. The analysis not only illustrates the need to refine our understandings of the value and usefulness of specific kinds of digital information in specific contexts, but also suggests where we need to focus our energies as we undertake the fundamental work needed to support digital preservation. The Yale Cost Model Assumptions about the value of digital information frequently turn on assertions that cheap storage and easy access are uniquely available in the digital environment. At the same time, many publishers, librarians and others contemplating the digital future express considerable fear that the digital world of information will not prove less expensive than the traditional paper environment, but in fact more expensive. Given a body of digital works, what resources are needed to store and provide access to them indefinitely into the future? In Project Open Book, the Yale University Library has accumulated an archive of over 2,000 digital texts. The texts are collections of black and white TIFF images at 600 dots per inch resolution under CCITT Group IV compression. Based on experience it has gained to date in the Project, the Yale Library has begun constructing a model that projects the costs of storage and access for a much larger digital archive, and has provided some of the details for presentation here (see Waters 1994). The fundamental assumption of the projection in the Yale model is that the digital archive is built primarily for the Yale community and is composed not of converted volumes, but of newly published acquisitions. These are purchased or licensed and accumulate at an annual rate of 200,000 volumes per year in a bit-mapped image form. Only a fraction of the newly acquired material is used each year. The usage rate is based on actual circulation rates at the Yale Library of about 15% of volumes. Such a rate is not unusual for large research libraries and seems appropriate as a basis for modeling a digital archive intended to collect material for future use as well as present use. The digital archive usage rate, however, is assumed to be 20%, slightly more than 15% in the traditional library on the theory that access is easier and therefore demand is more for materials in digital form. Digital Archive Costs Table 1 below contains a summary of the estimated digital archive storage and access costs per volume for Year 1 (See Appendix 2 for details). Components of storage costs in the digital archive include storage device costs -- a jukebox, in this model -- storage media costs, and the costs of operating and maintaining the storage equipment and of periodically refreshing the media and migrating the data. The costs of providing access to the digital archive include providing a document server and software for the server that will support client machines on users' desks, maintaining the server and access software and of operating the server, and printing on demand a selected portion of volumes used and of delivering those printed copies to the user. --------------------------------------------------------------------------- Cost Factors Costs per Volume in Year 1 Storage Device Costs. 1.12 Device Maintenance 0.46 Operations Costs 0.46 Media Costs 0.25 Media Refreshment and Data Migration 0.49 Total Storage Costs per Volume $2.77 Access Document Server $2.13 Server Maintenance 0.88 Server Operations 0.88 Access Software 1.22 Software Maintenance 0.50 Printing 0.96 Delivery 0.09 Total Access Costs per Volume $6.65 --------------------------------------------------------------------------- Table 1. Digital Archive Costs In this model, all equipment and software are capitalized over a life of five years, whereupon it is assumed to be obsolete. The media are assumed to be refreshed and the data migrated as the technology changes on the same five year cycle. Hardware and software maintenance costs as well as equipment operations costs are estimated as a proportion of the original purchase price. The model assumes the cost of hardware and software, as well as migration and other operational services all to be declining at a relatively rapid rate of 50% every 5 years. One can reasonably assume that local printers are available to network users of the digital archive, and that many users would not avail themselves of the printing and delivery services posited here. This model is built on the assumption that only 10% of the use of the archive would result in use of the archive's own printing and delivery services. The unit costs represented here are thus 10% of the actual unit costs for on-demand printing and delivery. Moreover, the printing costs include an estimated charge for copyright clearance and are assumed to be stable into the future. Delivery charges are assumed to be labor intensive and to inflate by 4% per year over time. Are the storage and access costs for this first year high or low? How would the costs compare of over time? What is an appropriate standard of comparison? Depository Library Costs To generate a comparison benchmark, Yale imagined that the same 200,000 volumes that it assumed it would acquire each year for the digital archive could just as readily be acquired in paper form. Note that this kind of benchmark is only possible for document-like objects and is simply not an option for many of the materials that originate in digital form and cannot exist in any other form. Yale further imagined, because its libraries are full or nearly so, that all new volumes would be stored, not in a new and expensive full-service library in the center of campus but in a low-cost depository facility from which a highly responsive service would deliver needed items directly to faculty offices and student rooms. Leaving all competitive rivalries aside, Yale granted for purposes of the model that it could not build and run a depository facility at unit costs lower than the published unit prices at the Harvard Depository Library. Yale thus projected its depository costs based on the depository prices charged by Harvard. --------------------------------------------------------------------------- Cost Factors Costs per Volume in Year 1 Total Storage Costs per Volume $0.24 Access Retrieval $2.20 Courier 0.27 Circulation 0.60 Delivery 0.90 Total Access Costs per Volume $3.97 --------------------------------------------------------------------------- Table 2. Depository Library Costs Table 2 contains a summary of the storage and access costs per volume in a depository library for Year 1. Given an average size of document and an average number of documents per linear foot, the projected library storage costs for paper documents are easily calculated from the published price list of the Harvard Depository Library, and presumably include in their base a means of recovering the costs of building construction and maintenance costs. The costs of providing access to the paper-based depository library consist of four components: retrieval from the depository shelf, transfer to the campus service point, circulation of the volume and delivery to the user. Estimates of retrieval and courier service costs are also based on the published price list of the Harvard Depository Library. The estimate of circulation cost is derived from actual circulation costs at Yale. And a cost is assigned for delivery service to the faculty office or student room as a substitute for reader use of a browsable stack in a full service library. All these costs are assumed to rise with inflation at an annual rate of 4%. Today, in Year 1, the differences between the unit costs of the depository library compared to those of the digital archive are striking. The storage costs in this model are more than 10 times higher for a digital archive composed of texts in image form, and the access costs are 67% higher. Skepticism about the purported cost advantages of digital libraries over traditional libraries thus seems well-founded in this model, at least in Year 1. What would happen over the longer term? A Ten-Year Scenario Proponents of digital libraries rest their case, at least in part, on arguments about the rapidly declining rates of technology costs. This highly simplified model highlights that argument and sets up a stark contrast between the digital archive, the costs of which (except for demand printing and delivery) decline at a steady rate, while the costs of the depository library rise by an inflationary rate each year. If these assumptions are set in play over a ten year period, the changes in unit costs are remarkable (see Table 3). --------------------------------------------------------------------------- Year 1 Year 4 Year 7 Year 10 Depository Library Depository Storage Costs Per Volume $0.24 $0.27 $0.30 $0.34 Depository Access Costs Per Volume Used $3.97 $4.46 $5.02 $5.64 Digital Archive Digital Storage Costs Per Volume $2.77 $1.83 $1.21 $0.80 Digital Access Costs $6.65 $4.76 $3.51 $2.70 --------------------------------------------------------------------------- Table 3. Projected Costs Per Volume Over 10 Years The calculation for this table ignores annual carrying costs. Instead, the table presents the costs as if each year were the initial year for such investment. By centering on the operational costs in this way, the table clearly reveals that unit costs of storage for the digital archive fall by about 70% over the period. However, in Year 10 they still remain more than double the unit costs of storage in the depository library. By contrast, unit costs of access in the digital archive fall to half of the unit access costs in the depository library over the period, overtaking them in about Year 5. So formulated, the model seems to confirm a widespread sense of the value of the digital world in providing easy access to digital information. However, if over the next decade storage in the digital archive is managed the same as storage in conventional paper-based libraries -- that is, if the number of volumes stored is the same as the number of volumes acquired -- then the overall cost advantage would still favor the depository library. In Year 10, the cost to store 200,000 new volumes in the digital archive is $160,000; the costs of access to 20%, or 40,000, of the volumes is $108,000; together these yield a total cost of $268,000. By contrast, the cost to store 200,000 new volumes in the depository library in Year 10 is $68,000; the costs of access to 15%, or 30,000, volumes would be $169,200; together these total $237,200. Obstacles and Prospects for Digital Archives With its stark assumptions that seem to favor the digital archive and its surprising results that favor the paper-based library, the cost analysis presented here raises a critical question. If the costs of providing storage and access to texts in digital image form are truly greater than the costs of providing storage and access to the same texts in paper form, are the highly touted advantages of the digital environment merely a chimera? There are at least two answers to this question. One is to challenge the high costs of the digital archive over time by asserting, for example, that the assumed rate of decline in technology-based costs should be steeper. This kind of challenge to the model and its results is risky for two reasons. First, although there may be evidence today that some technology costs are declining at a steeper rate than the overall rate posited in the model, it is difficult to argue from the evidence that the overall rate should be lowered or that such a lowered rate could be sustained over the period. It is difficult because, second, any expectation of declining costs has to be balanced against the equally persistent expectation of rising functionality, which tends to drive technology-based costs up -- or, at least, to slow their decline. Another, and perhaps more fruitful, answer is to think of the organization of digital information storage and access in fundamentally different terms from those which govern the conventional paper library. As we have seen, one of the significant qualities of digital information is that it lives in a networked environment. Given sufficient capacity or bandwidth, adequate security and reliability and wide extension of the network, one can alter a fundamental assumption of the model, namely, that the digital archive, like the paper-based library, best serves its client community by taking physical possession of all the materials it acquires. Instead, one can imagine a distributed storage environment, supported by various contractual arrangements with suppliers. Under these arrangements, a digital archive or other user agent could purchase or license the full 200,000 volumes, but then secure the right, either for a period of time or in perpetuity, to move a digital work from another archive on the network into local storage only when it is needed. And exercising a fail-safe prerogative might comprise one highly specialized definition of need. --------------------------------------------------------------------------- Year 1 Year 4 Year 7 Year 10 New Volumes 200,000 200,000 200,000 200,000 Depository Library (volumes stored=new volumes) Estimated annual use (15% of volumes) 30,000 30,000 30,000 30,000 Depository Storage Costs for New Volumes $48,000 $53,993 $60,735 $68,319 Depository Access Costs for Volumes Used $118,975 $133,830 $150,541 $169,338 Total Depository Storage and Access Costs $166,975 $187,824 $211,276 $237,657 Digital Archive (volumes stored=volumes used) Estimated annual use (20% of volumes) 40,000 40,000 40,000 40,000 Digital Storage Costs for Volumes Used $110,827 $72,980 $48,058 $31,646 Digital Access Costs for Volumes Used $266,120 $190,026 $140,128 $107,506 Total Digital Storage and Access Costs $377,101 $263,527 $188,805 $139,742 Difference: Depository-Digital ($210,126) ($75,703) $22,471 $97,915 --------------------------------------------------------------------------- Table 4. Costs for All Volumes Stored and Used With the prospect of a viable system of distributed networked-based archives, one can thus cast the model of a digital archive in the following way: assume that it stores the volumes acquired only as they are needed. Observe in Table 4, the effects of this changed assumption. Note that in Year 10, because the digital archive is now storing only the volumes used, its storage costs have dropped from $160,000 (as calculated above) to $31,875. Still, the depository library continues to hold an overall cost advantage until Year 7. Storage costs shift to the benefit of the digital archive in Year 6 and access costs follow suit in the next year. Beginning in Year 7 and continuing on into the future, the digital archive, conceived in a digital networked environment, begins to demonstrate its affordability compared to conventional paper-based modes of information storage and access. A different construction of the organization of digital storage and access compared to paper-based storage and access thus leads to a compellingly different construction of the relative economies. Even under highly restrictive assumptions -- the very specialized case of bit mapped images of text and the arguably conservative expectations about the rate of decline in technology-based costs -- the digital archive embedded in a highly distributed network of information resources begins to look economically attractive in a relatively short time. Now, if one begins to relax these restrictive assumptions, then the model of distributed digital archives starts bearing even more economic fruit. Incorporate in the model a faster rate of decline in technology-based costs. Or, rather than bit-mapped, inject into the model a different encoding method, such as compressed TeX, which is much less storage intensive than bit-mapped images. In all these cases, one can expect the costs to fall relative to both the digital and paper-based scenarios initially presented here. Richer and more detailed cost models than the simple analytic model advanced here -- ones that include costs of acquisition, administrative overhead, reference services and so on -- are needed to accurately assess the value and affordability of the digital environment. The Yale model, however, has the distinct advantage of helping to reveal that the key that unlocks the path to the economies of the digital environment is not technological, but organizational. Developing suitable and effective modes of distribution in a networked environment that lead to cost effective archives for preserving digital information is an organizational task requiring much ingenuity and numerous creative partnerships and alliances of various kinds among stakeholders. It is the various dimensions of this organizational effort to which we can look forward as the digital environment matures, but a key question still remains: who will pay? -- Financing A key question in the management of archival costs for operations and migration, of course, is how to balance them with income, either from a sponsoring organization or philanthropy that absorbs the costs or from direct or indirect charges for use. Uncertainty about the answer to this question, as much as any other factor, creates a significant barrier to the coherent, systematic preservation of digital information. Some general actions might help relieve the uncertainty about archival costs. For example, tax incentives and accounting rules that favor the preservation of digital information in archives as investment in long-term capital stock might spur the growth of digital archives. Otherwise, solutions to cost questions are likely to be found in relation to specific bodies of digital materials and the communities that are interested in them. For some kinds of digital information direct charging for use will be entirely acceptable to the relevant user communities. One can imagine making an actuarial calculation of the lifetime cost of preserving a digital information object, finding creators/providers/owners with an economic interest in paying to preserve their information, and constructing an archival service that functions much like a safety deposit system for digital information objects. As facilities are developed and refined to exact charges, conduct transactions for intellectual property and maintain confidentiality, and as experience with such mechanisms grows, some communities of interest that presently resist the notion of charging for information services, such as archiving, may grow less resistant. In any case, more imaginative solutions need to be found by asking hard questions about who benefits from the archived information, when do they benefit and do the answers suggest how the costs of preservation might be afforded. Some instructive examples are beginning to emerge from communities in which the members have asked and tried to answer these hard questions. Consider physics, for example. The direct beneficiaries of archived physics information are physicists and related professionals, who have, as their professional organization, the American Physical Society (APS). The APS already publishes the central corpus of physics information. It keeps a copy of most of what it has published and owns the copyrights to its publications. It has stability and longevity (it was founded earlier than the New York Public Library and considerably earlier than most extant commercial organizations), and a membership keenly aware of the value of its publications and thereby able to help select the most valuable among them. The Society has the technical skill and organizational wherewithal to manage its archive in digital form. It has mechanisms which could finance the archive, including member dues and access charges, and it has recently adopted a digital form of publication for its key journals. Moreover, after careful study, the Society has recently decided to embark on a systematic program that would lead it to build complete digital archives of its publications, past and present, and to maintain them into the future. For what other bodies of digital information are there societies or other interest groups (including commercial publishers) for which the same or similar conditions obtain? What kind of imaginative solutions to the problems of managing the costs of digital preservation have they imagined or could they be provoked to imagine? And how do (or would) these solutions compare to the ambitious program of the APS? SUMMARY AND RECOMMENDATIONS Buckminster Fuller used to tell a story about the Master of one of the Cambridge (England) Colleges, who noticed a deep crack in the massive beam supporting the college's dining hall. Not knowing to whom he should report the problem, the Master eventually notified the Royal Forester. The Forester replied that he had been expecting the call. The Forester's predecessor's predecessor, he said, had planted the tree for the new beam, and it was ready. This, Fuller noted, was how a society ought to work. The Commission on Preservation and Access and the Research Libraries Group (RLG) together have asked this Task Force on the Archiving of Digital Information to report in effect on ways that society should work with respect to the cultural record it is now creating in digital forms. The digital environment is so immature at this stage, and the future so uncertain, that our findings and recommendations are truly modest -- seedlings only. The analysis in this report of the emerging digital environment, and the place of digital archives there, is necessarily preliminary and intended to frame relevant questions for action and further systematic study. Asked to focus on the notion of "technology refreshing," we have found "data migration" to be a richer and more fruitful concept to describe what is necessary to ensure the preservation of digital information. In our deliberations, we have also found it necessary to advance the notion of a system of certified archives designed to preserve the cultural record in digital form and empowered with a fail-safe mechanism to launch aggressive rescue efforts to protect valuable cultural information otherwise in danger of being deliberately or inadvertently destroyed. Our summary recommendations for next steps are focused on three general topics of concern for the Task Force: pilot projects, needed support structures, and best practices. In particular, we recommend that the Commission on Preservation and Access and the Research Libraries Group take the following actions, either separately or together and in concert with other individuals or organizations as appropriate. Pilot Projects 1. Solicit proposals from interested archives around the country and provide coordinating services for selected participants in a cooperative project designed to place information objects from the early digital age into trust for use by future generations. Action is urgently needed to ensure that documents, discourse, software products and other digital information objects that document the early digital age are preserved before they slip irrevocably away. A project designed with this particular focus as a cooperative venture would have the added advantage of providing a testbed for developing a system of linked but distributed archives. Because the objects in this focal area are at such risk of loss, the project would also provide a useful means of exploring the operations of archival fail-safe mechanisms. 2. Secure funding and sponsor an open competition for proposals to advance digital archives, particularly with respect to removing legal and economic barriers. The recent competition sponsored by the National Science Foundation generated an enormous amount of creative thinking about and commitment to the development of digital libraries. A similar approach is needed for digital archives. Because the Task Force has found the primary issues for archives to center on legal and economic issues, the competition might best be focused on fostering creative alliances, especially with publishers, and practical, joint efforts designed to raise the legal and economic barriers to the effective operation of digital archives. 3. Foster practical experiments or demonstration projects in the archival application of technologies and services, such as transaction systems for property rights and authentication mechanisms, which promise to facilitate the preservation of the cultural record in digital form. Only through early and active use will digital archives be able to influence the development of key new technologies and services and help to ensure that they support information longevity. The matrix of information forms, dimensions and stakeholders suggested in this report can provide a useful framework for crafting specific targeted efforts. There are many cells in that matrix which are presently void, but which could be filled and verified (or nullified) through a series of imaginative demonstration projects. Moreover, there is growing need for evidence that digital archives can practically and effectively incorporate in their daily operations automated systems for transacting intellectual property and mechanisms for encryption, document authentication and user authorization Support Structures 4. Coordinate the appropriate organizations and individuals in the development of standards, criteria and mechanisms for identifying and certifying repositories of digital information as archives. If valued digital information is to be preserved for future generations, repositories claiming to serve an archival function must be able to prove that are who they say they are and that they can deliver on their preservation promise. One of the ways to provide such proof is to submit to an independently-administered program for archival certification. The appropriate individuals and organizations need now to design and begin to implement the standards, criteria and mechanisms for such certification.. 5. Engage actively in national policy efforts to design and develop the national information infrastructure to ensure that longevity of information is an explicit goal. The Task Force envisions a highly distributed network of linked digital information archives as the environment in which digital information will flourish over the long-term. Communication and information network policy decisions regarding pricing, security and network extension will greatly affect the viability of these archives and their efforts to preserve digital information. These policy decisions need to be informed with an understanding of the importance and complexity of digital preservation. Development of an infrastructure conducive to preservation might also be aided by consideration of tax incentives and accounting practices that treat the creation of digital information archives favorably as an investment in the nation's long-term capital stock. 6. Sponsor the development of a white paper on the foundations needed in intellectual property law to support the aggressive rescue of endangered digital information through an effective fail-safe mechanism. Limits in time and expertise have prevented the Task Force from giving full and thorough attention to the relation between intellectual property and the notion we have advanced of a "fail-safe" mechanism for digital information. Such attention is urgently needed as an essential contribution to the ongoing development of notions of copyright and fair use in a digital age. 7. Engage representatives of professional societies from a variety of disciplines in a series of forums designed to elicit creative thinking about the means of creating and financing digital archives of specific bodies of information. The American Physical Society has embarked on a creative mission to build a complete digital archive of the published material that it owns. What can other publishers learn from this venture? What other possibilities exist for treating digital preservation not as a problem to be solved globally, but as a problem to be solved by those with clear interests in the long-term availability and use of specific sets of related information objects. Best Practices 8. Commission follow-on case studies to identify current best practices and to benchmark costs in one or more of the following areas: a. Storage of massive quantities of culturally valuable digital information. There is little good experience yet in storing in digital form massive quantities of materials traditionally regarded as culturally valuable, such as books and serials. Organizations have, however, developed large digital archives for other kinds of culturally important information. Examples include the archives of census data, remote sensing satellite imagery, weather data, or commercial data such as insurance or medical records. What can be learned from experience in these areas about the means and costs of ensuring the longevity of digital information? b. Use of metadata for digital preservation of culturally valuable digital information The National Institute of Standards and Technology (1995) recently announced its intent to develop a data standard for "record description records" or metadata. It issued a call for comments in which it described the use of such records as essential for effectively managing the long-term preservation of the digital information generated by the Federal Government. Developments that result from interaction among federal and commercial agencies as a result of this call are likely to have significant wider implications, especially for the constituencies represented by the Commission and RLG, and will need to be closely monitored. How else, and by whom, are metadata being used to manage the long-term preservation of digital information? c. Migration paths for digital preservation of culturally valuable digital information Data migration is a common, if difficult, practice as businesses and other organizations preserve their essential business records through successive changes in hardware and business management software. Cultural archives that have been collecting digital objects have also had to begin migrating them as the hardware and software on which they were created has become obsolete. What is the range of experience of different organizations with archiving different types of content? What can be learned and generalized from these experiences? How do strategies compare over different organizations for archiving similar materials. Are there economies of scale that could be achieved by combining efforts across archives? What are the costs of the different strategies employed? What strategies have failed? In what ways have practices improved over time? Given the analysis in this report, its findings and modest recommendations, we expect that the best use of the work of the Task Force will ultimately be to heighten awareness of the seriousness of the digital preservation problem, its scope and complexity -- and its manageability. There are numerous challenges before us, but also enormous opportunities to contribute to the development of a national infrastructure that positively supports the long-term preservation of digital information. Such an infrastructure is a desirable outcome that will benefit us only if we conceive and structure it to benefit those served by our successors' successors. REFERENCES [Note: This is a partial list of the sources that the Task Force has found useful in composing this draft report. Not all of these are cited in the text. A more complete list of references and full citations will appear in the final draft.] Ackerman, M. S., and R. T. Fielding 1995 "Collection Maintenance in the Digital Library." . Barlow, John Perry 1994 "The economy of ideas." Wired, March, pp. 84-90, 126-129. Bearman, David 1989 Archival Methods. Technical Report, vol. 3, no. 1 (Pittsburgh: Archives and Museum Informatics). Bearman, David, and Margaret Hedstrom 1993 "Reinventing Archives for Electronic Records: Alternative Service Delivery Options." In Margaret Hedstrom, ed. Electronic Records Management Program Strategies, Archives and Museum Informatics Technical Report, No. 18, pp. 82-98. Conway, Paul 1994 "Digitizing Preservation." Library Journal (February 1, 1994): 42-45. 1995 "Selecting Microfilm for Digital Preservation: A Case Study from Project Open Book." Paper presented at the American Library Association Annual Meeting, Chicago, June 26, 1995. Creque, Stuart A. 1995 "Why Johnny Can't Read His Data." Wall Street Journal, June 5: p. A14. Garrett, John R., et. al. 1993 "Toward an Electronic Copyright Management System." Journal of the American Society for Information Science 44(8): 468-473. Getz, Malcolm 1992 "Information Storage." Unpublished manuscript, Vanderbilt University, February 27. Graham, Peter S. 1994 Intellectual Preservation: Electronic Preservation of the Third Kind. Washington, D.C.: Commission on Preservation and Access. 1995 "Requirements for the Digital Research Library." College and Research Libraries, July, 56(4): 331-339. Hedstrom, Margaret 1991 "Understanding Electronic Incunabula: A Framework for Research on Electronic Records." American Archivist 54 (3): 334-354. Lesk, Michael 1990 Image Formats for Preservation and Access: A Report of the Technology Assessment Advisory Committee to the Commission on Preservation and Access. Washington, D.C.: Commission on Preservation and Access. 1992 Preservation of New Technology: A Report of the Technology Assessment Advisory Committee to the Commission on Preservation and Access. Washington, D.C.: Commission on Preservation and Access. Levy, David M. and Catherine C. Marshall 1995 "Going Digital: A Look at Assumptions Underlying Digital Libraries." Communications of the ACM 38(4): 77-83. Lynch, Clifford 1993 Accessibility and Integrity of Networked Information Collections. Office of Technology Assessment, Congress of the United States, July 5, 1993. 1994 "The Integrity of Digital Information: Mechanics and Definitional Issues." Journal of the American Society for Information Science 45(10): 737-744. Moen, Bill 1995 "Metadata for Network Information Discovery and Retrieval." Information Standards Quarterly 7(2): 1-4. Mohlhenrich, Janice, ed. 1993 Preservation of Electronic Formats: Electronic Formats for Preservation. Fort Atkinson, Wis.: Highsmith. National Academy of Public Administration 1989 The Effects of Electronic Recordkeeping on the Historical Record of the U.S. Government: A report for the National Archives and Records Administration. Washington, D.C.: National Academy of Public Administration. National Institute of Standards and Technology 1995 "Intent to Develop a Federal Information Processing Standard (FIPS) for a Data Standard for Record Description Records--Request for Comments." Federal Register, 60 (39), February 28, 1995: 10832-10835. Neavill, Gordon B. 1984 "Electronic Publishing, Libraries, and the Survival of Information." Library Resources & Technical Services, 28 (January): 76-89. O'Toole, J. M. 1989 "On the Idea of Permanence." American Archivist 52(1): 10-25. Rothenberg, Jeff 1995 "Ensuring the Longevity of Digital Documents." Scientific American, (January): 42-47. The University of the State of New York, et al. 1988 A Strategic Plan for Managing and Preserving Electronic Records in New York State Government: Final Report of the Special Media Records Project. Albany: New York State Education Department. Waters, Donald J. 1994 "Transforming Libraries Through Digital Preservation." In Nancy E. Elkington, ed. Digital Imaging Technology for Preservation: Proceedings from an RLG Symposium held March 17 & 18, 1994. Mountain View, CA: Research Libraries Group, pp. Wiederhold, Gio 1995 "Digital Libraries, Value and Productivity." Communications of the ACM 38(4): 85-96 APPENDIX 1 Commission on Preservation and Access and Research Libraries Group Task Force on Archiving of Digital Information Proposed Charge Preamble Continued access indefinitely into the future of records stored in digital electronic form cannot under present circumstances be guaranteed within acceptable limits. Although loss of data associated with deterioration of storage media is an important consideration, the main issue is that software and hardware technology becomes rapidly obsolescent. Storage media becomes obsolete as do devices capable of reading such media; and old formats and standards give way to newer formats and standards. This situation holds both for electronic records derived through conversion from some analog form (paper, film, video, sound etc.), and for records that originated in electronic form. It has been proposed that one solution to this problem is to "refresh" the stored records at regular intervals, that is, to copy the records onto newer media and into newer formats. While this approach is simple in concept, implementation raises a number of issues, most of which are not technological. How, for example, can we guarantee that owners of electronic records will faithfully pursue such a refreshing mandate indefinitely into the future? Does the very nature of this question imply the need to contract such tasks to one or more organizations who can be relied upon to carry the refreshing torch forward? There are also important legal, economic, cultural, and technical questions. Charge The Commission on Preservation and Access and the Research Libraries Group join together in charging a Task Force to: y Frame the key problems (organizational, technological, legal, economic etc.) that need to be resolved for technology refreshing to be considered an acceptable approach to ensuring continuing access to electronic digital records indefinitely into the future. * Define the critical issues that inhibit resolution of each identified problem. * For each issue, recommend actions to remove the issue from the list. * Consider alternatives to technology refreshing. * Make other generic recommendations as appropriate. The Task Force may also wish to envision possible end-states that portray an environment in which technology refreshing is accepted as a routine approach; and scenarios for achieving such end-states. An important goal is to understand what might constitute "best practices" in the area of technology refreshing. The Task Force shall consult broadly among librarians, archivists, curators, technologists, relevant government and private sector organizations, and other interested parties. The Task Force is requested to complete an interim report by May, 1995 or thereabouts that can be circulated widely among interested communities to obtain feedback as input to a final report to be completed summer, 1995. This final report will constitute the key product of the Task Force. APPENDIX 2 Note: Appendix 2 is available only in the Word for Windows 6.0 or Adobe Acrobat versions of this document. See http://www-rlg.stanford.edu/ArchTF for links to these version.