Synopses & Reviews
Key to our culture is that we can disseminate information, and then maintain and access it over time. While we are rapidly advancing from vulnerable physical solutions to superior, digital media, preserving and using data over the long term involves complicated research challenges and organization efforts. Uwe Borghoff and his coauthors address the problem of storing, reading, and using digital data for periods longer than 50 years. They briefly describe several markup and document description languages like TIFF, PDF, HTML, and XML, explain the most important techniques such as migration and emulation, and present the OAIS (Open Archival Information System) Reference Model. To complement this background information on the technology issues the authors present the most relevant international preservation projects, such as the Dublin Core Metadata Initiative, and experiences from sample projects run by the Cornell University Library and the National Library of the Netherlands. A rated survey list of available systems and tools completes the book. With this broad overview, the authors address librarians who preserve our digital heritage, computer scientists who develop technologies that access data, and information managers engaged with the social and methodological requirements of long-term information access.
Human culture depends on our ability to disseminate information, and then maintain and access it over time. This book addresses the problems of storing, reading, and using digital data for periods longer than 50 years. They offer concise descriptions of markup and document description languages like TIFF, PDF, HTML, and XML, explain important techniques such as migration and emulation, and present the OAIS (Open Archival Information System) Reference Model.
About the Author
Uwe Borghoff is a full professor of computer science at the University of the Armed Forces (UniBwM), Munich, Germany. Prior to this, he worked at Xerox Research Centre Europe in Grenoble, France, where he led the coordination technologies group. Peter Rödig (UniBwM) is developing methods for long-term preservation of digital data. Related research interests include document engineering and database technologies. Jan Scheffczyk (UniBwM) is researching issues such as consistency maintenance in document engineering, long-term preservation of digital data, and software engineering. Lothar Schmitz is a lecturer at the UniBwM, and his research interests include software engineering and long-term preservation of digital data.
Table of Contents
Table of Contents (new sections and chapters w.r.t. the German version of this book are printed in bold face) Part I: Approaches to Long-Term Preservation (approx. 155 pages) 1 Long-term Preservation of Digital Documents (approx. 23 pages) 1.1 Blessing and Curse of Digital Documents 1.2 Challenges, Terms, Concepts 1.3 Preserving Byte Streams 1.4 Technical Approaches to Long-term Preservation 1.5 Legal and Social Issues 2 The OAIS Reference Model and the DSEP Process Model (approx. 13 pages) 2.1 The OAIS Reference Model 2.1.1 Background Information 2.1.2 Information Model 2.1.3 Process Model 2.2 The DSEP Process Model for Libraries 3 Migration (approx. 23 pages) 3.1 Migration: Notions and Goals 3.2 Migration as a Means for Long-term Preservation 3.2.1 Data Formats as Migration Targets 3.2.2 Migration via Changing Media 3.2.3 Migration via Changing Logical Structure 3.3 Preservation Processes in Migration Approaches 3.4 Migration: Pros and Cons 4 Emulation (approx. 27 pages) 4.1 Emulation: Notions and Goals 4.2 Emulation as a Means for Long-term Preservation 4.2.1 What exactly means Emulation? 4.2.2 Variants of Emulation 4.2.3 Exploiting Virtual Machines 4.3 Preservation Processes in Emulation Approaches 4.4 Emulation: Pros and Cons 5 Document Markup (approx. 25 pages) 5.1 An Example 5.2 Different Forms of Markup 5.2.1 Procedural, Structural, Semantic Markup 5.2.2 Embedded Markup Considered Harmful 5.2.3 Levels of Markup 5.3 Exploiting Markup for Long-term Preservation 5.3.1 Requirements for Long-term Preservation 5.3.2 Bibliographic Requirements 5.4 Persistence is a Virtue 5.4.1 Uniform Resource Identifier, -Name, -Locator 5.4.2 Referencing Documents 5.4.3 Handles and Digital Object Identifiers 5.4.4 Summary 6 Standard Markup Languages (approx. 26 pages) 6.1 Standards for Syntactic Document Markup 6.1.1 Tagged Image File Format (TIFF) 6.1.2 Portable Document Format (PDF) 6.1.3 HyperText Markup Language (HTML) 6.1.4 eXtensible Markup Language (XML) 6.2 Standards for Semantic Document Markup 6.2.1 Resource Description Framework (RDF) 6.2.2 Topic Maps 6.2.3 Ontologies 6.3 Vision: The Semantic Web 7 Discussion (approx. 11 pages) 7.1 Why do You Need to Act NOW? 7.2 What do We Know already, What Remains to be Done? 7.3 Facing Reality 7.4 A Combined Approach Part II: Recent Preservation Initiatives (approx. 157 pages) (Projects are subject to change) 8 Markup: Current Research and Development (approx. 50 pages) 8.1 The Dublin Core Metadata Initiative (DCMI) 8.2 The Metadata Encoding and Transmission Standard (METS) 8.3 The Victorian Electronic Records Strategy (VERS) 8.3 The Text Encoding Initiative (TEI) 8.4 The Research Libraries Group (RLG) 8.5 The Pandora Project 9 Migration: Current Research and Development (approx. 50 pages) 9.1 Migration in the VERS Project 9.2 Preserving the Whole 9.3 Risk Management of Digital Information 9.4 Database Migration 9.4.1 Motivation 9.4.2 Overview of the Architecture 9.4.3 Detaching Digital Objects from Physical Media 9.4.4 Services of Database Management Systems 9.4.5 Experiments 9.4.6 Discussion 10 Emulation: Current Research and Development (approx. 16 pages) 10.1 Emulation Experiments by Rothenberg 10.2 Universal Virtual Computer (UVC) 11 Digital Archiving Systems for Long-Term Preservation (approx. 40 pages) 11.1 Assessment Methodology 11.2 Market Survey 11.2.1 EPrints (University of Southampton) 11.2.2 DSpace (MIT) 11.2.3 DIAS (IBM/National Library of the Netherlands) 11.2.4 Fedora (Cornell University, The University of Virginia) List of Figures List of Tables References Index