Skip to Main Content

Digital Preservation: DP Overview

This guide is an introduction to fundamental concepts of digital preservation.

Introducing the concept of digital preservation

What is digital preservation?

Digital preservation (DP) refers to the cumulative and continuous actions that are undertaken to ensure that digital objects are preserved in their original state and remain accessible and intelligible to users well into the future. Beyond maintaining the authenticity and integrity of the objects themselves, DP also involves preserving the contextual information that is essential for objects to retain their meaning and significance. More technical aspects, such as managing storage infrastructure and migrating objects from one format to another to avoid digital obsolescence, also come into play.

The discipline of digital preservation is constantly evolving as new technologies, materials, and media come into existence. Likewise, it is important to emphasize that the preservation of digital objects is always an ongoing process and never a finished task. As Trevor Owens stresses in the introduction to his 2018 book The Theory and Craft of Digital Preservation, "Nothing has been preserved, there are only things being preserved". Digital objects must be constantly monitored to ensure that they remain accessible and unaltered.

Major themes in digital preservation

Fixity

Fixity refers to files retaining their original bit-to-bit structure without undergoing any alterations. If the words in a text file are altered, or the structure of the text is changed, the file is changed at the bit level. It is no longer "fixed", since its bit-level integrity has been lost. Ensuring fixity is essential in terms of the trustworthiness of a digital object, especially when that digital object has evidentiary value. Besides focusing on storage and access to digital assets, DP workflows need to protect the fixity of these assets.

Checksums are often used to confirm the fixity of files. A checksum is a type of unique identifier for files, sometimes described as a "file fingerprint". Checksums are generated by applying a set of mathematical operations to a file. This results in a string of characters which constitutes that file's checksum. If the file is changed in any way, it's checksum will be different. This is how checksums help detect alterations to digital objects.  

Fixity is related to the concept of "Mutability", which you will find below.

To learn more about fixity and checksums visit this page of the Digital Preservation Handbook, a product of the Digital Preservation Coalition

Authenticity

The concept of authenticity means that a thing is what it is said to be. For example, if a photograph in an online repository is labeled "UPRM Art Exposition 2021", it is authentic if it in fact shows an image of that specific event held that specific year. Authenticity is especially important for records that have significant evidentiary value.

It is essential to be mindful of the importance of authenticity when designing digital preservation strategies, as this is related to the trustworthiness of digital collections. One way to uphold the authenticity of a digital object is by recording provenance information, such as who uploaded a file to a repository and when it was uploaded, and recording other actions involving the digital object, such as if it is moved from one storage location to another. 

Mutability

An aspect that differentiates electronic information sources from analog ones is their mutability. This refers to the ease with which digital objects can be altered without leaving obvious traces. For example, a person could alter portions of a Word document without leaving behind the physical marks that would result from alterations to a printed page. Also, if adequate access restrictions are not in place, a malicious person could copy, alter, or delete files from a remote location. Similar situations could result from human error or equipment failure.

Due to this characteristic, electronic files must be securely stored and monitored constantly to prevent unwanted alterations and detect these if they happen. Having preservation copies of digital objects is also a good way to protect yourself against this type of problem.

Redundancy

Redundancy refers to having several copies of your digital objects in order to avoid irreparable loss. One approach to redundancy is what is known as the 3-2-1 Rule, which can be briefly explained as follows:

  • Have three copies of your digital content
  • Use at least two different types of storage media for your copies
  • Keep one copy of your content in a remote location

Another strategy used in digital preservation is having preservation copies and access copies of digital objects. Preservation copies are usually high-quality, uncompressed versions of the digital objects that are kept archived. Access copies can be compressed and not as high-quality. These are meant to be accessed by users, so they are frequently in formats that are easier to view and download.

Access

Collections are created and preserved so they can be accessed and used! Always think about who your user community is and what is most important to them. Also think about how your users will engage with the digital objects you are caring for. This will help guide decisions regarding what to collect, how to organize, describe, and present content, and what is most important to preserve for the long term.

In addition to this, be aware that forms of access and use will continue to evolve with time. Stay up-to-date on new digital platforms and ways of engaging with digital content so that you can provide the best possible experience to your user community.

Digital Obsolescence

As time goes by, certain technological equipment and software applications are used less and less until they become obsolete. When this happens, digital objects that are stored in formats or media that depend on these technologies become irretrievable. This is what is known as digital obsolescence. To avoid this type of problem, people in charge of DP must stay aware of trends in the computing world and migrate materials to formats and storage media that help assure future accessibility of the resources.

The importance of metadata

What is metadata?

In very simple terms, metadata is commonly defined as information about information. Any item that conveys meaning, such as a text document, a video, or an image, can be considered an information resource. Metadata describes information resources and provides context so that users have a more accurate idea of what they are looking at.

These are some examples of the type of questions that are answered by metadata:

  • Who is the author of this text document?
  • When and where was the video recorded?
  • What event is represented in this image?

Context and Findability

Metadata enriches an information resource by adding context. For example, situating a photograph in a certain historical place and time can give it much deeper meaning to a person viewing it. Metadata also enhances the findability of resources in digital interfaces by providing terminology which may be used in a keyword search. Researchers are therefore more likely to retrieve the resources most relevant to their interests.

Importance for Preservation

By providing a detailed description of objects in a collection, metadata helps guide the preservation process by shedding light on issues such as:

  • What is particularly meaningful about certain digital objects (This could be its subject matter, the creator of the object, its importance to a particular community, etc.)
  • What digital formats the objects are stored in
  • Who has the rights over certain objects
  • Hardware or software necessary to render the digital objects

This information is helpful in establishing priorities and making more sound decisions related to preservation of digital collections.

Types of metadata

Different types of metadata give us varying information about the digital object. Here are some common types of metadata: 

descriptive metadata - Provides information that helps identify the object and gives users an idea of its contents. For example, what is the title of the resource, who are the creators, what topics are covered in the resource, when the resource was created, etc.

technical metadata - Provides information about the technical characteristics of a digital object, such as its format and any software and hardware that is necessary to render the object.

preservation metadata - Information that is necessary for the preservation of a digital object. This can include technical information (such as software needed to render a file), information regarding fixity checks, and information regarding previous preservation actions (for example, if an object was migrated from one format to another).

rights metadata - Can include information regarding who owns the intellectual property rights over an object, whether the object may be copied and distributed, and whether the object can be altered for the creation of derivative works. 

structural metadata - Describes the relationship between different components of a digital object and how they should come together to render the object. For example, structural metadata can describe how the different pages and image files that make up a website are organized to properly render the site's content on your screen.