Skip to Main Content

Digital Preservation: Ingest and Storage

This guide is an introduction to fundamental concepts of digital preservation.

Ingest: Bringing material into your collections

Ingest refers to the process of bringing materials into a repository for storage. Ideally, there should be established procedures for ingest. This will help keep the process organized and keep track of the material that is added to the repository. Below are some points to consider when designing procedures for ingest.

Talk to providers of content 

It is important to establish good communication between the administrators of a repository and the people submitting materials to the collection. This will help make the ingest process more smooth and transparent. Repository administrators should reach out to content providers. They should become familiar with the kind of content that will be submitted and the expectations of providers regarding things like preservation and dissemination of the digital objects. Good communication allows repository administrators to provide proper guidance to providers about how to prepare materials for submission. All in all, providers gain a better understanding of how the repository's stewardship model works, and repository administrators obtain insights that allow them to better manage the material and serve their user community.

Establish parameters for the ingest process 

These are some points you may want to cover in your parameters:

Descriptive metadata - It is very helpful if providers of content assign descriptive metadata to the material they are submitting. This will lead to better findability and access once the digital objects are in the repository. A good practice is to have providers fill out a metadata form when they are submitting content. Repository administrators should provide guidance to providers on how to properly fill out the different metadata fields. This can be accomplished by creating a metadata manual. Administrators should try to obtain enough metadata at the point of submission to create thorough descriptive records of the materials.

Digital formatsIt is also useful to establish parameters regarding digital formats. Formats that are broadly used and non-proprietary are recommended for purposes of digital preservation. Work with content providers so that you are receiving material in digital formats that facilitate digital preservation work. For information on this topic provided by the Smithsonian Institution Archives, click here.

File names - Using a file naming convention (an established format for file names in your collection) can help by providing consistency and clarity. Ideally, file names should give some general information about the contents of the file while also being relatively short. For example, a file naming format for theses in an online repository could include the student's department initials, the student's last name and initials, and the year the thesis was approved. Two theses following this file naming format could look like this:

  • BIOL_GarciaColonP_2021
  • HIST_MendezJC_2019   

As you can see, these documents follow a consistent file naming pattern that provides general information about the file. The consistent format makes it easier to identify specific files included in the collection.

For more information about file naming conventions, see this video from University of Wisconsin Data Services.

Generate checksums for deposited materials

It is good practice to generate a checksum for each digital file at the moment of ingest into the repository. This is useful for monitoring fixity, as the checksum can be used later to check if the file has been altered in any way. Many repositories generate a checksum for each file at the moment of deposit.   

Document and assess the ingest process

It is important to maintain records of submissions and deposits to your collections and develop inventories detailing collection contents. Scholar@UPRM, like many other digital repositories, documents the moment of deposit of each digital object.

As should be done with any other process, take time to assess your ingest procedures and consider how they can be improved. For example, are you collecting sufficient metadata during the ingest process? Is the process being recorded properly? Do submitters understand the submission process well? Do they find it too complicated or tedious? Are they submitting the correct file formats?

Communicate with submitters and obtain their feedback as you work to improve your ingest process.

Storage: Keeping your material safe

Here are some key points to keep in mind regarding the storage of digital material meant for long-term preservation.

Stability and redundancy

A stable storage infrastructure is essential for the preservation of digital materials. Another key factor is redundancy, as multiple copies of the material provide insurance against irretrievable loss. A commonly cited approach to redundancy is what is known as the 3-2-1 Rule, which can be briefly explained as follows:

  • Have at least three full copies of your digital content. The idea is to have robust protection against information loss without having an unmanageable number of copies. 
  • Use at least two different types of storage media for your copies. Each storage media has its strengths and weaknesses. By using different media types, you can offset the weaknesses.    
  • Keep one copy of your content in a remote location. This offers protection against events like a fire or a natural disaster.

Creation of back-ups

The regular creation of back-up copies of a repository's content is one of the most fundamental digital preservation actions. Back-ups are ideally generated frequently using an automated procedure.

Metadata records

All items in a repository should be accompanied by their metadata record. Besides adding context to the item by providing descriptive information, the metadata record supplies additional information that can be crucial for preservation of the digital object. This can include provenance metadata, information about the item's relation to other items, fixity information, and technical specifications covering an object's digital format and what is necessary to render the object. 

Unique identifiers

Assigning a unique identifier to each digital item in a repository is a way to differentiate that item from all others. Scholar@UPRM automatically assigns a handle, which is a type of unique identifier, to each archived item. File names (ideally assigned following an established naming convention) or accession numbers can also function as unique identifiers within a repository environment. 

Fixity checks

In an ideal scenario, the fixity of stored digital objects is checked periodically using an automated procedure. This is done to detect if the bit-level integrity of any object has been altered. If alterations are found, they can be corrected as quickly as possible. Conducting regular fixity checks increases the trustworthiness of a repository, as there is a stronger assurance of the integrity of its content. Fixity checks are especially important when digital material is transferred from one storage location to another or from one format to another.

Preservation copies

Some digital objects may have huge file sizes, making it hard for end users to view or download them. To deal with this issue, repositories can create preservation copies and access copies of digital objects. The preservation copies are the original high-resolution, uncompressed versions of the digital objects. These are kept in the back end of the digital repository for preservation purposes, as their name implies. Access copies, which may be compressed and have a lower resolution, are the ones made available for users. Thus, the creation of separate preservation and access copies is a way to facilitate user access to collections while also protecting the bit integrity of the digital objects that the repository is preserving.

Control who has access to stored material

Part of protecting your digital objects is controlling who has access to them. Determine who should have access and what roles and privileges each person should have. Implement passwords and appropriate security measures. Also, keep a record of actions related to your storage environment (ingest of a digital object, moving an item from one collection to another, deletion of items, etc.) and who carries them out.