Locations of digital assets and metadata - smith-special-collections/sc-documentation GitHub Wiki

Metadata

  • While physical computer media is currently described in the container lists, the contents of the media is not. Currently, description of this backlog digital content lives in processing logs and file lists stored with the files on the server. These files are being copied to a Google drive folder so they can be linked to from finding aids.

  • Some descriptive information can also be found in the project master spreadsheet.

Digital objects

  • All born-digital assets (files) are stored on the libraries networked storage servers (LibPres, or LibNas for older files). These assets are considered the preservation copies. The Digital Preservation Archivist can provide staff with information on how to access these storage servers. Access copies are usually made on demand for online access or delivery to researchers.

  • Each accession, or group of files, is stored within an Archival Information Package (AIP) which also includes the metadata and documentation associated with that accession.

Interpreting the Archival Information Package (AIP)

The born-digital content being described in this project was all copied from backlog computer media found in the collections following the same basic workflow that’s used for new accessions:

  • Depending on the contents and media format, a forensic disk image may be created to ensure we have an true copy of the original; or a “logical copy” is made using digital preservation software that copies the files off the media while maintaining the original directory/file structure and integrity of the files, and captures technical metadata (such as date created and file types).

  • When multiple media items are copied in a collection, a processing log is created which includes a list of the item labels (if any), a unique object identifier, preservation activities, and the results (if the item was copied successfully, or not).

  • If a collection or accession consists of only 1-2 items then there isn't a processing log but there may be a text or saved Trello card (pdf) which serves as the log.

  • The original media is photographed to capture label information and any inserts.

  • A directory/file listing is created automatically during disk imaging or file transfer.

  • If a disk image is made, then the files are extracted from the disk image. The disk image is saved as a backup.

  • The disk images and files are “packaged” together with file lists, logs, and other documentation to form the Archival Information Package (AIP). Thus, one AIP may hold the files from one or more computer media items.

  • A unique identifier is assigned to the AIP, according to Special Collections file-naming protocols.

  • The AIP is backed up to Special Collections’ secure networked storage server (“LibPres”).

The AIP contains three sub-folders called:

  • Objects (or Data): the digital files copied from the media, which may consist of: * a hierarchy of folders and files * disk images * extracted files from disk images

  • Metadata - may contain: * a directory/file list for each media item in the form of a spreadsheet (csv or xls) * one combined directory/file list for all media items in the AIP (csv) * xml version of the metadata

  • Documentation - may contain: * processing log file * photos of media and inserts (jpg) * checksum lists (.md5 or .txt) * disk imaging logs (txt)

SEE examples of Archival Information Packages

  • 52.48 Clark Science Center Office: "Harriman Expedition": includes extracted files from one compact disc preserved in their original directory structure.
  • Dyke TV Records still images: contains an unprocessed disc image (ISO file) for each compact disc; plus the extracted files in a separate folder for each disc; file list for each disc; and a log file.
    1. Presidents Jill Ker Conway Papers: “A Woman’s Education”: includes disk images of 16 floppy disks (E01 files); the extracted files, file lists, and a log file.

Where to find metadata for born digital content:

1) Project Spreadsheet

  • For the backlog media project, information was entered in this spreadsheet.

  • Each row in the spreadsheet corresponds to a digital object (AIP) on the server, identified by the unique identifier. And each row represents a single media type (CDs, Floppy disks, flash drives, hard drives, etc.) within a single accession.

  • However, when the media was copied, similar types of media were grouped together and the recovered digital files were combined in one AIP. For example, floppy disks with zip disks; and CDs with DVDs. Therefore, there may be two or more rows in the spreadsheet that correspond to the same AIP, with the same identifier. These should be combined in the description.

For example, here are 4 entries from the spreadsheet for the Joann Aalfs papers, showing items from two different accessions. Note that the floppy disks and zip disks have the same identifier and are therefore found in the same AIP.:

Collection Accession Identifier Original format Box No. of items
Aalfs, Joann 16S-79 smith_ssc_479_16S-079_box012_cds Compact disc Box 12 8
Aalfs, Joann 16S-79 smith_ssc_479_16s-79_disks Floppy disk, 3.5" Boxes 12 & 14 83
Aalfs, Joann 16S-79 smith_ssc_479_16s-79_disks Zip disk Box 12 8
Aalfs, Joann 17S-12 smith_ssc_479_2017s-012_hd01 Hard drive Box 1 2

Note: Some box numbers may have changed since this project was completed, if a collection was processed).

2) Digital file lists

  • Electronically produced file lists are created when the contents is copied from the media, and are saved as a spreadsheet or csv file.

  • They typically include directory/folder/filenames, creation and last modified dates, and (sometimes) file formats. File lists are stored in the “Metadata” subfolder.

  • File lists for floppy disks may display system files, unallocated (empty) space, and sometimes deleted content. That type of content can be ignored for description purposes; the actual content is usually found in the “root” folder in the file list. For example, in this file list for a Jill Ker Conway floppy disk, there is only one valid file: AWEINT.doc

  • In some cases a new file list was created which combines all of the media items in the AIP. This should be the version attached to the finding aid. For this project they have been copied to Google drive.

3) Documentation: The following can be found in the “Documentation” subfolder of the AIP:

  • The log file: a spreadsheet created by staff to record and track the media items being copied. Includes label information (if any), the unique object identifier, and preservation results (if the item was copied successfully, or not).

  • Although sometimes a detailed title/description is given in the project spreadsheet, most content is described very generally at the aggregate level, and you will need to refer to the Log and other documentation for item titles.

  • Logs may also list to media that could not be read or copied, and disks with “bad sectors.” The latter means the disk could not be copied completely but there may still be readable files. These issues should be recorded in processing information note (see Data-entry instructions).

  • Photographs of computer media: these can be useful if label information in the log file is truncated, or when there is no log file.

4) Finding aid and container lists

  • Review the front-matter of the collection finding aid which will provide context, to help interpret the information found in the file lists and logs.

CONTENTS