Bags and Bagging - MetaArchive/public-documentation GitHub Wiki

Terms

Bags are based on the concept of "bag it and tag it," where a digital collection is packed into a directory (the bag) along with a machine-readable manifest file (the tag) that lists the contents. Bags have a sparse structure that envelopes any institutional data architecture and format. It can hold documents, pictures, music, movies and even other folders. Anything digital can fit into a bag.

A bag is like a folder or directory on a computer. It is essentially composed of three elements:

  1. A bag declaration text file, which is like a seal of authenticity;
  2. a text-file manifest listing the files in the collection; and
  3. a subdirectory – usually titled “data” – filled with the digital content.

The manifest is machine-readable for automated data ingest. The receiving computer analyzes the manifest and runs checksums on the contents; if the checksums match, the transfer is successful.

Bagging is the act of creating a bag, and several tools are available to automate the process.  MetaArchive does not mandate a particular tool for bagging content, but member institutions have used:

This answers the AirTable Question: "What bagging tools can be used? What are the pros and cons of each tool?"

bag-info.txt

A bag can also contain an optional text file, titled "bag-info.txt," that contains a small amount of administrative metadata, such as contact information for the collection owner and a brief description of the collection. Users can include much more metadata about the collection, but the Library recommends storing it in the "data" directory with the rest of the collection in order to keep the bag root directory uncluttered. Users can note in the "bag-info.txt" file that additional metadata exists and resides in the "data" directory. - Library of Congress Digital Content Transfer Tools site.

Most BagIt tools have ways of setting metadata values themselves (preventing you from needing to edit this file directly).

Below is a listing of bag metadata fields (required/recommended/optional) we would like you to use for your MetaArchive ingest, as well as some instructions on how to appropriately fill out their values. This information can also be found beginning on page 2 of the MetaArchive BagIt Usage Instructions (pdf).

Field Name Priority Description Help
Source-Organization Required The organization where the bag was made. (No abbreviations) Institution name
Organization-Address Recommended The address of the Source-Organization. Insert proper mailing address here
Contact-Name Required The name of the person responsible for the bag. Insert primary collection contact person
Contact-Phone Required Phone number of the person from Contact-Name. Insert primary collection person’s contact phone
Contact-Email Required Email address of the person in Contact-Name. Insert primary collection person’s contact email
External-Description Required A thorough description of the bag’s contents for those outside of your organization. Can be a truncated summary of Dublin Core collection descriptive metadata or a list of included collections if more than one collection is included in the “bag”
Bagging-Date Required The date the bag was created on. Insert Bag creation date
External-Identifier Recommended A sender-supplied identifier for the bag. control number, unique identifier, or collection folder filename repurposed for naming this bag
Bag-Size Required The size of the bag. This is usually set for you by the bagging tool. Set for you by the BagIt utility
Payload-Oxum Required This is usually set for you by the bagging tool. Set for you by the BagIt utility
Bag-Group-Identifier Optional A unique name given to the “bag group” if this bag is part of a bag group (more than 1 bag). This identifier must be unique across the sender's content, and if recognizable as belonging to a globally unique scheme, the receiver should make an effort to honor reference to it.
Bag-Count Optional This bag’s sequence number, if part of a “bag group”. Ex: 1 of 2
Internal-Sender-Identifier Optional The ID assigned to this content internally to your institution, if any. Insert local identifier for the bagged collection if applicable
Internal-Sender-Description Optional A sender-local prose description of the contents of the bag. e.g., Dublin Core collection descriptive metadata

This answers the AirTable Question: "What are some examples of the external and internal identifiers and descriptions that other MA members use when creating bag metadata?"

Completing bag-info.txt with Bagger

A complete version of the recommended MetaArchive profile above is available from Educopia here (https://docs.google.com/file/d/0B1gETO3iL-OsejhOZlRtV3c1OGM/edit?usp=sharing) [NEED TO UPDATE THIS LINK - SHOULD IT BE TO THE BU-JSON FILE?]. If you are using Bagger you can save this JSON file in the appropriate directory depending on your platform.

The steps for loading this profile with Bagger can be found on pages 4 & 5 of the MetaArchive BagIt Usage Instructions (pdf).

⚠️ **GitHub.com Fallback** ⚠️