Reference - pulibrary/BlueMountain GitHub Wiki
This is the reference document for the Blue Mountain Project.
Blue Mountain is a database of magazines: that is, it is a system that links information together. In order to link information about specific objects – magazine titles, magazine issues, etc. – Blue Mountain must assign a unique identifier to each one. (This is a well-known feature of information science.) Doing so allows us – programs and people – to refer to these things unambiguously.
Blue Mountain adopts the Universal Resource Name conventions (URN) to compose unique identifiers for titles and issues, as well as for the metadata records (METS and MODS) used to encode information about them.
Blue Mountain assigns a URI to each magazine, magazine issue, METS record, and MODS record it maintains. We have developed a convention for composing these URIs. Formally, the convention can be expressed like this:
<BMTNPREFIX> ::= "urn:PUL:bluemountain" <BMTNID> ::= "bmtn" <a-z><a-z><a-z> <DATESTRING> ::= CCYY-MM-DD | CCYY-MM | CCYY <ISSUEINDEX> ::= <0-9><0-9> <ISSUANCE> ::= <DATESTRING> "_" <ISSUEINDEX> <ISSUEID> ::= <BMTNID> "_" <ISSUANCE> <TITLEURI> ::= <BMTNPREFIX> ":" <BMTNID> <ISSUEURI> ::= <BMTNPREFIX> ":" <ISSUEID> <TITLEMETSURI> ::= <BMTNPREFIX> ":td:" <BMTNID> <ISSUEMETSURI> ::= <BMTNPREFIX> ":td:" <ISSUEID> <TITLEMODSURI> ::= <BMTNPREFIX> ":dmd:" <BMTNID> <ISSUEMODSURI> ::= <BMTNPREFIX> ":dmd:" <ISSUEID>
Let’s take a more discursive look at this syntax.
The Blue Mountain Prefix is a fixed string that follows the functional requirements specified in RFC 1737. It comprises the fixed string “urn” and a namespace identifier (NID), “PUL”2. Within the PUL namespace, the string bluemountain is intended to signify resources produced by and for the Blue Mountain project.
Blue Mountain assigns a sequential blue mountain identifier, or bmtnid to each title. The bmtnid will take the form bmtnNNN, where NNN is a hexavigesimal number (e.g., aaa, aab, aac, etc.).1 Blue Mountain will maintain bmtnids in a project registry, a file maintained with other administrative files.
Blue Mountain represents two conceptual objects:
- Titles
- A journal or magazine taken as a whole (e.g., The Signature was a magazine published by D. H. Lawrence and John Middleton Murray).
- Issues
- The periodic output under a Title (e.g., the first issue of The Signature appeared on October 4th, 1915).
Blue Mountain does not have an explicit conception of Volume, an aggregation of issues. Sometimes volumes were explicitly compiled by the publisher; sometimes they were created by collectors – libraries or individual subscribers. We will address the question of volumes at a later stage.
Blue Mountain represents titles and issues with two kinds of metadata: descriptive metadata about the entity (its title, place and date of publication, and so forth) and technical metadata about the digital files that comprise its representation in Blue Mountain.
Blue Mountain has adopted the MODS framework to encode descriptive metadata and the METS framework to encode technical metadata.
There are, then, six distinct kinds of object in Blue Mountain:
- Titles
- Issues of Titles
- The descriptive metadata for a title
- The descriptive metadata for an issue
- The technical metadata for a title
- The technical metadata for an issue
The URI of an object can be used to indicate what type of object it is. A title is represented by its bmtnid and an issue by its issueid; The descriptive metadata for a title or issue uses the same bmtnid or issueid but inserts the token :dmd: between the Blue Mountain Prefix and the id; similarly, the URI for technical metadata contains the token :td: between the prefix and the id.
The concept of issuance is critical to the ontology of periodicals. Blue Mountain’s formalization of issuance continues to evolve, but at this stage we have adopted the formalism developed by NDNP for describing newspaper issues.
In Blue Mountain, each magazine issue is assigned an issue identifier or issueid of the form
bmtnid_issuanceString
where issuanceString corresponds to the date of issuance and takes the form CCYY-MM-DD_II, defined as follows:
- CCYY
- A four-digit number representing the year of publication (e.g., 1912)
- MM
- A two-digit number representing the month of publication, where January = 01, Feburary = 02, etc.
- DD
- A two-digit number representing the day of publication (e.g., 01, 02, .. 30, 31).
- II
- A two-digit index of daily issuance (e.g., the first issue of the day is 01, the second is 02, and so on). This convention is adopted from the issuance of newspapers, which not infrequently issued a morning edition and an evening edition on the same day. Blue Mountain’s adoption allows us to distinguish among magazine texts that were published on the same day: a regular issue and a supplement, for example, and also among editions that bear the same date of issuance but were printed and published in different locations, sometimes with different content.
Issueids, like ISO 8601 dates, are organized from most significant to least significant chronological unit (i.e., from year to day). This format has two advantages: it allows ids to be sorted naturally, and it enables variable precision: the representation of daily, monthly, or yearly issuance.
- If you know the year, month, and day of publication (e.g., you know that the issue was published on January 5th, 1912) :: then the issuance string os 1912-01-05_01.
- If you have two issues published on the same day (e.g., an issue and a special supplement were both issued on January 5th, 1912) :: then the issuance strings are 1912-01-05_01 and 1912-01-05_02.
- If you know only the year and month of publication (e.g., you know it was published in January, 1912, but you do not know on what day: 1912-01_01.
- If you know that the magazine published two issues monthly, but you do not know the dates of publication (e.g., you have two issues published in January, 1912): The issuance strings are 1912-01_01 and 1912-01_02.
- If you know only the year of publication (e.g., you know that the issue was published in 1912, but you do not know the month or, therefore, the day of publication): The issuance string is 1912_01.
- If you have several issues published in the same year, but you know neither the month nor the day of publication (e.g., you know the journal published two issues in 1912, but you do not know the months or days of publication): The issuance strings are 1912_01 and 1912_02.
The journal le coeur à barbe has the Blue Mountain identifier bmtnaad. Only one issue was published, in April of 1920.
titleid | urn:PUL:bluemountain:bmtnaad |
title METS id | urn:PUL:bluemountain:td:bmtnaad |
title MODS id | urn:PUL:bluemountain:dmd:bmtnaad |
issueid | urn:PUL:bluemountain:bmtnaad_1920-04_01 |
issue METS id | urn:PUL:bluemountain:td:bmtnaad_1920-04_01 |
issue MODS id | urn:PUL:bluemountain:dmd:bmtnaad_1920-04_01 |
---|
Names of Blue Mountain files will be constructed using Blue Mountain IDs with the following extensions.
<EXTENSION> ::= "tif" | "jp2" <IMGINDEX> ::= <0-9><0-9><0-9> <FILENAME> ::= <ISSUEID> "_" <IMGINDEX> "." <EXTENSION>
Image files shall be named issueid_nnn.jp2 or issueid_nnn.tif, where
- issuid is the identifier of the issue;
- nnn is a three-digit number indicating the location of the image file in the sequence of image files (not necessarily the number printed on the page that has been photographed);
- jp2 is the conventional file extension for JPEG2000 files.
- tif is the conventional file extension for TIFF files.
For example,
bmtnaad_1925-06-03_01_001.jp2 bmtnaad_1925-06-03_01_002.jp2 ...
<EXTENSION> ::= "alto.xml" <IMGINDEX> ::= <0-9><0-9><0-9> <FILENAME> ::= <ISSUEID> "_" <IMGINDEX> "." <EXTENSION>
ALTO files shall be named issueid_nnn.alto.xml, where
- issuid is the identifier of the issue
- nnn is a three-digit number corresponding to the sequence number of the image file to which this ALTO file corresponds
- alto indicates the schema used to encode the document
- xml indicates the format of the file.
For example,
bmtnaad_1925-06-03_01_001.alto.xml bmtnaad_1925-06-03_01_002.alto.xml ...
<EXTENSION> ::= "mets.xml" <FILENAME> ::= <ISSUEID> "." <EXTENSION>
METS files shall be named issueid.mets.xml, where
- issueid is the identifier of the issue
- mets indicates the schema used to encode the document
- xml indicates the format of the file.
For example,
bmtnaad_1925-06-03_01.mets.xml
<EXTENSION> ::= "pdf" <FILENAME> ::= <ISSUEID> "." <EXTENSION>
PDF files shall be named issueid.pdf, where
- issueid is the identifier of the issue
- pdf indicates the format of the file.
For example,
bmtnaad_1925-06-03_01.pdf
Blue Mountain is a database of digital objects: groupings of machine-readable files that together constitute a representation.
A journal object will comprise the following elements:
- title-level descriptive metadata
- A detailed, machine-readable description of the periodical as a whole. Encoded in MODS for compatibility with library systems, but translatable into other formats (e.g., TEI).
- title-level bibliography
- An article-level prose description. (bmtnid.tei.xml)
- title-level metadata wrapper
- Pulls together the title-level metadata, the bibliography, and the issue-level metadata.
- issues
- one or more Issue Objects.
Representations of periodical issues. Issue objects comprise the following:
- preservation-quality images
- high-quality TIFF files (‘master TIFFs’), produced according to local best practices and in conformance with the FADGI standards (http://www.digitizationguidelines.gov/guidelines/digitize-technical.html).
- generative image derivatives
- more manageable forms of the master TIFFs, meant to serve as the source for online deliverables, etc. Encoded in the JPEG2000 format, according to specifications described below.
- delivery derivatives
- images optimized for delivery over the World Wide Web.
- issue-level descriptive metadata
- a MODS document (see below).
- text encodings
- Initially these will be in the form of corrected OCR for each page, encoded in the ALTO schema (output by ABBYY via docWORKS). Future encodings will likely include TEI representations, derived from the ALTO documents, for detailed textual analysis.
- deliverable text-under-image PDF
- another ABBYY output format.
- issue-level metadata wrapper
- a METS document. The METS half of METS/ALTO, the structMap of this document links constituent-level items to the regions identified in the ALTO documents, and to the page image. (See below for detailed specification.)
The components of the journal object have different storage and access requirements. Master TIFF files are very large binary files that will seldom be accessed but must be carefully preserved (they are expensive or impossible to replace). Image derivatives, too, are large binary files, but they can be regenerated from the master TIFFs and therefore require less care, but they will be accessed from a variety of sources (primarily the web). PDF files are hybrids: they are large binary files, composites of image derivatives and OCR output that cannot easily be recreated and so must be preserved more carefully than image derivatives while still being accessible. Metadata files are relatively small but very expensive to replace, and so must be curated carefully. They are also liable to updating, so version tracking is important.
The Blue Mountain Project will manage these assets separately. The non-binary data and metadata will be stored and managed in a distributed version control system (DVCS), which will enable change management, collaborative development among PUL and its METS/ALTO vendor, and resource sharing, as stipulated in the grant.
- Master TIFF files and text-under-image PDFs will be maintained in a preservation store;
- Image derivatives, and delivery-optimized copies of the PDFs, will be kept in an access store.
Metadata will be organized as a hierarchy of files and directories, like this:
- metadata/ - periodicals/ - bmtnID/ - bmtnID.mets.xml - bmtnID.mods.xml - bmtnID.tei.xml - issues
The issues/ directory will be organized by publication date, following the same convention as that used for constructing identifiers. So, for example,
- bmtnabi/ - issues/ - 1859/ - 01/ - 05_01/ - bmtnid_issueid.mets.xml - bmtnid_issueid.mods.xml - bmtnid_issueid.tei.xml - alto/ - bmtnid_issueid-001.alto.xml - bmtnid_issueid-002.alto.xml
The Preservation Store will be arranged as a filesystem mirroring the structure of the metadata tree and rooted at /usr/share/BlueMountain/pstore/periodicals.
- pstore/ - periodicals/ - bmtnid/ - issues/ - CCYY/ - MM/ - DD_II/ - bmtnid_issueid.pdf - bmtnid_issueid_001.tif - bmtnid_issueid_002.tif
Like the Preservation Store, the Access store will be arranged as a filesystem mirroring the structure of the metadata tree; it will be rooted at /usr/share/BlueMountain/astore/periodicals.
- astore/ - periodicals/ - bmtnid/ - issues/ - CCYY/ - MM/ - DD_II/ - bmtnid_issueid.pdf - generative/ - bmtnid_issueid_001.jp2 - bmtnid_issueid_002.jp2 - bmtnid_issueid_003.jp2 - delivery/ - bmtnid_issueid_001.jp2 - bmtnid_issueid_002.jp2 - bmtnid_issueid_003.jp2
There are two kinds of METS records in Blue Mountain:
- Title-Level METS – A METS document encapsulating information about the magazine title as a whole.
- Issue-Level METS – A METS document encapsulating information about an individual issue of a magazine.
These are described in greater detail below.
( Greater detail to come. )
The metadata for the title will be encapsulated in a title-level METS record: the title-level descriptive metadata (either as an embedded MODS record or pointed to), a pointer to the bibliographic history, and (possibly) pointers to issue-level metadata.
The metadata for each issue shall be encapsulated in a METS record. A skeleton sample of such a record is the following:
<?xml version="1.0" encoding="utf-8"?>
<mets xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://www.loc.gov/METS/"
xmlns:mix="http://www.loc.gov/mix/"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:local="http://diglib.princeton.edu"
xmlns:mods="http://www.loc.gov/mods/v3"
xsi:schemaLocation="http://www.loc.gov/METS/ http://www.loc.gov/standards/mets/mets.xsd http://www.loc.gov/mix/ http://schema.ccs-gmbh.com/docworks/version20/mix_jp2.xsd http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-6.xsd"
TYPE="Magazine"
OBJID="urn:PUL:periodicals:bluemountain:bmtnaap_1921-11_01">
<metsHdr>
<agent ROLE="CREATOR" TYPE="ORGANIZATION">
<name>Princeton University Library, Digital Initiatives</name>
<note>docWORKS-ID: 326971</note>
</agent>
<metsDocumentID TYPE="URN">urn:PUL:bluemountain:td:bmtnaap_1921-11_01</metsDocumentID>
</metsHdr>
<dmdSec ID="dmd1">
<mdWrap MDTYPE="MODS">
<xmlData>
<!-- MODS record goes here -->
</xmlData>
</mdWrap>
</dmdSec>
<!--Use a single administrative section (<amdSec>) as a
wrapper for the technical metadata for all the images in a group-->
<amdSec ID="amdSec1">
<techMD ID="techmd1">
<!-- technical metadata (MIX) for first image -->
<mdWrap MDTYPE="NISOIMG">
<!-- The technical metadata docWorks provides goes here -->
</mdWrap>
</techMD>
<techMD ID="techmd2">
<!-- technical metadata for the second image -->
<mdWrap MDTYPE="NISOIMG"/>
</techMD>
<!-- <techMD> elements for remaining image files in this group -->
</amdSec>
<amdSec ID="amdSec2">
<!-- <techMD> elements for generative image files -->
</amdSec>
<amdSec ID="amdSec3">
<!-- <techMD> elements for preservation image files -->
</amdSec>
<amdSec ID="amdSec4">
<!-- <techMD> elements for delivery PDF files -->
</amdSec>
<amdSec ID="amdSec5">
<!-- <techMD> elements for high-resolution PDF files -->
</amdSec>
<fileSec>
<fileGrp ID="IMGGRP1" USE="Delivery Images">
<!-- Note that the AMDID attribute contains the ID of the
<techMD> element corresponding to the file. Note, too,
the use of the GROUPID attribute, which groups together
the image file, other resolutions, and its corresponding ALTO file. -->
<file ID="IMG001" GROUPID="page1" AMDID="techmd1" MIMETYPE="image/jp2" CHECKSUM="xxxx" CHECKSUMTYPE="SHA-1">
<FLocat LOCTYPE="URL" xlink:href="file:///usr/share/BlueMountain/astore/periodicals/bmtnaad/issues/1925/06/03_01/delivery/bmtnaad_1925-06-03_01_001.jp2"/>
</file>
<file ID="IMG002" GROUPID="page2" AMDID="techmd2" MIMETYPE="image/jp2" CHECKSUM="xxxx" CHECKSUMTYPE="SHA-1">
<FLocat LOCTYPE="URL" xlink:href="file:///usr/share/BlueMountain/astore/periodicals/bmtnaad/issues/1925/06/03_01/delivery/bmtnaad_1925-06-03_01_002.jp2"/>
</file>
</fileGrp>
<fileGrp ID="IMGGRP2" USE="Generative Images">
<file ID="IMG003" GROUPID="page1" AMDID="techmd1" MIMETYPE="image/jp2" CHECKSUM="xxxx" CHECKSUMTYPE="SHA-1">
<FLocat LOCTYPE="URL" xlink:href="file:///usr/share/BlueMountain/astore/periodicals/bmtnaad/issues/1925/06/03_01/generative/bmtnaad_1925-06-03_01_001.jp2"/>
</file>
<file ID="IMG004" GROUPID="page2" AMDID="techmd2" MIMETYPE="image/jp2" CHECKSUM="xxxx" CHECKSUMTYPE="SHA-1">
<FLocat LOCTYPE="URL" xlink:href="file:///usr/share/BlueMountain/astore/periodicals/bmtnaad/issues/1925/06/03_01/generative/bmtnaad_1925-06-03_01_002.jp2"/>
</file>
</fileGrp>
<fileGrp ID="IMGGRP3" USE="Preservation Images">
<file ID="IMG005" GROUPID="page1" AMDID="techmd1" MIMETYPE="image/tiff" CHECKSUM="xxxx" CHECKSUMTYPE="SHA-1">
<FLocat LOCTYPE="URL" xlink:href="file:///usr/share/BlueMountain/pstore/periodicals/bmtnaad/issues/1925/06/03_01/bmtnaad_1925-06-03_01_001.tif"/>
</file>
<file ID="IMG006" GROUPID="page2" AMDID="techmd2" MIMETYPE="image/tiff" CHECKSUM="xxxx" CHECKSUMTYPE="SHA-1">
<FLocat LOCTYPE="URL" xlink:href="file:///usr/share/BlueMountain/pstore/periodicals/bmtnaad/issues/1925/06/03_01/bmtnaad_1925-06-03_01_002.tif"/>
</file>
</fileGrp>
<fileGrp ID="PDFGRP1" USE="low-resolution PDF">
<file ID="PDF01" MIMETYPE="application/pdf" CHECKSUM="xxxx" CHECKSUMTYPE="SHA-1">
<FLocat LOCTYPE="URL" xlink:href="file:///usr/share/BlueMountain/astore/periodicals/bmtnaad/issues/1925/06/03_01/bmtnaad_1925-06-03_01.pdf"/>
</file>
</fileGrp>
<fileGrp ID="PDFGRP2" USE="high-resolution PDF">
<file ID="PDF02" MIMETYPE="application/pdf" CHECKSUM="xxxx" CHECKSUMTYPE="SHA-1">
<FLocat LOCTYPE="URL" xlink:href="file:///usr/share/BlueMountain/pstore/periodicals/bmtnaad/issues/1925/06/03_01/bmtnaad_1925-06-03_01.pdf"/>
</file>
</fileGrp>
<fileGrp ID="ALTOGRP" USE="OCR">
<file ID="ALTO001" GROUPID="page1" MIMETYPE="text/xml" CHECKSUM="xxxx" CHECKSUMTYPE="SHA-1">
<FLocat LOCTYPE="URL" xlink:href="file://.bmtnaad_1925-06-03_01_001.alto.xml"/>
</file>
<file ID="ALTO002" GROUPID="page2" MIMETYPE="text/xml" CHECKSUM="xxxx" CHECKSUMTYPE="SHA-1">
<FLocat LOCTYPE="URL" xlink:href="file://.bmtnaad_1925-06-03_01_002.alto.xml"/>
</file>
</fileGrp>
</fileSec>
<structMap TYPE="PHYSICAL">
<div/>
</structMap>
<structMap TYPE="LOGICAL">
<div/>
</structMap>
</mets>
The root element <mets> contains these attributes:
- TYPE
- the fixed value Magazine
- OBJID
- the URN for the issue
- LABEL
- the issueid
The <metsHdr> element shall contain two elements:
A constant value for all records:
<agent ROLE="CREATOR" TYPE="ORGANIZATION">
<name>Princeton University Library, Digital Initiatives</name>
</agent>
Contains a string whose contents is composed as follows:
PREFIX:ISSUID
Where PREFIX is the following fixed value:
urn:PUL:bluemountain:td:
And ISSUEID is the issue identifier, computed using the rules above.
The record contains a single <dmdSec> element with an ID attribute of “dmd1”’ it contains an embedded MODS record for the issue (described below).
The <amdSec> contains a <techMD> element for each image file (a <mix> record).
There are five <amdSec>s in an issue-level METS file, one for the files in each of the following groups:
- Delivery-level jp2s
- Generative jp2s
- Preservation-level tifs
- low-resolution issue-level PDF
- high-resolution issue-level PDF
The fileSec comprises six <fileGrp> elements: one for each of the image groups above and one for the ALTO records.
The IMGGRP file group contains <file> elements that indicate the location of each image file, with attributes linking the file to the corresponding technical metadata and to the corresponding ALTO file.
- ID
- a unique XML id
- AMDID
- the ID of the <techmd> element corresponding to the image file
- GROUPID
- an ID that links an image file to an ALTO file.
The image file for a page and the ALTO file containing the OCR output for that page share an id (conventionally named pageN, where N is a sequence number).
- MIMETYPE
- the constant “image/jp2” for jpeg2000 images
- CHECKSUM
- the checksum of the file, according to the
algorithm specified in CHECKSUMTYPE
- CHECKSUMTYPE
- the algorithm used to compute the checksum; usually SHA-1.
- LOCTYPE
- the constant URL
- xlink:href
- the path to the file. For this project, it will
be a local path. For example:
file://./bmtnaad_1925-06-03_01_001.jp2
Like IMGGRP1 (Delivery Images) but corresponding to the Generative JP2 images.
Like IMGGRP1 (Delivery Images) but corresponding to the Preservation tiff images.
Contains a single <file> element corresponding to the low-resolution PDF.
Contains a single <file> element corresponding to the high-resolution PDF.
Like the <fileGrp> for images, but corresponding to the ALTO files. (The ALTO files do not have technical metadata, so there is no AMDID attribute.)
The <structMap> element describes a hierarchical arrangement of the parts (<div>s) making up the digital object described by the METS. For this project, there are two kinds: a physical structMap, which delineates the pages of the newspaper issue in reading order, and a logical structMap, which functions as an outline of the newspaper’s contents. Both of these are assembled by docWorks, using configuration rules.
This structMap is a map of the entire object. Each of the <div>s corresponds to one of the file groups.
<structMap TYPE="Resource">
<div LABEL="delivery formats">
<div LABEL="low-resolution PDF"></div>
<div LABEL="high-resolution PDF"></div>
<div LABEL="preservation TIFF"></div>
<div LABEL="delivery JP2"></div>
<div LABEL="generative JP2"></div>
<div LABEL="ALTO"></div>
<div LABEL="TEI-encoded"></div>
</div>
</structMap>
The outlines below show the hierarchical relationship among the <div> elements in the logical structMap. Each div is described more fully below.
- Magazine
- Volume+
- Issue+
- Contents
- { Article* | Illustration* | Section* }
- Advertisements
- { SponsoredAd+ | Section* }
- Article
- Header*
- Contents
- Head+
- Byline*
- Body
- { Paragraph* | Section* }
- Illustration
- Graphic+
- Caption?
- Illustration
- Paragraph+
- SponsoredAd
- { Graphic* | Paragraph* }
- Section
- Header?
- Body
- SponsoredAd
- { Article* | Illustration* | SponsoredAd* | Section* }
- Paragraph
- TextBlock+
- Paragraph
Attributes:
- TYPE
- must be “Magazine”
- LABEL
- The name of the magazine, equivalent to the
top-level <titleInfo> element.
6.1.2.5.3.1.2 <div TYPE=”Volume”>? A <div> representing a (possibly) bound volume of issues. In most cases, we are representing each issue of a magazine as a separate digital object, so the <div TYPE=”Volume”> element will in practice contain only one <div TYPE=”Issue”>.Attributes:
- TYPE
- must be “VOLUME”
- LABEL
- The volume caption, if present
Attributes:
- TYPE
- must be “ISSUE”
- LABEL
- The issue number and the date of publication
- DMDID
- the ID of the <dmdSec> for the object (in practice,
always “dmd1”)
The Issue <div> contains, in most cases, three sub-<div>s: <div TYPE=”PublicationInfo”>, <div TYPE=”EditorialContent”> and <div TYPE=”SponsoredAdvertisements”>, described below.
6.1.2.5.3.1.4 <div TYPE=”PublicationInfo”> Contains <div>s corresponding to the metadata about the magazine printed in the issue itself: mastheads, nameplates, folio lines, page numbers, etc.6.1.2.5.3.1.5 <div TYPE=”EditorialContent” LABEL=”Contents”> Contains <div>s corresponding to the TextContent and Illustration elements, in publication order. These elements have DMDID attributes whose values link them to the corresponding <relatedItem> elements in the <mods> record.6.1.2.5.3.1.6 <div TYPE=”SponsoredAdvertisements” LABEL=”Advertisements”> Contains <div>s corresponding to the SponsoredAdvertisement elements, in publication order. These elements have DMDID attributes whose values link them to the corresponding <relatedItem> elements in the <mods> record.6.1.2.5.3.1.7 <div TYPE=”TextContent”> A <div> representing a piece of editorial content: an article, a review, a letter, a poem, etc.Editorial content takes a number of forms: it may or may not have a headline; it may or may not have a byline; it may have subsections, each with its own headline (subhead).
A TextContent <div> MAY contain a <div TYPE=”Header”>; it will always have a <div TYPE=”Body”>.
Attributes:
- TYPE
- must be “TextContent”
- DMDID
- the ID of the <mods:relatedItem type=”constituent”>
element corresponding to this piece in the newspaper.
- LABEL
- SHOULD be equivalent to the contents of the
mods:relatedItem/mods:titleInfo/mods:title element
6.1.2.5.3.1.8 <div TYPE=”Header”> A <div> containing the component’s (the TextContent, SponsoredAd, or Section) heading information: a combination of headline and byline. The Header may contain one or more Head elements (encompassing, for example, a headline and a subhead); it may also contain one or more Byline elements (which may not necessarily be physically contiguous in the physical layout of the page).Attributes:
- TYPE
- must be “Header”
Attributes:
- TYPE
- must be “Head”
Attributes:
- TYPE
- must be “Byline”
Attributes:
- TYPE
- must be “Paragraph”
- ORDER
- the index of the paragraph in its containing div
(1, 2, etc.).
6.1.2.5.3.1.13 <div TYPE=”Section”> A section is a container <div> of other <div>s. It may or may not have a Header; it will contain some combination of articles, illustrations, SponsoredAds, and other sections.6.1.2.5.3.1.14 <div TYPE=”Illustration”>6.1.2.5.3.1.15 <div TYPE=”Graphic”> A div designating the location of a graphic on the page.6.1.2.5.3.1.16 <div TYPE=”Caption”>6.1.2.5.3.1.17 <div TYPE=”SponsoredAd”>6.1.2.5.3.1.18 <div TYPE=”TextBlock”> A div designating the region of a block of text on a page.There are two kinds of MODS records in Blue Mountain:
- Title-level MODS
- Issue-level MODS
The descriptive metadata for most, if not all, of the Blue Mountain titles has been taken from MARC records retrieved from Princeton’s OPAC using http://diglib.princeton.edu/tools/v2m and then edited and enhanced by hand. Here is a sample:
<mods xmlns="http://www.loc.gov/mods/v3"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation=
"http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-6.xsd">
<identifier type="bmtn">urn:PUL:bluemountain:bmtnaad</identifier>
<titleInfo usage="primary">
<nonSort>Le</nonSort>
<title>coeur à barbe</title>
<subTitle>journal transparent</subTitle>
</titleInfo>
<titleInfo>
<nonSort>Le </nonSort>
<title>cœur à barbe</title>
<subTitle>journal transparent</subTitle>
</titleInfo>
<name type="personal" authority="viaf" valueURI="http://viaf.org/viaf/73848255">
<namePart>Eluard, Paul</namePart>
<namePart type="date">1895-1952</namePart>
</name>
<name type="personal" authority="viaf" valueURI="http://viaf.org/viaf/96123513">
<namePart>Ribemont-Dessaignes, Georges</namePart>
<namePart type="date">1884-1974</namePart>
</name>
<name type="personal" authority="viaf" valueURI="http://viaf.org/viaf/27072443">
<namePart>Tzara, Tristan</namePart>
<namePart type="date">1896-1963</namePart>
</name>
<typeOfResource>text</typeOfResource>
<genre authority="bmtn">Periodicals-Title</genre> <!-- ref:genre -->
<genre authority="marcgt">periodical</genre>
<genre authority="fast">Periodicals.</genre>
<originInfo>
<place>
<placeTerm type="code" authority="marccountry">fr</placeTerm>
</place>
<dateIssued encoding="marc" point="start">1922</dateIssued>
<dateIssued encoding="marc" point="end">1922</dateIssued>
<issuance>serial</issuance>
<frequency>Unknown</frequency>
</originInfo>
<abstract>
A single-issue publication, le coeur à barbe was issued by
Tristan Tzara as a rallying-call for Dadaism, in response
to attacks by Francis Picabia and André Breton. Selected
contributors: Paul Éluard, Tristan Tzara, Georges
Ribemont-Dessaignes.
</abstract>
<language>
<languageTerm authority="iso639-2b" type="code">fre</languageTerm>
</language>
<physicalDescription>
<reformattingQuality>access</reformattingQuality>
<form authority="marccategory">electronic resource</form>
<form authority="marcsmd">remote</form>
<extent>1 online resource.</extent>
</physicalDescription>
<note type="date/sequential designation">1er no (avril 1922).</note>
<note>Editor: Tristan Tzara.</note>
<note>Introductory statement signed "Eluard, Ribemont-Dessaignes, Tzara."</note>
<subject>
<geographicCode authority="marcgac">e-fr---</geographicCode>
</subject>
<subject authority="fast">
<temporal>1900 - 1999</temporal>
</subject>
<subject authority="lcsh">
<topic>Dadaism</topic>
<geographic>France</geographic>
<genre>Periodicals</genre>
</subject>
<subject authority="lcsh">
<topic>French literature</topic>
<temporal>20th century</temporal>
<genre>Periodicals</genre>
</subject>
<subject authority="fast">
<topic>Dadaism</topic>
</subject>
<subject authority="fast">
<topic>French literature</topic>
</subject>
<subject authority="fast">
<geographic>France</geographic>
</subject>
<classification authority="lcc">NX456.D3</classification>
<location>
<url displayLabel="electronic resource" usage="primary display"
note="Available online via the Blue Mountain Project. Click here to view holdings"
>http://arks.princeton.edu/ark:/88435/12579t647</url>
</location>
<identifier type="oclc">858948452</identifier>
<recordInfo>
<descriptionStandard>rda</descriptionStandard>
<recordContentSource authority="marcorg">ZCU</recordContentSource>
<recordCreationDate encoding="marc">130926</recordCreationDate>
<recordChangeDate encoding="iso8601">20141212022132.0</recordChangeDate>
<recordIdentifier source="OCoLC">ocn858948452</recordIdentifier>
<recordIdentifier source="bmtn">urn:PUL:bluemountain:dmd:bmtnaad</recordIdentifier>
<recordOrigin>Converted from MARCXML to MODS version 3.4 using MARC21slim2MODS3-4.xsl
(Revision 1.70)</recordOrigin>
<languageOfCataloging>
<languageTerm authority="iso639-2b" type="code">eng</languageTerm>
</languageOfCataloging>
</recordInfo>
</mods>
The machine conversion does not include the following necessary elements:
- <identifier type=’bmtn’>urn:PUL:bluemountain:BMTNID</identifier>
- <recordIdentifier source=’bmtn’>
- <genre authority=”bmtn”>Periodicals-Title</genre>
These must be added by hand; if they are missing, the eXist-db catalog will not work.
The <name> elements are associated with authorities to enhance search and broaden the interconnectedness of the data. http://viaf.org is the preferred authority; http://id.loc.gov should be consulted when a name is not found in viaf.org; if a name is found in neither, a local authority will be created (To be determined later).
Dates are encoded in ISO standard 8601 format (see http://www.iso.org/iso/catalogue_detail?csnumber=40874; for an overview see http://en.wikipedia.org/wiki/ISO_8601). The extended form of the representation is preferred.
Blue Mountain prefers the Chicago Manual of Style for spelling and capitalization; these guidelines often conflict with standard cataloging practice, especially when capitalizing foreign languages. When they do, create two <titleInfo> elements, one following library rules and the other following the Chicago Manual of Style. The latter must be distinguished by giving the usage attribute the value “primary”.
Blue Mountain encodes descriptive metadata for the contents of each magazine issue, so the issues may be searched and analyzed.
<mods xmlns="http://www.loc.gov/mods/v3">
<recordInfo>
<recordIdentifier>urn:PUL:bluemountain:dmd:bmtnaap_1921-11_01</recordIdentifier>
</recordInfo>
<identifier type="bmtn">urn:PUL:bluemountain:bmtnaap_1921-11_01</identifier>
<typeOfResource>text</typeOfResource>
<genre>Periodicals-Issue</genre>
<titleInfo>
<title>Broom</title>
</titleInfo>
<name xmlns:mets="http://www.loc.gov/METS/"
type="personal"
authority="viaf"
valueURI="http://viaf.org/viaf/278935970">
<displayForm>Harold A. Loeb</displayForm>
<role>
<roleTerm authorityURI="http://www.loc.gov/marc/relators/">edt</roleTerm>
</role>
</name>
<name type="personal" authority="viaf"
valueURI="http://viaf.org/viaf/5727003">
<displayForm>Alfred Kreymborg</displayForm>
<role>
<roleTerm authorityURI="http://www.loc.gov/marc/relators/">edt</roleTerm>
</role>
</name>
<name type="personal"
authority="viaf"
valueURI="http://viaf.org/viaf/5727003">
<displayForm>Alfred Kreymborg</displayForm>
<role>
<roleTerm authorityURI="http://www.loc.gov/marc/relators/">edt</roleTerm>
</role>
</name>
<name type="personal"
authority="viaf"
valueURI="http://viaf.org/viaf/109649570">
<displayForm>Giuseppe Prezzolini</displayForm>
<role>
<roleTerm authorityURI="http://www.loc.gov/marc/relators/">edt</roleTerm>
</role>
</name>
<part type="issue">
<detail type="volume">
<number>1</number>
<caption>Vol. I</caption>
</detail>
<detail type="number">
<number>1</number>
<caption>No. 1</caption>
</detail>
</part>
<originInfo>
<dateIssued>November 1921</dateIssued>
<dateIssued keyDate="yes" encoding="w3cdtf">1921-11</dateIssued>
</originInfo>
<location>
<physicalLocation type="text">Princeton University. Department of Rare Books and Special Collections</physicalLocation>
<physicalLocation authority="marcorg" type="code">NjP</physicalLocation>
<holdingSimple>
<copyInformation>
<subLocation>LM (Little Magazine)</subLocation>
</copyInformation>
</holdingSimple>
</location>
<relatedItem type="host"
xlink:type="simple"
xlink:href="urn:PUL:bluemountain:bmtnaap">
<recordInfo>
<recordIdentifier>urn:PUL:bluemountain:dmd:bmtnaap</recordIdentifier>
</recordInfo>
</relatedItem>
<!-- The remaining constituents go here -->
</mods>
The root element of the document. When output as a stand-alone document, it has fixed attributes, as illustrated below:
<mods xmlns="http://www.loc.gov/mods/v3"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.loc.gov/mods/v3
http://www.loc.gov/mods/v3/mods-3-5.xsd">
The <recordInfo> element contains information about the MODS record itself. It shall contain a <recordIdentifier> element, as below.
The <recordIdentifier> element contains the ISSUEMODSURI, the unique identifier for the MODS record itself, as described above.
<recordIdentifier>urn:PUL:bluemountain:dmd:bmtnaap_1921-11_01</recordIdentifier>
The <identifier> element is used to identify the resource the MODS record describes (the magazine issue). Its value is the resource’s ISSUEURI, as described above.
<identifier type="bmtn">urn:PUL:bluemountain:bmtnaap_1921-11_01</identifier>
Each issue-level MODS record is related to the title-level record via a <relatedItem type=’host’> element.
<relatedItem type="host"
xlink:type="simple"
xlink:href="TITLEURI">
<recordInfo>
<recordIdentifier>urn:PUL:bluemountain:dmd:bmtnaap</recordIdentifier>
</recordInfo>
<location>
<physicalLocation type="text">Princeton University. Department of Rare Books and Special Collections</physicalLocation>
<physicalLocation authority="marcorg" type="code">NjP</physicalLocation>
<holdingSimple>
<copyInformation>
<subLocation>LM (Little Magazine)</subLocation>
</copyInformation>
</holdingSimple>
</location>
</relatedItem>
TITLEURI is the URI of the title record for the magazine title of which this issue is a part; e.g.,
urn:PUL:bluemountain:bmtnaap
The <location> data may often be derived from the catalog record for the title; see above.
The <language> element(s) indicates the language of the resource (the magazine issue). If the issue contains material written in several languages, the record should include a <language> element for each one. The value of the <languageTerm> element must be drawn from iso639-2b. For example:
<language>
<languageTerm type="code" authority="iso639-2b">rus</languageTerm>
</language>
The <titleInfo> element shall be determined by standard cataloging rules.
<titleInfo>
<nonSort>Le</nonSort>
<title>coeur à barbe</title>
<subTitle>journal transparent</subTitle>
</titleInfo>
When standard cataloging rules differ from the guidelines set forth in the Chicago Manual of Style, encoders should record the latter as the primary form of the title by using the usage attribute, as in
<titleInfo usage="primary">
<title>Broom</title>
<subTitle>An International Magazine of the Arts</subTitle>
</titleInfo>
<titleInfo>
<title>Broom</title>
</titleInfo>
The <part> element shall take the following form:
<part>
<detail type="volume">...</detail>
<detail type="issue">...</detail>
</part>
<detail type="volume">
<number>ARABICVOL</number>
<caption>Vol. MASTHEADVOL</caption>
</detail>
Where
- ARABICVOL is the volume number expressed as a non-formatted arabic numeral (e.g., 1, 2, 3, … 10, 11, …)
- MASTHEADVOL is the volume number as it appears in the
masthead.
The <detail type=”issue”> element shall take one of two possible forms:
- For “normal” issues (i.e., those following the recorded
sequence of publication), record both the sequential number of the issue as an arabic numeral and the issue number as it appears in the masthead:
<detail type="issue">
<number>ARABICISSUE</number>
<caption>No. MASTHEADISSUE</caption>
</detail>
Where
- ARABICISSUE is the issue number expressed as a non-formatted arabic numeral (e.g., 1, 2, 3, …, 10, 11, …)
- MASTHEADISSUE is the volume number as it appears in the
masthead.
- For “special” issues (e.g., supplements, etc.), for which there is no sequential number for the
issue, the <detail type=”issue”> element should take the following form:
<detail type="issue">
<caption>CAPTIONTEXT</caption>
</detail>
Where CAPTIONTEXT is determined using standard cataloging rules.
The <originInfo> element shall be used to record the date of issuance, as follows:
<originInfo>
<dateIssued>PRINTEDDATE</dateIssued>
<dateIssued encoding="iso8601" keyDate="yes">ISODATE</dateIssued>
</originInfo>
Where
- PRINTEDDATE is the date as it appears in the cover page FolioLine, or in the Masthead.
- ISODATE is the value of the date in the masthead, expressed in iso8601 format (YYYY-MM-DD) – see http://www.w3.org/TR/NOTE-datetime for details.
The location element is used to specify the physical location of the original from which the digital object was created. For the most part, this information can be derived from the data in Voyager; when the digital object has been created from page images scanned from extramural sources, ask the Project Manager.
Traditional library cataloging does not extend to the contents of periodicals, yet this level of description is precisely what is required by scholars of periodicals, and the Blue Mountain Project is committed to providing it, as well as to formulating guidelines, in cooperation with scholars and librarians, for this level of description. The specifications for this description, therefore, must be considered work in progress, work that will necessarily evolve over the course of the Project.
That being said, the Project will, at the outset, capture information about the following sorts of constituents:
- traditional editorial content (articles, features, letters to the editor, etc.)
- significant illustrations (figures, tip-ins, etc.)
- advertisements
The last sort – advertisements – is the most controversial, and the most difficult for librarians to understand, although advertisements are among the most heavily studied parts of historical periodicals. There are at present no established rules for describing advertisements, and their variety and abundance pose serious practical challenges to projects with limited resources. This version of the specification, therefore, provides little guidance on the description of advertisements, other than providing a framework for this level of detail to be created at a future date, by scholars, researchers, and other students of the material who wish to advance scholarship by enhancing the data provided here.
They are described in greater detail below.
Here is a hypothetical example of a constituent:
<relatedItem type="constituent" ID="c003">
<titleInfo lang="eng">
<title>Lake</title>
</titleInfo>
<name type="personal" authority="viaf"
valueURI="http://viaf.org/viaf/46485632">
<displayForm>Bayard Boyesen</displayForm>
<role>
<roleTerm type="code" authority="marcrelator">cre</roleTerm>
</role>
</name>
<typeOfResource>text</typeOfResource>
<language>
<languageTerm authority="iso639-2b">eng</languageTerm>
</language>
<part>
<extent unit="page">
<start>3</start>
</extent>
</part>
<genre type="CCS">TextContent</genre>
</relatedItem>
- The type attribute has the value constituent, because this related item is a constituent (a part) of the newspaper.
- The ID attribute may be any valid XML ID (it must begin with a character). By convention, the ID will begin with the letter c followed by a sequential number. (The docWorks processing flow generates these ID attributes.) This attribute links the description to a <div> element in the METS logical structMap.
The <title> is transcribed as it appears on the page, using standard cataloging rules.
- The <name> elements (there may none, or there may be more than one) are used to record the names of the people or organizations who are responsible for the constituent. Very often a constituent contains a single <name> element designating the creator of the piece; not infrequently, however, a constituent will contain the names of translators, illustrators, co-authors, or other contributors. These names must be encoded in separate <name> elements.
The <name> is transcribed as it appears on the page and is encoded in the <displayForm> element, which here functions as a byline. All <name> elements shall include a <role> element, which shall designate the generic role, cre, in the <roleTerm> subelement.
When possible, encoders should supply a link to a name authority, preferably http://viaf.org.
<name type="personal" authority="viaf"
valueURI="http://viaf.org/viaf/22203431">
<displayForm>L. Moholy-Nagy</displayForm>
<role>
<roleTerm type="code" authority="marcrelator">trle</roleTerm>
</role>
</name>
- The language used in the text. If more than one language is used, there should be a <language> element for each.
- The <language> element shall contain the subelement <languageTerm>, a three-letter code derived from the ISO639-2 standard, found at http://www.loc.gov/standards/iso639-2/. The code form should be used.
Contains a single <extent> element.
- The <extent> records the page or pages on which the constituent appears:
- when the item appears on a single page
encode the page number as a solitary <start> element.
<extent unit="page">
<start>3</start>
</extent>
- when the item appears on multiple sequential pages
encode the first page in a <start> element and the last page in an <end> element.
<extent unit="page">
<start>3</start>
<end>4</end>
</extent>
- when the item appears on non-sequential pages
encode the pages as a series in a <list> element, as in
<extent unit="page">
<list>3; 5</list>
</extent>
- when the item appears on a mix of sequential and non-sequential pages
-
<extent unit="page"> <list>1-2; 5</list> </extent>
For an article that starts on page 1, continues on page 2, and then skips to page 5.
The <genre type=”CCS”> is determined from the docWorks configuration: for articles and other editorial content, it will be TextContent; for photographs, cartoons, and other illustrations, it will be Illustration; for advertisements, it will be SponsoredAdvertisement.
These are the most common sorts of constituents: articles, poems, fiction – all textual editorial content.
A TextContent constituent may contain other constituents: in particular, an article may contain illustrations.
6.2.3.6.1.1.1 A basic article <relatedItem type="constituent" ID="c003">
<titleInfo lang="eng">
<title>Lake</title>
</titleInfo>
<name type="personal" authority="viaf"
valueURI="http://viaf.org/viaf/46485632">
<displayForm>Bayard Boyesen</displayForm>
<role>
<roleTerm type="code" authority="marcrelator">cre</roleTerm>
</role>
</name>
<typeOfResource>text</typeOfResource>
<language>
<languageTerm authority="iso639-2b">eng</languageTerm>
</language>
<part>
<extent unit="page">
<start>3</start>
</extent>
</part>
<genre type="CCS">TextContent</genre>
</relatedItem>
<relatedItem type="constituent" ID="c025">
<titleInfo lang="eng">
<title>Apollinaire</title>
</titleInfo>
<name type="personal" authority="viaf"
valueURI="http://viaf.org/viaf/41833206">
<displayForm>M. J.</displayForm>
<role>
<roleTerm type="code" authority="marcrelator">cre</roleTerm>
</role>
</name>
<typeOfResource>text</typeOfResource>
<language>
<languageTerm authority="iso639-2b">eng</languageTerm>
</language>
<part>
<extent unit="page">
<start>74</start>
<end>75</end>
</extent>
</part>
<relatedItem type="constituent" ID="c025.1">
<titleInfo lang="eng">
<title>Untitled Cartoon</title>
</titleInfo>
<typeOfResource>still image</typeOfResource>
<part>
<extent unit="page">
<start>74</start>
</extent>
</part>
<genre type="CCS">Illustration</genre>
</relatedItem>
</relatedItem>
We use Illustration to refer to all kinds of graphic “art”: photographs, cartoons, charts, etc. Most illustrations in (but not all) are accompanied by some sort of caption: a line or two of text, usually beneath the graphic, that describes the illustration, or names the creator of the illustration, or both.
In docWorks processing, captions, which often contain title and creator information, are recorded, in preliminary fashion, in the <title> element, as shown here:
<relatedItem type="constituent" ID="c009">
<titleInfo lang="pol">
<title>ILUSTRACJA DO „ZAGADNIEŃ WSPÓŁCZESNEJ ARCHITEKTURY" LEONA CHWISTKA (CZĘŚĆ 111. ZAGADNIENIE KONSTRUKCJI)</title>
</titleInfo>
<typeOfResource>still image</typeOfResource>
<part>
<extent unit="page">
<start>17</start>
</extent>
</part>
<genre type="CCS">Illustration</genre>
</relatedItem>
</relatedItem>
In cleaning up after docWorks processing, encoders should parse these titles into true titles and creators, as in:
<relatedItem type="constituent" ID="c009">
<titleInfo lang="pol">
<title>ILUSTRACJA DO „ZAGADNIEŃ WSPÓŁCZESNEJ ARCHITEKTURY"</title>
</titleInfo>
<name type="personal">
<displayForm>Leona Chwistka</displayForm>
<role>
<roleTerm type="code" authority="marcrelator">cre</roleTerm>
</role>
</name>
<typeOfResource>still image</typeOfResource>
<part>
<extent unit="page">
<start>17</start>
</extent>
</part>
<genre type="CCS">Illustration</genre>
<note type="caption">CZĘŚĆ 111. ZAGADNIENIE KONSTRUKCJI</note>
</relatedItem>
Text that is neither title nor creator should be captured in a <note type=”caption”>.
Advertisements are an important and plentiful constituent of many magazines. We do not attempt to assign them titles or creators.
<relatedItem type="constituent" ID="c3">
<titleInfo>
<title>[Advertisement]</title>
</titleInfo>
<language>
<languageTerm type="code" authority="iso639-2b">rus</languageTerm>
</language>
<part>
<extent unit="page">
<start>3</start>
</extent>
</part>
<genre type="CCS">SponsoredAdvertisement</genre>
</relatedItem>
For each page, an encoded representation of the layout and the machine-readable text on the page shall be provided, using the ALTO schema, version 3.0 or higher, with the following specifications, adopted from the NDNP:
- The text shall be encoded in the natural reading order of the language in which the text is written;
- Point size and font data to at least the word level shall be included;
- The ALTO file shall include bounding-box coordinates to at least the word level;
- Non-rectangular blocks shall not be used. Some illustrations may format as “tight” in the document.
In general, Princeton University Library adheres to the standards elaborated by the Federal Agencies Digitization Guidelines Initiative (FADGI), whose Still Image Working Group produced a document entitled Technical Guidelines for Digitizing Cultural Heritage Materials in 2010. Archival images will be captured in 24-bit RGB and digitally rendered at varying resolutions to produce a uniform long dimension of 7200 pixels, then stored as uncompressed TIFF files with a large, non-proprietary color profile (Pro Photo RGB). The homogenization of the archival files to a long dimension of 7200 pixels allows us to produce uniform derivative images rapidly and estimate our storage needs more accurately.
Derived from the Master TIFF files with the following formula:
kdu_compress -i YOURINPUT.tif -o YOUROUTPUT.jp2 Creversible=yes -rate -,1,0.5,0.25 \ -jp2_space sRGB \ -double_buffering 10 \ -num_threads 4 \ -no_weights \ -quiet
To generate a JP2000 using Kakadu, use the following recipe (taken from The National Digital Newspaper Program (NDNP) Technical Guidelines for Applicants):
kdu_compress -i YOURINPUT.tif -o YOUROUTPUT.jp2 -rate 1,0.84,0.7,0.6,0.5,0.4,0.35,0.3,0.25,0.21,0.18,0.15,0.125,0.1,0.088,0.0 75,0.0625,0.05,0.04419,0.03716,0.03125,0.025,0.0221,0.01858,0.015625 Clevels=6 Stiles={1024,1024} Corder=RLCP
1 This convention has been adopted to suport the naming conventions in Veridian, which prohibit the use of integers in identifiers.
2 RFC 1737 urges all NIDs to be registered with IANA. PUL is not, to my knowledge, registered with IANA.