IIIF - jneubert/doc GitHub Wiki

IIIF with Apache and generated static files.

PM20

PM20 has c. 2m digitized pages as jpg images in three different resolutions, organized in about 25,000 folders. Previously located at http://webopac.hwwa.de/pressemappe20 (internally pm-opac). Since 2022-01 moved to https://pm20.zbw.eu.

Structure for persistent URIs

Common prefix

http://purl.org/pressemappe20/folder/

Collection

{collection} # (co|pe|sh|wa) for company/persons/subject/wares archive

Folder

{collection}/{folder-nk} # folder-nk = numerical key, e.g. "000012" or "141113,161612"

Document

{collection}/{folder-nk}/{document-id} # document-id = numerus currens for doc wthin folder

Page

{collection}/{folder-nk}/{document-id}/{page-no} # page-no = numerus currens for page wthin document, starting with 1

Mapping/implementation

The URIs are mapped 1:1 to https://pm20.zbw.eu/folder by Purl and to implementation-specific URLs via Apache Rewrite rules (RewriteMap is currently not used). The Purl URIs are published for citations. Folder URIs are resolved via content negotiation to language-specific html files. Document and page URIs are resolved to IIIF viewer URLs addressing a manifest and a specific canvas (details below). Manifest files exist in a "public" and "intern" version due to intellectual property rights restrictions. All inner linking within the application is done by URLs pointing to files under https://pm20.zbw.eu.

Directory structure

Physical structure (as copied from pm-opac, not visible on the web):

folder
  {collection}
    {folder-hash}
      {folder-nk}
        {document-hash}       # for sh and wa, hash1/nk1/hash2/nk2
          {document-id}
            PIC
              {image-file}    # _[ABC].jpg ([format](https://www.wikidata.org/wiki/Wikidata:WikiProject_20th_Century_Press_Archives/Data_structure/Dateinamensformat), includes image id, starting with 0

Virtual structure for IIIF:

iiif
  folder
    {collection}
      {folder-nk}
        manifest.json
        {document-id}
          ~~manifest.json~~         # implemented differently (see below)
          {page-no}             # starting with 1
            full
              {size}            # defaults to "max"
                0
                  default.jpg

Real structure for IIIF (no directory hashing here - limit 65,000 sub-dirs for ext4 - under folder/{collection} max 12,000 sub-dirs; db 'institution' row count 62,255)

iiif
  folder
    {collection}
        {folder-nk}
          .htaccess
          intern.manifest.json    # via rewrite rule
          public.manifest.json    # via rewrite rule
          {document-id}
            {page-no}             # starting with 1
              .htaccess
              info.json
              thumbnail.jpg

Implementation in Image / Presentation API 3.0

Static image files

The implementation is not based on tiles, but just on the existing image files in different resolutions, which are provided by a level0 compliant Imageservice3 via Apache (rewrite rules to physical files).

The file iiif/folder/{collection}/{folder-nk}/{document-id}/{page-no}/.htaccess (example linked below) is used for mapping virtual IIIF image URLs (under iiif/) to paths of physical files (under folder/ in different resolutions). The actual file names are not visible externally.

Other than Mirador, Univeral Viewer currently (as of 04/2022) does not use the different resolutions offered.

Addressing documents and pages

The document structure, as necessary for the persistent document and page URIs, and for the table of contents in the viewing of folders, is implemented as structures element in the folder manifest. It contains Range items, representing the (possibly multiple) page of a document. The {folder-nk}/{document-id}/{page-no} hierarchy is reflected in the Canvas URIs.

As of 04/2022, metadata for ranges is apparently not shown in Mirador and Universal Viewer.

Mapping of document and page URIs to "start" canvases (pages) for the viewer

Implemented in two steps, using /iiifview/ as an intermediate level.

in folder/.htaccess:

# URIs for documents - rewrite to iiif view
RewriteRule "^(co|pe|sh|wa)/([0-9,]+)/([0-9]{5})$" "https://pm20.zbw.eu/iiifview/folder/$1/$2/$3"

# URIs for pages of documents - rewrite to iiif view
RewriteRule "^(co|pe|sh|wa)/([0-9,]+)/([0-9]{5})/([0-9]{4})$" "https://pm20.zbw.eu/iiifview/folder/$1/$2/$3/$4"

in .htaccess:

# IIIF viewer links for documents
RewriteRule "^iiifview/folder/(co|pe|sh|wa)/([0-9,]+)/([0-9]{5})$" "https://pm20.zbw.eu/mirador/?manifestId=https://pm20.zbw.eu/iiif/folder/$1/$2/manifest.json&canvasId=https://pm20.zbw.eu/iiif/folder/$1/$2/$3/0001/canvas"

# IIIF viewer links for pages of documents
RewriteRule "^iiifview/folder/(co|pe|sh|wa)/([0-9,]+)/([0-9]{5})/([0-9]{4})$" "https://pm20.zbw.eu/mirador/?manifestId=https://pm20.zbw.eu/iiif/folder/$1/$2/manifest.json&canvasId=https://pm20.zbw.eu/iiif/folder/$1/$2/$3/$4/canvas"

Examples

Previous experimental implementation in 2.x

Examples

Schluchseewerk AG manifest.json, example info.json, example .htaccess.

View with Mirador, Universal Viewer, Tify.

Validators

Viewer demo services

Infrastructure

Browsers/Viewers normally require having CORS headers set. Apache obviously needs an .htaccess file at least one level above the manifest.json.

Header add Access-Control-Allow-Origin "*"
Header add Access-Control-Allow-Headers "origin, x-requested-with, content-type"
Header add Access-Control-Allow-Methods "PUT, GET, POST, DELETE, OPTIONS"

Weblinks