The Matterhorn Protocol, sorted - J-VU/PDF-UA-Library GitHub Wiki

What is it?

"Intended for software developers and document testers, the Matterhorn Protocol is designed to foster PDF/UA adoption by specifying a common set of tests to facilitate the exchange of detailed information on PDF/UA conformance."

A PDF is available from this page at the PDF Association website.

In short, it consists of 31 Checkpoints comprised of 136 Failure Conditions. Of those, 87 failure conditions can be determined by software alone, 47 failure conditions usually require human judgment, and 2 failure conditions have no specific tests (as of the document's publishing in 2014).

Checkpoints

Machine

Checkpoint 01: Real content tagged
Index Failure Condition Section Type How See
01-003 Content marked as Artifact is present inside tagged content. UA1:7.1-1 Object Machine -
01-004 Tagged content is present inside content marked as Artifact. UA1:7.1-1 Object Machine -
01-005 Content is neither marked as Artifact nor tagged as real content. UA1:7.1-2 Object Machine -
01-007 Suspect entry has a value of true. UA1:7.1-11 Doc Machine -
Checkpoint 02: Role Mapping
Index Failure Condition Section Type How See
02-001 One or more non-standard tag’s mapping does not terminate with a standard type. NOTE: Although PDF/UA defines the nomenclature for heading levels above H6 (Hn), these are not standard structure types (as defined in ISO 32000-1) and therefore Hn tags must (PDF/UA-1 7.1, paragraph 1) be rolemapped to a standard structure type. According to PDF/UA-1, PDF/UA-conforming processors are expected to ignore such mappings and respect the heading level. UA1:7.1-3 Doc Machine -
02-003 A circular mapping exists. UA1:7.1-3 Doc Machine -
02-004 One or more standard types are remapped. UA1:7.1-4 Doc Machine -
Checkpoint 06: Metadata
Index Failure Condition Section Type How See
06-001 Document does not contain XMP metadata stream UA1:7.1-8 Doc Machine -
06-002 The metadata stream in the Catalog dictionary does not include the PDF/UA identifier. UA1:5 Doc Machine -
06-003 Metadata stream does not contain dc:title UA1:7.1-8 Doc Machine -
Checkpoint 07: Dictionary
Index Failure Condition Section Type How See
07-001 ViewerPreferences dictionary of the Catalog dictionary does not contain DisplayDocTitle key. UA1:7.1-9 Doc Machine -
07-002 ViewerPreferences dictionary of the Catalog dictionary contains DisplayDocTitle key with a value of false. UA1:7.1-9 Doc Machine -
Checkpoint 09: Appropriate Tags
Index Failure Condition Section Type How See
09-004 A table-related structure element is used in a way that does not conform to the syntax defined in ISO 32000-1, Table 337. UA1:7.2-1 Object Machine -
09-005 A list-related structure element is used in a way that does not conform to Table 336 in ISO 32000-1. UA1:7.2-1 Object Machine -
09-006 A TOC-related structure element is used in a way that does not conform to Table 333 in ISO 32000-1. UA1:7.2-1 Object Machine -
09-007 A Ruby-related structure element is used in a way that does not conform to Table 338 in ISO 32000-1. UA1:7.2-1 Object Machine -
09-008 A Warichu-related structure element is used in a way that does not conform to Table 338 in ISO 32000-1. UA1:7.2-1 Object Machine -
Checkpoint 10: Character Mappings
Index Failure Condition Section Type How See
10-001 Character code cannot be mapped to Unicode. UA1:7.2-2 Object Machine -
Checkpoint 11: Declared Natural Language
Index Failure Condition Section Type How See
11-001 Natural language for text in page content cannot be determined. UA1:7.2-3 Object Machine -
11-002 Natural language for text in “Alt”, “ActualText” and “E” attributes cannot be determined. UA1:7.2-3 Object Machine -
11-003 Natural language in the Outline entries cannot be determined. UA1:7.2-3 Object Machine -
11-004 Natural language in the “Contents” entry for annotations cannot be determined. UA1:7.2-3 Object Machine -
11-005 Natural language in the TU key for form fields cannot be determined. UA1:7.2-3 Object Machine -
11-006 Natural language for document metadata cannot be determined. UA1:7.2-3 Doc Machine -
Checkpoint 13: Graphics
Index Failure Condition Section Type How See
13-004 Figure tag alternative or replacement text missing. UA1:7.3-3 Object Machine -
Checkpoint 14: Headings
Index Failure Condition Section Type How See
14-002 Does use numbered headings, but the first heading tag is not H1. UA1:7.4.2-1 Doc Machine -
14-003 Numbered heading levels in descending sequence are skipped (Example: H3 follows directly after H1). UA1:7.4-1 Doc Machine -
14-006 A node contains more than one H tag. UA1:7.4.4-1 Object Machine -
14-007 Document uses both H and H# tags. NOTE: In weakly-structured documents, headings always take the form “Hn” (e.g., H1, H2, Hn) without intervening whitespace or numerical separators. UA1:7.4.4-3 Doc Machine -
Checkpoint 15: Tables
Index Failure Condition Section Type How See
15-003 In a table not organized with Headers attributes and IDs, a TH cell does not contain a Scope attribute. UA1:7.5-2 Object Machine -
Checkpoint 17: Mathematical Expressions
Index Failure Condition Section Type How See
17-002 Formula tag is missing an Alt attribute. UA1:7.7-1 Object Machine -
17-003 Unicode mapping requirements are not met. UA1:7.7-2 Object Machine 10-001
Checkpoint 19: Notes and References
Index Failure Condition Section Type How See
19-003 ID key of the Note tag is not present. UA1:7.9-2 Object Machine -
19-004 ID key of the Note tag is non-unique. UA1:7.9-2 Object Machine -
Checkpoint 20: Optional Content
Index Failure Condition Section Type How See
20-001 Name entry is missing or has an empty string as its value in an Optional Content Configuration Dictionary in the Configs entry in the OCProperties entry in the Catalog dictionary. UA1:7.10-1 Object Machine -
20-002 Name entry is missing or has an empty string as its value in an Optional Content Configuration Dictionary that is the value of the D entry in the OCProperties entry in the Catalog dictionary. UA1:7.10-1 Object Machine -
20-003 The AS key appears in an Optional Content Configuration Dictionary. UA1:7.10-2 Object Machine -
Checkpoint 21: Embedded Files
Index Failure Condition Section Type How See
21-001 The file specification dictionary for an embedded file does not contain F and UF keys. UA1:7.11-1 Object Machine -
Checkpoint 25: XFA
Index Failure Condition Section Type How See
25-001 File contains the dynamicRender element with value “required”. UA1:7.15-1 Object Machine -
Checkpoint 26: Security
Index Failure Condition Section Type How See
26-001 The file is encrypted but does not contain a P key in its encryption dictionary. UA1:7.16-1 Object Machine -
26-002 The file is encrypted and does contain a P key but the 10th bit position of the P key is false. UA1:7.16-1 Object Machine -
Checkpoint 28: Annotations
Index Failure Condition Section Type How See
28-002 An annotation, excluding annotations of subtype Widget, Popup or Link, is not nested within an Annot tag. UA1:7.18.1-2 Object Machine -
28-004 An annotation other than annotations of type Link, Widget or Popup whose hidden flag is not set and whose rectangle is not outside the cropbox and does not have a Contents key does not have an alternative description (in the form of an Alt entry in the enclosing structure element). UA1:7.18.1-4 Object Machine -
28-005 A form field whose hidden flag is not set and whose rectangle is not outside the crop-box and does not have a TU key does not have an alternative description (in the form of an Alt entry in the enclosing structure element). UA1:7.18.1-4 Object Machine -
28-006 An annotation with subtype undefined in ISO 32000 does not meet 7.18.1. UA1:7.18.2-1 Object Machine 28-001, 28-002, 28-003, 28-004
28-007 An annotation of subtype TrapNet exists. UA1:7.18.2-2 Object Machine -
28-008 A page containing an annotation does not contain a Tabs key. UA1:7.18.3-1 Object Machine -
28-009 A page containing an annotation has a Tabs key with a value other than S. UA1:7.18.3-1 Object Machine -
28-010 A widget annotation is not nested within a Form tag. UA1:7.18.4-1 Object Machine -
28-011 A link annotation is not nested within a Link tag. UA1:7.18.5-1 Object Machine -
28-012 A link annotation does not include an alternate description in the Contents Key. UA1:7.18.5-2 Object Machine -
28-014 CT key is missing from the media clip data dictionary. UA1:7.18.6.2-1 Object Machine -
28-015 Alt key is missing from the media clip data dictionary. UA1:7.18.6.2-1 Object Machine -
28-016 File attachment annotations do not conform to 7.11. UA1:7.18.7-1 Object Machine 21-001
28-017 A PrinterMark annotation is included in logical structure. UA1:7.18.8-1 Object Machine -
28-018 The appearance stream of a PrinterMark annotation is not marked as Artifact. UA1:7.18.8-2 Object Machine 01-002, 01-005
Checkpoint 30: XObjects
Index Failure Condition Section Type How See
30-001 A reference XObject is present. UA1:7.20-1 Object Machine -
30-002 Form XObject contains MCIDs and is referenced more than once. UA1:7.20-2 Object Machine -
Checkpoint 31: Fonts
Index Failure Condition Section Type How See
31-001 A Type 0 font dictionary with encoding other than Identity-H and Identity-V has values for Registry in both CIDSystemInfo dictionaries that are not identical. UA1:7.21.3-1 Object Machine -
31-002 A Type 0 font dictionary with encoding other than Identity-H and Identity-V has values for Ordering in both CIDSystemInfo dictionaries that are not identical. UA1:7.21.3.1-1 Object Machine -
31-003 A Type 0 font dictionary with encoding other than Identity-H and Identity-V has a value for Supplement in the CIDSystemInfo dictionary of the CID font that is less than the value for Supplement in the CIDSystemInfo dictionary of the CMap. UA1:7.21.3.1-1 Object Machine -
31-004 A Type 2 CID font contains neither a stream nor the name Identity as the value of the CIDToGIDMap entry. UA1:7.21.3.2-1 Object Machine -
31-005 A Type 2 CID font does not contain a CIDToGIDMap entry. UA1:7.21.3.2-1 Object Machine -
31-006 A CMap is neither listed as described in ISO 320001:2008, 9.7.5.2, Table 118 nor is it embedded. UA1:7.21.3.3-1 Object Machine -
31-007 The WMode entry in a CMap dictionary is not identical to the WMode value in the CMap stream. UA1:7.21.3.3-1 Object Machine -
31-008 A CMap references another CMap which is not listed in ISO 32000-1:2008, 9.7.5.2, Table 118. NOTE 1: For more information see ISO 32000-1 9.7.5.3, UseCMap entry. UA1:7.21.3.3-2 Object Machine -
31-009 For a font used by text intended to be rendered the font program is not embedded. NOTE 2: A glyph is used for rendering if the text render mode is not equal 3 (text render mode 3 is used for invisible text). UA1:7.21.4.1-1 Object Machine -
31-011 For a font used by text the font program is embedded but it does not contain glyphs for all of the glyphs referenced by the text used for rendering. NOTE 3: A glyph is used for rendering if the text render mode is not equal to 3 (text render mode 3 is used for invisible text). UA1:7.21.4.1-3 Object Machine -
31-012 The FontDescriptor dictionary of an embedded Type 1 font contains a CharSet string, but at least one of the glyphs present in the font program is not listed in the CharSet string. UA1:7.21.4.2-1 Object Machine -
31-013 The FontDescriptor dictionary of an embedded Type 1 font contains a CharSet string, but at least one of the glyphs listed in the CharSet string is not present in the font program. UA1:7.21.4.2-2 Object Machine -
31-014 The FontDescriptor dictionary of an embedded CID font contains a CIDSet string, but at least one of the glyphs present in the font program is not listed in the CIDSet string. UA1:7.21.4.2-3 Object Machine -
31-015 The FontDescriptor dictionary of an embedded CID font contains a CIDSet string, but at least one of the glyphs listed in the CIDSet string is not present in the font program. UA1:7.21.4.2-4 Object Machine -
31-016 For one or more glyphs, the glyph width information in the font dictionary and in the embedded font program differ by more than 1/1000 unit. UA1:7.21.5-1 Object Machine -
31-017 A non-symbolic TrueType font is used for rendering, but none of the cmap entries in the embedded font program is a non-symbolic cmap. UA1:7.21.6-1 Object Machine -
31-018 A non-symbolic TrueType font is used for rendering, but for at least one glyph to be rendered the glyph cannot be looked up by any of the non-symbolic cmap entries in the embedded font program. UA1:7.21.6-2 Object Machine -
31-019 The font dictionary for a non-symbolic TrueType font does not contain an Encoding entry. UA1:7.21.6-3 Object Machine -
31-020 The font dictionary for a non-symbolic TrueType font contains an Encoding dictionary which does not contain a BaseEncoding entry. UA1:7.21.6-4 Object Machine -
31-021 The value for either the Encoding entry or the BaseEncoding entry in the Encoding dictionary in a non-symbolic TrueType font dictionary is neither MacRomanEncoding nor WinAnsiEncoding. UA1:7.21.6-5 Object Machine -
31-022 The Differences array in the Encoding entry in a non-symbolic TrueType font dictionary contains one or more glyph names which are not listed in the Adobe Glyph List. UA1:7.21.6-6 Object Machine -
31-023 The Differences array is present in the Encoding entry in a non-symbolic TrueType font dictionary but the embedded font program does not contain a (3,1) Microsoft Unicode cmap. UA1:7.21.6-7 Object Machine -
31-024 The Encoding entry is present in the font dictionary for a symbolic TrueType font. UA1:7.21.6-8 Object Machine -
31-025 The embedded font program for a symbolic TrueType font contains no cmap. UA1:7.21.6-9 Object Machine -
31-026 The embedded font program for a symbolic TrueType font contains more than one cmap, but none of the cmap entries is a (3,0) Microsoft Symbol cmap. UA1:7.21.6-10 Object Machine -
31-027 A font dictionary does not contain the ToUnicode entry and none of the following is true:- the font uses MacRomanEncoding, MacExpertEncoding or WinAnsiEncoding- the font is a Type 1 or Type 3 font and the glyph names of the glyphs referenced are all contained in the Adobe Glyph List or the set of named characters in the Symbol font, as defined in ISO 32000-1:2008, Annex D-the font is a Type 0 font, and its descendant CIDFont uses Adobe-GB1, Adobe-CNS1, Adobe-Japan1 or Adobe-Korea1 character collections - the font is a non-symbolic TrueType font UA1:7.21.7-1 Object Machine -
31-028 One or more Unicode values specified in the ToUnicode CMap are zero (0). UA1:7.21.7-2 Object Machine -
31-029 One or more Unicode values specified in the ToUnicode CMap are equal to either U+FEFF or U+FFFE. UA1:7.21.7-3 Object Machine -
31-030 One or more characters used in text showing operators reference the .notdef glyph. UA1:7.21.8-1 Object Machine -

Human

Checkpoint 01: Real content tagged
Index Failure Condition Section Type How See
01-001 Artifact is tagged as real content. UA1:7.1-1 Object Human -
01-002 Real content is marked as artifact. UA1:7.1-1 Object Human -
01-006 The structure type and attributes of a structure element are not semantically appropriate for the structure element. All of the following structure types must be taken into account: Document, Part, Art, Sect, Div, BlockQuote, Caption, TOC, TOCI, Index, NonStruct, Private, P, H, H1, H2, H3, H4, H5, H6, L, LI, Lbl, LBody, Table, TR, TH, TD, THead, TBody, TFoot, Span, Quote, Note, Reference, BibEntry, Code, Link, Annot, Ruby, Warichu, RB, RT, RP, WT, WP, Figure, Formula, Form. NOTE 1: Structure type is not semantically appropriate if the nature of the content inside the structure element does not match the structure type of the structure element. NOTE 2: For any non-standard structure types, the standard structure type to which the type is rolemapped shall be used for validation. NOTE 3: Tables are regular when the number of logical cells is equal in each row after accounting for rowspan and colspan attributes. While PDF/UA-1 does not prohibit irregular tables, irregular tables are almost always a strong indicator of improper table structure. It may be a good idea to raise a warning when such tables are encountered, but it is not required by the Matterhorn Protocol. NOTE 4: The value of table cell attributes is a function of the cell’s semantic role in the table’s structure. Therefore, a TH cell may not include a Scope attribute with an inappropriate value. UA1:7.1-2 Object Human -
Checkpoint 02: Role Mapping
Index Failure Condition Section Type How See
02-002 The mapping of one or more non-standard types is semantically inappropriate. UA1:7.1-3 Doc Human -
Checkpoint 03: Flickering

(Note these are all Human)

Index Failure Condition Section Type How See
03-001 One or more Actions lead to flickering. UA1:7.1-5 Page Human -
03-002 One or more multimedia objects contain flickering content. UA1:7.1-5 Object Human -
03-003 One or more JavaScript actions lead to flickering. UA1:7.1-5 JS Human -
Checkpoint 04: Color and Contrast

(Note: This is only Human - though color contrast is something that can be checked on a web page...)

Index Failure Condition Section Type How See
04-001 Information is conveyed by contrast, color, format or layout, or some combination thereof but the content is not tagged to reflect all meaning conveyed by the use of contrast, color, format or layout, or some combination thereof. UA1:7.1-6 Object Human -
Checkpoint 05: Sound

(Note: These are all Human)

Index Failure Condition Section Type How See
05-001 Media annotation present, but audio content not available in another form. NOTE 1: An example of another form is a transcript. UA1:7.1-7 Object Human -
05-002 Audio annotation present, but content not available in another form. NOTE 2: An example of another form is a transcript. UA1:7.1-7 Object Human -
05-003 JavaScript uses beep function but does not provide another means of notification. UA1:7.1-7 JS Human -
Checkpoint 06: Metadata
Index Failure Condition Section Type How See
06-004 dc:title does not clearly identify the document UA1:7.1-8 Doc Human -
Checkpoint 08: OCR Validation

(Note: These are all Human)

Index Failure Condition Section Type How See
08-001 OCR-generated text contains significant errors. UA1:7.1-10 Page Human -
08-002 OCR-generated text is not tagged UA1:7.1-10 Page Human 01-006
Checkpoint 09: Appropriate Tags
Index Failure Condition Section Type How See
09-001 Tags are not in logical reading order. UA1:7.2-1 Doc Human -
09-002 Structure elements are nested in a semantically inappropriate manner. (e.g., a table inside a heading). UA1:7.2-1 Object Human -
09-003 The structure type (after applying any role-mapping as necessary) of a structure element is not semantically appropriate. UA1:7.2-1 Object Human 01-006
Checkpoint 11: Declared Natural Language
Index Failure Condition Section Type How See
11-001 Natural language for text in page content cannot be determined. UA1:7.2-3 Object Machine -
11-002 Natural language for text in “Alt”, “ActualText” and “E” attributes cannot be determined. UA1:7.2-3 Object Machine -
11-003 Natural language in the Outline entries cannot be determined. UA1:7.2-3 Object Machine -
11-004 Natural language in the “Contents” entry for annotations cannot be determined. UA1:7.2-3 Object Machine -
11-005 Natural language in the TU key for form fields cannot be determined. UA1:7.2-3 Object Machine -
11-006 Natural language for document metadata cannot be determined. UA1:7.2-3 Doc Machine -
11-007 Natural language is not appropriate. UA1:7.2-3 All Human -
Checkpoint 12: Stretchable Characters

(Note: Human Only)

Index Failure Condition Section Type How See
12-001 Stretched characters are not represented appropriately. UA1:7.2-4 Object Human -
Checkpoint 13: Graphics
Index Failure Condition Section Type How See
13-001 Graphics objects other than text objects and artifacts are not tagged with a Figure tag. UA1:7.3-1 Object Human -
13-002 A link with a meaningful background does not include alternative text describing both the link and the graphic’s purpose. UA1:7.3-1 Object Human -
13-003 A caption is not tagged with a Caption tag. UA1:7.3-2 Object Human -
13-005 Actual text used for a Figure for which Alternative text is more appropriate. UA1:7.3-4 Object Human -
13-006 Graphics objects that possess semantic value only within a group of graphics objects is tagged on its own. UA1:7.3-5 Object Human -
13-007 A more accessible representation is not used. UA1:7.3-6 Object Human -
Checkpoint 14: Headings
Index Failure Condition Section Type How See
14-001 Headings are not tagged. UA1:7.4-1 Doc Human 01-006
14-004 Numbered heading tags do not use Arabic numerals. UA1:7.4.3-1 Object Human 01-006
14-005 Content representing a 7th level (or higher) heading does not use an “H7” (or higher) tag. UA1:7.4.3-1 Object Human 01-006
Checkpoint 15: Tables
Index Failure Condition Section Type How See
15-001 A row has a header cell, but that header cell is not tagged as a header. UA1:7.5-1 Object Human -
15-002 A column has a header cell, but that header cell is not tagged as a header. UA1:7.5-1 Object Human -
15-004 Content is tagged as a table for information that is not organized in rows and columns. UA1:7.5-3 Object Human -
15-005 A given cell’s header cannot be unambiguously determined. UA1:7.5-2 Object Human 01-006
Checkpoint 16: Lists

(Note: These are all Human)

Index Failure Condition Section Type How See
16-001 List is an ordered list, but no value for the ListNumbering attribute is present. UA1:7.6-1 Object Human -
16-002 List is an ordered list, but the ListNumbering value is not one of the following: Decimal, UpperRoman, LowerRoman, UpperAlpha, LowerAlpha. UA1:7.6-1 Object Human -
16-003 Content is a list but is not tagged as a list. UA1:7.6-2 Object Human 01-006
Checkpoint 17: Mathematical Expressions
Index Failure Condition Section Type How See
17-001 Content is a mathematical expression but is not tagged with a Formula tag. UA1:7.7-1 Object Human 01-006
Checkpoint 18: Page Headers and Footers

(Note: These are both Human)

Index Failure Condition Section Type How See
18-001 Headers and footers are not marked as pagination artifacts. UA1:7.8-1 Object Human -
18-002 Header or footer artifacts are not classified as Header or Footer subtypes. UA1:7.8-1 Object Human -
Checkpoint 19: Notes and References
Index Failure Condition Section Type How See
19-001 Footnotes, endnotes, note labels are not tagged as Note. UA1:7.9-1 Object Human -
19-002 References are not tagged as Reference. UA1:7.9-1 Object Human -
Checkpoint 22: Article Threads

(Note: Human Only)

Index Failure Condition Section Type How See
22-001 Article threads do not reflect logical reading order. UA1:7.12-1 Object Human -
Checkpoint 24: Non-Interactive Forms

(Note: Human Only)

Index Failure Condition Section Type How See
24-001 Non-interactive forms are not tagged with the PrintFields attribute. UA1:7.14-1 Object Human -
Checkpoint 28: Annotations
Index Failure Condition Section Type How See
28-001 An annotation whose hidden flag is not set and whose rectangle is not outside the crop-box is not in correct reading order. UA1:7.18.1-2 Object Human -
28-003 An annotation whose hidden flag is not set and whose rectangle is not outside the crop-box is used for visual formatting but is not tagged according to its semantic function. UA1:7.18.1-3 Object Human -
28-013 The IsMap key is present with a value of true but the functionality is not provided in some other way. UA1:7.18.5-3 Object Human -
Checkpoint 29: Actions

(Note: Human Only)

Index Failure Condition Section Type How See
29-001 A script requires specific timing for individual keystrokes. UA1:7.19-1 Object Human -
Checkpoint 31: Fonts
Index Failure Condition Section Type How See
31-010 A font program is embedded that is not legally embeddable for unlimited, universal rendering. UA1:7.21.4.1-2 Object Human -

Other

Checkpoint 23: Digital Signatures

(Note: not testable)

Index Failure Condition Section Type How See
23-001 No test specific to digital signatures is required, however other provisions apply (form fields). UA1:7.13-1 -- -- 01-006
Checkpoint 27: Navigation

(Note: not testable)

Index Failure Condition Section Type How See
27-001 No tests specific to navigation are required; use appropriate semantics. UA1:7.17-1 -- -- 01-006