TEI Encoding of the Great Exhibition Catalogue - Gia-Alexander/HIST630 GitHub Wiki

Table of Contents

Coding Overview

Populating the TEI Header

<fileDesc>

<encodingDesc>

Coding the Five Introductions

<objectName>

<persName>

<placeName>

Coding the Offset Markup File

Coding the ODD File

Coding and Testing the Schematron File

Generating and Associating the Schema

Discussion

Future Work

Coding Overview

Each file in the present project is coded in XML with a TEI root element. The encoded files in this project include the following:

  • GE_17.xml: Introduction to Class 17 of the Great Exhibition Catalogue, Paper

  • GE_23.xml: Introduction to Class 23 of the Great Exhibition Catalogue, Metal

  • GE_24.xml: Introduction to Class 24 of the Great Exhibition Catalogue, Glass

  • GE_26.xml: Introduction to Class 26 of the Great Exhibition Catalogue, Furniture

  • GE_29.xml: Introduction to Class 29 of the Great Exhibition Catalogue, Small Wares

  • GEOffset.xml: Offset markup for this project for persons <persName> and places placeName

  • RebuiltODD.odd: TEI customization file for this project

  • GE_Schematron.sch: Schematron rules for class and subclass taxonomies

  • RebuiltODD.rng: Project schema

Most of the TEI header is the same across all project files; however, each of the five primary content files contains an additional layer of taxonomy for subclasses as specified in the Great Exhibition Catalobue. For example, the file for Class 17, Paper, contains subclass object types for bulk paper "BulkPaper", stationery "Stationery", cardstock "CardStock", and objects related to printing "Printing". Schematron rules restrict the categorization of future <objectName> types to those specified in each subclass taxonomy.

Populating the TEI Header

The TEI header for the file GE_17.xml stands as an exemplus for the headers used across this project.

<?xml version="1.0" encoding="UTF-8"?>
<?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml"
	schematypens="http://purl.oclc.org/dsdl/schematron"?>
<?xml-model href="https://raw.githubusercontent.com/Gia-Alexander/HIST630/master/GESchematron.sch" type="application/xml" schematypens="http://purl.oclc.org/dsdl/schematron"?>
<?xml-model href="https://raw.githubusercontent.com/Gia-Alexander/HIST630/master/RebuiltODD.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
   <teiHeader>
      <fileDesc>
         <titleStmt>
            <title>TEI Encoding of the Introduction to Class 17 from <bibl><title> Volume III of the
                     Official Catalogue of the Great Exhibition of the Works of Industry of All
                     Nations <date when="1851">1851</date>(4 vols.),</title><biblScope unit="page"
                     from="536" to="537"/></bibl></title>
         </titleStmt>
         <publicationStmt>
            <authority>Gia O. Alexander</authority>
            <pubPlace>College Station, TX</pubPlace>
            <date>Fall 2019</date>
            <availability>
               <licence>This work is licensed under a Creative Commons Attribution 4.0 International
                  Licence and may be freely reused with attribution.</licence>
               <p>The primary source material for this project is available in the public domain at
                  archive.org.</p>
            </availability>
         </publicationStmt>
         <sourceDesc>
            <p>Information about the publication of the source work. Introductions to sections of
               the catalogue containing information about about writing substrates, implements and
               accessories excerpted from <bibl><title>Volume III of the Official Catalogue of the
                     Great Exhibition of the Works of Industry of All Nations <date when="1851"
                        >1851</date></title> (4 vols.) from <publisher>Archive.org</publisher>,
                  scanned from <placeName ref="#HarvardCollege">Harvard College Library</placeName>
                  by <publisher>Google Books</publisher>. <idno type="URI">
                     https://archive.org/details/officialcatalog06unkngoog/page/n14</idno></bibl>
            </p>
         </sourceDesc>
      </fileDesc>
      <encodingDesc>
         <classDecl>
            <taxonomy>
               <category xml:id="Paper">
                  <catDesc> PAPER, PRINTING, AND BOOKBINDING </catDesc>
                  <category xml:id="BulkPaper">
                     <catDesc>A. Paper in the raw slate as it leaves the mill, such as Brown Paper,
                        Millboards, Printing, Writing, and Drawing Papen, &amp;c.</catDesc>
                  </category>
                  <category xml:id="Stationery">
                     <catDesc>B. Articles of Stationery, as Envelopes, Lace Papers, Fancy Papers,
                        Ornamented and Glazed Papers, Sealing-wai, Wafers, inks of ail kinds,
                        &amp;c.
                           <idno type="URI">https://hobancards.com/stationery-of-the-victorian-era</idno>
                     </catDesc>
                  </category>
                  <category xml:id="CardStock">
                     <catDesc>C. Pasteboards, Cards, &amp;c. ; and Scaleboard Boxes, Cartonnerie,
                        &amp;c.</catDesc>
                  </category>
                  <category xml:id="Printing">
                     <catDesc>D. Printing, not including printing as a fine art, and Printing Inks
                        and Varnishes ; Bookbinding in cloth, velvet, vellum, &amp;c. ; Fancy Books,
                        Portfolios, Desks, &amp;c,</catDesc>
                  </category>
               </category>
            </taxonomy>
         </classDecl>
      </encodingDesc>
   </teiHeader>

<fileDesc>

This element contains the title statement <titleStmt>, publication statement <publicationStmt>, source description <sourceDesc>, and licensing information for the encoded files in this project. Here, I have taken care to differentiate between the title and publication information for my present project versus that of the primary source. Using the <bibl> element, I carefully specify which copy of the text of the Great Exhibition Catalogue I am working from. To aid potential users of my archive in their own digital humanities research, I also emphasize that the primary source is in the public domain by adding a <p> element to this effect and by pointing out that they may freely build upon my work under a Creative Commons license with attribution <licence>.

<encodingDesc>

The encoding description <encodingDesc> provides the double-layered <taxonomy> of classes and subclasses of items in the Great Exhibition Catalogue. In the specific example above from GE_17.xml for Class 17: Paper, the subclasses noted above appear. I achieve the two layers of <taxonomy> by nesting a <category> within a <category> .

Coding the Five Introductions

Using an XML class declaration, I differentiated each of the five primary content files according to volume and class from the Great Exhibition Catalogue, as follows:

 <body>
         <div type="class" n="3.17">
            <head type="class"> PAPER, PRINTING, AND BOOKBINDING. </head>
            <div type="intro">
               <head type="subTitle"> INTRODUCTION. </head>

In the above example, the code n="3.17" indicates that the file encodes Class 17 from the third volume of the Great Exhibition Catalogue.

<objectName>

Although some digital humanists may regard it as "tag abuse," in this project, I use the TEI element <objectName> to denote types of objects and link them to their subclasses in the second layer of the taxonomy in each primary content file. Keeping with GE_17.xml as an example, I thus mark the following:

<objectName type="Stationery">small articles for fancy purposes</objectName>

where "Stationery" refers to a subclass in the file taxonomy.

<persName>

Interestingly, across all five primary content files in this encoding project, I have only two persons: the enigmatic "R. E." who signs off as the author of each introduction, and Queen Victoria. I use xml:id tags throughout this project, and here stands an example from GE_23.xml of how doing so helps me consolidate different references to the same person, in this case the Queen:

the Jewel- case and the <objectName type="Jewellery">Great Diamond</objectName> 
exhibited by <persName ref="#QueenVictoria">Her Majesty</persName> are 
instances of this description. </p>

Throughout the primary content files coded for this project Victoria is referred to as "Her Majesty," "Her Majesty the Queen," "the Queen," and "Queen Victoria." By using ref="#QueenVictoria" to point to my offset markup file, I can direct all of these references to the same and correct person.

<placeName>

A primary rhetorical purpose behind the Great Exhibition and its catalogue was to promote the breadth and wealth of the British Empire. Likewise, the primary content files contain ample references to locations therein. Most of these are straightforward, direct references, such as the following fromGE_17.xml:

Considerably more is made in <placeName ref="#England">England</placeName> than in <placeName ref="#Scotland">Scotland</placeName> or <placeName ref="#Ireland" >Ireland</placeName>. 

However, the location of the Great Exhibition itself, London, is often also referred to as the "Metropolis." Similarly to the previous case, an xml:id helps to indicate that these are one and the same. Consider the following example, again from GE_17.xml:

The localities from whence the articles exhibited have been sent are
much less restricted than in preceding Classes. Many of the exhibitors appear in
the capacity of producers of <objectName type="Stationery">small articles for 
fancy purposes</objectName> ; and as these are
obviously capable of being made at home, requiring taste and minute skill rather
than mechanical power for their manufacture, the places from which they have been
forwarded for exhibition have not the special interest attaching to great
producing towns or cities, where thousands of machines and operatives are all
occupied in one department of manufacture. From the <placeName ref="#London">
metropolis,</placeName> however, where a
large demand for such articles exists, the great proportion of them are derived.
<placeName ref="#London">London</placeName> also represents most largely the
 enormous printing resources of this country. Hut of these, as specimens only of
single works can appear, but a faint idea can be gained from the examples
exhibited, In one of the greatest establishments of the <placeName ref="#London"
 >Metropolis</placeName> twenty <objectName type="Printing">machines</objectName> 
are constantly occupied, each of which
is capable of throwing off from 3,000 to 4,000 impressions per hour, and in
addition a large number of printing <objectName type="Printing">machines</objectName> 
for fine work are employed. These great printing establishments resemble very closely 
the large manufactories of
other districts, only their origination differs with the peculiar nature of the
manufacture, if the mechanical production of printed books may be so termed. 

Coding the Offset Markup for Persons and Places

GE_Offset.xml includes lists of persons and places. Some of the places most relevant to the material culture of writing in the Victorian era have pointers to web pages that provide additional information about the specified location's role in the Great Exhibition. For example:

<listPlace>
     <place xml:id="London">
               <placeName>London</placeName>
               <idno type="URI">https://en.wikipedia.org/wiki/Great_Exhibition</idno>
            </place>
</listPlace>

Coding the ODD File

The main TEI customization I undertook in my .odd file for this project involved carefully defining the double-layer taxonomy of classes and subclasses for future coding of entries for each exhibitor who contributed an object related to the material culture of writing. I further created a closed attribute list for the second or sublclass layer of the taxonomy, as follows:

      <elementSpec ident="category" module="header" mode="change">
        <content>
          <textNode/>
          <elementRef key="category"/>
        </content>
        <attList>
          <attDef ident="ref">
            <valList type="closed">
              <valItem ident="#Paper">
                <gloss>PAPER, PRINTING, AND BOOKBINDING</gloss>
              </valItem>
              <valItem ident="#Metal">
                <gloss>WORKS IN PRECIOUS METALS, JEWELLRY, ETC.</gloss>
              </valItem>
              <valItem ident="#Glass">
                <gloss>GLASS</gloss>
              </valItem>
              <valItem ident="#Furniture">
                <gloss> FURNITURE, UPHOLSTERY, PAPER HANGINGS, PAPIER MACHE AND JAPANNED GOODS</gloss>
              </valItem>
              <valItem ident="#SmallWares">
                <gloss>MISCELLANEOUS MANUFACTURES AND SMALL WARES</gloss>
              </valItem>
            </valList> 
            <remarks><p>This project uses taxonomies to categorize classes and subclasses of materials and objects
              as presented in the Great Exhibition Catalogue.</p></remarks>
          </attDef>
        </attList>
      </elementSpec>

for the top layer and the following for the secondary of subclass layer:

      <elementSpec ident="objectName" module="namesdates" mode="change">
        <content>
          <textNode/>
          <elementRef key="objectName"/>
        </content>
        <attList>
          <attDef ident="type">
            <valList type="closed">
              <valItem ident="BulkPaper"></valItem>
              <valItem ident="Stationery"></valItem>
              <valItem ident="CardStock"></valItem>
              <valItem ident="Printing"></valItem>
              <valItem ident="Communion"></valItem>
              <valItem ident="PreciousMetals"></valItem>
              <valItem ident="GeneralDomestic"></valItem>
              <valItem ident="ElectroPlate"></valItem>
              <valItem ident="SheffieldPlate"></valItem>
              <valItem ident="Ormolu"></valItem>
              <valItem ident="Jewellery"></valItem>
              <valItem ident="OrnamentsToys"></valItem>
              <valItem ident="Beads"></valItem>
              <valItem ident="Curios"></valItem>
              <valItem ident="Windows"></valItem>
              <valItem ident="StainedGlass"></valItem>
              <valItem ident="CastGlass"></valItem>
              <valItem ident="Bottles"></valItem>
              <valItem ident="ChemGlass"></valItem>
              <valItem ident="FlintGlass"></valItem>
              <valItem ident="OpticalGlass"></valItem>
              <valItem ident="Decor"></valItem>
              <valItem ident="Upholstery"></valItem>
              <valItem ident="wallpaper"></valItem>
              <valItem ident="PapierMache"></valItem>
              <valItem ident="Perfume"></valItem>
              <valItem ident="BoxesCases"></valItem>
              <valItem ident="FakeFlowers"></valItem>
              <valItem ident="Candles"></valItem>
              <valItem ident="Confectionery"></valItem>
              <valItem ident="Beads"></valItem>
              <valItem ident="Umbrellas"></valItem>
              <valItem ident="Fishing"></valItem>
              <valItem ident="WalkingCanes"></valItem>
              <valItem ident="MiscSmallWares"></valItem>
            </valList> 
            <exemplum xml:lang="en">
              <p>The <gi>objectName</gi> element is used in this project to pair objects and types of objects
              mentioned in the text to their corresponding subclasses as set forth in Great Exhibition Catalogue.</p>
              <eg xml:space="preserve"><![CDATA[ <q>The application of improved <objectName                type="Printing">machinery</objectName> to printing is also of recent
                  date, and has been attended with results of great moment. </q>]]></eg>
              <p>As shown above, the <att>type</att> attribute may be used to distinguish the one from the
                other.</p>
            </exemplum>
            <remarks><p>This project uses objectName:type to specify subcategories of object classes in the Great        Exhibition Catalogue.</p></remarks>
          </attDef>
        </attList>
      </elementSpec>

I also included the following examplum for <objectName> to clarify my somewhat unconventional use of this tag:

<exemplum xml:lang="en">
              <p>The <gi>objectName</gi> element is used in this project to pair objects and types of objects
              mentioned in the text to their corresponding subclasses as set forth in Great Exhibition Catalogue.</p>
              <eg xml:space="preserve"><![CDATA[ <q>The application of improved <objectName                  type="Printing">machinery</objectName> to printing is also of recent
                  date, and has been attended with results of great moment. </q>]]></eg>
              <p>As shown above, the <att>type</att> attribute may be used to distinguish the one from the
                other.</p>
            </exemplum>

Coding and Testing the Schematron File

So that future encoders of this project and I stay within the bounds of the closed attribute list for <objectName> as described above and in the Great Exhibition Catalogue itself, I created Schematron rules to restrict future entries, as follows from GESchematron.sch:

 <sch:rule context="tei:teiHeader//tei:catDesc">
            <sch:report test="tei:gloss">
                A &lt;catDesc&gt; element in the &lt;taxonomy&gt; may not contain a &lt;gloss&gt; element.
            </sch:report>
        </sch:rule>
                
        <sch:rule context="tei:header/category">
            <sch:let name="category" value="//tei:teiHeader/category/xml:id"/>
            <sch:assert test=". = category">
                Preferred values: #Paper, #Metal, #Glass, #Furniture, #SmallWares
            </sch:assert>
        </sch:rule>
        
        <sch:rule context="tei:body/p/objectName">
            <sch:let name="objectName" value="//tei:body/p/objectName/type"/>
            <sch:assert test=". = objectName">
                Preferred values: 
                BulkPaper, Stationery, CardStock, Printing, 
                Communion, PreciousMetals. GeneralDomestic, ElectroPlate, SheffieldPlage, Ormolu, Jewellery, OrnamentsToys, Enamel, Curios, 
                Windows, StainedGlass, CastGlass, Bottles, ChemGlass, FlintGlass, OpticalGlass,
                Decor, Upholstery, Wallpaper, PapierMache,
                Perfume, BoxesCases, FakeFlowers, Candles, Confectionery, Beads, Umbrellas, Fishing, WalkingCanes,    MiscSmallWares                
            </sch:assert>
        </sch:rule>

I then tested GESchematron.sch against GE_17.xml by associating the Schematron file with the content file in Oxygen. GE_17.xml remained valid after this association. When I attempted to change an instance of <objectName> to something not in the closed attribute list, Oxygen rejected the change and unvalidated my code.

Generating and Associating the Schema

I then generated the master schema for this project, <RebuiltODD.rng> by transforming the valid RebuiltODD.odd using RelaxNG in Oxygen. Next, I pushed the resulting schema from the out folder that Oxygen generated to the present repository in GitHub. I then accessed the schema on GitHub, captured the URL of the RAW output, and associated that with each of my five primary content files using Oxygen.

Discussion

I may be disqualifying myself from the most favorable evaluation of my project here, but DH ethics compel me to disclose some uncertainties and problems that remain with the present project at the time of this writing.

First, I was not sure whether I needed to associate the master project schema, RebuiltODD.rng, with my offset markup file, GE_Offset.xml.

Second, when I originally associated the raw form of RebuiltODD.rng with my five content files, they all remained valid across the board. I was at a good stopping place for the evening's work, so I closed Oxygen and shut down my computer. When resuming my work the next day for final upload and evaluation of this work, I noticed that Oxygen had unvalidated all five of my primary content files with the single error of "Missing Children" showing in Problems. I attempted to figure this problem out by trying to edit my schema file, but quickly began making it worse. The offending code is at the very bottom of RebuiltODD.rng, as follows:

  <start>
      <choice/>
   </start>
</grammar>

Oxygen locks on to the <choice/> element as missing its children.

Because all of my files had validated the previous evening, and because of the proximity of the code to the end of the schema, I am unsure whether this problem is a minor glitch, or whether I may be missing a large chunk of needed data and just not realize it because of my relatively new and limited understanding of xPath and Schematron. Evaluators may criticize and deduct points accordingly, but I would appreciate an answer to this very frustrating last-minute problem.

Future Work

Future work entails encoding an entry for each writing-related object exhibited.

⚠️ **GitHub.com Fallback** ⚠️