16‐map‐view - Pittsburgh-NEH-Institute/hoaXed GitHub Wiki

Data visualization and mapping

This section discusses the implementation of a web-based JavaScript map that is created dynamically from data in the corpus. In Real Life this type of visualization might serve as a discovery tool for articles and to communicate the geographic distribution of the corpus within and around London. We used it in our Institute to demonstrate how integration with external APIs and libraries can be implemented in an eXist-db application.

Reasons to map

Mapping, and data visualization more generally, is a great tool for communicating research outcomes as part of a digital edition. Thoughtful, task-appropriate visualizations excel at compressing large amounts of information into a small space, and although in many cases the complexity and consideration need to make an effective map may require a great deal of technical intervention, the XML data underlying an eXist-db edition provides a usefully structured starting point for reuse in a visualization.

What our map does and does not accomplish

The map in this tutorial is very much a minimum viable product; it shows the geographic distribution of the articles in the corpus, but it stops there. In Real Life we might have continued development to incorporate the following extensions and improvements:

We could have made the article identifiers (the ghost faces in the image below) clickable links that load reading views of the articles.
We could have a used a historical street map layer in the visualization instead of a modern one.
We could have implemented a strategy for preparing map data that was less dependent on manual effort. The underlying data in the current implementation comes from our hand-curated gazetteer, created by cross-referencing historical maps, geo-referenced maps, contemporary maps, and primary texts like gazetteers and street name lists. Mapping in a more ambitious real-life application might start with named entity recognition (relying on a computer to recognize place names) and retrieval of geographic coordinates from a large-scale gazetteer using linked open data.
Our map leaves unexplained the ways in which some of our research is creative and inferential so that we can foreground the relationships among points in the grid. In Real Life it would be important to document those details.

All of this is to say: Be cautious about how you visualize (and document) your research data and outcomes, including data that includes geographic or spatial information!

More reasons to be careful about mapping

Should your project incorporate a mapping visualization? A primary consideration for us would be whether the visualization contributes meaningfully to the research goals of the project. In the case of ghost hoaxes, part of the story we want to tell might involve how misinformation was spread geographically, as reflected in the local press, and the geographic distribution of published reports might help us explore that question. On the other hand, projects sometimes incorporate mapping visualizations that do not enhance the research utility, and that are ultimately only decorative. Decoration can attract people to the site who will then engage with the content and research results, but if our time or other resources were limited, a decision about whether to prioritize mapping would take into consideration the extent to which it enhanced the research value of the site.

One other consideration is that the small sample of articles in this project was chosen somewhat arbitrarily, which means that the distribution on the map is the distribution of our data set, that is, of the articles we happened to include. It is not the distribution of the phenomenon, that is, of ghost-hoax reports in nineteenth-century England. The non-representative selectivity of the data pertains to the project as a whole, to be sure, but the risk of miscommunication may be greater with a map than with text because maps, like visualizations in general, achieve their communicative density by excluding much of the data in order to highlight the rest (in this case, in order to highlight location). Reading views of a small number of articles might be understood as illustrative of the information that circulated about ghosts at the time without necessarily being statistically representative of the geographic distribution of those stories. A map, though, can easily be misunderstood as not only saying “stories were present in these locations”, but also implying that they were less present elsewhere, when what it really says is that “our sample is not geographically representative, and our stories happen to be from these locations”.

Goals

Evaluate third-party visualization tools
Use and adapt third-party documentation
Use and adapt third-party code, even in coding languages you don’t otherwise write yourself
Create a new page in a web app to display a map using the Mapbox GL JS JavaScript custom marker tutorial

The amount of JavaScript knowledge we need in order to integrate a JavaScript library into an HTML page is small enough that we can look things up on line as we go along. More important than JavaScript knowledge is that we need to know how to troubleshoot effectively, read error messages, and edit code incrementally. If you plan to code along with this stage in the tutorial, we recommend first reading through the Mapbox tutorials in full. You might also want to spend some time poking around in their documentation to become more familiar with the tool. We selected Mapbox because of the extensive public documentation and relatively low barriers to use. To create your own Mapbox maps, you will need to generate your own public API keys and create an account with them. For the number of API calls our project generates, the service has historically been free. Tool selection when it comes to external service providers is an important aspect of project lifecycle planning, so consider carefully which resources may be best for you. If a static map will be an effective visualization tool for your data and research goals, a static map is a good way to proceed! JavaScript library mapping is just one of many options for spatial visualization.

Drafting the pipeline

At this stage in the project we have a well established pipeline for processing data. To create a new feature that conforms to our general pipeline architecture we begin by writing an XQuery module in the modules directory that will create the model by extracting from the source documents only the information we want to use or display to the user. Next, we create a script in the views directory that takes the model as its input and creates an HTML section as its output. In some cases, this requires us to add information to our resources directory, such as CSS or images that are specific to our new page.

Creating the map in the example above above follows these pipeline steps. The model in this case is based on place names and geographic coordinates, which is relative simple, and most of our work will happen during the view phase. In our views so far we had to transform the model (in the model namespace) to HTML. In this case we need to transform the model to a combination of HTML and a new format, GeoJSON, that can interact with the Mapbox API. GeoJSON is a standardized way of encoding geographic information, and to transform our geographic details from our model namespace (which is XML) to GeoGSON we will use an XPath function called xml-to-json().

Create the model

The information we need to include in the model is a list of places, their latitude and longitude coordinates, and the titles and identifiers (@xml:id attributes on the root <TEI> element) of any articles in which that place is referenced. This will provide all the data required to create individual points on the map with some embedded information that aids discoverability for users.

We start, as we always do, by copying in namespace and data path variables from a previous file:

(: =====
Declare namespaces
===== :)
declare namespace tei="http://www.tei-c.org/ns/1.0";
declare namespace m="http://www.obdurodon.org/hoax/model";

declare namespace output = "http://www.w3.org/2010/xslt-xquery-serialization";
declare option output:method "xml";
declare option output:indent "no";

(: =====
Retrieve controller parameters
===== :)
declare variable $exist:root as xs:string := 
    request:get-parameter("exist:root", "xmldb:exist:///db/apps");
declare variable $exist:controller as xs:string := 
    request:get-parameter("exist:controller", "/hoaXed");
declare variable $path-to-data as xs:string := 
    $exist:root || $exist:controller || '/data';
    
declare variable $articles as element(tei:TEI)+ := collection($path-to-data || '/hoax_xml')/tei:TEI;

Next, we set up our output element and return the place names in places.xml so we have them available. Not every place name in the gazetteer will appear in the final output, so the code block below is not the complete expression that we use, but it’s a way to start with all places, and we’ll add a step to filter them shortly:

<m:geo-places>{
  for $entry in doc($path-to-data || '/aux_xml/places.xml')/descendant::tei:place 
    let $place-name as xs:string := $entry/tei:placeName => string-join('; ')
    return
       <m:name>{$place-name}</m:name>
}</m:geo-places>

What this XQuery FLWOR expression says is: for each entry in the <place> elements in our auxiliary gazetteer file we set the variable $place-name to represent a string that combines all of the <placeName> elements associated with that <place>, and we return that string inside an element in our model namespace called <m:name>.

We want our eventual map to include only the places that have been geocoded (that is, for which we have supplied latitude and longitude values), so we can enhance the FLWOR expression above with a where clause that filters the list of places accordingly:

<m:geo-places>{
  for $entry in doc($path-to-data || '/aux_xml/places.xml')/descendant::tei:place 
    let $place-name as xs:string := $entry/tei:placeName => string-join('; ')
    let $geo :=$entry/tei:location/tei:geo
    where exists($geo)
    return
         <m:name>{$place-name}</m:name>
}</m:geo-places>

Now that we have filtered for only the places that have a <tei:geo> element, the next step is to find the articles where those places are referenced, and for each of those articles we want to retrieve the @xml:id and a string version of the title. The XQuery below is incomplete because it finds all articles for all <placeName> elements, regardless of whether the particular <placeName> happens to appear in the article. We’ll fix that shortly, but for now:

<m:geo-places>{
  for $entry in doc($path-to-data || '/aux_xml/places.xml')/descendant::tei:place 
    let $place-name as xs:string := $entry/tei:placeName => string-join('; ')
    let $geo :=$entry/tei:location/tei:geo
    let $articles as element(tei:TEI)* :=  $articles[descendant::tei:placeName]
    where exists($geo)
    return
      <m:place>
        <m:name>
        {$place-name}
        </m:name>
        <m:articles>
            {for $article in $articles
                return <m:article>{$article/@xml:id, $article/descendant::tei:titleStmt/tei:title ! string(.)}</m:article>}
        </m:articles> 
      </m:place>
}</m:geo-places>

The code so far returns an identical list of all articles that mention places for all places. What we want instead is for each place name to produce a list not of all articles that mention any places, but only of articles that mention the single, specific place we care about at the moment. To do this, we should try to match the @xml:id of the gazetteer entry to the @ref attribute on the tag in the XML. You can see an example of how the @xml:id values are encoded below.

<place xml:id="cardross" type="neighborhood">
  <placeName>Cardross</placeName>
  <location>
    <settlement>Dumbartonshire</settlement>
    <geo>55.9628546 -4.6486472</geo>
  </location>
  <place xml:id="drumhead" type="haunted_house">
    <placeName>Drumhead</placeName>
    <location>
      <geo>55.97713888888889 -4.664861111111112</geo>
    </location>
  </place>
</place>

The unique @xml:id values on <place> elements in the gazetteer will allow us to find matches between places in the gazetteer and references to places in the articles. Let's change the let statement to include a new secondary predicate.

let $articles as element(tei:TEI)* :=  
    $articles[descendant::tei:placeName[substring-after(@ref, '#') eq $entry/@xml:id]]

Now instead of selecting every article that contains a place name, the new code should select only articles that contain the specified place name.

Finally, we want to include the latitude and longitude in the model we create so we can use it to make a map later on. This information is stored in the eXist-db index, so we can use a predicate and ft:query() to retrieve it. We want the predicate to be on the variable that contains the full path, so we return to the for clause declaration to add it. First, here is just the new predicate:

[ft:query(., (), map{'fields':('format-lat','format-long')})]

Predicates typically filter results; recall our earlier predicate that sselected <place> elements from the gazetteer only if they had associated <geo> values. In this new predicate we aren’t actually doing any filtering: the first argument (the dot) says to operate on the current item (the predicate is applied to each item, one by one) and the empty parentheses say that we aren’t filtering for specific textual content and every match should succeed. The point of adding this predicate, then, is just to make the formatted latitude and longitude information available for later use. Because tracking and retrieving field information is computationally expensive, eXist-db makes fields available only if they have been requested with ft:query(), even if the ft:query() function is not doing anything that we would normally think of as querying.

Below is the final XQuery that creates the model, which can generate input to the code that creates the view:

<m:geo-places>{
  for $entry in doc($path-to-data || '/aux_xml/places.xml')/descendant::tei:place
    [ft:query(., (), map {'fields': ('format-lat', 'format-long')})]
  let $place-name as xs:string := $entry/tei:placeName => string-join('; ')
  let $geo := $entry/tei:location/tei:geo
  let $articles as element(tei:TEI)* := 
    $articles[descendant::tei:placeName[substring-after(@ref, '#') eq $entry/@xml:id]]
  where exists($geo)
  return
    <m:place>
      <m:name>{$place-name}</m:name>
      <m:geo>
        <m:lat>{ft:field($entry, 'format-lat')}</m:lat>
        <m:long>{ft:field($entry, 'format-long')}</m:long>
        <m:articles>{
          for $article in $articles
          return
            <m:article>{$article/@xml:id, $article/descendant::tei:titleStmt/tei:title ! string(.)}</m:article>
        }</m:articles>
      </m:geo>
    </m:place>
}</m:geo-places>

Create the view

In the views subdirectory, we create a new file called maps-to-html.xql and begin by copying our usual housekeeping declarations:

(: Declare namespaces :)
declare namespace html="http://www.w3.org/1999/xhtml";
declare namespace hoax ="http://www.obdurodon.org/hoaxed";
declare namespace tei="http://www.tei-c.org/ns/1.0";
declare namespace m="http://www.obdurodon.org/hoax/model";
declare namespace console="http://existdb.org/xquery/console";
declare namespace xi="http://www.w3.org/2001/XInclude";

declare namespace output = "http://www.w3.org/2010/xslt-xquery-serialization";
declare option output:method "xml";
declare option output:indent "no";

declare variable $exist:root as xs:string := 
  request:get-parameter("exist:root", "xmldb:exist:///db/apps");
declare variable $exist:controller as xs:string := 
  request:get-parameter("exist:controller", "/hoaXed");

declare variable $data as document-node() := request:get-data();

With that established, we can think about how we want to approach this problem. According to the Mapbox tutorial, in order to configure our page to interact with Mapbox resources we need to add a few elements to our <head> element, we need to create a <div> with an @id value of "map" (which will hold the map created by the Mapbox JavaScript), and we need a <script> element in our HTML that will contain all of the JavaScript and our geodata, which must be in JSON format. Most of our effort will focus on creating the geodata because almost everything else is just boilerplate that can be copied from the tutorial and pasted into the view module. Remember that you need to edit the <head> element in the html-template.xql file, rather than in the specific view module itself, because view modules do not have their own <head> elements. You can see the example from this stage in our html-template.xql file.

We recommend getting the map to load from the view using the sample data first, and trying to render it with project data only once you’ve confirmed that it works with sample data. The reason is that if you use your real data immediately and the map fails to render, you won’t know whether the issue is with your data or with the code that processes it. If you start with sample data that you know is correct, you can be confident that any issues are in the processing code. We’re going to “fast forward” the troubleshooting to get the sample data map to load, but we recommend going step-by-step through the Mapbox tutorial before you resume this part of the stage.

To make this implementation more manageable, we declare separate variables that contain 1) the first part of the Javascript, 2) the data we want to map, and 3) the ending part of the Javascript. Then, inside the HTML section that will render to the user, we can concatenate those variables to create a single script. Below is the full script we use to render the sample data:

declare variable $js-front as xs:string := "mapboxgl.accessToken = 'pk.eyJ1IjoiZ2FiaWtlYW5lIiwiYSI6ImNqdWlzYWwxcTFlMjg0ZnBnM21kem9xZm4ifQ.CQ5LDwZO32ryoGVb-QQwCg';
const map = new mapboxgl.Map({
  container: 'map',
  style: 'mapbox://styles/mapbox/light-v10',
  center: [-96, 37.8],
  zoom: 3
});

";
declare variable $js-marker-data as xs:string := "const geojson = {
  type: 'FeatureCollection',
  features: [
    {
      type: 'Feature',
      geometry: {
        type: 'Point',
        coordinates: [-77.032, 38.913]
      },
      properties: {
        title: 'Mapbox',
        description: 'Washington, D.C.'
      }
    },
    {
      type: 'Feature',
      geometry: {
        type: 'Point',
        coordinates: [-122.414, 37.776]
      },
      properties: {
        title: 'Mapbox',
        description: 'San Francisco, California'
      }
    }
  ]
};";

declare variable $js-back as xs:string := "// add markers to map
for (const feature of geojson.features) {
  // create a HTML element for each feature
  const el = document.createElement('div');
  el.className = 'marker';

  // make a marker for each feature and add to the map
  new mapboxgl.Marker(el)
  .setLngLat(feature.geometry.coordinates)
  .setPopup(
    new mapboxgl.Popup({ offset: 25 }) // add popups
      .setHTML(
       `<h3>${feature.properties.name}</h3>
        <p>${feature.properties.appears}</p>`
      )
  )
  .addTo(map);;
};";

<html:section id="map-viz"> 

<html:div id="map"></html:div>

<html:div id="drawingPara">
    <html:script>
        {concat($js-front, $js-marker-data, $js-back)}
    </html:script>
</html:div>

</html:section>

Note that the three values are all strings, and therefore enclosed in quotation marks. Note also that both XQuery declare statements and JavaScript lines must end in semicolons; the semicolons inside the quotation marks are part of the JavaScript code and the ones outside the quotation marks signal the ends of the XQuery declare statements.

In order to get the edition data into JSON, we will store the data in XQuery-related data structures called maps and arrays and transform it to JSON using a function. An XQuery map is a data structure of key: value pairs, where a value is the data you are storing and the key is its name. Arrays are similar to sequences, but allow for nested information. This would be a good time to read the XQuery for Humanists section on maps and arrays (start on p. 139) to learn when, why, and how you might want to use them. The following XML representation of bibliographic information about XQuery for Humanists will serve as a template for our next steps:

let $doc :=
  <map xmlns="http://www.w3.org/2005/xpath-functions">
    <array key="workshops">
          <map>
            <array key="XQuery">
             <map>
                <string key="leader">Cliff</string>
                <string key="institution">Vanderbilt University</string>             
             </map> 
             <map>
                <string key="leader">Joe</string>
                <string key="institution">Department of State</string>            
             </map>
            </array>
          </map>
      </array>
 </map>
return fn:xml-to-json($doc)

Note that XML map and array information that is designed to be transformed to JSON with the xml-to-json() function must be in a specific namespace. When we evaluate this simple example in eXide, our Text Output (perhaps surprisingly, not JSON Output) result looks like the following:

{"workshops":[{"XQuery":[{"leader":"Cliff","institution":"Vanderbilt University"},{"leader":"Joe","institution":"Department of State"}]}]}

Once we create maps using the XML elements, the xml-to-json() function will be the simplest method for transforming our structured data into a flat JSON object. Below, you can find the variable we will eventually use to fill in all of the data, but for now it just creates a spot for every place in the data, but with the same sample data as before. In the XQuery below we specify the namespace for the XML that is to be transformed to JSON in a different way; because the prefix fn: is predefined as bound to the appropriate namespace, instead of specifying the namespace literally we can prepend the prefix to the elements. In Real Life we would use whichever of these methods we found easier to develop, test, and maintain:

declare variable $map as element(fn:map) := <fn:map>
    <fn:string key='type'>FeatureCollection</fn:string>
    <fn:array key='features'>{
        for $place in $data/descendant::m:geo-places/m:place
        return
            <fn:map>
                <fn:string key='type'>Feature</fn:string>
                <fn:map key='geometry'>
                    <fn:string key='type'>Point</fn:string>
                    <fn:array key='coordinates'>
                        <fn:number>-122.414</fn:number>
                        <fn:number>37.776</fn:number>             
                    </fn:array>
                </fn:map>
                <fn:map key='properties'>
                  <fn:string key='name'>San Francisco</fn:string>
                  <fn:string key="appears">Appears in title</fn:string>
                </fn:map>
            </fn:map>
    }</fn:array>
</fn:map>;

Play around with rendering the output of this variable in your browser to get a sense of how it looks. Next, we want to convert this variable to JSON. We begin with:

declare variable $geojson as xs:string := concat('const geojson = ', xml-to-json($map), ';') ;

This variable concatenates the Javascript variable declaration with the JSON, which has been converted using the function we referenced earlier. You can print this output in the browser to take a look at it, but if we want to render the map, we have to replace our previous test data with the newly constructed data:

<html:script>{concat($js-front, $geojson, $js-back)}</html:script>

When you save and check the output in the browser, you should see a map display with just one point on it. When you inspect the results in the browser (you can right click and inspect in the browser debugging view), you should see the same feature repeated over and over. Our next step will be to vary the data so we have many different features, instead of overlapping examples, as in:

To do this, we want to go back to the for instruction we created to make each feature and add some let statements. This way, we can get information for each place and store it in its own feature map. We are adding the full variable declaration below, but when we were developing this code added one step at a time so that we examine the output and correct any mistakes. Adding all of the variables at once would have resulted in a much more complicated debugging process:

declare variable $map as element(fn:map) := <fn:map>
    <fn:string key='type'>FeatureCollection</fn:string>
    <fn:array key='features'> {
        for $place in $data/descendant::m:geo-places/m:place
        let $name as xs:string := $place/m:name ! string(.)
        let $lat as xs:double := $place/descendant::m:lat ! number(.)
        let $long as xs:double := $place/descendant::m:long ! number(.)
        let $titles as xs:string* :=$place/descendant::m:article ! string(.)
        let $print-titles as xs:string := string-join($titles, ', ')
        return
            <fn:map>
                <fn:string key='type'>Feature</fn:string>
                <fn:map key='geometry'>
                    <fn:string key='type'>Point</fn:string>
                    <fn:array key='coordinates'>
                        <fn:number>{$long}</fn:number>
                        <fn:number>{$lat}</fn:number>             
                    </fn:array>
                </fn:map>
                <fn:map key='properties'>
                    <fn:string key='name'>{$name}</fn:string>
                    <fn:string key="appears">Appears in {$print-titles}</fn:string>
                </fn:map>
            </fn:map>
    }</fn:array>
</fn:map>;
xml-to-json($map)

We can change the map view focus closer to where our points are clustering by updating the $js-front variable as below:

declare variable $js-front as xs:string := "mapboxgl.accessToken = 'pk.eyJ1IjoiZ2FiaWtlYW5lIiwiYSI6ImNqdWlzYWwxcTFlMjg0ZnBnM21kem9xZm4ifQ.CQ5LDwZO32ryoGVb-QQwCg';
const map = new mapboxgl.Map({
  container: 'map',
  style: 'mapbox://styles/mapbox/light-v10',
  center: [-0.131719, 51.501029],
  zoom: 12
});

";

With this shift, you should be able to see a more centralized view of the places mentioned in the corpus, clustering around London and the surrounding areas when you reload the page. In this case, we just selected a center point that worked, but one could calculate the point if the map needed to be more dynamic.

If you have not followed along to build this, take a closer look at the final view file for a full picture.

Review

Any data visualization conveys some information clearly by obscuring other information. In this map we obscure the article date and publication information, while privileging article titles and the relationship to physical places as a tool for discovery and association. In the search view, we do the opposite: we obscure information related to place and reveal information related to the title, publication, and dates.

This section covered a few new technical topics in a drive-by way. We learned enough information about them to be useful to this project, but we did not become experts on them. Put another way: we learned enough to evaluate the tools against our project goals, and we then learned enough to implement those tools. If this project required long-term maintenance, part of our evaluation would include an assessment of the sustainability costs as well.

Here is a short list of those drive-by topics:

Javascript and JS libraries
external APIs
GeoJSON
XQuery maps and arrays and the xml-to-json() function

We considered a number of different options for creating a map that was interactive and could aid users in discovering spatialized connections among articles in the corpus. In the end, Mapbox won out because of its relatively small learning curve and low cost, while still maintaining the dynamic and interactive features we needed. The lesson of this stage is not “how to use Mapbox in an eXist-db application,” but, rather, "how to approach tool selection and development” even when you may not be entirely familiar or comfortable with the technical skills required.