Import Update IDs - nomisma/framework GitHub Wiki

The Nomisma administrative interface allows admins to create or update concepts from data stored in Google Spreadsheets. First, the Google spreadsheet must be published to the web. Once it is available on the web, the import XForms application will interact with Google's APIs to get the spreadsheet as an Atom XML feed. The spreadsheet must have clearly defined headings for all columns by dragging the separating bar below the first row.

The import process has four phases: selection of the Google spreadsheet and the type of concepts being imported, mapping column headings from the spreadsheet to RDF properties which are relevant to the type of concept, the validation, row by row, of content in the spreadsheet, and the transformation of the spreadsheet into RDF properties that will either A) create new IDs or B) add supplemental properties into existing IDs (it will not overwrite content). The import process saves the RDF/XML to the filesystem and then updates the SPARQL endpoint and Solr.

1. Select Spreadsheet and Concept Type

The type of concept must be selected from the drop down menu. It is important to note that all IDs in a spreadsheet must be of the same type of concept (e.g., mint, denomination, etc.). The spreadsheet key must also be inputted. The key can be copied and pasted from the address bar in the browser. The key is a long combination of letters and numbers between 'https://docs.google.com/spreadsheets/d/' and the following forward slash, e.g., '1C0Q8pq4kuc7K_ZcJmFen5B4L0n-ywX4g9IOjXhoicb0'. The process will only read the first sheet in the spreadsheet for importation.

2. Mapping

If the key is valid, the application will load the spreadsheet as an Atom feed from the Google Spreadsheet API. The application will gather a list of column headings and present a table for these headings and available mappings linked to RDF properties. The properties that are available may vary depending on the concept type. For example, a latitude and longitude are only available for mints. Further validation is conducted on the mapping, and the Spreadsheet Validation button will only become enabled if the mapping itself is valid. Warning messages will be displayed to detail invalid mappings. For example, there cannot be more than one role, organization, start date, latitude, longitude. The same language cannot be applied to multiple Preferred Labels or Definitions. Etc.

It is not necessary to map all columns to RDF properties, but there are three requirements:

Requirements

Nomisma ID: One column must contain the Nomisma ID. The ID must contain only certain allowable characters (and thus should not be the full URI): lower case letters, numbers, and the following special characters: -_',.()[]

Preferred Label (skos:prefLabel) (English): There must be a preferred English label. When Preferred Label, Alternative Label, Definition, or Scope Note are selected, a language drop down menu will automatically appear in the mapping table, and a language must be selected.

Definition (skos:definition) (English): There must be an English definition.

Other Properties

Alternate Concept (prov:alternateOf): According to the PROV Ontology, void:alternateOf is defined by "Two alternate entities present aspects of the same thing. These aspects may be the same or different, and the alternate entities may or may not overlap in time." This property presently applies only to mints, and is used to link together mints which occupy the same place during different fields of numismatics, time periods, or authorities, e.g., nm:constantinople, nm:istanbul, and nm:byzantium. It must be a Nomisma URI.

Alternative Label (skos:altLabel): There may be many alternative label columns. A language is required, but there are no constraints on the number of languages.

Broader Concept (skos:broader): Available for all but a Person or Organization (org ontology used for these linkages). It must be a Nomisma URI.

Close Match (skos:closeMatch): A similar concept. Only used in nmo:Mint to link to similar geographic places, like Geonames or Pleiades. This must conform to a URI beginning with http:// or https://.

Dynasty (org:memberOf): Only available for foaf:Person. There can be multiple entries. It must be a Nomisma URI.

End Date (nmo:hasEndDate): There may not be more than one, but it can be used without an accompanying Start Date. If the concept type is a person or organization, an End Date belongs to an org:Membership, and therefore, a Role is required.

Exact Match (skos:exactMatch): An exactly matching concept in another thesaurus scheme. Available for all concept types except for mint. This must conform to a URI beginning with http:// or https://.

Field of Numismatics (dcterms:isPartOf): Must be a Nomisma URI.

Latitude (geo:lat): Only available for Mints. Must be a decimal number between -180 and 180. There must be a Longitude.

Longitude (geo:long): Only available for Mints. Must be a decimal number between -180 and 180. There must be a Latitude.

Organization (org:organization): Used to link a Person or Organization to a parent Organization. It must be a Nomisma URI, and it can only be used in conjunction with a Role.

Role (org:role): Available for a Person or Organization. It must be a Nomisma URI.

Scope Note (skos:scopeNote): A note concerning the scope and context of the usage of a concept. There may only be one per language.

Source (dcterms:source): A URI of a bibliographic resource for the concept. It may be a Nomisma URI or an external resource, like Worldcat.

Start Date (nmo:hasStartDate): There may not be more than one, but it can be used without an accompanying End Date. If the concept type is a person or organization, an Start Date belongs to an org:Membership, and therefore, a Role is required.

Validation

Once a valid mapping has been created, the Validate Spreadsheet button will become enabled. Clicking on this will perform a row by row validation of the content of the spreadsheet itself. The process will ensure that URIs in Field of Numismatics, Broader Concept, Organization, Role, and Dynasty are valid and conform to relevant RDF class. This means that a URI that is mapped to a Dynasty must be an rdac:Family. Latitudes and Longitudes will be checked. Errors encountered in the validation process will be recorded, and the administrator will see a list of errors for each row, if applicable. If there are no errors in the validation process, the administrator may move forward with the RDF import process.

Import

If the spreadsheet content is valid, the XForms engine will iterate through each row. It will attempt to load the RDF/XML file from the filesystem. If it cannot load the file, it is then assumed that the file does not exist, and a new ID will be created from a template. The process then moves forward with adding new information into the RDF model. The process will not overwrite existing preferred labels, matching URIs, coordinates, etc.

After new properties are added into the RDF model, the RDF is serialized into an XML file and written into the filesystem. The new or updated RDF will be posted into the triplestore, and the document will be re-indexed in Solr.

Wikipedia/DBpedia parsing

Wikipedia URIs linked via Exact Match or Close Match that contain '#' characters will be interpreted as related web documents as opposed to concepts, and the rdfs:seeAlso property will be applied instead. For those other URIs that include 'dbpedia.org' or 'wikipedia.org', the Wikidata API will be used to extract labels in other languages or identifiers in other LOD systems. Identifiers from the following systems are parsed and imported into Nomisma RDF: Geonames, Getty thesauri, GND, BnF, ISNI, Freebase, Eagle Project (epigraphy), VIAF, and SUDOC.