Getting Started with Triple Parsing - strohne/Facepager GitHub Wiki

Reading time: 3 minutes

In this Getting Started you will learn how to utilise the new RDF triple parsing feature in Facepager. As a showcase we will collect turtle triples from the Getty Thesaurus of Geographic Names Online (TGN).

Facepager's integrated triple parser extracts all triples from TTL (Turtle), XML (eXtensible Markup Language), or JSON-LD (JavaScript Object Notation for Linked Data) files and converts them into a tabular format featuring simple triples only for further processing. This is helpful because in the RDF (Resource Description Framework) data is represented as triples, each consisting of a subject, predicate, and object. If you want to know more about RDF and how information stored in triples can be extracted if a SPARQL Endpoint is provided, check out our Getting Started with SPARQL.

The TGN is a structured vocabulary as well as a knowledge base created and maintained by the Getty Research Institute. It provides information on geographic places, including current and historical names, locations, and place types with a focus on art, architecture, and material culture. It targets researchers and educational as well as historically interested projects alike while aiming to standardise geographic data, support cultural heritage projects, and improve search capabilities. See their guide on how to use the TGN to understand the database's capabilities.

How to fetch triples of geographical names

In this brief example, we will fetch all information the TGN holds about the city of Milan. While you can lookup this information using the TGN's web interface, Facepager is unique in that it allows you to conveniently fetch data for several places at once and export them as you deem fit. To begin:

  1. Create a database: Click New Database in the Menu Bar of Facepager to create a blank database. Save it in a directory of your choice.

  2. Setup the Generic module: From the Presets tab in the Menu Bar select and Apply the Knowledge Graph preset "Getty: Fetch triples of geographical names". The Generic module in the Query Setup will refresh automatically. Notice that the base path is now set to call the TGN's vocabulary and contains the seed node placeholder <Object ID>. At this point no further parameters are required. Importantly, however, depending on what triple format you want to fetch, ensure to select the respective format in the Response drop-down menu. As we are fetching turtle triples (note that the Base path ends in ".ttl"), the ttl-format is preselected.

  3. Add nodes: Before fetching data, you will need to provide one or more seed nodes which will fill in said placeholder upon fetching. To do so, select Add Nodes in the Menu Bar. In the open dialogue box enter a the ID of any entry of interest to you from the TGN (e.g., "7005903" for Milan). Include as many nodes as you like.

  4. Fetch data: Select one or more seed nodes, then hit Fetch Data at the bottom of the Query Setup. Facepager will now fetch data based on your setup which, if you followed the guidance at hand, should have resulted in the following final URL: https://vocab.getty.edu/tgn/7005903.ttl. Once finished, you can inspect all subjects, predicates, and objects by expanding your seed node or clicking Expand nodes in the Menu Bar. In contrast to potentially complex web views of turtle data or any RDF format, Facepager extracts all simple triples and displays them as tables. Please, be aware that a large number of new nodes may be generated due to the general nature of the preset.

  5. Export data: Expand all nodes and select the ones you want to export. Hit Export Data to get a CSV-file. Notice the options provided by the export dialogue. You can open CSV files with Excel or any statistics software you like.

What's next?

⚠️ **GitHub.com Fallback** ⚠️