Places - dkglab/fall-of-rome GitHub Wiki
Places
Roman provinces
Names of these provinces are values in the Provincia
column in the
site CSV
data:
- Carthaginensis (1211)
- Baetica (1148)
- Tarraconensis (867)
- Lusitania (801)
- Gallaecia (426)
- Baleares (52)
Wikidata query for Roman provinces in Hispania (does not include Baleares)
Present-day administrative units in Iberia
Names of these units are values in the Municipality
column in the site CSV data:
- Córdoba (84)
- Mérida (61)
- Tarragona (55)
- Málaga (49)
- Seville (48)
- Palma del Río (43)
- missing municipality (41)
- Alcalá de Henares (41)
- Écija (37)
- Carmona (36)
- 2,164 other municipalities (4010)
Wikidata query for the municipalities of Spain
Wikidata query for the municipalities of Portugal
entity reconciliation in OpenRefine.
Building digital gazetteers throughOne of the most perplexing challenges encountered in this aspect of the project was dealing with entities in the source data (provided by Dr. Gruber) within the municipalities column that did not correspond to true municipalities. Out of 1,836 unique municipalities entities in the dataset, over 530 were non-municipality-place entities. These varied widely and included human settlements, civil parishes, cities, electoral districts, historical neighborhoods, towns, historical regions, former provinces, localities, concejo (the Spanish word for council), heritage sites, villages, and more. To preserve the integrity of the source data, these were reconciled with Wikidata, Virtual International Authority Files (VIAF), and GeoNames they were using OpenRefine. In the future, it might be beneficial to rename the "municipality" column to reflect the presence of non-municipality entities.
While entities from Wikidata generally aligned well with the source data, there were a considerable number of false matches arising from the occurrence of identical Spanish place names in other countries, particularly in former Spanish colonies like Mexico, Brazil and the Philippines. This issue of duplicate toponyms necessitated manual inspection and matching. Additionally, the source data featured exonyms and variant spellings of place names, which also required manual matching, often utilizing archeological site geocoordinates for disambiguation.
GeoNames was anticipated to be an excellent reconciliation source. However, due to the presence of varying levels of administrative divisions in the data, there was a fair bit of manual matching to do. In the case of Spanish and Portuguese municipalities, all are classified as ADM2 (second-order administrative division) in GeoNames. The VIAF matching process (using the Geographic Names Authority File) was aided by using Wikidata IDs matched in the first reconciliation step, allowed for a more streamlined process as the reconciliation process. Despite this, the method was more prone to errors compared to the other two data sources (i.e., Wikidata and GeoNames) and ultimately was the least successful of the three.
Extracting Description from Wikidata
Most Wikidata entities have a description field—a short, descriptive phrase, as shown below, that clarifies what the item is. We wanted to display this on each municipality page, so we explored two different approaches for extracting the description field.
- Direct API Requests with JSON Parsing
The more effective approach involved using the Wikidata API (wbgetentities) to retrieve structured data in JSON format. This method returned the full entity metadata, including the description field, along with labels and aliases for a given Qid. I used scripts to parse these JSON responses and extract the relevant descriptions for use in the gazetteer. This was particularly useful for enhancing human readability and aiding manual review during reconciliation.
Using the GREL (General Refine Expression Language) 'https://www.wikidata.org/w/api.php?action=wbgetentities&props=descriptions&languages=en&ids=' + value + '&format=json'
And extracted the description using row.cells["description_json"].value.parseJson().entities[row.cells["municipality_wiki_id"].value].descriptions.en.value
- Wikidata Query Service (SPARQL) is a better option (Recommended by Prof. Shaw)
Directly, filtering by language if necessary
Present-day geometries with geoBoundaries
Currently, we are using geoJSON files courtesy of geoBoundaries to define the polygons on a map for each present-day administrative region. However, there have been some challenges in using these geometries. There are no independent IDs in the data, meaning that the polygons need to be matched by name. There is also inconsistent use of diacritics in the names themselves -- the files for Spanish regions use them while the files for Portuguese regions do not. geoJSON also does not work well with SPARQL currently, so the files have to be translated to the WKTliteral format.
Analytic regions
Names of these regions are values in the New Regions
column in the site CSV data:
- North Meseta (453)
- Ebro Valley (428)
- South Meseta (410)
- Atlantic Façade (349)
- Valencian Coast (348)
- Strait of Gibraltar (341)
- Coastal Tarraconensis (333)
- Greater Emerita (328)
- Hispalis (319)
- Carthaginensis Coast (279)
- Upper Guadalquivir (251)
- Middle Guadalquivir (220)
- Northern Façade (204)
- South Lusitania (137)
- Balearics (67)
- Carthaginensis coast (29)
- Tarraconensis Coast (5)
- Northern Coast (2)
- Northern façade (1)
- Tarraconensis coast (1)
Future work
- Programmatically creating snapshots of the geometry of each place in order to display a map image of each region on its page.