csv files - VlaamseKunstcollectie/Imagehub-Fixes GitHub Wiki

CSV files with metadata for enrichment and translation

Here is a rundown of every CSV file and how it should be used to enrich the LIDO files coming from the datahub. Use this info in conjunction with the example CSVs to create an ETL pipeline from datahub to datahub 2.

200LIDOFiles.csv

This is the first CSV that needs to be read to extract LIDO work IDs from the Datahub. Once this has been done, some of the columns in this CSV, in addition to the other CSVs, need to be used to enrich the extracted LIDO files, after which they need to be deposited in Datahub 2.

This CSV has the following columns:

  • WorkPID: This is the Work PID of the work that the LIDO file extracted from Datahub should refer to
  • Data URN: This URN can be fed to the Datahub to extract its LIDO file, in the format http://datahub.vlaamsekunstcollectie.be/api/v1/data/{DataURN}.xml
  • IsPartof: this is used to refer to another LIDO XML file related to this one. This info should be inserted in the RelatedWorksWrap. An example can be found in the MulLido1Image example folder. The contents of this cell should go into RelatedWork, and the IsPartOf should go in RelatedWorkRelType. example:

<lido:relatedWorkRelType> <lido:conceptID lido:type="URI">http://purl.org/dc/terms/isPartOf</lido:conceptID> <lido:term xml:lang="en">is part of</lido:term> </lido:relatedWorkRelType>

xpaths:

descriptiveMetadata/objectRelationWrap/relatedWorksWrap/relatedWorkSet/relatedWork/object/objectID descriptiveMetadata/objectRelationWrap/relatedWorksWrap/relatedWorkSet/relatedWorkRelType/conceptID descriptiveMetadata/objectRelationWrap/relatedWorksWrap/relatedWorkSet/relatedWorkRelType/term

  • HasPart: this is used to refer to another LIDO XML file related to this one. This info should be inserted in the RelatedWorksWrap. An example can be found in the MulLido1Image example folder. The contents of this cell should go into RelatedWork, and the HasPart should go in RelatedWorkRelType. example:

<lido:relatedWorkRelType> <lido:conceptID lido:type="URI">http://purl.org/dc/terms/hasPart</lido:conceptID> <lido:term xml:lang="en">has part</lido:term>

  • nextinsequence: HasPart: this is used to refer to another LIDO XML file related to this one. This info should be inserted in the RelatedWorksWrap. An example can be found in the MulLido1Image example folder. The contents of this cell should go into RelatedWork, and the nextInSequence should go in RelatedWorkRelType. There is no Dublin Core metadata term for this, so it should be treated as a local term. example:

<lido:relatedWorkRelType> <lido:conceptID lido:type="local">nextInSequence</lido:conceptID> <lido:term xml:lang="en">followed by</lido:term>

  • copyright status: The Copyright status is a URL referring either to a Creative Commons license or a Rightsstatements.org license. The info here should be recorden in the RightsResource tags for its relevant resource. The copyright status applies to the resource (image) of the work, not to the copyright status of the Work itself, or the LIDO file. an example of how this should be recorded is:

<lido:rightsResource> <lido:rightsType> <lido:conceptID lido:type:"URI" source="Creative Commons">https://creativecommons.org/publicdomain/zero/1.0/</lido:conceptID> <lido:term>CC0</lido:term> </lido:rightsType> </lido:rightsResource>

Xpaths:

administrativeMetadata/resourceWrap/resourceSet/rightsResource/rightsType/conceptID

administrativeMetadata/resourceWrap/resourceSet/rightsResource/rightsType/term

  • LUKAS foto ID: This is the number used by LUKASWeb to identify its resources. There can be one or multiple LUKAS photos associated with a LIDO file. If there are multiple photos associated with the same LIDO file, the row will be duplicated in the CSV, but will have a different LUKAS photo ID. Every photo ID should be added as a separate resource in the LIDO file. An example of how this should be recorded is:

<lido:resourceSet> <lido:resourceID lido:type="local">0030251001</lido:resourceID> <lido:resourceSource lido:type="holder of image"> <lido:legalBodyName> <lido:appellationValue> Lukas, Arts in Flanders </lido:appellationValue> </lido:legalBodyName> </lido:resourceSource> <lido:rightsResource> <lido:rightsType> <lido:conceptID lido:type:"URI" source="Creative Commons">https://creativecommons.org/publicdomain/zero/1.0/</lido:conceptID> <lido:term>CC0</lido:term> </lido:rightsType> </lido:rightsResource> </lido:resourceSet>

Xpaths:

administrativeMetadata/resourceWrap/resourceSet/resourceID

administrativeMetadata/resourceWrap/resourceSet/resourceSource/legalBodyName/appellationValue

an example row of this csv:

Work PID Data URN IspartOf HasPart nextinsequence copyright status LUKAS foto id
http://kmska.be/collection/work/id/523-525bis oai:datahub.vlaamsekunstcollectie.be:kmska.be:523-525bis http://kmska.be/collection/work/id/524 ; http://kmska.be/collection/work/id/523 ; http://kmska.be/collection/work/id/525 ; http://kmska.be/collection/work/id/525bis https://creativecommons.org/publicdomain/zero/1.0/ 0030717000

200titles.csv

This contains a row for every row in 200LIDOfiles.csv. Where LIDO WorkPIDs were duplicated in 200LIDOFiles, the same has been done here. For every row, there is a title both in Dutch and in English. origin LIDO files can either have titles in different languages formatted as such:

<lido:objectIdentificationWrap> <lido:titleWrap> <lido:titleSet> <lido:appellationValue lido:pref="preferred">Laatste Oordeel</lido:appellationValue> <lido:appellationValue lido:pref="alternate">Le Jugement Dernier</lido:appellationValue> <lido:appellationValue lido:pref="alternate">Last Judgement</lido:appellationValue> <lido:appellationValue lido:pref="alternate">Das Jüngste Gericht</lido:appellationValue> </lido:titleSet>

In this case, for the "nl" descriptivemetadata wrapper only the Dutch version needs to be kept, xml:lang set to "nl" and lido:pref="preferred" kept. In the "en" descriptivemetadata wrapper the same needs to happen, but then with the English title, xml:lang="en" and "lido:pref="preferred". In the other case the origin LIDO XML only has a title in a single language, mostly Dutch.

<lido:titleWrap> <lido:titleSet> <lido:appellationValue lido:pref="preferred" xml:lang="nl">Annunciatie</lido:appellationValue> <lido:sourceAppellation xml:lang="nl">342</lido:sourceAppellation> </lido:titleSet> </lido:titleWrap>

in this case, keep the title and its attributes as is for the "nl" descriptivemetadata wrapper, and the same but with xml:lang="en" and no lido:sourceappellation needs to be put in the "en" descriptivemetadata wrapper, with the english title replacing the dutch one. It is possible that no translation of a title is present. In this case, copy and keep the dutch title with xml:lang="nl" in the "en" descriptivemetadatawrapper.

example row:

WorkPID nl en
http://mskgent.be/collection/work/id/1998-B-112 De wraak van Hop-Frog Hop-Frog's Revenge

Xpath: /descriptiveMetadata/objectClassificationWrap/objectWorkTypeWrap/objectWorkType/term

200descriptions.csv

Like 200titles.csv, this CSV has a row for every row in 200LIDOFiles.csv. For every row there is description text in Dutch, and one in English. LIDO files should originally show descriptions like this:

<lido:objectDescriptionWrap> <lido:objectDescriptionSet> <lido:descriptiveNoteValue xml:lang="nl">Oorspronkelijk vormden de vier tafereeltjes (inv. nr. 257-260) twee panelen, die later doorgezaagd zijn. Op de achterkant van de "Calvarie" was "Maria" afgebeeld, en "Kruisafneming" vormde één paneel met "Gabriël". De achterzijde vormt aldus een voorstelling van de Annunciatie: Maria schrikt op van de engel die haar de boodschap komt brengen.&#13; De werken zijn klein, waarschijnlijk gaat het om een reisaltaar. (Sandra Janssens, Museumboek)</lido:descriptiveNoteValue> </lido:objectDescriptionSet> </lido:objectDescriptionWrap>

The structure should be kept as is, copied into the "en" part of descriptivemetadataWrap, and changed so that the content becomes the English description and xml:lang="en".

example row:

WorkPID nl en
http://kmska.be/collection/work/id/524 "Achter de donator St. Sebastiaan met het wapen van zijn gilde op zijn kledij. Aan een boom het wapen van de donator: sabel met gouden adelaar. Op de lijst onderaan bevond zich, volgens vorige publicaties de tekst: Adi 15 iunii etatis 32 a° 1515. Thans enkel nog: Adi 15 iunni op geschilderde boord onderaan. (Catalogus KMSKA Schilderkunst Oude Meesters)" "Behind the donator St. Sebastian with the arms of his guild on his clothes. A boom arm of the donator: saber with golden eagle. The list below was located, according to previous publications Text: Adi 15 iunii etatis a 32 ° 1515. Now just yet: Adi 15 iunni on painted hem. (Catalog KMSKA Old Masters Paintings)"

Xpath: descriptiveMetadata/objectIdentificationWrap/objectDescriptionWrap/objectDescriptionSet/descriptiveNoteValue

periodtranslations.csv

This CSV has the translations for all the periods that are mentioned in the 200 LIDO files we're using. In original files periods are described as follows:

<lido:periodName> <lido:conceptID lido:type="local" lido:source="Adlib">20000150</lido:conceptID> <lido:term xml:lang="nl">15de eeuw</lido:term> </lido:periodName>

There can be 1 or more periods defined. For every period, copy this metadata over to the "en" descriptivemetadata, keep the same structure, and change the content to be the English form and xml:lang="en"

example row:

nl en
16de eeuw 16th century

Xpath: descriptiveMetadata/eventWrap/eventSet/event[eventType='production']/periodName/term

legalbodynametranslations.csv

This CSV holds the translations for the name of the institutions. It is mostly present, but sometimes it isn't. If it isn't create it for both the Dutch and English versions. If it is present, it will have this structure:

<lido:repositorySet> <lido:repositoryName> <lido:legalBodyName> <lido:appellationValue>Musea Brugge - Groeningemuseum</lido:appellationValue> </lido:legalBodyName> </lido:repositoryName> copy this over to the "en" descriptivemetadata wrap, keep the structure, and change the contents of appellationValue to the English version.

example row:

nl en
Museum voor Schone Kunsten Gent Museum of Fine Arts Ghent

Xpath: objectIdentificationWrap/repositoryWrap/repositorySet/repositoryName/legalBodyName

⚠️ **GitHub.com Fallback** ⚠️