Decisions Index: Mapping & Transformation - crystalyragui/MARC2RDA GitHub Wiki
I.A.1. rdam:P30103 "has exemplar of manifestation"
I.A.1.a. This mapping and transform mints a distinct rda:Item for each field indicating item-specific data, such as $5, even when they occur with the same values within the same MARC record. This avoids conflating distinct items within the same collection, but runs the risk of minting redundant rda:Item entities and IRIs when only a single item exists. Manual reconciliation after conversion at the institution level is recommended.
I.A.2. rdam:P30134 "title of manifestation"
I.A.2.a. Inconsistent application of punctuation and MARC subfielding rules create messy data. The transformation has been written to accommodate a majority of cases. Manual review is suggested where manifestation titles include ISBD punctuation such as " = " or " ; ".
I.A.3.a. In Phase I, URIs are using an approach that mimics access points and appends to the end of a stub URI and attempts to dedupe Manifestation, Work, Expression that way. However, the approach currently creates some merges of things that are not the same (e.g., some video recordings), while also not reconciling some that are the same (e.g., APs constructed with variation due to cataloging practices). The project group is aware and acknowledges this. At the conclusion of Phase I, some example scenarios will be outlined for illustrative purposes with recommendations. See meeting notes.
When a decision is made to exclude a MARC field or subfield and mark it "not mapped", a reason should be provided. This section provides a list of unmapped fields/subfields and justification where not self-explanatory. Undefined or redefined subfields and character positions are excluded.
A check will need to be made on whether obsolete fields/subfields/character positions are being used in source data ourselves. Since they are obsolete, this work will not be prioritized at this time, mapping obsolete fields/subfields/character positions will be postponed until at least the end of the PCC RDA BSR/CSR milestones.
I.B.3.a. $8 will not be mapped until a use case is provided.
I.B.4.a. 008/05 "Date entered on file"
I.B.4.b. 008/38 "Modified record"
I.B.4.c. 008/39 "Cataloging source"
I.B.4.d. 008/32 BOOKS "Main entry in body of entry" [OBSOLETE]
I.B.4.e. 008/18 COMPUTER FILES "Frequency" [OBSOLETE]
I.B.4.f. 008/19 COMPUTER FILES "Regularity" [OBSOLETE]
I.B.4.g. 008/27 COMPUTER FILES "Type of machine" [OBSOLETE]
I.B.4.h. 008/20 CONTINUING RESOURCES "ISSN center" [OBSOLETE]
I.B.4.i. 008/30 CONTINUING RESOURCES "Title page availability" [OBSOLETE]
I.B.4.j. 008/31 CONTINUING RESOURCES "Index availability" [OBSOLETE]
I.B.4.k. 008/32 CONTINUING RESOURCES "Cumulative index availability" [OBSOLETE]
I.B.4.l. 008/34 CONTINUING RESOURCES "Entry convention"
I.B.4.m. 008/26-27 MAPS "Publisher code" [OBSOLETE]
I.B.4.n. 008/32 MAPS "Citation indicator" [OBSOLETE]
I.B.4.o. 008/30 MIXED MATERIALS "Case file indicator" [OBSOLETE]
I.B.4.p. 008/32 MIXED MATERIALS "Processing status code" [OBSOLETE]
I.B.4.q. 008/33 MIXED MATERIALS "Collection status code" [OBSOLETE]
I.B.4.r. 008/34 MIXED MATERIALS "Level of collection control code" [OBSOLETE]
I.B.4.s. 008/32 MUSIC|VISUAL MATERIALS "Main entry in body of entry" [OBSOLETE]
I.B.4.t. 008/21 VISUAL MATERIALS "In LC Collection" [OBSOLETE]
I.B.4.u. 008/23-27 VISUAL MATERIALS "Accompanying matter" [OBSOLETE]
I.B.5.a. $d "Designation of section (SE)" [OBSOLETE]
I.B.5.b. $e "Name of part/section (SE)" [OBSOLETE]
Could not find sufficient MARC documentation
Could not find sufficient MARC documentation
No way to determine whether this applies to the work or expression entity. Field not widely used.
Diffuse semantics
All relationships for all 6XX fields map to ‘subject _____’
This field duplicates the classification number recorded in fields 082 and 083, so it does not need to be transformed.
This is authenticating the MARC record as part of a workflow among specific agencies. Outside of scope of RDA as not treating MARC record as a metadata work that relates to an RDA data set.
These numbers only apply for Library of Congress internal acquisitions. Will not map unless there's clear utility beyond LoC internally.
I.B.14.a. 00-04 Logical record length
I.B.14.b. 00-05 Record status
I.B.14.c. 00-08 Type of control
I.B.14.d. 00-09 Character coding scheme
I.B.14.e. 00-10 Indicator count
I.B.14.f. 00-11 Subfield code count
I.B.14.g. 12-16 Base address of data
I.B.14.h. 17 Encoding level
I.B.14.i. 18 Descriptive cataloging form
I.B.14.j. 20 Length of the length-of-field portion
I.B.14.k. 20-23 Entry map
I.B.14.l. 21 Length of the starting-character-position portion
I.B.14.m. 22 Length of the implementation-defined portion
I.B.14.n. 23 Undefined Entry map character position
I.B.15.a. 00-02 Tag
I.B.15.b. 03-06 Field length
I.B.15.c. 07-11 Starting character position
I.B.16.a. Values not considered useful:
"Unknown"
"Other"
"Not applicable"
"Not specified"
"No attempt to code"
"Not [*]"
"None of the following"
I.B.16.b. Obsolete values Not mapped unless unique (code not redefined, for instance) and believed to be useful.
II.A.1.a. Write as few conditions as possible.
II.A.1.b. Map the redundant data, push any duplicate triple issues downstream.
500 notes will be mapped as "has note on manifestation" for now, with status "?". Revisit later.
535: Note on manifestation: note with boilerplate describing what the field and subfields mean in context of indicators. See issue discussion and meeting notes
When subfields are collated into one unstructured description, the subfields will be separated with a ';'
When a field or subfield is private, the field or subfield is reified as a metadata work with category of work "private"
e.g.
ex:Item1 rdaio:P40164 ex:MetaWork1 [is item described with metadata by]
ex:MetaWork1 rdf:type rdf:statement
ex:MetaWork1 rdf:subject ex:Item1
ex:MetaWork1 rdf:predicate http://rdaregistry.info/Elements/i/P40026 [has custodial history of item]
ex:MetaWork1 rdf:object "[contents of private field or subfield]"
ex:MetaWork1 rdawd:P10004 "private" [category of work]
See issue for field 561 for discussion on decision
II.C.1.a. When an IRI is needed and cannot be found in the RDA Registry, IRIs from other sources may be used.
II.C.1.b. Prefer the following sources, in this order, for supplying outside IRIs:
II.C.1.b.i. University of Washington 006-008 and 007
II.C.1.b.ii. Library of Congress
II.C.1.b.iii. MARC21 Vocabularies from Metadata Management Associates, available via Open Metadata Registry
II.C.2.a. Assigning properties from outside the RDA Registry is out of scope at this time. Assign the next-most-specific appropriate RDA property and record "loss" in the Status column. These will be compiled later and sent to the RSC for advice.
II.C.2.b. An exception to Decision II.C.2.a has been made with regard to concepts, such as those represented by classification numbers and subject headings. The project will follow RDA's lead and use SKOS properties to refer to skos:Concepts in this mapping.
II.D.1.a. Internal punctuation provided in the MARC field will be retained in the transformed access point.
II.D.1.b. For now, the code will retain ending punctuation apart from commas and ISBD punctuation for access points. Periods will be removed from subject headings unless the word/abbreviation is part of a compiled list. The transform output data will determine if any other punctuation requires a different approach.
II.D.1.c. Periods will be removed from subject headings unless the word/abbreviation is part of a compiled list. This will utilize a function and lookup table in the transform.
II.D.1.d. Periods will be retained in subject headings when they end in a single character. Example: rdaad:P50375IEEE Symposium on Security and Privacy. Technical Committee on Security and Privacy. Sub-Committee A.</rdaad:P50375>. See meeting notes.
Map all valid name subfields from a heading for an Agent entity access point, even if some of those subfields are not typically used
Data will not be corrected when creating access points, it will be mapped as found.
The assumption is that values are direct values for the RDA properties given. If otherwise, a transformation note is required.
When an IRI is given as a value, the mapping will not also include a corresponding label. Labels should be retrieved via IRI by implementers, and unless an IRI is not present, are out of scope for this mapping. See discussion for more detail.
II.E.3.a. Where IRI values are expected, object properties should be used. In other cases, datatype properties should be used.
II.E.3.b. Within spreadsheets, recording method column may be used to determine property type.
II.E.4.a. Aggregates will be modelled according to the Official RDA structure, using aggregating works and aggregated expressions, where they can be detected.
II.F.1.a. Nomens will be consistently minted for nomen strings for identifiers found in MARC data. See meeting notes.
II.F.1.b. Aligned with the approach to minting Nomens, if a source is not present the identifier cannot be treated as a minted nomen. See discussion and II.C.1.a.
II.F.2. Outside IRIs will not be used to identify these Nomens (although they could be mapped later), or link to these from the Nomens that the project create at this time.
II.F.3. Provide a scheme for the Nomen ('in scheme' + LC vocab IRI)
II.F.4. Add a status for invalid identifiers (use n/P80168 'status of identification'...)
II.F.5. III.B.4.b will not apply for Nomens minted for identifiers. Paired 0XX fields are rarely used.
II.F.6. The control number for a bibliographic description maintained by an agency is treated as an identifier for the manifestation described by the record. I.e., the default entity for an identifier in a MARC21 record is Manifestation, in the absence of any fields indicating it is assigned to a different entity. See discussion and issue.
II.F.7. Identifier subfields can be mapped to Nomen provenance properties, aligned with the approach for 6xx. See discussion.
II.G.1. WEMI-to-Agent: will default to related agent of manifestation When the relationship between an Agent and a WEMI Entity cannot be determined from the MARC field and relator subfields, the default will be a relationship with the manifestation.
II.G.2. WEMI-to-WEMI: will default to related work of manifestation When the relationship between a described WEMI and an associated WEMI cannot be determined, the default will be a related work of manifestation. See meeting notes.
II.G.3. Original RDA labels will be mapped MARC records use Original RDA, Original RDA Labels that have changed or become deprecated will be mapped alongside the project's mapping of Official RDA labels.
II.H.1. Minting of separate manifestations for original and reproduction For records describing reproductions of originals that include details about the original, the project attempts to distinguish the descriptive elements of the original manifestation. These elements will be associated with a minted URI for the original manifestation. This process relies on specific cues, such as the presence of tag 533 and other conditions. This approach aligns with PCC decisions and practices regarding reproductions, including microfilm, electronic (including "provider neutral"), and print-on-demand reproductions. However, this may not handle all cases where original dates or details appear in the main descriptive fields of reproductions, particularly when cataloging practices are inconsistent. See Reproductions Guidance for more detail.
II.I.1. Punctuation For square brackets, the project decided to only strip surrounding brackets and [sic] as well as strip ending punctuation for 245. See meeting notes.
II.I.2. $n and $p $n and $p following $b other title information should be part of the title proper not other title information. See meeting notes.
Use date in 008 as publication date, if it has one. If it is not possible to find a more reliable date from another field, pick the 1st date in 264 $c. See meeting notes.
The 3XX fields that map to a manifestation property (34Xs, 336-338), $3 can be mapped to a note on manifestation that states something like "[value] applies to: [$3]". See 2024-07-10 and 2025-02-19 meeting notes.
II.L.1. These fields will be mapped to note on work. See meeting notes for more notes on this decision.
II.M.1. 800 series added entry--personal name
Treat the same as 1xx and 7xx; they are like 7xx except with presence of numbering at the end. See meeting notes. See related Issue.
II.M.2. 841-88X
MARC fields after 83X will not be mapped. Exceptions are: 856, 857, 880, 881 (stay in Phase I) and 843, 883, 886 (stay in Phase II). See 2024-12-4 meeting notes, 2025-02-26 meeting notes.
II.N.1. $7 postponed to Phase II.
It is not yet implemented and in use. See meeting notes.
II.O.1. Labels
In Phase I, the project will label its RDA-RDF note on manifestation mappings of MARC21 Bibliographic linking fields in 760-787 using amended RDA Registry Element Labels. These are labels Deborah Fritz has created which amend labels from the RDA Registry based on what elements the linking fields might be mapped to, if entities rather than notes had been created to make them more user-friendly. See table under "Field labels" tab, Column D.
II.P.1. Agents
In Phase I, agents will be broken out from topical 6XX fields unless they are WE headings. See meeting notes.
III.A.1.a A developed list of approved sources will be used for $0, $1, and $2 values for each RDA entity type to determine whether $1 and $2 values are retained in the mapping and transform (see slides for additional details - “Approved” refers to Approved URI list, which is used alongside URIs in MARC list.
When $1 is approved: $1 value is used as the entity IRI
When $1 is unapproved or not present: an IRI is minted for the entity
When $1 is unapproved and not present: it is stringified as an identifier for the entity
When $2 is present: a nomen is minted for the access point with the scheme of nomen from $2
When $2 is approved: the access point is authorized (AAP)
When $2 is unapproved: the access point is not authorized (AP)
When $2 is not present: The access point is not authorized (AP) and is a string value with a datatype property
III.A.1.a.i. No statements will be made about the IRI in subfield $1 except for adding an access point triple and any identifiers stringified from $0 or $1 values. If the source of the heading is given, the heading object will be a minted nomen; if not, the object will be a literal string. If a nomen is minted, statements will be made about the nomen that include the nomen string and the scheme/source of the nomen.
III.A.1.a.ii. When the field contains multiple $1 values, the first approved value provided will be accepted.
III.A.1.a.iii. In phase 1, the project will recognize and process $0 values as $1 values for RDA Entities only when they are FAST identifiers that can be transformed into RWO IRIs.
III.A.1.a.iv. Unused, unapproved $1 values will be retained as stringified identifiers for the RDA Entity.
III.A.1.a.v. How to mint the IRI
Nomens always use opaque IRIs, i.e. are randomly generated.
If an approved source is present, the IRI uses the source and access point to mint the IRI for the entity.
If there is not an approved source, the IRI also uses the source and access point to mint the IRI for the entity.
Related works and agents should always use the source and access point to mint the IRI for the entity.
III.A.2. $0's and $1's for 3XX fields
III.A.2.a. An external IRI in a subfield $0 or $1 from a 3XX field will be recorded as the object of a triple of an attribute property.
III.A.2.a.i. The project will recognize and use subfield $0 values as IRIs when they contain 'http'.
See discussion for more details
III.A.3. $0's and $1's for skos:Concepts
Any $1 value is used, and $0 is used if it is a FAST identifier and can be converted or if it is identified as an IRI (containing http). There is no approved list for concepts like there is for RDA Entities.
III.B.1. $6 data will be preserved, even for entities where authorities exist.
III.B.2. The 880 should be mapped according to the associated field identified in $6.
III.B.3. Incorrect MARC in 880s
In practice, the regular field associated with the 880 through $6 may not contain the corresponding romanized form of the 880 or vice versa, especially for 520 or 650/655 fields. This is incorrect MARC, and will not be accounted for in the mapping. Libraries with holdings attached to such records should clean up incorrect fields before transformation.
Examples:
https://lccn.loc.gov/2021421243
520 ## |6 880-06 |a Detailed summary in vernacular field only.
880 ## |6 520-06/$1 |a "学者的人间情怀"是陈平原的代表作,论及"学术史""走出'五四'""左图右史""述学文体","演说现场","报刊研究"等重要话题,也都点到为止,好在大都日后在专业著作中有所展开.最重要的是,反映了他当时"压在纸背的心情".
邱振中主编., & Qiu Zhenzhong zhu bian. (2014). 书法与中国社会 (邱振中 & Z. Qiu, Eds.; Di 2 ban). 中国人民大学出版社. (OCLC #910728126)
650 #7 ǂa藝術社會學. ǂ2 lcstt ǂ0 http://catld.ncl.edu.tw/subject/sh0018327
650 #7 ǂaArt and society. ǂ2 fast ǂ0 (OCoLC)fst00815432
650 #7 ǂa中国书法. ǂ2 local/OSU
650 #7 ǂaCalligraphy, Chinese. ǂ2 fast ǂ0 (OCoLC)fst00844390
651 #7 ǂa中國. ǂ2 lcstt ǂ0 http://catld.ncl.edu.tw/subject/sh0001067
651 #7 ǂaChina. ǂ2 fast ǂ0 (OCoLC)fst01206073
655 #7 ǂa歷史. ǂ2 lcstt ǂ0 http://catld.ncl.edu.tw/subject/sh0016956
655 #7 ǂaHistory. ǂ2 fast ǂ0 (OCoLC)fst01411628
III.B.4. $6 and Minting of new Nomen Entities for literal field values
III.B.4.a. Nomens will be minted where the property range for a regular/880 field maps to either an RDA Entity with a secondary property with a range of Nomen, or when the mapped property's range is simply a Nomen. Where a property lacks a range, literal values only will be created. Triples with literal values will not be reified in order to retain equivalence relationships between string values not associated with a Nomen.
III.B.4.b. Where a Nomen is minted, the MARC 880 and regular field linked by $6 are mapped this way:
[WEMIEntity1] [propertyWRangeNomen1] [Nomen1]
[Nomen1] hasNomenString ["literal value of regular field"]
[Nomen1] isEquivalentTo ["literal value of 880"]
III.B.4.c. Where a Nomen is not minted, the MARC 880 and regular field linked by $6 are mapped this way:
[WemiEntity1] [propertyWORangeNomen] ["literal value of either field"]
III.B.4.d. Script and language of strings cannot be reliably determined from the MARC format in $6, and so are not mapped. See 2022-10-05 meeting notes for more notes on this decision.
III.C.1. Preliminary processing for cultural heritage organizations and their collections
III.C.1.a. Take information from id.loc's Code List for Cultural Heritage Organizations
III.C.1.b. Mint corporate body IRI for each nomen
III.C.1.c. Mint one collection work IRI for each organization using boilerplate for appellations based on institution label in code list and identifiers based on codes
III.C.1.d. Mint one collection manifestation for each collection work using similar boilerplate, including identifiers based on codes
III.C.1.e. Publish somewhere for re-use. No decision on location has been made.
III.C.2. When $5 indicates that a statement applies to an item entity
III.C.2.a. Mint one item entity/IRI for each occurrence of $5
III.C.2.b. Relate the item to the published collection manifestation that corresponds to the code value in $5
III.C.2.c. Illustration of model:

III.C.2.d. Example Mapping: MARC Record with Multiple $5's
III.D.1. When there is no accompanying IRI, retain the source in $2 or 65X indicator 2
III.D.1.a. For a concept, use as skos:inScheme
III.D.1.b. For a nomen, use as rdan:P80069 has scheme of nomen
III.D.2. If there is an expectation for the source vocabulary to have an IRI somewhere (for example, a "Source Vocabulary" at id.loc.gov, or an RDA vocabulary at the RDA Registry), enter that information in the spreadsheet in the "Transformation Notes" column.
III.D.2.a. The transform will have to perform look-ups for the source IRIs. This should be easier than searching all the specific source vocabularies for specific string values, which will be done later in the transformation pipeline.
Bulk download of files for these vocabularies at the start of the transform and perform local lookups to those files during the transform. For 3XX: the terms can often be looked up and an IRI retrieved to be used as the object of the attribute property. If this cannot be done, mint a skos:concept and III.D.1.a. applies
III.D.2.b. If the value of $2 cannot be associated with an IRI of a source vocabulary, then the transformation retains the $2 value as a datatype (text) value.