Decisions Index - uwlib-cams/MARC2RDA GitHub Wiki

I. Documentation

I.A. Spreadsheet

I.A.1. "Delete" status: Reasons for not mapping a row/justifications for recording "Delete" should go in "Notes--Uncategorized" column, not in "Justification for Mapping" column. 2022-03-23
I.A.2. Add transformation notes whenever possible. 2022-03-23
I.A.3. Do not comment on spreadsheets in Google Sheets. Comments, discussions, and issues should be located and tracked in GitHub. 2022-05-18

I.A.4. Syntax

I.A.4.a. Condition Layering

I.A.4.a.i. Multiple values for the same MARC subfield condition with OR relationships should be recorded in the same cell, with | as delimiter. 2022-02-23
I.A.4.a.ii. Independent conditions get separate rows in the spreadsheet. Layered conditions are in multiple columns of the same row. 2022-01-06
I.A.4.a.iii. New sets of MARCTagCondition/ConditionValue columns may be added as needed to create more layered conditions with AND relationships to one another. 2022-04-06
I.A.4.b. Punctuation differences in label values for conditions should be ignored/treated as the same. For instance, “Based on (work)” and “Based on work” should be treated as the same string. 2022-02-23
I.A.4.c. Updated Instructions to include formatting for MARCTagCondition cells and corresponding value cells. 2022-04-06
I.A.4.d. Added table with prescribed syntax operators. 2022-04-06

I.B. Versioning

I.B.1. We will rely on GitHub’s versioning control, and refrain from adding new columns/notes to record different iterations of the mapping spreadsheets over time. 2022-02-16
I.B.2. Draft versioning will rely on Google Sheets, with frequent semi-automated pushes to GitHub by Theo. 2022-06-22

I.C. Transformation Disclaimers for Users

I.C.1. rdam:P30103 "has exemplar of manifestation"

I.C.1.a. This mapping and transform mints a distinct rda:Item for each field indicating item-specific data, such as $5, even when they occur with the same values within the same MARC record. This avoids conflating distinct items within the same collections, but runs the risk of minting redundant rda:Item entities and IRIs when only a single item exists. Manual reconciliation after conversion at the institution level is recommended. 2023-05-31

I.C.2. rdam:P30134 "title of manifestation"

I.C.2.a. Inconsistent application of punctuation and MARC subfielding rules create messy data here. The transformation has been written to accommodate a majority of cases. Manual review is suggested where manifestation titles include ISBD punctuation such as " = " or " ; ". 2023-05-31

I.D. MARC21 Fields & Subfields Not Mapped

I.D.1. When we decide to exclude a MARC field or subfield and mark it "not mapped", we ought to provide a reason. This section provides a list of unmapped fields/subfields and justification where not self-explanatory. Undefined or redefined subfields and character positions are excluded. 2023-06-13

I.D.3. $8 "Field link and sequence number"

I.D.3.a. We will not map $8 until a use case is provided. 2022-07-14

I.D.4. 008 "Fixed Length Data Elements" Character Positions

I.D.4.a. 008/05 "Date entered on file" 2023-06-13
I.D.4.b. 008/38 "Modified record" 2023-06-13
I.D.4.c. 008/39 "Cataloging source" 2023-06-13
I.D.4.d. 008/32 BOOKS "Main entry in body of entry" [OBSOLETE] 2023-06-13
I.D.4.e. 008/18 COMPUTER FILES "Frequency" [OBSOLETE] 2023-06-13
I.D.4.f. 008/19 COMPUTER FILES "Regularity" [OBSOLETE] 2023-06-13
I.D.4.g. 008/27 COMPUTER FILES "Type of machine" [OBSOLETE] 2023-06-13
I.D.4.h. 008/20 CONTINUING RESOURCES "ISSN center" [OBSOLETE] 2023-06-13
I.D.4.i. 008/30 CONTINUING RESOURCES "Title page availability" [OBSOLETE] 2023-06-13
I.D.4.j. 008/31 CONTINUING RESOURCES "Index availability" [OBSOLETE] 2023-06-13
I.D.4.k. 008/32 CONTINUING RESOURCES "Cumulative index availability" [OBSOLETE] 2023-06-13
I.D.4.l. 008/34 CONTINUING RESOURCES "Entry convention" 2023-06-13
I.D.4.m. 008/26-27 MAPS "Publisher code" [OBSOLETE] 2023-06-13
I.D.4.n. 008/32 MAPS "Citation indicator" [OBSOLETE] 2023-06-13
I.D.4.o. 008/30 MIXED MATERIALS "Case file indicator" [OBSOLETE] 2023-06-13
I.D.4.p. 008/32 MIXED MATERIALS "Processing status code" [OBSOLETE] 2023-06-13
I.D.4.q. 008/33 MIXED MATERIALS "Collection status code" [OBSOLETE] 2023-06-13
I.D.4.r. 008/34 MIXED MATERIALS "Level of collection control code" [OBSOLETE] 2023-06-13
I.D.4.s. 008/32 MUSIC|VISUAL MATERIALS "Main entry in body of entry" [OBSOLETE] 2023-06-13
I.D.4.t. 008/21 VISUAL MATERIALS "In LC Collection" [OBSOLETE] 2023-06-13
I.D.4.u. 008/23-27 VISUAL MATERIALS "Accompanying matter" [OBSOLETE] 2023-06-13

I.D.5. 245 subfields

I.D.5.a. $d "Designation of section (SE)" [OBSOLETE] 2023-06-13
I.D.5.b. $e "Name of part/section (SE)" [OBSOLETE] 2023-06-13

I.D.6. 871 "Variant corporate name" [OBSOLETE]

I.D.6.a. Could not find sufficient MARC documentation 2023-06-13

I.D.7. 870 "Variant personal name" [OBSOLETE]

I.D.7.a. Could not find sufficient MARC documentation 2023-06-13

I.D.8. 381 "Other distinguishing characteristics of work or expression"

I.D.8.a. No way to determine whether this applies to the work or expression entity. Field not widely used. 2023-06-06

I.D.9. 562 "Copy and version identification note"

I.D.9.a. Diffuse semantics 2024-02-01

I.E. Values Not Mapped

I.E.1. Values not considered useful: * "Unknown"
* "Other"
* "Not applicable"
* "Not specified"
* "No attempt to code"
* "Not [*]"
* "None of the following" 2023-06-13
I.E.2. Obsolete values, unless unique (code not redefined, for instance) and believed to be useful 2023-06-13

II. Mappings

II.A. Redundancy

II.A.1. Write as few conditions as possible.
II.A.2. Map the redundant data, push any duplicate triple issues downstream. 2022-01-26

II.B. 500 Notes

II.B.1. We will map as "has note on manifestation" for now, with status "?". Revisit later. 2022-03-23

II.C. $0/$1

II.C.1. Transform structure for $0/$1

For information on rationale and other ideas we considered, see this discussion and meeting notes
II.C.1.a. When $1 exists:

  • Value can be used in RDA as the direct value of the appropriate RDA property
  • We will avoid minting extra entities or relating IRI's as authorities
  • If $0 exists alongside a $1 in the same field, ignore $0

II.C.1.b. When $0 exists and $1 does not exist:

  • We will not mint an entity and then assign the $0 as an identifier or IRI for a metadata work about that entity.
  • For transform:
    • Write conditions for when an IRI is known to be an IRI for an RDA Entity and flip those to $1. These conditions should be recorded in the Decisions Index, outside of the mapping spreadsheets. Conditions may be RDA-entity-specific and/or MARC-field-specific
    • When we cannot determine what type of thing the IRI is an instance of, transformation code should output a report. Possibilities for reports:
      • Alert that a $0 value is not recognized and may benefit from human analysis
      • Alert that $0 value is recognized and is not appropriate for RDA for specified reasons
      • Create sorted list of unused $0 values into an HTML document with live anchors that allow each IRI to be dereferenced by clicking 2023-03-01

II.D. Properties and IRIs from Outside the RDA Registry

II.D.1. IRIs

II.D.1.a. When an IRI is needed and cannot be found in the RDA Registry, IRIs from other sources may be used. 2022-04-06

II.D.1.b. Prefer the following sources, in this order, for supplying outside IRIs:

II.D.1.b.i. Library of Congress
II.D.1.b.ii. MARC21 Vocabularies from Metadata Management Associates, available via Open Metadata Registry

II.D.2. Properties

II.D.2.a. Assigning properties from outside the RDA Registry is out of scope at this time. Assign the next-most-specific appropriate RDA property and record "loss" in the Status column. We will compile these later and send to RSC for advice. 2022-04-06
II.D.2.b. An exception to Decision II.D.2.a has been made with regard to concepts, such as those represented by classification numbers and subject headings. We will follow RDA's lead and use SKOS properties to refer to skos:Concepts in this mapping. 2024-03-06

II.E. Control Fields

II.E.1. Unknown/Other as values: Will not be mapped/recorded. We only want to include "valuable values". 2022-02-02

II.F. Obsolete Fields/Subfields/Character Positions

II.F.1. We will need to check on whether obsolete fields/subfields/character positions are being used in source data ourselves. Since they are obsolete, we will not prioritize this work right now, and will put off mapping obsolete fields/subfields/character positions until at least the end of the PCC RDA BSR/CSR milestones. 2022-06-01

II.G. $6

II.G.1. We will preserve $6 data, even for entities where authorities exist. 2022-07-20
II.G.2. The 880 should be mapped according to the associated field identified in $6. 2022-10-19
II.G.3. In practice, the regular field associated with the 880 through $6 may not contained the corresponding romanized form of the 880 or vice versa, especially for 520 or 650/655 fields. This is incorrect MARC, and will not be accounted for in the mapping. Libraries with holdings attached to such records should clean up incorrect fields.
Examples:
https://lccn.loc.gov/2021421243
520 ## |6 880-06 |a Detailed summary in vernacular field only.
880 ## |6 520-06/$1 |a "学者的人间情怀"是陈平原的代表作,论及"学术史""走出'五四'""左图右史""述学文体","演说现场","报刊研究"等重要话题,也都点到为止,好在大都日后在专业著作中有所展开.最重要的是,反映了他当时"压在纸背的心情".

OCLC #910728126
650 7藝術社會學. ǂ2 lcstt ǂ0 http://catld.ncl.edu.tw/subject/sh0018327
650 7Art and society. ǂ2 fast ǂ0 (OCoLC)fst00815432
650 7中国书法. ǂ2 local/OSU
650 7Calligraphy, Chinese. ǂ2 fast ǂ0 (OCoLC)fst00844390
651 7中國. ǂ2 lcstt ǂ0 http://catld.ncl.edu.tw/subject/sh0001067
651 7China. ǂ2 fast ǂ0 (OCoLC)fst01206073
655 7歷史. ǂ2 lcstt ǂ0 http://catld.ncl.edu.tw/subject/sh0016956
655 7History. ǂ2 fast ǂ0 (OCoLC)fst01411628

II.G.4. $6 and Minting of new Nomen Entities for literal field values

II.G.4.a. We will mint Nomens where the property range for a regular/880 field maps to either an RDA Entity with a secondary property with a range of Nomen, or when the mapped property's range is simply a Nomen. Where a property lacks a range, we will create literal values only. We will not reify triples with literal values in order to retain equivalence relationships between string values not associated with a Nomen. 2022-10-18
II.G.4.b. Where a Nomen is minted, the MARC 880 and regular field linked by $6 are mapped this way:
[WEMIEntity1] [propertyWRangeNomen1] [Nomen1]
[Nomen1] [hasNomenString](rdand:P80068) ["literal value of regular field"]
[Nomen1] [isEquivalentTo](rdand:P80113) ["literal value of 880"] 2022-10-18
II.G.4.c. Where a Nomen is not minted, the MARC 880 and regular field linked by $6 are mapped this way:
[WemiEntity1] [propertyWORangeNomen] ["literal value of either field"] 2022-10-18
II.G.5. Script and language of strings cannot be reliably determined from the MARC format in $6, and so are not mapped. 2022-10-18

II.I. Priorities

II.I.1. Milestones

II.I.1.a. Current milestone: MVP for Transform. Priority order for subsequent milestones: BSR, CSR, Mapping Review, Publication [milestone for transform will be created once we have MVP done]. 2022-07-27

II.I.2. Entity Types and Structures

II.I.2.a. We will focus initial mapping of MVP and BSR Milestones on singleton Work-Expression entities (non-aggregates), with aggregates and serials integrated in layers afterward. 2022-07-27
II.I.2.b. We will create a set of conditions to identify and exclude aggregates, serials, collections etc., adding them back and treating them appropriately in stages. 2022-07-27

II.J. $5

II.J.1. Preliminary processing for cultural heritage organizations and their collections 2022-08-24

II.J.1.a. Take information from id.loc's Code List for Cultural Heritage Organizations
II.J.1.b. Mint corporate body IRI for each nomen
II.J.1.c. Mint one collection work IRI for each organization using boilerplate for appellations based on institution label in code list and identifiers based on codes
II.J.1.d. Mint one collection manifestation for each collection work using similar boilerplate, including identifiers based on codes
II.J.1.e. Publish somewhere for re-use (Wikidata?)

II.J.2. When $5 indicates that a statement applies to an item entity

II.J.2.a. Mint one item entity/IRI for each occurrence of $5 2022-08-24
II.J.2.b. Relate the item to the published collection manifestation that corresponds to the code value in $5 2022-08-24
II.J.2.c. Illustration of model:

image

II.J.2.d. Example Mapping: MARC Record with Multiple $5's

II.K. $2: in 3XX and in X30/65X with indicator 2

II.K.1. Condition: source of term is entered in $2 or in indicator 2 but there is no accompanying IRI. This means the $2 value is a literal in the RDA output data.
II.K.2 Solution: Follow Dodds/Davis Chapter 3 "Custom Datatype":
    II.K.2.1. "A data model contains structured values that don't correspond to one of the pre-existing XML Schema datatypes."
    II.K.2.2. "Create a URI to identify the custom datatype and use that URI when creating Typed Literals."
    II.K.2.3. That is, if there is no $1 or $0, retain the source in either $2 (explcitly associated with an IRI) or X30/65X indicator 2 by appending a datatype IRI.
    II.K.2.4. NOTE on this solution for 3XX fields: generally we expect 3XX literal values to be easily matched ("reconciled") in the source vocabulary and associated IRIs extracted and inserted in the RDA data. We are not reconciling with external vocabularies at this time, however. That will be done in a second transformation phase and added to the transformation pipeline. At that stage, these datatype IRIs can be used to identify the source vocabulary, thus preventing the necessity of returning to the MARC data for $2 values.
II.K.3 For X30/650 indicator 1 not equal to 7, we can use the following IRIs as datatype IRIs:
    0 - LCSH.        https://id.loc.gov/vocabulary/subjectSchemes/lcsh
    1 - LCSH for children's.  https://id.loc.gov/vocabulary/subjectSchemes/cyac
    2 - MeSH.         http://id.loc.gov/vocabulary/subjectSchemes/mesh
    3 - NAL.         http://id.loc.gov/vocabulary/subjectSchemes/nal
    4 - Source not specified. Ignore
    5 - Canadian SH.       http://id.loc.gov/vocabulary/subjectSchemes/cash
    6 - RVM.         http://id.loc.gov/vocabulary/subjectSchemes/rvm
    [7 -Source in $2.   convert code to URI from http://id.loc.gov/vocabulary/subjectSchemes; if necessary convert to lower-case letters]
II.K.4. When indicator 2 is 7, the source code will be in $2; for 3XX fields, the source code will be in $2 regardless of indicator values.
    II.K.4.1. If we expect the source vocabulary to have an IRI somewhere (for example, a "Source Vocaulary" at id.loc.gov, or an RDA vocaulary at the RDA Registry), enter that information in the spreadsheet in the "Transformation Notes" column.
        II.K.4.1.1. The transform will have to perform look-ups for the source IRIs. This should be easier than searching all the specific source vocabularies for specific string values, which will be done later in the transformation pipeline.
    II.K.4.2 If the value of $2 cannot be associated with an IRI of a source vocabulary, then the transformation should output a message that a $2 value has been lost. Preferably, a report of all lost $2 values will be produced.
II.K.5. Notes on Identifiers
    II.K.5.1. The solution above will not apply to identifiers.
    II.K.5.2. RDA is equipped to represent identifiers using an appropriate "has identifier" property with the value entified as a Nomen instance.
    II.K.5.3. In addition, identifier values are usually associated with MARC fields/subfields, i.e. they are values for MARC "properties." Sources of identifiers in this case are determined by the MARC field/subfield rather than the code entered specifically in $2.
II.K.6. Example:
Original data MARC field 650:
    650 _7 $a subject headings. $2 aat
...transforms to the following RDA/XML:
    <rdf:Description
      rdf:about="https://open-na.hosted.exlibrisgroup.com/alma/01ALLIANCE_UW/bf/entity/99129883740001452#Work">
        <rdawd:P10256 rdf:datatype="https://id.loc.gov/vocabulary/subjectSchemes/aat"> (rdawd:hasSubject)
          subject headings
        </rdawd:P10256>
    </rdf:Description>

II.L Reproductions and 533

II.L.1. 533 fields will not prompt the minting of two separate manifestations (one for the original and one for the reproduction). Instead, this mapping and transform will mint a single manifestation. The project team realizes that in RDA, separate manifestations ought to be described in cases where one manifestation is reproduced as another. However, differentiating which MARC fields are "about" which of these manifestations is difficult due to inconsistent practices across cataloging communities over time. The group may revisit this at a later date, and welcomes contributions from the community in the form of transformation code that can reliably extract manifestation data for original and reproduction manifestations from a single MARC record. 2023-03-29

II.M. LRM/RDA/RDF Data Structure

II.M.1. Intermediate blank nodes and resources are never implied in this mapping. The assumption is that values are direct values for the RDA properties given. If otherwise, a transformation note is required.

II.M.2 When an IRI is given as a value, the mapping will not also include a corresponding label. Labels should be retrieved via IRI by implementers, and unless an IRI is not present, are out of scope for this mapping. See discussion for more detail. 2022-04-20

II.M.3. Datatype/Object Properties

II.M.3.a. Where IRI values are expected, object properties should be used. In other cases, datatype properties should be used. 2022-06-22
II.M.3.b. Within spreadsheets, recording method column may be used to determine property type. 2022-06-22

II.M.4. Aggregates

II.M.4.a. We will model aggregates according to Official RDA structure, using aggregating works and aggregated expressions, where they can be detected. 2022-07-27

II.N. Identifiers

II.N.1. We will consistently mint Nomens for nomen strings for identifiers found in MARC data. 2023-06-07
II.N.2. We will not use outside IRIs to identify these Nomens (although these could be mapped later), or link to these from the Nomens that we create at this time. 2023-06-07
II.N.3. Provide a scheme for the Nomen ('in scheme' + LC vocab IRI) 2023-06-07
II.N.4. Add a status for invalid identifiers (use n/P80168 'status of identification'...) 2023-06-07

III. Workflow

III.A GitHub

III.A.1. Theo will use semi-automated script to push changes directly to Master branch. We're not utilizing pull requests. 2022-7-27
III.A.2. When we close an issue, if it isn't self-explanatory, we will include a note and link to related decisions recorded in the decisions index at the time of issue-closing. 2022-06-08

III.A.3. Issue Phases

III.A.3.a. Project issues begin in the "to-do" category of the GitHub project, and are prioritized using Milestones.
III.A.3.b. Issues are self-assigned, and moved to "in progress" category at that time.
III.A.3.c. Once a first pass is complete, mappers move issue to "awaiting review".
III.A.3.d. If one or a few open questions are holding up the first pass, a mapper may move the issue to "almost done" column, adding a note in the issue explaining what needs to be done in order to complete a first pass. The self-assigned mapper will finish the mapping when the open questions are answered, or will find someone else to adopt the issue. Crystal will monitor "almost done" column to make sure nothing lingers unnecessarily. Once a first pass is complete, these issues move to "awaiting review". If a mapper believes the mapping is too complex for a single person to reasonably complete, and that the mapping would benefit from group review, the mapper will also label the issue "meeting discussion needed". 2022-09-21
III.A.3.e. Any project participant except for the person who performed the initial mapping may elect to take on the role of "reviewer" on a piece of the mapping. Reviewers self-assign issues from "awaiting review" to review mappings. Once review is complete, issues are moved to "ready for transform". 2022-09-21
III.A.3.f. Transformers select issues from "ready for transform" and write code for them. Once code is complete, issue is closed and automatically moved to "done".
III.A.3.g. If changes are needed after an issue has been moved to "done", the issue is re-opened and re-assigned, and moved to the appropriate column. Notes should be made in the issue to indicate to mappers that transformation code needs to be adjusted/checked against any changes.

III.B Google Sheets

III.B.1 Mapping contributors will edit the draft mappings in Google Sheets. 2022-05-18

III.C Transformation

III.C.1. Select fields to code, field by field. Select field using the main project board and assign yourself to the issue on GitHub. Do not code fields in "To Do" or "In Progress." It is OK to code fields in "Awaiting Review" or "Review in Progress" if the procedure below (III.C.3) is followed. It is best to select fields from "Ready for Transform"; follow the procedure below in (III.C.4). 2024-03-11.
III.C.2. A field can be moved into "Done" on the main project board if two conditions are met: (1) the coding is done; (2) mappers have moved the field into "Ready for Transform." 2022-07-28
III.C.3. When selecting a field to code in "Awaiting Review" or "Review in Progress":
III.C.3.a. Manually edit the issue by adding the appropriate label indicating the field's status on the main project board at the time of the coding. Either use "coding-ar" (for a field being coded while "Awaiting Review") or "coding-rip" (for a field being coded while "Review in Progress"). 2022-07-28
III.C.3.b. Add/Commit the code for the single field; include a commit message that includes the following: -m "Code for field XXX [awaitingReview OR reviewInProgress] #[issueNumber]." Example: "Code for field 100 awaitingReview #102." 2022-07-28
III.C.3.c. When coding is complete, manually edit the issue by removing the "coding-ar" or "coding-rip" label and replacing it with the "coded-ar" or "coded-rip" label as appropriate. 2024-03-11
III.C.4. When selecting a field to code in "Ready for Transform":
III.C.4.a. Manually edit the issue by adding the label "coding-rft" (i.e. "We are coding this field in the main project board's category 'Ready for Transform"). 2022-07-28
III.C.4.b. Add/Commit the code for the single field; include a commit message that includes the following: -m "Code for field XXX #[issueNumber]." Example: "Code for field 100 #102." 2024-03-11
III.C.4.c. When coding is complete, remove the "coding-rft" label and replace it with the "coded-rft" label. 2024-03-11
III.C.5. To complete a field:
III.C.5.a. Change the transformation-related tags from the appropriate "coding" tag to the matching "coded" tag. If the field was in "Ready for Transform", close the issue for the field. 2024-03-11.
III.C.6. Pending decisions regarding the transformation workflow:
III.C.6.a. UNDECIDED: How to indicate when a "coded-ar" or "coded-rip" mapping has changed during review and the code needs to be reviewed/updated. 2024-03-11

IV. Meeting Norms

IV.A. We will talk and listen in turns, using hand-raising functionality in Zoom interface to order discussions. 2022-08-03
IV.B. Agendas, set by Crystal and edited by anyone, will include time limits for items. Unused time can be rolled over to the next item. 2022-08-03
IV.C. Timekeeper and note-taker will be assigned at the start of each meeting (as separate duties). Note-taking should rotate between UW members. Notes may be amended by anyone after each meeting. 2022-08-03
IV.D. Off-topic but important items will be noted in a "Backburner" section of the notes/agenda in an effort to keep conversations focused. 2022-08-03

⚠️ **GitHub.com Fallback** ⚠️