2024 Meeting Minutes - uwlib-cams/MARC2RDA GitHub Wiki

May 1, 2024

**See time zone conversion** **Meeting norms** Present: Crystal Yragui, Deborah Fritz, Laura Akerman, Cypress Payne, Junghae Lee, Ebe Kartus, Gordon Dunsire, Sita Bhagwandin Absent: Adam Schiff, Penny Sun, Jian Lee, Sofia Zapounidou Notes: Cypress Payne Time: Ebe Kartus

Water Cooler/Agenda Review/Roles for Meeting

Updates

  • Project plan draft
    • Quick overview of content and organization of document
    • Will be really useful as an introduction to M2R project
    • This week: review document, and comments and suggested edits
  • Crystal, Cate, Deborah, and Adam meeting next week about MLA table
  • Crystal still needs to set up 533 meeting (will do so today)
  • Cypress is working on getting output from relator transformation code for feedback

Minting IRIs

Minting IRIs Google Doc

The group needs to decide how we want to mint IRIs.

Minting IRIs for identified entities

  • IRI is concatenation of AAP for identified entity
  • Normalization process – removing spaces, punctuation (except – ), rendering result in upper or lowercase
  • Don’t need to worry about length
  • Local identifier transparency can aid in cleanup although they should be completely opaque, but this solution works for inside the transform

Extension to manifestation

  • Main aim is to produce something that allows automatic de-duplication of manifestation IRIs

Questions

  • Crystal: is it easier to reconcile things that are the same or pull apart things that have been falsely reconciled?

    • Seems to be easier to merge than unmerge
    • Original MARC record is available which can help with de-merging
    • In order to avoid contaminating a triple store, this interception needs to be done beforehand in the output from the transform, but this might be a tremendous human intervention
  • Where is our data going? Might determine how we need to do this

    • Wikidata? Wikibase?
    • Closed vs open
      • Open means we can’t delete
  • Can we check when processing whether a name is undifferentiated or not?

  • Lots of complications to discuss

    • Undifferentiated iris
      • Gordon: some kind of scoping analysis should be done – extract all 7XX fields from single MARC database to see duplicates. Also know about undifferentiated name headings
    • Duplicates vs false merges

Wrap-Up

Backburner

Action Items

  • Team will asynchronously review the project plan draft before next meeting and add comments or suggest edits

April 24, 2024

**See time zone conversion** **Meeting norms** Present: Crystal Yragui, Adam Schiff, Deborah Fritz, Laura Akerman, Cypress Payne, Junghae Lee, Ebe Kartus, Penny Sun, Jian Lee, Sofia Zapounidou Absent: Gordon Dunsire, Sita Bhagwandin Notes: Cypress Payne Time: Ebe Kartus

Water Cooler/Agenda Review/Roles for Meeting (10)

Reflection on last week (10)

  • Chicken/egg issue with LRM model – no systems are using it because there's no data, but there's no data because there are no systems
  • Is there a system out there that can actually use this?
    • Sinopia? Wikibase?
    • We have time to think about/explore options on where we want to store our data

Project Plan Draft (Deborah) (20)

Disclaimers – this is an unfinished document and is based on Gordon’s outline talking about identifying, relating, and describing the entities described in MARC record. Deborah is expanding it and pulling in the pieces the group has been working on.

  • WEMI entities described by MARC record
  • Related entities described by data in headings fields in the record
    • Agents, works, expressions, manifestations, items, nomens, places, timespans (also concepts)

Deborah’s question: is my approach worth pursuing or is there another/better way of doing this?

  • Crystal: From a project management perspective, this is a really thorough description of what we’ve been doing & where we’re headed. We should put this in shared drive so we can potentially collaborate on it and maybe eventually publish on GitHub
  • Laura: I agree, this is what we’ve been doing – identifying entities field by field. Biggest challenge has been when the entity is ambiguous – those decisions will need to be documented and clarified
  • Sofia: This gives order to what we’ve been doing – I like the approach. Can we identify the fields we are going to use to identify entities described by record? This document can be an outline for the transformation algorithm.

Aggregate Markers (Deborah) (30)

  • Looked at Deborah's Excel sheet, which is organized by tag
  • Deborah is compiling lists of terms that identify aggregates, we looked at music terms.
    • Adam: single music works will still have a plural term
  • What’s the next step? We need dedicated specialists/help for some of these special formats such as music in order to proceed with lists

533 and 008: Need a separate meeting to discuss? LA, CY, anyone else? SB? GD? CP? (10)

Laura is going through field by field with conditions for how to handle them when 533 is present and has a spreadsheet. Crystal will set up a meeting and invite Laura, Sita, Cypress, Adam, Jian, and Gordon.

Wrap-Up (10)

Action Items

  • Deborah will try to get the project plan doc up next week so we can begin collaborating on that
  • Crystal will reach out to Cate about music markers for aggregates
  • Crystal will set up a meeting to discuss 533 and 008

Backburner

April 17, 2024

**See time zone conversion** **Meeting norms** Present: Crystal Yragui, Adam Schiff, Gordon Dunsire, Deborah Fritz, Laura Akerman, Cypress Payne, Sita Bhagwandin, Junghae Lee, Ebe Kartus, Penny Sun, Jian Lee, Sofia Zapounidou LKD Project members (guests): Matias Frosterus, Jarmo Saarikko, Minna Kantanen, Marja-Liisa Seppala, Antii Impivaara, Alex Kourijoki Absent: Benjamin Riesenberg Notes: Sofia Zapounidou

Housekeeping/Roles for Meeting (5)

  • Recording
  • Notes
  • Agenda

Introductions (10)

LKD Project team

  • Matias Frosterus, IS manager at the National Library of Finland (NLF), project leader for LKD Project
  • Jurmo Saarikko, responsible for the modelling part (Bibframe-based), previous project Agent model
  • Minna Kantanen, Cataloguer, systems librarian, MARC21 & RDA expertise
  • Marja-Liisa Seppala, RDA coordinator
  • Antii Impivaara, Technical resources for the LKD project
  • Alex Kourijoki, Information specialist, National metadata repository of Finland (Melinda)

About LKD Model Project

Matias Frosterus presented

Timeline: 2022-2024 NLF Strategy: use of LOD, open-source, open interfaces, collaboration Description: Linked data project for which the Bibframe model was selected. The reason behind the adoption of Bibframe has been that it can accomodate bibliograpic data under the RDA rules, there is a community behind it, conversions and related systems/tools exist, and it seems to have a wide adoption. Currently, NLF and partners use a common metadata repository called Melinda. Melinda is based on commercial software Aleph+custom services. The goal is to replace Melinda with a linked data capable system. In this contaxt, LibrisXL and Folio have been considered, but this task remains on hold till 2028 (initial planning was 2025) Nevertheless, the data model part is needed as some libraries already migrating to linked data systems (namely Quria from Axiell company) Infrastructure: Besides the Melinda infrastructure, the NLF has created many controlled vocabularies to be used in linked data projects. These include

  • "Finnish Metadata Thesaurus", includes the RDA vocabularies + new terms + URIs
  • FINTO, ontology and thesaurus service

Model:The model is based on Bibframe (bffi namespace is used), but it has been expanded to accomodate the semantics of the LRM/RDA Expression entity. As a result the properties of a given bf:Work will be mapped to properties of the bffi:Work and bffi:Expression classes.

About MARC2RDA Project

CY has presented the MARC2RDA project at the UoW.

Discussion

The discussion touched many issues relating to the versions of Bibframe, the relationships in Bibframe, systems, the modelling of aggregates and diachronic works, and datasets and publications.

  • Versions of Bibframe: There is a Bibframe Interoperability Group. NLF will participate. NLF colleagues perceive the mapping between bf:Work to bffi:Work and bffi:Expression as an easy one. They do not expect problems on this.
  • Relationships: the NLF will enrich their model with more relationships than the official BF if needed
  • Systems: the NLF considers LibrisXL and Folio. They are also investigating Sinopia and Wikibase. Regarding Sinopia, the NLF colleagues expressed the difficulty in creating templates.
  • Modelling issues: Aggregates is one of the issues studied by the NLF team and there may be a collaboration between the two projects (NLF LKD and UoW MARC2RDA) on this. Diachronic works will be the next cataloguing case (after aggregates) they will work on.
  • Datasets: there are thoughts about ingesting BF data from Sweden National Library (Libris), and the Library of Congress
  • Publications: There are no publications regarding the LKD project so far.
  • Decision: Teams will follow each other's work and there will be another meeting between the teams in the Fall 2024.

April 10, 2024

**See time zone conversion** **Meeting norms** Present: Cypress Payne, Sita Bhagwandin, Gordon Dunsire, Junghae Lee, Adam Schiff, Ebe Kartus, Laura Akerman, Deborah Fritz, Crystal Yragui, Penny Sun, Jian Lee Absent: Benjamin Riesenberg, Sofia Zapounidou Time: Ebe Kartus Notes: Jian Lee

Water Cooler/Agenda Review/Roles for Meeting (5)

Announcements (10)

  • Next week, we will be joined by guests from the LKD Model team from the National Library of Finland to hear about their project and exchange ideas about MARC21, RDA, and BIBFRAME

UW Staffing Updates

  • Benjamin has taken a position at the University of Oregon Libraries as a Metadata Librarian, and is leaving the UW Libraries in May.
  • Crystal is moving into a temporary Metadata Librarian and co-Interim Head of Metadata and Cataloging Initiatives Unit position in May.
  • Junghae is serving as co-Interim Head of Metadata and Cataloging Initiatives.
  • Translation: Crystal is temporarily serving in Benjamin's prior position, and Crystal and Junghae are sharing Theo's former position on an interim basis. UW is still down two people on our linked data team.
  • Crystal and Cypress are looking to hire another student in May/June.
  • Deborah: Anyone attending ALA? Jamie Handling (sp?) is interested in meeting with members of this group there.

Next Steps: Relationships (30)

“Agent Relator Transformation Table” and the “Using the Agent Relator Transformation Table”

  • Deborah is getting these ready for Cypress to use to continue working on transformation logic
  • Agent relationships are moving along. May need some changes on the using the relator transformation table document about aggregates
  • The latest MARC relator values mapped to RDA_2024049 is up in the Google drive.

"HeadingsFieldsPersonalNames"

  • Still some outstanding questions, but potential for students or others to start working on this
  • Outstanding questions from the HeadingsFieldsPersonalNames table. Maybe students can work on it to free up Deborah’s time. But need to answer the questions on the spreadsheet first. Penny agrees to pick up the table and continues Deborah’s work.

WEMI to WEMI

  • Deborah has started on a WEMI to WEMI table similar to the agents table. Still need a list all of the relationships so they could be mapped. So anything isn’t that a default for it. Is it worth looking at the PCC relationships?
    • We can’t do anything about the PCC relationships label can only map to an unconstraint property and unconstraint properties are not proper RDA. Also it is making more work and reducing the quality of the first phase project.

Agent as Subject

  • Where to keep agent as Subject notes? Combine with Gordons’s subject document?
  • Deborah’s document should be the master document, Gordon’s document can be folded in.

Next steps: Subjects (10)

  • Transforming subject data document is going well--is it time to finalize and integrate with spreadsheets? Is this what we will do with this documentation, or will it live somewhere else?
  • Sofia asks whether 630 information should be transferred to Google Sheets or if it overrides what is in Google Sheets.
  • There are a whole bunch of questions in the document still not answered. After that, we can fold that into the relationships document.
  • Sofia’s question about how to identify an expression can also be folded in the WEMI to WEMI document
  • We should also transfer the information regarding 630 to the 630 Google sheet because that is the master mapping document.
  • Should think about what the folding in ought to look like for the relationships and subjects document.
  • Penny will go through the subject document to pull out all the open questions so we can address them.

Project Plans: (We've got just under 9 months projected) (30)

  • Portion of the project = identifying entities described by the MARC record and related entities.
    • Primary WEMI relationships
    • WEMI-Agent relationships
    • WEMI-WEMI relationships
      • found in 130, 100/110/111 + 240, 100/110/111 + 245, 440, 6xx, 70x-75x, 76x-78x and 80x-83x fields
    • Concept relationships
      • Classification
      • Subject headings
  • Portion of the project = Describing entities
    • Minting IRIs
      • How to formulate those IRIs
      • Mapping attributes from:
        • The entire record for entities with primary relationships
        • AAPs for entities with secondary (?) relationships
      • Using NAF IRIs if provided
  • At some point, we will need to decide that we have intellectually arrived at our Phase I draft for big decisions
  • Then, move into aggressive review of first-pass mappings along with transformation code writing and output review
  • Once first pass mappings are reviewed, focus on revamping documentation ahead of publication
  • End of year, publish as a GitHub package and spend time in the new year writing papers, giving conference publications, etc.
  • Plan phase II

Notes

  • Crystal went over project plan
  • Identifying aggregates should be added to the plan somewhere, and types of aggregates
  • Output of phase I needs to be coherent RDA linked metadata. Standard technique is to identify instances of entities (identify primary-level WEMI stack of entities in a MARC record) and establish WEMI-to-WEMI relationships.
  • Identifying entities sounds like the important next step. Deborah will work on aggregates first, then the WEMI relationships

Wrap-Up (10)

Backburner

  • Revisiting 008. Laura will provide a summary of what should be changed, or a proposed plan for changes, and then we could talk more about it.

Action Items

  • Penny will go through the subject document to gather open questions
  • Deborah will address aggregates questions. It will delay the work on the WEMI relationships
  • Crystal will send out an email regarding the LDK model discussion next week

April 3, 2024

**See time zone conversion** **Meeting norms** Present: Adam Schiff, Crystal Yragui, Cypress Payne, Deborah Fritz, Ebe Kartus, Gordon Dunsire, Laura Akerman, Sita Bhagwandin, Sofia Zapounidou, Penny Sun Absent: Benjamin Riesenberg, Erin Grant, Jian Ping Lee, Junghae Lee Notes: Sofia Zapounidou Time: Ebe Kartus

Water Cooler/Agenda Review/Roles for Meeting (5)

Announcements (5)

  • All BSR fields have been assigned and are in progress!
  • Transform update: We will mint IRIs where they are needed for well-formed RDA. Right now, we are minting fake IRIs which are unique within each transform run but are not persistent and do not resolve. Much needs to be decided, and Cypress has not looked at this in depth yet. We should discuss URI minting as we approach publication of the Phase I transformation code.
  • Cypress is going to begin marking things "URGENT" if they will hold up her work if they go unaddressed. This is to alert Crystal that they need attention
  • Joint meeting with National Library of Finland to discuss M2R and their LKD data model: similarities, differences, and approaches. The meeting will be organized during one of the team's weekly meetings.

Relator Table (30)

  • Relator table is functional, Cypress and Deborah are in communication about updates

  • Can resume mapping other aspects of second layer of relevant fields. Let's figure out how this ought to be done so that assigned folks can get to work.

  • Check in with Deborah on what she has been working on with regard to this?

  • Deborah has started working on a similar approach for WEMI to Agent using the X00 fields.Some questions

    • No official mapping regarding $i. Some of them will probably match with RDA relationships, some will not.
    • Aggregates remain tricky. Probably to be handled in Phase II. Propose to split dataset aggregates/non aggregates before transformation
  • Cypress will turn into code Deborah's table. Agent instances will be minted but information about these entities found in their AAPs will not be extracted separately. As an example, a Person agent will be minted using the 100 field, the relationship between this Person and the Work will be based on $e/$4, but information about the Person like birth and death dates from 100$d will not be extracted at this point. Probably, this should be done later.

  • Gordon thinks that the mapping of WEMI to WEMI relationships is a really important task, especially the inherent relationships W-E-M-I (same tree)

    • Regarding aggregates, we are not sure to which work we are attaching the graph created based on the record info
  • Crystal proposes to map aggregates, but since their mapping will be messy, add a disclaimer for aggregates (aka that they will be handled in a next phase)

  • Deborah asks Gordon if he has any ideas on the mapping of aggregates, since there is no field that explicitly states this work is an aggregating one. Probably, we will have to create a model for handling the cases of multiple expressions embodied into the same manifestation. It can be done, but the algorithm must include many IFs.

Transforming Subject Data (30)

  • Gordon's Transforming subject data document
    • Gordon presented the examples for 630
    • regarding the rule that if LDR/18=c, Gordon proposed to use the British Library table for punctuation
    • SZ tried to find this table online, but could not.
    • parts of the aap for 630 works can be split
    • propose to ignore more analytic subject relationships, use just the generic one

Brainstorm: Options for Storing Transformed Data and Publishing URIs (15)

  • Where do we imagine our URIs living at the end of Phase I? Phase II? Permanently?
  • What about our transformed data?
  • These aren't questions we need to answer soon, but it is important that we start thinking seriously about them.
  • Crustal proposes to change the PCC view on the use of $0
  • Regarding the transformed data
    • Gordon. idea for central storage of the trasnformed data and deduplication algorithm
    • Gordon. creation of persistent URIs
    • Cypress. we can use multiple bases for the URIs

Wrap-Up (5)

Backburner

Action Items

March 27, 2024

**See time zone conversion** **Meeting norms** Present: Adam Schiff, Crystal Yragui, Cypress Payne, Deborah Fritz, Ebe Kartus, Gordon Dunsire, Laura Akerman, Sita Bhagwandin, Sofia Zapounidou, Penny Sun Absent: Benjamin Riesenberg, Erin Grant, Jian Ping Lee, Junghae Lee Notes: Cypress Payne Time: Ebe Kartus

Water Cooler/Agenda Review/Roles for Meeting (5)

Announcements (5)

Vendor Relationships and Licensing (5)

  • Notes from Crystal and Adam's meeting with Ex Libris are up on Google Drive here
  • Our work is currently published under a CCO 1.0 Universal (Public Domain) license.
  • License description: "By marking the work with a CC0 public domain dedication, the creator is giving up their copyright and allowing reusers to distribute, remix, adapt, and build upon the material in any medium or format, even for commercial purposes."
  • Vendors have been expressing interest in using our product, so if this license is not what we want to do, now is the time to change it
  • Everyone seems in agreement that this is the correct license to be using

:flashlight: $0's and $1's: Out of the Fog (30)

Definitions

$0 Authority record control number or standard number

  • Subfield $0 contains the system control number of the related authority or classification record, or a standard identifier. These identifiers may be in the form of text or a Uniform Resource Identifier (URI). If the identifier is text, the control number or identifier is preceded by the appropriate MARC Organization code (for a related authority record) or the Standard Identifier source code (for a standard identifier scheme), enclosed in parentheses. When the identifier is given in the form of a Web retrieval protocol, e.g., HTTP URI, no preceding parenthetical is used.
  • Subfield $0 may contain a URI that identifies a name or label for an entity. When dereferenced, the URI points to information describing that name. A URI that directly identifies the entity itself is contained in subfield $1.
  • See MARC Code List for Organizations for a listing of organization codes and Standard Identifier Source Codes for code systems for standard identifiers. Subfield $0 is repeatable for different control numbers or identifiers.

$1 Real World Object URI

  • Subfield $1 contains a URI that identifies an entity, sometimes referred to as a Thing, a Real World Object or RWO, whether actual or conceptual. When dereferenced, the URI points to a description of that entity. A URI that identifies a name or label for an entity is contained in $0.

We know how to create well-formed RDA data from $1. Can we agree that the problem is $0?

Determining a way forward with $0

  • Problem: What is the RDA entity of the related authority or classification record, if any? And can we represent it as well-formed RDA in a consistent way?
  • If we can't come up with a model during the meeting time, is anyone willing to give it a try asynchronously?

Discussion

What is the RDA entity of the related authority or classification record, if any? And can we represent it as well-formed RDA in a consistent way?

  • Gordon: This can't be determined; If we are minting IRIs for instances of entities (persons, corporate bodies, things with authority records), we can relate that $0 as some kind of identifier.
    • We would have to make an interpretation – treating the identifier of an authority record as if it was an identifier of descriptive work about the entity we are attaching it to (and noting this in the transform)
    • Best that can be done is to use ‘is person described by’ with the identifier recording method, with the entire contents of $0
  • $0 points at name authority document
  • Deborah: : This is saying the name is authorized, $0 applies to person’s name, $1 applies to the person. We aren’t identifying the person we’re identifying the name (nomen)
  • Crystal: the closest thing we’re going to get is “person described by” and treating authority records as work.
  • This isn’t going to uniformly apply to every $0, we’re currently talking about agents
  • With lc we’ve already decided we’re converting lc $0s to $1s
  • We need a transform meeting to discuss minting IRIs

Update on Relator Table (15)

New version of relator table is up! Major changes:

  • Split field and indicator into separate columns
  • Added new columns
    • Unconstrained curies
      • Adam: these are in $4s as https
    • Column identifying which relators map to multiple domains
      • Cypress can put conditions into transform: if $4 has RDA URI, map as same URI. Otherwise, if it has an RDA relationship label, rely on that. But if it only has MARC info and multiple domains, we need to default to manifestation
    • X11 $j columns
  • 700 and 711s split up
  • 720s are split into ind1 = 1 and ind1 = #|2
  • There are 93 RDA to WEMI relationships that do not have MARC relators
  • Question for Cypress from Laura: can the code account for changes in the table?
    • Cypress: new rows or values won’t require changes to code. New logic or new columns that are added will require adjustments.

Transforming Subject Data (30)

  • Looked at "Heading or term" examples in Gordon's Transforming subject data document
    • Started at Example 50
    • We may know there’s an IRI for a person, but the computer doesn’t
    • Relating subject work to subject person hasn’t been added yet, but it will be
    • Gordon switched $v to 'has category of work'
    • Punctuation in skos:prefLabel needs to be worked out
      • Crystal (in chat): Once this document is finished and we have talked it through, maybe a subgroup of us can run through it with a finer tooth comb before the students use it as a basis for mapping and transformation to address punctuation etc.?
    • Anything with ind2=4 we need to use datatype. We don’t know anything about it, so we’re not minting an IRI.
    • Example 59 contains $0 values
      • We looked at processing $0 values as FAST URIs

Action Items

March 20, 2024

No meeting

March 13, 2024

**See time zone conversion** **Meeting norms** Present: Adam Schiff, Crystal Yragui, Cypress Payne, Deborah Fritz, Ebe Kartus, Gordon Dunsire, Laura Akerman, Sita Bhagwandin, Sofia Zapounidou Absent: Benjamin Riesenberg, Erin Grant, Jian Ping Lee, Junghae Lee, Penny Sun Notes: Crystal Yragui Time: Ebe Kartus

Announcements (5)

  • Many members are attending the CEAL 2024 Annual Meeting today
  • Event of interest: MARC and Its Transition in the Linked Data Environment: Pt.2: MARC to Linked Data - More Possibilities -- Friday, 3/15/2024 (2-3pm EST/1-2pm CST/11am-12pm PST). Registration Link

Relator Table Transform Round 1 (20)

  • First test has been a success! Report-back from Cypress
  • See issue
  • Cypress gave a run-through of the code for the relator table in Oxygen. There were lots of questions!
  • Adam: Will we use constrained or unconstrained properties?
    • Constrained.
  • What happens when $4 contains unconstrained properties?
    • Cypress: We will need to figure this out.
    • Deborah: There is a default to add here in the table.
    • The PCC needs to revisit the decision to prefer unconstrained properties, which came from a preference for simplified labels.
  • Discussion on the Authority Toolkit and URIs it supplies
  • Case does not matter for matching
  • Issue to work on: When a code or relator in $4 or $e has a domain that could be two or more entities from WEMI stack, table needs to determine which entity to set as the "default" domain for the code to choose. The code will need to be adjusted to follow these choices once they are integrated into the table.
    • Example: acp/art copyist could be creative person of manifestation, expression, or work. We don't know which it is. We need to set defaults. How?
  • RDA constrained labels are not user-friendly and definitely not intended for display. Unconstrained are friendlier and more adaptable for display.
    • Ebe: NLNZ uses constrained properties, maps them to simplified display labels for users.
  • Deborah: We don't need to bring skos:closeMatch or inverse properties over to results from tables in future code iterations (note to Cypress)

$0's and $1's: Revisiting our Choices, Cont'd (30)

  • Reviewed discussion from last week
  • Are the differences between $0 and $1 confusing?
  • Adam: No. The PCC has produced clear documentation on the differences between these subfields, and where to put URIs from various sources.
  • Ebe: heard during a BIBFRAME presentation that LC is putting everything in $0, not using $1 at all.
    • Crystal: That is incorrect practice!
    • Adam: Is this about converting BIBFRAME to MARC? Possibly.
    • Conversation about LC practice. Not sure what this looks like, but won't base our mapping on incorrect practice.

What is $0 Referring to, Really?

  • Gordon: We can't determine the entity for which $0 is a referent.
  • Laura: It's referring to a document on the web to support an AAP. A manifestation of a Work.
  • Let's take an example. 380 $a Motion Pictures $2 lcgft $0 (DLC)gf2011026406 $0 http://id.loc.gov/authorities/genreForms/gf2011026406
    • The relationship for 380 is Work --> category of work --> [$0]
    • Sofia: RDA is agnostic about the range for category of work. Using the URI as a value here is fine. Use it.
    • Crystal: Decide for $0 on a case by case basis based on ranges?
    • Gordon: This makes sense here because this is an attribute field. Relationship fields are completely different.
  • Gordon: In RDA, for genre/form, elements that take value vocabularies, or, attribute fields, we should interpret $0 values as $1 values for our mapping. For relationship fields, where the element is pointing to something that is potentially an RDA entity (but also potentially not), our approach needs to be different. Our mission is to create well-formed RDA data.

Subject and Classification Mappings (30)

  • Minting IRIs: We don't really have a strategy for minting these uniquely in a way that stays consistent each time we run the code. Deduplication/entity management is a phase II effort.
    • Cypress: Generate ID function is what the transformation is using.
    • Gordon: This is trivial. The subject transformation mappings paper (below) describes a separated transform for each. ARK transform scheme.
      • Sounds great to Cypress
  • Gordon's document
    • Check in: Have we all read this? Has it been substantially updated since last week?
    • Adam: Looking up IRIs for LCSH/id.loc vocabularies: will we do this before minting our own IRIs, or after?
      • After. id.loc.gov metadata attached to IRIs is incoherent. We will re-mint and define subjects as skos:Concepts. Recommended course of action: Don't deduplicate in post-processing. Instead, create and publish a mapping. Assert that minted entities are equivalent, not sameAs. Keep id.loc.gov metadata at arm's length from ours.

Phase II Questions

  • We've talked about Sinopia as a home for our output. What do we think about Wikidata or Wikibase instead?
    • See Sofia's recent work: "Entity Management Using RDA and Wikibase: A Case Study at the National Library of Greece"
    • We've got a problem with minting IRIs...haven't even talked about a neutral base domain for the transform. Could Wikibase/Wikidata offer a solution? We know both Sinopia and Wikibase can mint unique persistent IRIs for entities created natively in those interfaces.
    • Automatic deduplication and dereferencing
    • What about RIMMF?
    • We need to discuss these things further before releasing the tranformation.
    • Who will host?

March 6, 2024

**See time zone conversion** **Meeting norms** Present: Deborah, Crystal, Gordon, Jian, Laura, Junghae, Sita, Ebe, Penny Absent: Sofia, Erin Notes: Junghae Time: Ebe

Announcements (5)

$0's and $1's: Revisiting our Choices (30)

  • Decisions Index $0/$1 Section
  • Discussion on $0's and $1's
    • Special discussions on: II.C.1. Transform structure for $0/$1
      • II.C.1.a. When $1 exists: We will avoid minting extra entities or relating IRI's as authorities; If $0 exist alongside a $1 in the same field, ignore $0
      • II.C.1.b. When $0 exists and $1 does not exist: We will not mint an entity and then assign the $0 as an identifier or IRI for a metadata work about that entity.
  • Cypress's observations from 380:
    • The current code only maps $0 values if they begin with 'http://rdaregistry.info/termList' or 'http://id.loc.gov'. Any $0 values that don't begin with either of those are not mapped and outputs a comment instead.
    • However, this doesn't seem to match the comment in the 380 spreadsheet, which says: "record as IRI if string begins with "http", else as identifier. Do not map if it duplicates $1 --LNA 4/5/2023"
  • This is proving challenging to transform, and impractical from a code output standpoint. Will users of our transform be happy with this result? Can we put together a sub-committee to rethink this issue and come back to the group with a fresh proposal?
    • $1 is the real world object which is the value of the property in the subfield. What does $0 identify or represent as an IRI?
    • Adam: $0 represents the authorized access point, that is the value given for the object of the property - that would be a nomen.
    • Laura: Is it the identifier for the authorized access point or identifier for the collection of data supporting authorized entity referencing in the field? I think it is the collection of data.
    • Deborah: What about $0 is for the reference source for the nomen so that would take care of both Laura and Adam say. It's an identifier for an AAP and an identifier for a nomen but really it's the reference source for the AAP and reference source for the nomen. If you're doing the IRI for a nomen, then there's an element reference source in that description of the nomen that links out to where you got this AAP from.
  • What kind of RDA entity is the reference source? It wouldn't be the RDA entity. The RDA entity would be the nomen and then in the description of the nomen, there is a relationship, reference source. A name authority file is not an RDA entity. Therefore, we don't want that (Gordon).
    • Think about that reference source element for nomen. Is it also restricted to linking to an RDA entity? We're in the description for the nomen, and there is an element that says 'reference source.' Then where is the source for the AAP? Does it have to be an RDA entity as well?
    • Adam: In practice, the answer depends on the property. For 380, such as category of work, the value would be terms describing genre/form, which falls outside of RDA values. In the case of affiliated institution, the value would be a corporate body, which is an RDA entity.
  • We will continue to discuss this topic next week.

Subject and Classification Mappings (30)

  • Gordon's document
    • The paper is a working document.
    • It has been decided to select option 3B for classification numbers and extend it to subject headings. While there's a similar treatment of classification numbers as concepts, we can also retain additional information embedded in subject heading fields because of semantics embedded in those fields. If we wish to retain as much RDA compliant and compatible information from legacy MARC21 records, then this would be the most suitable to use.
    • The presence of expressional level subfields in AAP raises a question: whether it is a sufficient indication that we were dealing with an expression rather than a work. It is challenging to discern this from the AAP because of issues with aggregates and the inclusion of a language in a work.

100/600/700/800 Mappings (25)

  • Deborah's spreadsheet
  • This shows patterns among 100/600/700/800, e.g., indicators remain consistent most of the time.
  • $c is problematic since it includes a variety of information, such rank, roman numeral, etc. Problematic name subfields are highlighted in red in spreadsheet.
  • We assume these are authorized access points as long as they follow schemes.
  • We will continue to discuss this topic next week.

Action Items

  • Erin will send more data to Deborah.
  • Crystal will upload the meeting notes with Ex Libris in the shared Google Drive.

February 28, 2024

**See time zone conversion** **Meeting norms** Present:Crystal, Theo, Jian, Junghae, Adam, Deborah, Ebe, Sofia, Sita, Penny Notes: Jian Time: Ebe

Announcements (5)

  • Theo's last meeting (for a while, at least!) Thank you for all your work, Theo!
  • Gordon and Erin absent today

770 and Similar Linking Entry Fields (15)

  • See [Question] in 770 issue: Essentially, asking whether mappers ought to be using linking entry notes to attempt to mint new entities for the things being described or whether we should create notes. Guidance will help with many similar 77x fields.
  • Crystal sees these as similar to 505 notes and thinks they should be notes (for Phase I at least), but wanted to consult with the group before deciding.
  • Laura: they are not note fields like 505, they are linking fields. There should be enough information to mint an entity.
  • 770 indicates relation to a work. If we are minting a work, then we will also need to mint an expression or item to meet the minimum description of work.
  • Adam commented that when looking at all the subfields in 770, such as subfields for physical descriptions and publication information, 770 field looks more like describing a manifestation.
  • Similar situation to series. A series statement most of the time only has a series title
  • Need further discussion. It may depend on each linking field. Need a new discussion page and lots of examples
  • Crystal will set up a discussion for this.

Relator Table and Related Code

  • Theo: Code is functioning, passed codes to Cypress. Cypress is going to eliminate the repetition of conditions, such as counting same conditions over and over again.
  • Crystal will be the person to communicate for the conversion sheet in Google drive.
  • The relationship table will need to be updated regularly. There are new relator codes that have been added to the LC relator terms.
  • PCC label? Currently PCC policy is not to use $4 at all.
  • Table needs to be maintained. Adding a new role when there is a new relator code should not be a problem.
  • Debra has been adding more conditions to the table and the MARC relator terms explanation document. How to coordinate that with Cypress’s work when she works on the table and the transformation code? How to accomplish feedback to Cypress?
  • The explanation document is a first pass, prefer Cypress not to use it until is more polished.
  • Debra: x00 and x10 has the same pattern. Debra will on a table for x10, and x11.

$0

  • Cypress does not know what to know with subfield 0.
  • The current decision is to ignore $0 when $1 exists. This does not look right. Will need to revisit the decision index for treatment of $0 and $1
  • most $0 will have a LC URI, what to do if that’s the case? Map to identifier that’s been referenced?

February 21, 2024

**See time zone conversion** **Meeting norms** **Present:Erin Grant, Crystal Yragui, Laura Akerman, Ebe Kartus, Penny Sun, Sita Bhagwandin, Adam Schiff, Theo, Deborah Fritz, Gordon Dunsire, Junghae **Notes:Laura Akerman Time:

Announcements (5)

  • Erin Grant will be the supervisor for this project going forward.
  • 11:00am PST today we will meet about the relator table and coding in the same zoom room as this meeting.

Classification

*Issue: minting multiple IRIs for the same classification as a SKOS concept subject - use some kind of convention that embeds the classification number in the URI? Or use a hash URI? If we create identical URIs, they don't have to be "deduped". Otherwise, create a map between local URIs and URIs for the scheme if available. *Some of same issues come up for names but with AAPs and authorities for names, etc. there are other aspects that haven't been sorted *Gordon's Transforming Subject data from MARC21 to RDA https://docs.google.com/document/d/1T5VyAH6bPKBTJp_j4l2ecOLUlP00u-GK7Ubti58mtqU/edit (shared with individuals only) 3b looks at subject headings or terms. But the diagram shows has subject Concept which has notation, Classification Number, has alternate label "Classification number" is in scheme "Classification scheme". Example 7 - a SKOS graph generated from 050, using alt labels which allow to get the full scheme notation because only one preflabel allowed in SKOS

  • Possible post-processing to link up our little SKOS concept with, e.g., LC's linked data for the classification scheme that included human-readable. *3B has a very similar structure for subjects (65x and parts of rest of 6xx( but includes subject scheme rather than class scheme. *Subjects that are also RDA entities (person/coroprate, AAP for work, etc.). require a nomen and classification scheme (Subject scheme) *Subjects which are name/title have a work as the subject, not the person, Debora points out.
    *Discussion around MARC $c not being included in the nomen properties except as part of an authorized access point, etc. *Subjects that have a $t we can assume it's Work? Discussion - they might be expressions, and certain subfields indicate expression properties. Could we use language of representative expression? (Adam says - dangerous) Can we only treat as work? Or can we have an expression as subject? (OCLC use of word "work" in documentation is probably not RDA definition).
    *Side discussion on whether some author/title subjects in 6xx that follow a pattern are aggregating works or and/or expressions. Language for an ag. work is an indication of a new ag. work! rather than expression of an ag. work. WE lock for aggregating work.
  • Gordon cautions to avoid erroneous data in conversion. Adam points out that it's erroneous to say a subject is a work with representative expression English, when the 6xx has $l English of what is probably a translation of a work originally published in German.
    *Deborah points out that for ag. works, use of $l shouldn't be how the language is handled.
    *Adam: h, s, l and f indicate expression properties. Maybe subfield m and r also. Gordon had mapped some of these as "representative expression" properties.
    *Alternative: mint an expression when we have expression properties? Cleaner or more complex.? vs. Just remove $l, s, etc. from the Work and...
    Work about an expression is about a Work....

To be continued. Gordon will add questions and more to his draft.

Action items

February 14, 2024

**See time zone conversion** **Meeting norms** Present: deborah theo laura penny jian gordon junghae adam crystal sofia ebe Notes: Theo Time: Ebe

Announcements (5)

  • Meeting will end at 9am PST due to scheduling conflict
  • Crystal and Adam met with Ex Libris: Crystal will put notes up in Google Drive soon
  • Meeting notes:
    • They're interested in our code but they were vague. Our project expressed interest in open code; they do share code as open on CloudOps. Even the idea of covering some expenses was floated. Perhaps their developers can help code. But there needs to be more discussion.

Transformation Workflow (10)

  • note from coders: coordinate with mappers: if "ar" or "rip" are coded, ask them to make a note in the issue when they move to "rft." It would be particularly helpful to note whether changes were made to the mapping during review.
  • note from mappers: how will mappers know whether coders have coded before mappers have moved to "rft"? this is not part of the established workflow
  • Meeting notes:
    • The goal is to get a note in the issue from mappers when something is moved from ar or rip to rft AND it was previously marked coded. The coders need to look at the code again when something gets recategorized on the job board AND it has been coded. A note in the issue is what the coders would like. It will not be a perfect system (since it is manual) but it will be an additional safeguard that code that needs to be re-written will be re-written.
    • However, whatever the case, it should be documented in the mapping and transformation workflows documents.

Full MARC record in the RDA/RDF output (15)

  • Did not look for the issue/discussion on this, as some stuff has already been decided
  • Can we make a decision on how we want to do that? Specifically, what it should look like in the output RDA/RDF.
    • It should probably travel with the manifestation.
    • There is an element like rdam:P30254"is manifestation described by" that can be used. Or the unconstrained one: rdau:P60215"is described by".
  • Meeting notes:
    • Deborah added to Issue 367
    • relate the manifestation to the MARCXML (could just as well be the binary MARC if that's preferred)
      • marc record does not need to be a nomen
        • description of property in the toolkit is misleading
          • maybe this would be better: access point as an option for the property's value, and another option is the structured metadata thing itself.
        • the marc record is the metadata, not an access point for the metadata
      • no point in putting a link to the metadata from multiple entities, just a direct relationship to the metadata itself in the description of the manifestation
        • the marc is included only for people consuming the transform; they can find their way to the manifestation.
    • use the property manifestationDescribedWithMetadataBy
    • prefer getting inverse relationship explicit in the data
    • an iri does not need to be assigned to the metadata; that would be preferred if we needed to describe the metadata, but the metadata describes itself
    • another alternative: relate the metadata to the description of the metadata description sets; Laura will look into this.

Classification (25)

  • Option 3 in Gordon's paper seems to be the way we are leaning. Decision? Details we need to work out?
  • Meetng notes:
  • (available soon)

Meeting needed between coders and Deborah w/r/t relator elements table? (5)

  • We are short on time today. Determine whether a meeting is needed to iron out details on this.
  • Meeting notes:
    • meeting should include Deborah, Theo, Cypress, Gordon (if he's available) and Ebe (also if available)
    • Week of Feb 19 preferred by Theo; Deborah not available Feb 23
    • Crystal will schedule the meeting
    • Coders will prepare for the meeting somehow

Action Items

  • Crystal set-up relator elements table meeting
  • Someone (Theo?) write into mapping procedure the note to coders that a coded field was moved to ready-for-transform
  • Coders prepare for the relator elements table meeting; maybe by writing some sample code using the table
  • Cypress (with Crystal) start working on the classification transform, including the lookups

February 7, 2024

**See time zone conversion** **Meeting norms** **Present: Gordon Dunsire, Crystal Yragui, Laura Akerman, Ebe Kartus, Deborah Fritz, Junghae Lee, Theo Gerontakos, Adam Schiff, Sofia Zapounidou **Notes: Sofia **Time: Ebe

Agenda Review/Times/Roles (5)

Announcements (10)

  • Ebe. National Library of New Zealand has published its documents in the new release of the RDA Toolkit: SES, VES, Application profiles. They can be found under the Documents menu. To be sure you see everything, select subscribe institution.
  • Laura. The IGELU Linked Open Data Working Group will recommend to Exlibris to consider the UoW MARC21 2 RDA Transformation project. Exlibris is interested in creating a MARC21-to-RDA conversion functionality in their products.
  • Sofia. New paper by the NLG: Entity Management Using RDA and Wikibase: A Case Study at the National Library of Greece. In case you do not have access, use this link
  • Junghae has another meeting and will arrive a bit late; Crystal will record the meeting this week (Crystal, record the meeting!)

Classification mappings (Gordon) (30)

  • See discussion
  • GD has created a document with analyses on the issue, Transforming subject classification number data from MARC 21 to RDA.docx
  • Gordon's analysis provides 3 options. Gordon, Crystal and Sofia prefer the 3rd one.
  • Open questions/issues
    • Do we drop the fields 051, 061, 071 from the mapping?
      • CY proposes to drop them and leave them to US National Libraries to map and transform. Ebe agrees with CY proposal.
    • How do we hadle accession numbers in 060 and 070?
      • AS will contact NLM to find out how they use these fields
    • Are $0 and $1 subfields useful in this context? What do they describe?
    • Do we include $q and $7 in the mapping?
      • GD suggests we do not.
    • The id.loc.gov provides separate URIs for each DDC edition, but not for other schemes' editions. This complicates things, since the same classification number may have a different meaning depending on the scheme edition used
  • Clarifications.
    • We do not map the shelving part of the classification number
    • CY proposes to ask the RSC to create new Item properties for full call numbers
    • The only scheme that has URIs for classification numbers is the DDC - dewey.info is not working though.
    • There are URIs for LCC, e.g., https://id.loc.gov/authorities/classification/AC200.html
  • Discussion will continue asychronously

Relationship Elements Table (30)

Action Items

January 31, 2024

**See time zone conversion** **Meeting norms** Present: Crystal, Deborah, Adam, Laura, Theo, Pengyan, Jian, Sita, Sofia, Ebe, Junghae Notes: Junghae Time:

Announcements

  • Per Gordon's recommendation, Crystal and Cypress decided not to map field 562. Reasoning: Diffuse semantics.
    • It's hard to distinguish between notes on expression and notes on work.
  • Theo is leaving the project March 1. Coding will be left to Cypress; Theo will contribute in February, at least getting Cypress coding.
    • Cypress will stay until the end of December, by which time Phase I should be completed. During February, Theo and Cypress will meet weekly, and Theo believes it will be a smooth transition.
    • Deborah will have a meeting with Theo and Cypress (transformation team) as necessary.
    • Jian, Crystal, and Laura will hold off on mapping the 700 and 710 fields.
    • Laura will arrange a meeting with the transformation team regarding the 533 field.
    • The grant proposal is on hold as we currently don't have a project leader.
  • Gordon will be absent today, we'll discuss classification mappings next week
  • Crystal is meeting with Ex Libris in February, they are interested in the mapping and transform

Updates to Projected Phases/Timelines (Theo)

  • If you put raw data in RIMMF, you can see the structures.
  • Theo will separate collection aggregates from everything else. We need to figure out all the markers to reliably identify singles.
  • 700 indicator 2 should be addressed in Phase II. It could be a part work or a compilation of different works (collection aggregates). The relationship to manifestation will be embedded in the manifestation. If we are describing it as an expression, where are we going to find the expression attributes? There are expression attributes in that MARC record, but that MARC record describes the whole compilation (French, English, etc.). Many of these expressions lack authority records so we have to mint URIs, pull it out from WorldCat entities, or find somewhere else.

Review

  • This is a bottleneck in our workflow
  • Let's asynchronously review some mappings and get them ready for transformation!

541 Immediate Source of Acquisition Note

January 24, 2024

**See time zone conversion** **Meeting norms** Present:Gordon Dunsire, Crystal Yragui, Laura Akerman, Ebe Kartus, Deborah Fritz, Junghae Lee, Sita Bhagwandin, Theo Gerontakos, Pengyan Sun Notes: Crystal, Laura Time:

Announcements

  • Crystal responded to an email from Ex Libris expressing interest in the project
  • With the January release of Toolkit NLNZ will have 98% of our policy statements available. Ultimately NLNZ will have a policy statement against every option in Toolkit.
    • The other documents that will be made publically available via Toolkit are:
      • NLNZ guidance (I think this will be public with the January release but it will be Draft)
      • NLNZ String Encoding Schemes
      • NLNZ language guidance for RDA
      • NLNZ alternative arrangement of Manifestation: extent of manifestation
      • NLNZ Vocabulary Encoding Schemes

Relationship label discussion

  • See spreadsheet
  • Deborah set up table for relators to RDA relationships.
  • Want to use constrained properties.
  • column for id.loc.gov relator URIs, MARC code from list for relators, MARC label combining, Registry label (TK label).
  • always 5 mappings (person, family, corporatebody, collectiveagent, agent). One unconstrained URI maps to 5 constrained URIs. Use heading field and indicators to determine which of 5 relations to map to.
  • discussion about conferences being mapped to corporate bodies. should collective agent be used (of which corp bodies are a subcategory)? RDa might develop a more complex sub categories, if you want to you could use collective agent in anticipation. RDA doesn't make distinction between corp bodies and conference/meetings currently.
  • Gordon thinks on principle we should go from where we are and map to corporate body. Suggests create statement "Has category of corporate body --- conference".
  • Can a conference be a creator? Note, a conf. proceeding will be an aggregating wk. (What about a ship?)
  • Mention of family being creators - yes, e.g. family bible...
  • we should not transform to toolkit labels, just to the RDA relation URI.
  • Deborah not sure about a few mappings, and some are missing from one side (MARC relators or Registry relations) or the other. Gordon - says the relations are up to date, discuss any strange ones with him.
  • there are patterns for aggregates (single or multi part)
  • this table is only for agent fields that are differentiated but there may be older records with no relator code that we may need to have a more generic relation. other parts of marc may have names in a subfield without a type of agent and without a relator that might be mapped to an agent category.
  • for collections (aggregates), the only relations would be aggregator or contributor to aggregation. Cannot call them an "author" by default.
  • Deborah, if record describes a single part etc. if you have x00, x10, x11, look at whether an expression relationship, mint expression and put name portion of agent... if we don't have a relator, all we can say is that this name is simply related to the manifestation and we have no other information. But if it is a collection aggregate, then based on what that relationship is, e.g. translator, corresponding to "text", you can say they are "contributor of text". (or cartography, etc.)
  • Crystal - we can't do level of detail to tease out aggregation variants in Phase 1 = we don't have resources. Gordon: we have to take into account, needing to know aggregating status in order to know what entity (work, expression, manifestation) is being related to. Gordon thinks we need to tackle other issue first - top down. Deborah, if you can't process other than single part, you have to pre-process and have them fall about
  • Long history in MARC, at one point, having an author of a collection if they authored all the parts was correct. It's not wrong under those rules.
  • Theo is thinking maybwe we do need to process aggretes and could do it in PHase 1... (tbc)
  • This looks like the data we wanted a table for for mapping 1xx, 7xx, etc relations. It's in this projects spreadsheets.
  • Thank you Deborah!

January 17, 2024

**See time zone conversion** **Meeting norms** Present: Deborah Theo Crystal Penny Benjamin Sita Gordon Ebe Jian Adam Junghae (left early) Laura (arrived late) Notes: Theo Time: Ebe

Announcements

  • Crystal considering doing a talk on the project at the ALA Core IG Week Session for the MARC Formats Transition Interest Group. Anyone want to join? See CFP
    • talk about new goalposts? Difficulties of aggregates? Something else?
    • Contact Crystal is you want to collaborate; fill out your own proposal if you want to go solo
      • only 15 minutes for each presentation, it will have to be short and concise
    • Ebe thinks this is a good platform to highlight the work this group has done. Let people know about some of the complexities of transforming MARC, things people may not realize.
  • Cypress will be working on the project through Fall quarter at UW; can work on transform

Classification mappings

  • See discussion
  • A good number of the remaining unassigned mappings are classification fields. Students are ready to take them on (as first passes; students not ready to review, they're still training with Crystal) if we can decide how we want to model them. Let's decide today! We can always change our minds down the line.
  • What are classification numbers? Manifestations? Identifiers?
    • Discussion 434:
      • Subject part of LC number or Dewey number is subject
      • RDA/LRM has not addressed class numbers directly
      • And lots more! (Take a look.)
    • Part of a classification number identifies a subject whereas another part talks about elements of an expression or manifestation that helps an institution place an physical resource on the library shelf.
  • We could mint new properties (non-RDA)
  • We could not map these; RDA doesn't have properties so maybe just leave them behind
  • Could whole class number be used as one complete, distinct value (i.e. for a custom property minted by us)?
    • Probably want to tease-out the subject part
    • Sometimes full value is split across subfields; sometimes, like with NLM, it's all in $a
  • Do we need to map the classification part at all? Just let the location part remain lost legacy data.
  • Analysis:
    • Local elements (we would sub-type them to RDA elements), i.e. relationship elements -- specifically, attribute elements -- containing information about which scheme from which the number is taken.
      • Cannot be done for every classification scheme used in MARC21
      • However there are about a dozen commonly-used schemes; we can mint properties for those as sub-properties of RDA hasSubject
        • RDA hasSubject has no range; probably best if sub-properties maintained no range
        • for example, hasUdcNumber, rdfs:subPropertyOf rda:hasSubject
      • Define the local elements/properties.
        • Definition will describe the data provenance (the source of the class number) by necessity
      • Value of the property will not reveal the topic; that is revealed in the source of the class number; i.e. you have to go to the source to determine the topic; retrieving those as strings is out of scope for this project
      • creating properties will shift the burden from having to state what kind of subject the number represents to the classification schemes themselves
    • within $a there may be non-subject insertions; we cannot say with any certainty the contents of $a are subjects at all; like with government publications have us enter manifestation identifiers
    • summary of analysis:
      • the scheme itself; if we don't create sub-properties with embedded semantics, we end up with a data provenance issue that would need to be solved through reification (too complicated for this project)
    • two viable alternatives:
      • sub-properties
      • map all class numbers to hasSubject without data provenance
  • If we create sub-properties, the sub-properties inherit the domain of the super-property; consequently, sub-properties of hasSubject are Work properties
  • The semantics of the sub-properties requires them to be subjects; will have to analyze field-by-field to determine what to do with non-subject subfield values, like Cutter numbers, which can be ignored (i.e., not mapped)
    • Nevertheless, we have to treat class numbers as subjects and allow inaccuracies
    • if numbers like Cutter numbers are embedded, they are bound to remain part of the value
      • this has been widely discussed for a long time; there is no remedy
  • Maybe an item property for shelf mark?
    • That's out of scope for RDA; best we can do is extend collection location; a shelf mark is ambiguous, as it is not necessarily an item identifier but more like a manifestation identifier or location; so we can create sub-collection, sub-sub-collection, etc., all the way down to the shelf mark level
  • A more sound task: separate the conflation between classification/categorization/identification
    • this has been an ongoing effort over the past 20-30 years; prior to that, they were all considered to be the same thing
  • Recap of extreme options; there's a solution in-between:
    • sub-properties of hasSubject; values would be pieces of the class numbers
    • do not map as out-of-scope for RDA
  • RIMFF ignored classification numbers; wasn't sure where they would fit
  • Item information: can't use identifier for item element for classification number?
    • that could work; there's no rule that identifiers need to identify separate things uniquely; identifiers may identify more than one item. It may not be ideal, but it's not prohibited.
  • It would be useful to look at the MARC fields for class numbers (050-088) and determine which are classification numbers; maybe make a spreadsheet, find the patterns, treat the fields accordingly
    • Subfields, however, generally echo constructors used in synthetic classification schemes
    • Also, why did they create so many fields for classification numbers? Because they were trying, circa 1970s, to reflect the internal classification structures of the separate schemes, which all have different approaches to synthesis
  • We're now trying to produce semantically coherent data, so we'll take a dumbing-up approach where we treat the whole classification number as a subject number -- unless we know better; so, we assume it's a subject number and feed subsequent problems downstream
  • However, we could have a set of transformation rules specific to each MARC tag and/or scheme; anyone knowledgeable in those schemes could assist in mapping
  • select 12 or so schemes -- the big ones -- and parse them -- it can be done on the Wiki -- and have colleagues carry out more practical work on that; the community will find this valuable work
    • RDA subjects are vaguely defined, and the work done here as some sort of RDA subjects will apply elsewhere, like in BIBFRAME
    • Practical outcome: data elements that can be re-used with semantically coherent definitions plus practical transformations from MARC21
    • Gordon will work on this; students can take that work, insert into the mapping, and coders can code into the transform:
      • Gordon will do the initial analysis; one document; select MARC tags; some rationale on what transform should be; this will be posted on the Wiki; then we can determine how to develop the document into something more robust, including the registration of the sub-properties
  • [The classification discussion now ends at 37:16.]

700 field

  • Spreadsheet
  • 7XX work party notes from November 2023
  • Issue for relator terms and codes table/spreadsheet
  • Meeting discussion included:
    • What are we going to do with 7XX? Just the AAP? Mint IRIs?
      • Mint IRIs
        • But not for related added entries, when we have nothing in the MARC but the name? Deborah advises against that.
    • And what about the patterns? 100 name portion = 700 name portion = 600 name portion = 800, etc. Process in groups, not line-by-line, indicator-by-indicator? The only thing that changes in the RDA is the relationship. Same applies to 1XX + title, 7XX + title, etc. Or, should we discuss the principles first then the details?
      • this would be similar to what we want to do with relator terms, as the relator terms apply to all the headings
        • For example, what do you do with a personal name in MARC when converting to RDA? There can be one position for the name in MARC but there may be more than one RDA element to map to
      • Maybe when we have one of those fields mapped completely, we can consider combining solutions -- but we might benefit from getting something done first.
      • All the rows may hinder the mental process; work may be expedited by determining the patterns
      • Having these explicit to coders would be helpful; presumably, if these patterns exist, we will want to write more compact code to process the MARC re-using, say, functions to process 1XX, 7XX, etc. similarly. If this is not made explicit and the coders want to code it that way, then the burden will be on the coders to figure out the patterns, and that may be asking too much.
        • BIBFRAME conversion specs do decipher some of these patterns; worth a look; for example, see ConvSpec-1XX,7XX,8XX-Names-v.1.
    • Deborah can create that spreadsheet that would show the patterns for personal name fields (100 600 700 800 = X00), providing the mapping-thought (thereby greatly reducing the number of rows in the mapping) and a model of how all those fields should be mapped
    • Action items:
      • Ebe is also working on a spreadsheet for marc relators mapped to RDA elements ("table of relationships")
      • Crystal and students will fill in the spreadsheets that utilize the personal name spreadsheet and the table of relationships
      • Coders (Theo/Cypress) will code the transform based on all the spreadsheets and the tables.
    • Registry viewer produced by TMQ: a way of looking at all registry elements in tables; also mappings
      • Thus there's a way to line up the RDA constrained elements with the unconstrained
      • Also a way to line them up with the codes for mapping to marc
      • Also line up with labels
      • Could be helpful to Ebe; Deborah could wotj with Ebe with the Registry viewer
    • Spreadsheet organization is easy to use as-is, very transparent; let's not lose that simplicity in the package we offer for phase 1.
      • Crystal and students, when filling in the spreadsheets, plan to account for every field. Thus the simple structure of the spreadsheets will be retained. At the same time, we should be able to find a way to reference the tables/patterns.
    • (A data review launches at the meeting at 58:35, starting with a look at the project board)
    • No objection to performing review work at meetings, so we'll plan on doing that
    • MARC 043 (issue 76) selected for review today.
      • The 043 is repeatable: all subfields too; we don't know if the subfields refer to the same place or a different place.
      • we can safely map to hasSubjectPlace with an identifier value however
      • Also a problem with aggregates; we can assign value to aggregating work; we might even be able to assign to manifestation; but if we try to attach to an aggregated work, we won't know which place applies to which aggregated work.
      • Note: 043 is not a BSR field; maybe next field for review should be a BSR field
    • 043$a
      • hasSubjectPlace rdaw:P10321 has range=Place
      • Geographic area code is an identifier for place, and, thus, a nomen string; it can be the value of hasSubjectPlace using the identifier recording method
      • The geographic area code nomenString, or appellation, is a adequate value for hasSubjectPlace; it maintains the usual issues of data provenance however; the meaning of the code is included in those issues
        • How about a lookup in id.loc.gov and get an iri? That solves the data provenance problem.
          • Actually lookup not required; the base IRI is consistent and the code is appended
      • $2 not used with $a
    • 043$b
      • Local codes, source in $2; record as identifier/nomenString; mint IRI for Nomen
      • If no $2 ... what? Do not map.
      • A lot of errors in this field by catalogers; prehaps they think it's "code for local" or "code for local sub-entity" and not "local code."
    • (meeting ends here; resume at 043 review next week)

Action items

  • :loudspeaker: Gordon will work on this; students can take that work, insert into the mapping, and coders can code into the transform: * Gordon will do the initial analysis; one document; select MARC tags; some rationale on what transform should be; this will be posted on the Wiki; then we can determine how to develop the document into something more robust, including the registration of the sub-properties
  • :loudspeaker: Deborah can create that spreadsheet that would show the patterns for personal name fields (100 600 700 800 = X00), providing the mapping-thought (thereby greatly reducing the number of rows in the mapping) and a model of how all those fields should be mapped
  • :loudspeaker: Ebe is also working on a spreadsheet for marc relators mapped to RDA elements ("table of relationships")
  • :loudspeaker: Crystal and students will fill in the spreadsheets that utilize the personal name spreadsheet and the table of relationships
  • :loudspeaker: Coders (Theo/Cypress) will code the transform based on all the spreadsheets and the tables.

January 10, 2024

**See time zone conversion** **Meeting norms** Present: Adam Schiff, Crystal Yragui, Deborah Fritz, Ebe Kartus, Gordon Dunsire, Jian P. Lee, Junghae Lee, Laura Akerman, Penny Sun, Sita Bhagwandin Notes: Crystal Yragui Time: Ebe Kartus

Project Milestones and Timeline

Aggregates

  • DF: A problem with aggregates is that you can't describe non-aggregates until you eliminate all the aggregates. Her experience shows that it's a slow crawl through the database. The "let's deal with aggregates later" approach is probably a good idea due to this fact. Will need a completely different transformation pipeline for aggregates. One example of layers of complexity is aggregating works vs. multi-part works. WE lock in aggregating works, multiple expressions in multi-part works. Different creator relationships. Different work modeling.
  • AS: Ignore small markers like "writer of preface" or "writer of introduction" relationships in 700 fields, treating those as singleton manifestations rather than aggregates since the aggregate isn't really described and users are not likely to care about small augmenting pieces of aggregates such as introductions to the extent that they need entities minted for them?
  • DH: Rephrase. Describe augmented aggregates as singleton expressions.
  • Explaining aggregates is difficult and will take more time. Where does it fit into our timeline?
  • LA: What are the limits of what we can automate with regard to aggregates?
    • There may be limits to what we can pull out of MARC. What is an acceptable level of detail? What is the cost/benefit?
    • New aggregates concepts not considered during legacy MARC creation. People going to have to review transformation output anyway. Add disclaimer to transformation stating that most aggregates will follow non-aggregate mapping?
  • GD: We're going down a rabbit hole here. We can't extract more information from MARC records than was put into them in the first place. Conflating what went before with what should happen in the future. We should be trying to extract what is useful from existing data, avoiding making false statements, optimizing the level of detail in the output. Results won't be pretty and cleanup is a necessary part of the process.
    • Boils down to entity/identity management. Shouldn't get too bothered about whether something is an aggregate or not.
    • Complicated aggregate MARC21 records present another deduplication challenge that needs to be met during a cleanup phase. The more we wish to retain, the more we will duplicate. Someone else (with more resources) will have to do this work.
    • We need to accept limitations on transformation and acknowledge that tidying-up will be part of a future project.
  • :loudspeaker: We will do our best to map glaringly obvious RDA aggregate fields, such as 700_2, as well-formed RDA aggregates during this phase of the mapping. We will add markers to recognize potential aggregates in legacy MARC data for the benefit of future projects which may refine the transformation. Other aggregates will either be excluded from the transformation or passed through as singleton expressions. Let's check with Theo, review next week, and add to decisions index.

Review Workflow

  • Review has become a bottleneck. Let's get more serious about review and add it to meetings once we're through the 700 field.

BSR/CSR

  • Only 19 unassigned tags left in BSR
  • CSR after that should be less time-consuming due to overlap
  • We have enough serials expertise in the current group to tackle serials
  • There is the aggregates aspect to consider!

Transformation

  • Laura should get in touch with Theo about potentially working on tranform if she has time
  • :pushpin: We need help on the transformation after May. If you or someone you know has XSLT expertise and some time to spend on this, please volunteer or put them in touch with Crystal and Theo.

Timelines

  • Tentative first pass on BSR by end of May
  • Tentative review of BSR by end of September
  • Tentative transformation of BSR by end of 2024
  • Let's put our mapping hats on! :world_map:

Bound-withs

  • These are collections in RDA. 773 tag re-assigned to Laura, as she's working on it for a BIBFRAME project

700 field

Action items

January 3, 2024

**See time zone conversion** **Meeting norms** Present: Benjamin Riesenberg, Crystal Yragui, Adam Schiff, Ebe Kartus, Jian Ping Lee, Junghae Lee, Pengyan Sun, Sita Bhagwandin, Theo Gerontakos, Laura Akerman Notes: Benjamin Riesenberg Time: Ebe Kartus

Announcements (5)

Aggregates check-in (30)

Since we are missing several members of our group, let's treat this as a brainstorm and put off any big decsision-making until well after the holidays when everyone (or mostly everyone) can attend

  • Are we ready to begin applying what we've learned about aggregates to the mapping? If not, what needs to be learned/decided?*
  • Ideas for how will we approach transformation of aggregates, particularly with regard to minting descriptions of aggregated works/expressions?*
  • Are we ready to start mapping aggregates? If not, what is holding us back?
    • If we wait to be ready, we'll wait forever; we need repetition of all the concepts; we ought to just start mapping and make mistakes and correct them (practice)
    • We'll do a first pass and a review in any case
  • For aggregates, do we need to re-evaluate tags which have already been mapped?
  • Might be helpful to map a tag together, like we did with 008 tags
    • Great idea -- how about the 700? That tag will need to be redone since we decided to use a table for relationships
  • Worth considering doing two mappings on some tags? (Aggregates and non-aggregates)??
    • We've talked about separating records out into aggregates and non-aggregates, so that might help with this 'divide and conquer', running separate transformations
  • Right, how will we approach the transformation of aggregates??
    • Example: Aggregates for which aggregated W/Es haven't been described in the record! Why mint an IRI for something we can't describe?
    • Example: Something has a 700 analytic field pointing to aggregated W/Es, run this through an 'aggregates' pipeline (?)
  • Discussion of crowdsourcing
    • People are happy to help if there's a platform to do this
    • Thinking of crowdsourcing for data cleanup, reconciliation/clustering
  • I think a good approach would be looking for 'sure bets' for aggregates and get those mappings down
    • Going further, it seems like we are looking for sophisticated tools to identify aggregates? But unsure about this. Would these need to be applied to a body of records prior to mapping, to sort aggregates and non-aggregates?
    • How will we handle more bifurcation of the mapping in terms of conditions? First pass and then look more deeply, provide more detail based on conditions in a second pass?
    • Trying to deal with all the kinds of aggregates would slow us down
  • We have a problem with the project as a whole, lots of work has come to a standstill without immediate prospects of starting back up
    • We may need to clearly mark a boundary between phase one and phase two, we thought this was going to happen quickly but it may not; I know the mapping can continue but I'm not sure the transformation can continue
    • What is the 'total plan' for this project? Is it to separate into 1) non-aggregates and aggregates for which aggregated W/Es are not described and 2) aggregates, then run the transform for 1, then revisit 2 and begin to look at further indicators for aggregates
    • I think we had intended to not map serials until a later phase, we are focusing on BSR for now; may be useful to list out or articulate what we are or are not focusing on in phase one
  • We have a gap: Lots of people invested in mapping, not a lot of people invested in doing the transform work
    • There is a need for people to sign on for XSLT work on the transform
    • TG and CP will be starting XSLT work on phase one transform in the coming weeks
    • Note that this will include coding for the markers to identify aggregates - the purpose of this will be so that, as records go through the pipeline, once a flag for an aggregate goes up, the record is ejected from processing
    • OK, but what about 'easy' aggregates like 700 analytics? Could we go ahead and transform some 'easy' aggregates in phase one?
    • Well, we could, but phase one should result in a useful product. What's useful? For example, is it useful to have a transform that identifies/flags aggregates and kicks them out of the transform project? I think this might be useful
    • It's a huge amount of records that will be thrown out, though
    • Thinking about things like identification of a writer of a preface or introduction, this results in rejecting even more records
  • Serials are aggregates, so if phase two is aggregates, phase two would include serials

Pick a tag to review as a group (45 minutes)

Let's look at the 533 for the rest of the meeting
See 533 reproduction note issue, see 533 spreadsheet*

* Restricted access

Action items