2025 Meeting Minutes - uwlib-cams/MARC2RDA GitHub Wiki

September 3, 2025

See time zone conversion Meeting norms Present: Absent: Time: Notes:

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (10)

Crystal scheduled 4 optional deliverables work parties over the next two weeks at varying times/days. If you're interested but don't see one you can attend, let Crystal know
Laura went through issues, discussions, etc. and made a list of things that might be worth including in the write-up. Look it over and comment on things if you have time
Crystal is just about done outlining the write-up
Sofia is working on $0/$1 this weekend
Crystal has tidied 3XX-8XX spreadsheets and still needs to do 0XX, 00X, 1XX, 2XX
Ebe and Cypress handled abbreviations list

Wrap-up (5)

Action items

Backburner

August 27, 2025

See time zone conversion Meeting norms Present: Deborah, Crystal, Tynan, Laura, Cypress, Ebe, Dee, Adam, Sita, Junghae, Jian Absent: Sarah Time: Notes: Tynan

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (5)

Laura: report for CA state libraries on linked data output mentioning RDA
Crystal: someone from the PCC standing committee on training requested some output from us for their linked data training
- They are including RDA/RDF for their linked data training
OCLC is changing their minds about whether we can use their data – legal changes
- Jeff Mixter is investigating getting copies of LC data as it comes into OCLC – whether this would be a feasible workaround
- With the migration, it is more difficult to download records from the LC website – we might need to exclude that data or pick a few records by hand to include if we think it’s important
- Laura: the public interface for the LC catalog has changed: we can no longer select a lot of records and bulk download them from that interface
- Ebe: worth reaching out to British Library for records? Their database was hacked, so they are not up and running at the moment
- Adam: we could reach out to the PCC in general and see if there’s anyone who wants to supply records; then we wouldn’t have any OCLC limitations on this sharing
- Deborah: we still have a lot of files, so we shouldn’t get too bogged down in the last month with new reords
Deborah: she and Richard have been able to implement the transformation in RIMMF
- You can take a marc/xml record, import into RIMMF and run the transform to create the output
- Richard is also working on doing this with raw marc
Aggregates
- Splitting the augmenting manifestations for times when we want to describe the aggregating work and times when we don’t
- Do we want to let some of them go through as only describing the primary work? Or perhaps wait until Phase II to decide that
- Bringing in NLNZ records and UW records: the legacy data have their own idiosyncrasies
- The more different types of records we bring in, the more challenges we have to deal with

CMC Qualifiers Explanation

Cypress and Deborah have been working on how to map Content type and Carrier type values when certain qualifiers are present:
- Tactile—affects content type
- Microform—affects carrier type depending on whether or not the manifestation is a reproduction
- Electronic—affects carrier type depending on whether or not the manifestation is a reproduction
I have questions and concerns and would appreciate some guidance on wrapping up whatever we can ASAP. And leaving for Phase II what we cannot do quickly.
Deborah has put together a table
We are looking in the 000, 007, 006/008
Also good things to be found in 245 $h
- Tells us that the thing we are describing is an electronic resource
We are also looking in the 300 $a and the 300$e for specific terms to pick up the carrier type
- Sometimes only the 300 $a is there to tell us
- Some UW records don’t have 336-338
Laura: 336 could be qualifier for an expression
- 337 and 338 are media-type and carrier type, manifestation-level properties
- What is the value of having an access point for a manifestation? It is not required.
- Has direct impact on the way URIs are created
- We discussed this at length as a group before and decided we would make access points for manifestations – we decided we would do our best with Phase I and then refine in Phase I
Deborah: it has proven more of a challenge that we have previously anticipated
336, 337, 338 were machine-produced in many cases
Sometimes OCLC didn’t pick up everything when we were looking at narrowing results
Sometimes we get code $f (tactile), which we don’t know whether it applies to the image or not, e.g. for an atlas is it for tactile text or for cartographic tactile images or both?
Microforms:
- Always reproductions? Adam thinks there are some original microforms
  - There are things created from digital files that have never been published and created originally as microforms, but this is RARE
  - In those records, hopefully everything is describing the microform
- But we are dealing with records that are referring to both the original and also the microform
- There are 008 values ($a, $b, $c) that are for the microform
  - We have to take the mapping bite by bite as to what describes the original and what describes the reproduction
- Not everyone is PCC and not everyone is in PCC libraries describing the instructions
- There are examples where the individual things in the collection don’t have their own separate descriptions anywhere – these would be original microforms as well
Electronic qualifiers
- Some of this won’t be able to be done for Phase I
- How do we decide what should be phase I and what for phase II?
- e.g. save reproductions for phase II?
- sometimes they don’t have 533, but use 534 or 500 instead
Questions:
- (1) If we have multiple content types applying to an expression and we cannot determine which MARC tactile value applies to which RDA content-type values, which one is tactile?
  - Should we in this situation add the qualifier tactile?
  - Would this be a work characteristics? It is if it is an aggregating work
  - We could code to add tactile when we know the tactile characteristics are in the description, but we don’t know which content it applies to
  - We do the non-tactile for the content type and say that there is something tactile in there
- (2) Microform
  - If no information about carrier type, then we can’t provide the carrier type value at all
  - Should we put the media-type microform to replace the carrier type that we cannot provide in the access point and IRI?
- (3) Electronic
  - The same thing is happening in the electronic
  - For example, do we add media type “computer” in AP and IRI
    - Adam: use “electronic” instead?
    - In a description of a manifestation, does that communicate?
    - Media type is type of media needed to access
    - In many places it is referred to it is called electronic – but that doesn’t tell you what is needed to access this information
    - Electronic in cataloging just means computer files that are read and acted on
    - Internet stuff and content on a floppy disk would both be electronic; electronic would be broad
    - Mobile devices have computers in them
Deborah’s suggestions: add tactile, add microform, add electronic to distinguish
Ebe: supports this if we want to do something in phase I
- If we can’t do it simply, should we not transform it all?
Laura: we have already known that the de-duping would not be perfect
- Expression-level issues are much more important than manifestation-level issues
- If manifestations are associating with the same work and expression, then that’s a good thing
- Whether we use the same terminology in our additions to the manifestation access point or manifestation URI or not, that is ok
- If the de-duping doesn’t happen across manifestations, then so be it
Cypress:
- Content type mostly done
- Need to get carrier type in (or vice versa!)
- If she can get some of the de-duping done then so be-it, but probably not before Sept 1?
Deborah
- The qualifiers are proving challenging to figure out
- Seconding Ebe: apply a filter to catch all the records and not transform them yet, and then deal with reproductions in phase II
- There are records that mix the microform with the original – hard to deal with!
- Many of these involve errors in the coding if the cataloger forgot to change the fixed field
- Very hard to determine if the records is describing the microform, the original, or both (as in adding a 533)
  - Laura: Can we write an AI program to determine which ones were mis-coded?
Adam: if there is some indication that the record is a microform, let’s assume it is; this will be right most of the time and we can refine this later
- Do the same for electronic – if some indication, we can assume that the data is bad
Decisions:
- (1) add “microform” or “electronic” if the record seems to indicate so rather than disentangle records that are a mix
- (2) Supply media type as access point qualifier if no carrier type found?
  - If we don’t have a media or a carrier type, then we let the records merge with the original print

Deliverables Work Plan Check-in

Crystal is adding some co-working joint sessions to work on deliverables -- join if you can!

Wrap-up (5)

Action items

Backburner

August 20, 2025

See time zone conversion Meeting norms Present: Sita, Deborah, Crystal, Cypress, Adam, Junghae, Laura, Tynan, Sarah, Jian Absent: Ebe Time: Jian Notes: Tynan

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (5)

Abhignya's last week is this week. Congratulations on your graduation, and thank you for your hard work!
Laura reviewed the non-mapping materials in our Google Drive and produced a report on what should/shouldn't be included in Phase I release, along with questions. We should all check it for relevant information on our deliverables assignments.

Deliverables Work Plan Check-in (20)

Check due dates/assignments
- Refer to document for updated deadlines and assignments.
Cypress feels as though we can complete the mapping even without Abhignya
We are working to get everything done by the 15th so that we can deal with stylistic issues
Exportable formats:
- .csv files for tables
The tasks and assignees are laid out in the Phase I Deliverables Planning document
Our goal is to run the transform starting Sept 1
Once we have our sample records run through the transform, we need to FTP them to UW and from there we put them in a Dryad database
Who will run the transform for that final run?
- Perhaps Deborah can run the transform and send the output to UW
Once we have the 2,345,000 records in hand, we will select a subset of 1,000
Crystal will email people about the draft writeup for some help (item IV)
To double check what you are signed up to work on, see the planning document
What supports to people need to get the work done?
- Work parties for these?
- Concerns?

Almost Done - waiting for decision Issues (10)

Project board
What needs to happen to get these moved to Ready For Transform and/or Done?
045:
- Cypress moved it back because we had talked about edtf, we don’t have time to get timespans into edtf format at this point
- We should just code it following what’s present and fix edtf format in phase II
Leader 6: type of record
- We are using this for content type
Moving LDR/DIR to phase II

List of abbreviations (5)

Issue
Needs to get into xml format for us to use it
Doesn’t need to be a coder
Oxygen can take the spreadsheet that Sara H. created and read into xml
Decision: add any additional abbreviations to the abbreviations spreadsheet and tag Cypress when done
Does anyone else want to look at this?

APs for Original and Reproduction manifestations (30)

Current coding creates separate descriptions for original and reproduction manifestations, with certain fields assigned to descriptions of originals only, other fields assigned to reproductions only, and other fields assigned to both.
We are adding Date (of publication, etc.) + Name (of publication, etc.) + Version + Carrier type + Carrier qualifiers) to the APs of manifestations.
Carrier type is being assigned to the description of the reproduction. Is there anywhere in the MARC record that provides the carrier type of the original?
Given that we are now adding Date (of publication, etc.) + Name (of publication, etc.) + Version + Carrier type + Carrier qualifiers) to the APs of manifestations, we can use a normalized AP string in a Manifestation IRI. Was there are particular reason that we decided to use a MARC Record identifier as part of the Manifestation IRIs (016, 035 if OCLC, or 010 in that order of preference)? Should we remove that portion and only use the normalized AP string?

APs meeting notes

For manifestation access points, we are getting the carrier type from different places
Carrier type might vary between original and reproduction
- 533 means 338 (carrier type) is for the reproduction but 300 is for the original
- How do we check for this when creating access points?
- The 300 doesn’t always have data about the print in practice
- We can’t reliably know whether what’s in the 300 is pagination for an original book or the extent of the carrier type for the reproduction
- Is it safe to get carrier type from outside of the 338?
- Currently in the transform, one gets MAN and the other gets ORIGMAN
- 530 $e is seldom there
Online reproduction of a print book, carrier type is completely different
- Photo-copy of book: carrier type is the same
We need a document indicating what to do for reproduction versus original
Reproduction guidance:
- Goes along with the decision on this
Here is guidance from the original RDA toolkit
- See Facsimiles and Reproductions section
- For a micro-form, we code as describing the original
Also see Reproduction Conditional Mappings document
There are places in 008 mapping where we apply carrier type to original instead of to reproduction
- We have to decide whether we can rely upon 300 in certain conditions
- Depends on whether it is a provider-neutral reproduction
- We can tell this with: 040 $e pn
- 588 version record and subfield (see Reproduction Conditional Mappings notes)
- Provider-neutral guidelines:
Decision: We should steer clear of using 300 to identify carrier types for original
Tackle this for deduplication purposes for phase II
For now we leave the IRIs including ORIGMAN
Are we putting carrier type for original description and reproduction?
- Carrier type of reproduction will be different from that of the original
- In other words, we are taking a record that describes the reproduction and speculating about the decision for the original
- Our database is describing what we had
- Necessary to separate the data to enable clarity about what entity is being described
- The URI needs to de-duplicate, but this won’t be perfect
For access points, if we are putting carrier type in all of them
Where to get access point from MARC?
Inconsistency from where carrier type is mapped from
We figured out how to describe an original
Cypress’s suggestion:
- For phase I: use the AP for the main manifestation with addition of origman
- Two reproductions from the same original manifestation will de-duplicate
- We will de-duplicate more in phase II
At some point we put a manifestation identifier as part of the IRI
- It helped with de-duplication of things that are separate
- Is it safe to remove these separate identifiers?
- GET ANSWER FROM RECORDING AND UPDATE

Wrap-up (5)

Everyone look at the report from Laura
Look at the deliverables workplan
We have a deliverables folder – feel free to put final versions of stuff in there
Laura will make the document for issues that we are deferring to phase II
- Laura will add the document to the meeting notes

Action items

Backburner

Issues Deferred to Phase 2 is a document in Non-Mapping Materials created by Laura just now as a handy place to collect brief descriptions and links to decisions in meeting notes and documents - as a start.

August 13, 2025

See time zone conversion Meeting norms Present: Sita, Deborah, Crystal, Cypress, Adam, Junghae, Laura, Abhignya, Ebe, Jian Absent: Tynan Time: Sita Notes: Sarah

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (5)

Crystal polished up the 8XX spreadsheets
Junghae exported UW original records and sent to Deborah and Richard
Crystal discussed mapping table spreadsheet music questions with Cate
Project referenced in Shoichi Taniguchi (2025) Addressing and Reducing Complexity in Metadata Mappings: RDA to DC and MARC 21, Cataloging & Classification Quarterly, 63:4, 240-266, DOI:10.1080/01639374.2025.2511804

Tynan questions on 046 (25)

Rows 41, 42, and 44 of mapping spreadsheet: why are these rows structured descriptions rather than IRIs?
- Should both work and manifestation be linked to the timespanIRI?
  - Decision: yes, rows 41, 42, and 44 should be IRIs.
  - Other changes made to spreadsheet: delete row 46.
Should the transformation note for rows 47 and 48 of the spreadsheet be switched? Row 47 has a "reprint" date with the value of $b or $c, which are the earlier dates. Row 48 has the "original", with the value of $d or $e, which are the later dates.
- Decision: no, the transformation notes should remain as for these rows.
Should I try to clean up messy date formats, such as in test-36-58-62-77, where a date format I found on MARC looks like "20011027"
- Decision: for now don't clean it up, we will revisit this in Phase II.
For row 49 of the spreadsheet, because the transformation note says "single date of distribution" should I make sure that $d and $e are not present?
- Decision: For now, don't worry about validating whether $d or $e are not present, just use the value of $b and $c regardless.
Same question for 50, 51, and 52. Should I ensure that if $b or $c is present that neither $d nor $e is and vice versa?
- Decision: Handle this in the same way as row 49-don't worry about testing for $d or $e.
Deborah brought up a mapping issue in row 49. The registry label is "has date of manifestation" but the transformation notes tell us to create a note. We cannot have a triple with the property "has date of manifestation" where the object is a note. This is also disparate from the mapping and code for 008. We will need to discuss this further later on.

String Encoding Schemes (SES) Guidance (30)

Cypress and Deborah have been working on getting this coded
Have found some unanswered questions in the issue (SES for Access Points)
1. We are omitting initial articles from titles (based on coding in 245 I2); should we also omit leading ellipses from titles? They usually mean the cataloger has left replaced words that change in manifestations of diachronic works. But, what if the title actually begins with them? Should we just go with whatever the cataloger entered as 245 I2?
- We will go with what the cataloger has put in the record and handle pre-processing instructions that deal with this in Phase II.
- "Title begins with a definite or indefinite article that is disregarded in sorting and filing processes. Any diacritical mark, space or mark of punctuation associated with the article and any space or mark of punctuation preceding the first filing character after the article is included in the count of nonfiling characters. Any diacritic, however, associated with the first filing character is not included in the count of nonfiling characters." We need to include this in the pre-processing notes so that catalogers can clean up their records before transformation.
1. Unless otherwise instructed, we will not be mapping 'broken' timespans from either the 008 or the 260$c or 264$c (see also Dates #382).
- If there is no date in 008 or 260, then we should still use 264$c even though this is bad cataloging and there should usually be 008 or 260.
1. Should we add other available qualifiers to an existing heading with qualifiers, as part of our transform?
- Should we add content type?
- Decision: We create two access points, one with content type and one without. This accounts for reconciliation and we can change it in
  Phase II, if needed.
- We will have to look at musical works again regarding work access points. Librettos to musical works present a unique
  challenge because they should have a separate author from the musical work itself, but the author is not always listed. There may be a
  duplication issue if the author for a libretto or musical work are not listed in the access point. We should explore further options to distinguish the libretto from the musical work.
1. Does a different performer mean a different expression? If so, how can we map a performer/narrator as part of an AP?
- We will revisit in Phase II.

Deliverables Work Plan Check-in (20)

Check due dates/assignments
- Refer to document for updated deadlines and assignments.
Work parties for these?
Concerns?

Wrap-up (5)

Action items

Backburner

Next week we will discuss the deliverables work plan more and particularly the dessemination regarding the release.

August 6, 2025

See time zone conversion Meeting norms Present: Absent: Time:Jian Notes:Tynan

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (10)

007 is ready for transform
Crystal, Laura and Ebe met yesterday about Phase I wrap-up work

Attribute fields spreadsheets (20)

Attribute fields folder
- Coded fields and Content type
  - Spreadsheet maps from reasonable map of a code, e.g. 006, 007, 008 to RDA value
  - There isn’t a 1:1 mapping, need to refine it based on other fields
  - Problematic content types in red
  - Diagram: content type is “cartographic image”
    - But we also have “cartographic tactile image”
    - So we check 00-01 = fd, which indicates we have a tactile image
  - Representational:
    - In the 008
    - An image is “still” as long as not moving, but it can be manipulated
  - Remote sensing (row 72)
    - Can be video – we cannot call this still image
  - Braille (row 81)
    - Code f
    - Must accompany tactile text
    - 008 – is the primary content
    - Agreed on mapping
  - Braille (row 82)
    - If we are in the 008 for music and we want to say its tactile, (special category in 007 for tactile)
    - Music for 008 music must be for the tactile texts
    - We need to ask a music cataloguer for this, people are inconsistent with their uses of 007
    - The 008, when it’s just music and braille could be meant to express braille music form
- Coded fields and Media type
- Coded fields and Carrier type
  - Carrier type for “play aways” – audio book in a box that you can plug headphones directly into
    - We are calling this “object”
    - See instructions from OLAC
      - Extent/Carrier type: 1 audio media player
      - But they are saying “carrier type audio”, which the group agrees doesn’t make sense
    - See Best Practices for Cataloging Digital Media Storage Devices
  - Kit (row 69)
    - Also calling this an object
    - Crystal has never seen this with “kit” unless it contains objects like game pieces
    - Decision: no carrier type for the kit itself
    - Does it make RDA description incomplete not to have a carrier type?
  - Slide (row 100)
    - Deborah’s guess is that carrier type is “sheet”
    - Decision: use “slide” as carrier type – problem solved!

Transformation Output Review (35)

Aquaculture dataset in RIMMF?
- We need to make a list of questions that we need to go back to for phase II – for now, unless something is egregious, we aren’t going to fix things for phase I
- We have been allowing redundancies to come through
- (1) Change 008 mapping to “Nature of content: ” and then the value, e.g. bibliographies, discographies, filmographies (whichever is present)
- When we have aggregating work, we don’t know what the subject heading applies to
  - For phase II, do we want to fine-tune this: if we are only going to describe the primary work and apply all properties to primary work
  - We should do this if the only thing that made something an aggregate is the presence of bibliographies
  - (2) Fine-tune AAM in phase II
- The aggregating work has subject headings; the aggregated work does not in this example
- Co-authors end up in related person of work property
- Why is related-agent editor ending up attached to the work and not contributor to manifestation? Deborah is looking into this
- 856 is not coming over:
  - We have mapped it, haven’t finished coding it
- Ok that OCoLC numbers are coming up
- Do we need “Has equivalent: ” when there is a $i present?
  - (3) Refine 76x - 78x to drop prefix from field # if $i is present
- (4) If there is a 530, currently we prefix the value from $a with “Additional physical form available note: ”
  - But some of those notes already have a prefix
  - Another redundancy to clean-up in phase II

Work Distribution (30)

Deliverables Work Plan
Need to have reasonable due dates for everything and divvy up responsibilities
More eyes on deliverables outline = better

Wrap-up (5)

Action items

Backburner

July 30, 2025

See time zone conversion Meeting norms Present: Absent: Time:Ebe Notes:Tynan

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (5)

Transform:
- 11 RFT and 8 Waiting for Decision
- Of those, only 3 not currently being worked on by coders
- Also in progress is WEM access points
- 6 of the 8 Waiting for Decision are coded
- Biggest tasks for transform team after this will be code cleanup and adjusting as we review
Post Phase I wrap up
- We need a structured work plan with benchmarks to meet our goal of September release
- Crystal working on this in the coming week. Does anyone want to meet about it? Probably at least an hour.
- Crystal will email Ebe and Laura to work on this

007 (10)

Spreadsheets
Need someone to go through and add transformation notes
Some might need IRI value
Problem is with “has note on manifestation” and “has category of manifestation”
007 is important for 3 main access point qualifiers
- But lots in there has never been used
- Deborah: only position 1 and 2 need to go into the transformation
Crystal will work on this today

856 (10)

Spreadsheet
Transformation wants to mint second manifestation
Is this conditional on second indicator?
- If value is 0, then identifier for resource
Issue: we have marked delete all rows where the indicator 2 value is 0
- So we will change row 667 to “reviewed” so that we can have that case
“RDA for Uniform Resource Locator - Definition and Scope: An address of an online resource. A Uniform Resource Locator includes all manifestation identifiers intended to provide online access to a manifestation using a standard Internet browser.”
Decisions:
- Row 688 transformation note: "[$3 value] at: [$u value]"
- Row 678 transformation note: "Related resource at: [value of $u]"
- Row 689 transformation note: "Version of resource at: [value of $u]"
- Row 690 transformation note: "Component part(s) of resource at: [value of $u]"
- Row 691 transformation note: "Version of component part(s) of resource at: [value of $u]"
For phase I, the best we can is map the second indicator 0 when there is no $3 as the URI provided by the marc record; if there is a $3, then we say location for $3 value and have it be a note
If there are multiple indicator 0’s, map them both
So in our mapping, there are URI’s associated with the manifestation
The problem is that $3 can hold so many different types of things

Finding carrier types (15)

See Finding and mapping Carrier Type values
Related to 336, 337, 338 (content media carrier)
Very few of these fields in many systems, such as LC
Where do we look to get content and carrier?
Cypress: “I asked about the ISO639-3 in the LC session yesterday, and they will make a bulk download available at some point that we could use. At this point I think it is best to continue with creating ISO URIs by combining the expected base IRI with the code available in MARC, without checking that it is valid in LC at this point.”
3 different versions: carrier type, content type, media type
- Carrier type might have enough to use
- Content type is likely finishable
- We can’t address the access points or minting of the IRIs without this information

Timespans (15)

See Issue Dates #382
Primarily for ongoing works – we have issue with open-ended dates
- If date is open-ended, it is diachronic
- Worried that open-ended date cannot be a timespan, cannot be a timespan IRI
- Gordon: the unstructured description in any element is just like a note
- We are not sending it to a timespan, we are making a note on timespan
- It is valid to put unstructured description in as “date of manifestation” when we have an open-ended date, i.e. manifestation of diachronic work
We can record broken timespans as unstructured descriptions, “date of manifestation”
Should we use unstructured descriptions as date qualifiers in an AP for a manifestation?
“-” versus “/”: if we go with “/”, then users need to learn a new way of understanding e.g. 1975/ means open-ended beginning with 1975
Need to be careful: is it a publication date or a chronological designation
- e.g. “the 1975 volume was published in 1976”
People should weigh in on the issue
Additionally, we will do a poll asynchronously
- Crystal will send out a poll via email, discussion will be asynchronous today and the poll will go and be decided by next week (8/6/25)
Laura: catalogers are encouraged to put a date, but if they put “n.d.”, then we’d have nothing to add to the access point
- All of these are cataloger guesses
- The dates are not reliable enough to use as access points

July 23, 2025

See time zone conversion Meeting norms Present: Crystal, Deborah, Ebe, Sarah, Jian, Laura, Adam Absent: Cypress, Tynan Time: Ebe Notes: Sarah

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (5)

Laura and Crystal will be presenting at the LD4 conference on Monday, July 28th.

String Encoding Scheme for Access Points (30)

See issue
Access points as nomens? Manifestation AP:
- 008 vs. 26X ($c) for date of publication.
  - Will it be possible to take a look at some records to see what would be cleaner? (Laura may be able to do this after LD4)
  - 26X will have less complicated coding, but we may already have code for 008. Deborah will look at this code to see if it works here.
    - If we've already completed code for 008, we'll use that for date of publication.
- Using edition statement in every access point, when present, to ensure access points are unique.
  - This decision will mean that errors get included in access points, but it is a way to ensure de-duplication.
  - We will discuss de-duplication in APs more during Phase II. Work AP:
- Discuss parentheses as separators for aggregating works. Expression AP:
- Language codes 008 or 041?
  - 008 is simpler; it would require additional coding to pull language codes from 041.
  - For now, we will stick with the simpler solution and see how it works.
  - If this causes issues, we will revisit in Phase II.
- Designation of version:
  - 250$a vs. 245$s
    - We will use every 250$a and not 245$s for now, since it is uncommon and often misused. Aggregating Work AP:
- Base: Title of Work - ISBD says that when there are additional titles, the titles are strung together. Would this also apply when there are parallel titles?
  - We will have to discuss this further later.
- Associated agent follows base.
- Creator/Aggregator is in 1xx, Other in 7xx. Deborah will flesh this distinction out further.
  - Both Other and Creator/Aggregator will follow base.
Next step for SES for APs is to discuss coding.

Timespans (20)

Did not get to timespans today, will discuss next week.
See Gordon's answer to our question from last week
Are we set to:
- Treat "broken" timespans as years of identifiable non-broken timespans within (example: circa 1950 and ~1950 and 1950- all become "1950")
- Add notes on manifestation as broken timespans occur to say "date of [whatever element]: 'input broken timespan as it appears in MARC.'"?
Who can implement this in code? Which fields need to be adjusted?

Output review (30)

We did not get to Output Review this week, will review next week.
RIMMF view
continue with Tuataras

Wrap-up (5)

Action Items

Work on Phase I deliverables

Backburner

Agenda Items for Transform Meeting

Briefly discuss timespans
Discuss SES for APs; how decisions made today will translate into code.

July 16, 2025

See time zone conversion Meeting norms Present: Crystal, Sarah, Ebe, Tynan, Laura, Abhignya, Junghae, Adam Absent: Cypress, Deborah Time: Notes: Sarah

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (10)

Cypress is now formally co-leading the project with Crystal and will head the transformation. She has a few more hours each week to dedicate
to the project.
Laura and Crystal will be presenting at LD4 next week.

Access Point String Encoding Scheme

Punctuation?
- Cypress and Deborah are not here this week: we will discuss this issue again next week.
- Laura left a comment on issue #208 about output ISBD punctuation when there is no existing punctuation. This is a note, not an access point.
- NLNZ has a document in the RDA Toolkit regarding punctuation in access points.
  - If you want to view this document, you will have to subscribe to it under documents in RDA Toolkit.
  - We could look at creating our own documentation on RDA Toolkit for this project so that users can easily
    subscribe to and access it.
See original toolkit appendix E

Output review

Tuatara output file
- Punctuation for all access points will be weird until we make a decision about it and implement it.
  - Ideally, expression access point in output would look like: The reptile database (Uetz, Peter). Text. English
- Remove initial articles from titles in access points?
  - Filtering for articles could be complicated. (i.e. a word that is an article in English may have the same spelling but different meaning in another language.)
- Question: should we make access points nomens so that we can say things about them like language, source, etc.? discuss in transformation meeting.
- We could include language of expression in expression access point to avoid duplicates.
  - We are working on this. We need to determine a way to look up language codes in 041. See mapping in Drive here
    - ISO language code look up is in Phase II.
- rdamd:30011 and timespan:
  - Mint a timespan for "has date of publication".
  - Question: how do we handle publications with a start-date, but no end-date?
    - RDA toolkit defines timespan as "finite"
    - Can we use date + hyphen as an appellation of timespan? Is this considered finite?
    - Ask Gordon about appellation of timespan and on-going publications.
    - Ebe will look at NLNZ documentation for information.
  - Follow-up question: how do we handle timespans for works that are issued in multiple parts, but are not necessarily aggregating works?
    - We're not dealing with any successive works in Phase I; these will wait until Phase II.
    - Are we not dealing with any works that are multi-part or which have two dates for any reason in Phase I? Ask Deborah and discuss next week. (Are all multi-part works aggregates, or not?)
    - These questions will determine what we include in and exclude from Phase I.

Wrap-up

Action Items

Discuss access points as nomens in transformation meeting.
Discuss access point SES at transformation meeting.
Asked Gordon about timespan and Ebe will look at NLNZ documentation.

Backburner

July 9, 2025

See time zone conversion Meeting norms Present:Crystal, Cypress, Adam, Ebe, Sarah, Laura, Deborah, Sita, Jian Absent: Abhignya, Tynan, Sofia Time: Ebe Notes: Sarah

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (10)

Check-in: AP SES? Are we all set?
- Still need to make decisions about punctuation.
Transformation progress: 3 tags left aside from those needing tables to start coding. Everything else in progress. This is great!

041 transformation (15)

Which coder wants to take this on?
- We will discuss this further at the transformation subunit meeting tomorrow.
Can a mapper help by constructing the lookup table?
- We may not want to do lookups in this case due to the size of the lookup file and instead construct IRIs from $2, trusting that it is correct.
- If we're going to do lookups: multiple lookups might be a problem for users, so we may want to create tables. It’s better to cache the file and query it locally, rather than doing a live lookup.
Lookup table for MARC & ISO codes
- LOC has URIs for ISO codes which match $2
- We do have lookup for MARC, but at the moment it is querying LOC-this needs to be changed.
- ISO lists are faceted and not all terms are in one XML file, so we might have to ask for that. This is something we can table for phase II.

$2 Discussion (20)

Combined lookup table for 336, 337, 338 $a and $b?
- We don't want to combine the lookup tables. If we do this then we will have to update the table each time RDA updates one.
- We should keep them separate so we can automatically update each file as RDA updates.
- Decision: look up either term or code in all three files.
- Why are we mapping the codes by the notes?
  - When we find the URI, we are mapping it in the $3 note.
  - In the case that we don't find a URI, do we want to have the $3 note with the code? Or, nothing at all?
  - Decision: we don't want a note with code in it, so do not have a note if URI is not found.
- Will discuss coding for this in transformation subunit meeting tomorrow.

Output review (30)

Tuatara output file
HTML character coding: incorrect character encoding scheme?
- Similar to HTML, quotations, apostrophes, ampersands, less than, and greater than symbols are protected characters in XML. This is well-formed XML and should validate.
  - However, they may not be getting converted back to the symbols (i.e. ">") in RIMMF, which is causing issues.
  - Converting the XML file to N-Triples before putting into RIMMF should correct this.
  - Will discuss further in transformation meeting tomorrow.
- Expression for work/access point:
  - no space between author name and title
  - if a record does not have 100 field, it goes to first 700 field: assuming that this is most significant and not a corporate body.
  - Potential issue: if we are only picking first 700, we will likely end up with none-unique access points. This is part of what we will have to look at in Phase II.
  - We will do further de-duplication in Phase II.
- LCC versus LCSH in Tuatara.xml:
  - LCC has notation and alternative label, in Tuatara-RDA.xml these have the same value: "QL645"
  - This may be because QL645 doesn't have a pref label and is a code.
  - This output aligns with the mapping in Gordon's document, but we may want to review the mapping for classification concepts.
- We don't want to strip punctuation from the end of "extent of manifestation" because we need the punctuation to be there in the case of abbreviations.
- rdae: review lines 327 and 328 in XML output file

Wrap-Up (5)

Action Items

To do at transformation meeting tomorrow:
- discuss 041 coding and lookup tables
- follow up on $2 discussion. Can we do collective lookups for 337 and 338?
- Cypress will look at rdae properties in manifestation (lines 327 and 328 in Tuatara-RDA.xml)
To do before meeting next week:
- Work on Phase I deliverables.
- Review output files. (we will stick with the same files for the time being.)

Backburner

Punctuation for access points: discuss next week

July 2, 2025

See time zone conversion Meeting norms Present: Absent: Time: Notes:

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (10)

Sofia has a scheduling conflict with her new job. We'll do another round of scheduling polls after Phase I. Please ping Sofia for asynchronous participation for the time being

Fields that need mapper eyes (25)

773
- why end with “: “
- We use the colon if there is a $3 or $4 present – so we need to amend this to make it conditional on there being a subfield 3 present
- “Follow subfield value with “;” if $3 is present
- “Follow final subfield value with “.” unless the value ends with “.”
- We also don’t need to have a “.” at the end of the URI
- Changed in the spreadsheet, commented in the issue that an update was made to the transformation notes
383
- Note 1 (from the issue #172)
  - Change the mapping to the Cypress’s suggestion in the notes on this issue (#172)
  - Retain $2, rather than finding an IRI for it
- Note 2
  - We will use Cypress’s suggestion in the issue
- Note 3
  - The group endorses Cypress’s suggestion; Cypress will update what the coder should change
385 - spreadsheet
- We should not map $m and $n, they are characteristics of the term in the vocabulary
- $b is the code for $a
- We should use $a as the text string if there is no $2
- When $2 is present, we need to mint a skos:concept
- The spreadsheet needs an overhaul to get it ready to code: we need “$a without $2 present” and “$a with $2 present” options
- Crystal will re-map and Cypress will review

Access Point SES for Manifestation (25)

Title proper
See discussion
From last week:
- $s is not part of title proper. What about $f?
- Should $f/$s qualify access points?
- What is the $s in a 245?
From this week
- Do Titles Proper need to be distinguished? No, but the access points do
- Can we do an ALMA list looking for245 $s in the catalogue?
- Also hard to know what $f was intended for – they are being used in different ways
- Ignore $f for $245, because it is unclear how cataloguers were using the subfield
- Drop $f for access points [except for Collection works-- Phase II]
- $s we should keep for the access point
- Ignore 245 $f and $s for titles proper for non-collection works

Source codes with appended language codes (15)

Note to UW: Do a cleanup project on these and reiterate policy in Staffweb.
(Need to get a bit of the setup for this topic from the recording, TC)
- We could make a marker for pre-processing to find and correct them
- Particularly for 33x, they should have a controlled vocabulary
- There will, however, be other controlled vocabularies
- There could be another content-type list outside of RDA that we want to use – makes it tricky
- Instruction: if $2 present and it is not RDA, then mint a concept
- Decision: we should implement Deborah’s proposed solution (link)
- There is going to be a change in bibco standard record that says use RDACO and no RDA content

Test records for next week (5)

Crystal will send about ten records to students to run through the transform this week
What would we like to see? More from the Jane Austen file, or fresh records with certain fields present?
- Record set on a topic: tuataras – or other lizards!

Wrap-up (5)

Action Items

Backburner

June 25, 2025

See time zone conversion Meeting norms Present: Crystal, Cypress, Deborah, Tynan, Abhignya, Sarah, Ebe Absent: Sita, Adam, Jian, Laura Time: Ebe Notes: Sarah

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (10)

LD4 proposal was accepted; Laura and Crystal will present next month.
Cypress has added "ready-for-self-assignment" tags to the fields under "Ready for transform" to improve clarity.

130/240 $0s and $1s (Cypress question) (20)

Can we use 'approved' $0s and $1s from a 130 or 240 as the main Work or Expression IRI? If not, what do we do with it? Include it as an identifier like we do with 630/730/830 and 6XX/7XX/8XX $t?
That is - If a 130 or 240 has only work subfields and the IRI source is approved for works (from our list) - do we use it as the main Work IRI? (only HMML authority file, MusicBrainz, Web NDL)
and If a 130 or 240 has work and expression subfields and the IRI source is approved for expressions (from our list) - do we use it as the main expression IRI? (only HMML authority file)
- It is unlikely that we will see this occur in practice, but we should still account for it in the code in case it comes up.
- Existing code uses $0 and $1 as identifiers and not IRIs (i.e. the 630 fields)
- If we use these fields as IRIs we don't know what information the creator has about it and we can't add as much information as we can to our own IRIs.
- IRI squatting (using another person's IRI and adding your information to it) is generally bad practice.
- It would be better to use our own IRIs and use OWL "same-as" to connect IRIs. We may want to have further discussion on how this would look/work in an open-access Triplestore environment.
- For the time being, we will mint our own IRIs, using $0s and $1s as identifiers.

Access point SES for Manifestation (25)

Title proper
See discussion
In the 245 field, should $s and $f be included as part of the title proper, or should they be included in the access point as qualifiers?
Why are $s and $f in 245 and not separated in a 250? (There may be a few reasons for this: to facilitate display, avoid using a 250, or the difference between edition and version)
- We decided that $s is not part of the title proper, need to discuss more regarding $f.
- Further discussion needed on whether they should qualify access points. ($s may qualify access point for expression, $f may qualify access point for work, $c is for manifestation.)
- Currently, $f is included in the access point for expression.
- We can discuss this further next week, when more people are present.

Source codes with appended language codes (15)

$2 Source below is possibly RDA translations
We have found some examples in the UW file where $2 (source) is, e.g.,
- 336 $6 880-09 $a Wen zi $b txt $2 rdacontent/chi
- 337 $6 880-10 $a Wu mei jie $b n $2 rdamedia/chi
- 338 $6 880-11 $a Cheng ce $b nc $2 rdacarrier/chi

Are they from the official RDA Registry for values? E.g. https://www.rdaregistry.info/termList/RDAContentType/
Is there a lookup table for these translations?
How should we map these? Is it safe to map them to e.g., RDAContentType
- RDA has URIs in non-Latin alphabets (i.e., it has simplified Chinesese, but not Pinyin), so there may be issues with looking up language codes from a 336 where the language is transliterated. The related 880, however, may have the non-latin script and can be looked up.
- Crystal suggested: if 880 or 33X $b has a label found in RDA registry, use URI and don't map $a. if no $b or 880 matching a concept from rDA registry, mint a concept for $a and assign language tag and source from $2.
- These examples are incorrect in their usage of LC RDA content types. LC does not have language codes and should always be in English. However, RDA does have language codes.
- Since this is a recurring mistake, the code should account for it.
- We can account for this by using $2 when "rdacontent/chi" or "rdaco/chi" to look up in RDA registry XML file, otherwise if not found there we can look up in lc registry file.
- Currently, in the code, if there is no match for $2 in 336, it tosses the data out and doesn't output anything. We may want to add a step to this in phase II, where-if there is no match for 336, it uses related 880, instead.
- Put decision into decision index under $2

Wrap-up (5)

Action items

Cypress will consult Gordon about $0 and $1 in 130/240.
Deborah will write 336 $2 decision in a discussion and then feed into decision infex. Cypress will proofread.

Backburner

Discuss SES for access points with more people present next week.

June 18, 2025

See time zone conversion Meeting norms Present: Crystal, Cypress, Deborah, Sita, Abhignya, Sarah, Adam, Ebe, Jian Absent: Tynan, Laura Time: Sita Notes: Cypress, Crystal

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (10)

Sara H. and Doreen are graduating, and have rolled off the project for the time being (they're both welcome to return if they have time in their new post-graduation jobs!)
Crystal and Ebe submitted a proposal to the SWIB conference last week
No transformation meeting this week due to the Juneteenth holiday
Progress Report
- Crystal still needs to check and move 110, 130, 240, 711 & 730 to ready for transform
- Jian still working on 007 review
- 35 total ready for transform (21 BSR)
- Still need to compile a list of abbreviations for the code/punctuation omission (something AI could help get started?)
  - lots of notes in the issue on abbreviation
  - have someone compile these into one and look at it next week - Ebe volunteered

Transformation Discussion (25)

How are coders doing with transform? Any questions that would have been asked tomorrow?
Tynan is going to continue working on field-by-field coding
Cypress working on access points, reviewing 338 as Matthew works on it
Abhignya working on 385, almost done
Sarah working on field-by-field code - 388, was working on 336 but will leave it for Matthew, will work on 083
Deborah suggests not working on 041 for now
Not mapping the 003s/001s
Cypress will go in tonight and label things as 'ready for self-assignment'
Crystal will create a phase 2 column

Initial Articles (20)

AAP for titles:
- Continue to remove initial articles for AAP and add VAP with them—for matching to current NAF practice
- Change coding to map a title proper as a VAP for work/expression, ignoring the filing indicator
- Add coding to map a title proper as an AAP for work/expression, using the Filing indicator to skip [x] characters
Decision: Can do this in the code, for Access Points, will drop initial articles for the AAP and retain it for the VAP.
Switch to retain initial articles for AAP and add VAP without them
- Retain current coding to map a title proper as an AAP for work/expression, ignoring the filing indicator
- Add coding to map a title proper as a VAP for work/expression, using the Filing indicator to skip [x] characters

Output Review (25)

[Latest Jane Austen file](Working Documents/transformationCode/outputDataForReview/20250520/20250520-janeausten-NA-RDA-lexicalaliases.rdf)?
Need to talk about an SES for access points, has not been added in the code because we don't have the SES yet
LCCN has notes indicating scheme of nomen, should be revised to include scheme of nomen.
8XX/4XX needs to go into numbering of part at the work level of the main WEMI stack (not the series work).
- [Work identified in WEMI stack] is issue of [Series work (identified by 490/8XX)
- [Work identified in WEMI stack] has numbering of part [$v value (for $v value, prefer 8XX. If not in 8XX, use 490)]

Wrap-up (5)

Action Items

Cypress will organize the ready for transform project column
Crystal will move issues to a phase 2 project column
Crystal will put LCCN changes into spreadsheet and indicate code re-check needed
Crystal will assemble a dataset for next week's review

Backburner

SES for access points (discuss at next meeting)

June 11, 2025

See time zone conversion Meeting norms Present: Cypress, Crystal, Sarah C., Sara H., Tynan, Jian, Deborah, Doreen, Adam, Abhignya, Junghae, Ebe, Laura Absent: Gordon, Matthew Time: Ebe Notes: Sara H.

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (10)

SWIB proposal due today. Crystal will submit before 5:00 PDT. Ebe will take a look.

Deliverables Work (20)

Accompanying documentation for Code (Sarah, Abhignya, with question support from Cypress)
Spreadsheet cleanup (Crystal)
Sample data accumulation (who can store this much right now?)
- 2.345M total MARC records, split by library
- Jeff Mixter at OCLC confirmed will be getting LC records
- Concern with splitting was that do not want to repeat IRIs
- Uniquely generated IRIs currently use a date stamp. Should probably use a date-time stamp in order to run multiple files through in one day.
- UW might be able to look at ex-Library resources for storing & conversion
Supplemental vocabularies and properties (Crystal & Sarah)
GitHub README files for release (Ebe can help)
- Guide to implementation, use, etc. of Github
Selections from the GitHub wiki, issues, and discussions, such as
- Edited version of the Decisions Index (Crystal would love help with this one)
- Instructions (Crystal)
- Approved URI list (Crystal)
Wikibase Cloud server with subset of data (Tynan)
- About 1,000 records
- Can wait until after field-by-field coding is done
RSC feedback (Ebe can help)
PCC feedback (who can help?)
- Sofia: I would propose to reuse some texts, eg rsc and pcc docs may be almost the same. After use those documents for the overall write up
- Adam: One PCC decision I do not like at all is that for aggregates they want to use Expression manifested but are allowing work access points as the value.
- Ebe: How do they expect to be able to identify the various pieces of data. NLNZ is recording expression data so it can later extrapolate work if required. They are not muddling things together
Overall write-up (who can help?)
- Maybe submit as an article in an open publication?
- Explain what everything is, how the work was done
- Use Project Plans as a base
- Cypress can help with Transform aspects of it after coding is complete
Crystal will create issues for Deliverables and add to Phase I Closeout milestone

1XX-8XX BIG MARC record from Richard (25)

XML version of MARC bibliographic fields/sub-fields
Doesn't do indicators, does have all subfields, $9 included as a label; Value is given as tag subfield
Cypress transformed & Deborah loaded into RIMMF
Use for review as a group of what output looks like
Review transformed record against record review spreadsheet - then doublecheck whether is expected per the mapping spreadsheet (e.g., is it expected $8 is missing)
Won't be able to check punctuation handling in this version since just using labels, rather than real titles with punctuation
Use is really to see where subfields got mapped and how things will be displayed to the user
If subfields are repeated, then can test code for breaking
Could be used as a base for coder testing

TMQ Triple store and RIMMF demo (25)

Where can we store the data and what to do with it?
Richard tested creating a repository
Options are: 1. Linux server in-house 2. Pair.com
Limits:
- Read-only
- Needs to be made secure
- Not ready for public use; just to show progress so far
Chose GraphDB to test
Used every 50th LC from Feb 2025; this transform was from 6mo ago
Uses SPARQL
- Currently only a title proper index
- Crystal has some SPARQL skills and could help
Data can be displayed as a linked data graph, though not set up with labels yet
Discussed challenges with initial articles in titles and a programmatic way to handle; two title propers? use title proper and variant? since language isn't indicated for the 245 creates additional challenges (e.g., the in English (article) vs. the in French (tea)) - further discussion needed
Repository is accessible using RIMMF
Deborah can send info on how to access the repository and answer any RIMMF questions
Planning on updated GraphDB? Can do at anytime, and doesn't take a lot of time
Is GraphDB free? What does it cost to put project data there? Deborah thinks it's possible to host and then run GraphDB on it, and will check with Richard

Wrap-up (5)

Action Items

Answer Jian's questions on the 007 review here
Crystal will heck BSR status of 007
Update unique IRI generation from using a date stamp to using a date-time stamp
Crystal will create issues for Deliverables and add to Phase I Closeout milestone
Deborah will check GraphDB costs with Richard--Richard says he is using a free copy of GraphDB which allows him to run 5 repositories of any size, but only run 2 SPARQL queries

From last week:

Crystal will change Transform timeline to July 15
Crystal will update spreadsheets of 9 issues waiting on attributes table and check links from the ones already done, and move them to "ready for transformation"

Backburner

Further discuss initial article handling

June 4, 2025

See time zone conversion Meeting norms Present: Deborah, Sara H., Cypress, Crystal, Sarah C., Jian, Junghae, Laura, Adam, Tynan, Ebe, Sofia Absent: Doreen, Gordon, Matthew, Sita, Abhignya Time: Laura Notes: Sara H.

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (10)

LD4 Conference proposal submitted
- Submitted abstract as is; Title was along the lines of: MARC 21 to LRM/RDA/RDF Mapping Project: Crossing the River with More Mountains to Climb
Crystal removed field-by-field transformation review issues and project steps after discussions last week. Strategy for review will be based instead on record-by-record review, with the One Big Record included
SWIB proposal: Who else wants to present? Deadline is June 11. Crystal drafted a proposal based on the LD4 proposal, but updated for where the project should be by SWIB-time

Private YouTube Channel for Meeting Recordings? (5)

Google Drive storage is pretty low
Phase II will mean more stuff
Thoughts on having meeting recordings upload to YouTube and only viewable by those with the link, and only keeping the links in private Google Drive?
Alternatively, we could make an additional Shared Drive for Phase II or just for meeting recordings
Could also permanently delete recordings from 2023, even though we don't have transcripts
Previously decided to remove recordings in favor of transcripts, but old ones don't have them
The group's sense is that with detailed notes and documenting decisions, there's been less of a need to review older recordings; now more useful for catching up
Decision: remove videos that are more than a year old, but Crystal will back them up in case ever needed on a UW drive

Project Timeline Review (10)

64 fields left to code, 50 in "ready for transform"
Nine waiting on attributes table (are these actually in progress?)
- One row that says "see headings mapping table", covers all subfields except 3, 6, 7, 8, maybe 0 and 1
- Cypress did this for agents headings fields, link should be updated
- Crystal can update spreadsheets and check links from the ones already done, and move them to "ready for transformation"
- 0 and 1 stuff should be in the spreadsheets
- Still need to address what to do with $0 and $1 in 130/240
- Find summary of $0/$1 and integrate or refer out to
Five still in review
Adjust projected phases and timelines, with Phase II still beginning September 1. Feasible if mappers begin working on other deliverables now
- Losing Doreen, Sara H.; still have Tynan; Cypress back; new coders continue to ramp
- Cypress is supporting error review; if others can take field-by-field Cypress could focus on other parts, but her time is more limited
- Tynan: would find it helpful to have quick reviews of the output to iterate quickly with any updates needed in the code as still solidifying conceptual side of the work
- Crystal, Jian, Laura and maybe Ebe depending on availability, all willing to provide feedback; Cypress on more complex cases
- Talk more in Transform meeting tomorrow about the process. Will share the decision with the group and record in the Decisions Index. Likely a label for coders and note with the ask and link to the code, then reviewers self-assign.
- Decision to push deadline to July 15.
Who can work on what deliverables?
- Crystal will set out work for the deliverables. Next week will discuss and start assigning.

One Big MARC Record (10)

First iteration: comprehensiveTextualMaterial.xml
- Was for textual materials, included indicators, fields, sub-fields with sub-fields repeated even when non-repeatable
- Comprehensive record rather than valid.
Running transform against it is helping to catch areas where code can be shored up against invalid MARC
Second iteration in progress
Specific test examples are available in test_input and test_output (e.g., looking at 008 or 245)
Talk more at Transform meeting tomorrow, about how much time putting into it and practicality - set deadline for once it's good enough

Headings Attributes Table Update (15)

Cypress is working on the access points part and will having questions on string encoding schemes
Cypress will start a discussion; thinks it's going to be similar to 245
Deborah noted won't have to look at ISBD and instead take in order given; can put examples in the discussion
Crystal, Deborah, and Sofia will review

Status Check: (10)

007: almost done; Jian needs to add questions to issue and then once answered can complete
070: there's a condition of whether something is "serial-like" that determines how an item is handled; need more detailed conditions for the coders; need a work-around for scheme of nomen to indicate this number comes from National Agricultural Library.
752: Junghae will look at it
336: moved to a table instead of mapping spreadsheet; Matthew is coding; Crystal will fix the mapping spreadsheet since it's not up to date anymore

Transformation Q&A (20)

How can mappers support coders in this work?
- Helping review output, and any feedback on test input
- When reviewing and it's already marked as coded and nothing changed then move it to Done instead of Ready to Transform; Crystal will look at these and move if needed
- Keep responding quickly when asks on clarifying mapping spreadsheet
How is onboarding going for Matthew, Sarah, and Abhignya?
- Will discuss at Transform meeting
Do we need to leave some coding for Phase II? Can some fields be momentarily "left behind"?
- Did this for 400/411 - obsolete/not in BSR, postpone to meet deadline
- Crystal will check whether in BSR and note whether need to be prioritized. Otherwise leave not in BSR and can take up if extra time before the July 15 deadline
- Can work on Access Points without a field being fully mapped
- Some in Transform may be moved: 005 not going to map; LDR 8 type of control is data provenance so should be looked at in Phase II

Wrap-up (5)

Reviewing Test input and output
- Crystal recommends downloading both input and output and looking at them side by side; uses VSCode, thinks better than Notepad++
- The rdf:about is the IRI. If it ends with "man" it is a manifestation and everything with the "about" is about the manifestation
- 245 went through extensive testing and examples
- Mappers can suggest additions to the test to expand the use cases

Action Items

Crystal will change Transform timeline to July 15
Crystal will start setting out work for deliverables
Crystal will remove old videos and back them up
Crystal will update spreadsheets of 9 issues waiting on attributes table and check links from the ones already done, and move them to "ready for transformation"
Transform topics tomorrow:
- How is onboarding going for Matthew, Sarah, and Abhignya?
- One Big MARC Record, requirements, effort, and timeline
- Processing for reviewing coder output, using a label, noting the ask and code, reviewers assignment

Backburner

May 28, 2025

See time zone conversion Meeting norms Present: Crystal, Sara H., Cypress, Deborah, Doreen, Adam, Laura, Abhignya, Junghae, Sarah C., Jian, Tynan, Sita, Sophia Absent: Ebe Time: Sara Notes: Doreen

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (10)

LD4 Conference? Deadline May 30 to submit. Conference end of July.
- Document created.
- Crystal: something ready by Thursday EOD. Laura is on vacation but available Thursday and Friday if needed.
Crystal added Poll decision re: Agent IRIs to the Decisions Index in II.C.1.a.v. How to mint the IRI. It would be awesome if folks would check to make sure everything came over that was expected.
SWIB:
- Extended call for proposals; new deadline: June 11, 2025.
- Consider presenting there too — note that LD4 and SWIB are largely distinct communities.
Crystal and Deborah worked on the mapping for 336
It's the end of May: can we finish review and coding by the end of June?
- Avoid delays into September; Phase II should start on time.
- Coding review can extend into July, but initial coding must be complete.
- Coders currently have 53 tasks left; feasible with available coders.
- For transformation meeting tomorrow, talk about this with coders.

336 Mapping: Pattern Matching Table and Review (30)

Deborah developed a pattern-matching table and mapping spreadsheet.
336 Issue
Mapping:
- All values for 336 are to be mapped using RDA ContentType IRIs — not as literal terms or MARC codes.
- If the IRI is an http and not from LC, include it in the "not LC" bucket (this covers RDA Registry IRIs).
- Do not map:
- "other" — it has no semantic value since "not other" is undefined.
- "unspecified" — adds no meaningful information.
Codes vs. Terms
- In minted IRIs: Should use RDA curies (e.g., (rdaco:1001)) to support multilingual alignment and de-duplication.
- Access Points: Will retain human-readable terms for now to support review and debugging.
Language
- Same mapping logic applies as with 336:
- Use language codes in IRIs (e.g., eng for English).
- Use terms in APs (e.g., "English").
- Subfield l may not always match the 008/041 codes (especially in parallel aggregates).
- 041 can contain multiple language codes; 008 supports only one.
Technical Implications
- Current access point and IRI functions share logic and must be separated.
- Coders/Cypress will try to:
- Rebuild the IRI generation function to accommodate code-based values.
- Implement or integrate a local lookup table to map codes ↔ terms for languages.
- Avoid reliance on external APIs; all lookups must be local.
Implementation Notes
- Old transform logic from Theo (commented-out code) should be deleted.
- For Matthew tomorrow during transformation meeting: additional mapping tables for 337 and 338 should be based on the same structure.

Transformation Code Review (40)

3XX for review
3XX for review Discussion - one discussion with all tags shared
Reviewed through 340 last week--do remaining tags
- 344 Review: Looks good.
- 345: Not yet coded.
- 346: Reviewed using RIMMF
Assign asynchronous review individually or in small groups?
3XX Review Process
- Current Method (field-by-field) is time-consuming and inefficient.
- Pending Decision: Shift to holistic record-level reviews:
- Select a diverse group of MARC records.
- Include a "one record to rule them all" that touches most transformations.
- Suggestion: Add coders’ test data to facilitate more comprehensive review. Limited usefulness?
- Suggestion: Combine real records and test outputs?
- Suggestion: Use more carefully selected MARC records since not everything has been coded?
- Needs to be discussed with coders tomorrow during transformation meeting.

Wrap-up (5)

Action Items

General:
- Prepare LD4 proposal draft (Crystal, by Thursday EOD)
- Review Agent IRI entry in Decisions Index (All)
For Transformation meeting tomorrow:
- Discuss whether we can complete Phase I coding by end of June
- Ask Matthew to review if the 336 mapping logic can be replicated for 337 and 338
- Discuss switching from field-by-field review to record-level review; coder/reviewer workflow.

Backburner

May 21, 2025

See time zone conversion Meeting norms Present: Crystal, Sara H., Cypress, Ebe, Deborah, Doreen, Adam, Laura, Abhignya, Junghae, Sarah C., Jian Absent: Matthew, Sita, Gordon, Sofia, Tynan Time: Ebe Notes: Sara H.

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (10)

Welcome Abhignya!
Sarah and Matthew are coding away
Crystal reviewed most of the classification tags, but there's an unresolved question about 070. Ebe has one tag left to review, Jian has two.
Cypress will join the transform meeting tomorrow
LD4 Conference? Deadline May 30 to submit. Conference end of July.
- Document created.
- Email Crystal to join.
Crystal will add Poll decision re: Agent IRIs to the Decisions Index today. She has been working on getting issues added and workflow lined up for the transformation review. Documentation on that also coming today.

Transformation Code Review

3XX for review
3XX for review Discussion - one discussion with all tags shared
336
- Cypress: 336 has not been coded. It was originally coded by Theo or Zhuo but we switched approaches and 336 was put on hold for more discussion on how it would be mapped.
- Worked on manifestations and would come back to expressions, but didn't come back?
- Marked as code re-check needed and removed from today's review
- We want to map as many values as possible to RDA value vocabularies.
337
- If the subfield is there, both 'has media type' and 'has media type value applies to' whatever is in subfield 3
- Capture both media type and carrier type as didn't want to assume they were the same
- Looked at Adventure time example and 2nd carrier type going to doi, probably because of an 008 and being minted to a vocab from UW.
- For the 008, a lot mapped to RDA when could confirm, but some that couldn't went to doi and UW vocabulary. This will be captured in 008 mapping sheet.
  - As with 33X, we want to map as many values as possible to RDA value vocabularies with the 008 as well.
- Is there mediatype from a different vocabulary? Not in these examples. Most will be from id.loc.gov since that's preferred by PPC for the field
- Liked the way it was transformed, and want to map as many values as possible to RDA vocabularies.
338
- Carrier type comes from both $a and $b, so mapped both. In the transformation they will be deduped to one triple. We did this so wouldn't miss anything
- Online doesn't have an equivalent in doi? Could have been Online Resource in RDA
- 008 still in Ready for transform status. Sita completed a 2nd review.
  - Was anything changed? If not, then can be marked as done. Otherwise, will need code re-check.
- Ebe recalled long 008 conversations that they weren't entirely equivalent, and this was the best route for the time being.
- Deborah noted carrier type and content type are very important to be available in record access point qualifiers and discovery systems will need them for filtering. May have unintended merging otherwise.
- Cypress noted qualifiers come from 336, 337, 338 right now. We will need an 008 mapping at some point, and that will take time to code
- We want to map as many values as possible to RDA value vocabularies
  - Besides fixed values, will need to hunt for other places this could be derived. This will be more challenging and difficult for a student unless already deeply familiar with MARC.
- Sofia and Gordon did some work on mapping. Group made a similar decision on 03-05-2025 to look at this again at some point. See issue for 336
- Created a new issue from a previous discussion - Deborah and Ebe will evaluate
- Cypress noted this will be a large coding sub-project
One MARC record to rule them all
- It's difficult to find records the represent every possible iteration of MARC. Time consuming to create.
- Sara proposed considering using GenAI to generate test MARC records that meet all the parameters we're trying to hit
- Crystal suggested could be a presentation on appropriate uses of AI for metadata: aid for technical work rather intellectual work (e.g., subject analysis and understanding a painting)
- Deborah noted will need multiple versions for different format. Deborah and Laura can help guide Sara on requirements
- New issue created: One MARC Record to Rule Them All, per format
Transformation Review Goal
- Essentially, we're trying to answer, "does it look correct?" If the output is what we expect, even if we're not hitting every possibility.
- If/when a very "complete" record becomes available, we can add that to the review.
340
- $a and $c use the same vocab; in RDA they map to the same property: all just material
- Group discussed the loss of specificity, challenges if someone ever wanted to convert back to MARC, and whether a note would be useful.
- Ebe recalled that when the vocabularies were pulled together, that the specificity of MARC was taken out, and they all went into one VES. This was intentionally done.
- She also noted that the idea was this is legacy data, let's put it in RDA/RDF; moving forward, being compliant with new content standard and saying the same things semantically; as a result, old data will unfortunately be left behind
- Adam agreed this was a persuasive argument and it's not so important to hold on to what's lost
- Laura noted that there are communities that do care about this kind of loss, with Getty and AAT as an example - losing granularity in properties, not values
- Unfortunately, this project can't bridge those losses for everyone, and they'll need to be taken up with RSC if needed.
- We'll leave the loss as is for now, and can revisit later if needed.

Wrap-up (5)

It's best if the group can work on asynchronous review before next week so we can move on to the next review

Action Items

Everyone: asynchronous review of 3XX for review

Backburner

May 14, 2025

See time zone conversion Meeting norms Present: Crystal, Sara H., Sita, Deborah, Doreen, Junghae, Cypress, Jian, Sofia, Laura, Sarah C., Ebe, Tynan, Adam Absent: Gordon, Matthew Time: Ebe Notes: Sara H.

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (10)

Welcome back Cypress!!
Crystal onboarded Sarah yesterday
Sarah and Matthew started taking tags to code
OCLC DLC Records coming soon
- Hoping for more non-book items. Expecting by end of May
LD4 Conference? Deadline May 30 to submit. Conference end of July.
- Laura could help with some preparation.
- Ebe can help with materials preparation, won't likely be able to attend.
- Look at what Crystal/Sofia presented at IFLA Symposium.
- Plus a little looking ahead to Phase II. Recent issues overcome.
- Crystal will email Laura and Ebe
Transformation meeting is tomorrow
- Cypress will begin attending next week
Doreen and Sara graduate in about a month

Headings for Works and Expressions as Subjects (30)

Attributes Table Discussion
Revisit discussion from last week, wrap into a decision we can implement
Loose ends?
Laura found something with Agent she'd like to note. If it's simple, Deborah suggests adding as a comment in the table.
Team reviewed slides from Deborah:

Left side - get subject
Middle - get entity for work and for person; also subject heading relationship with work; example work is aggregating
Right side - get subject and subject for the whole string
Question of whether to link Shakespeare to all his works whether about him; only get to him in an index

Deborah: Have to find your way to an entity and then to related entities
Laura: Create concepts with SKOS, and SKOS doesn't have the complexity to break out agent with concept. Phase II investigate going beyond SKOS?
Adam: What's the relationship between pink and blue?
Laura: Subject is RDA entity, but concept is just an IRI. Trying not to impute subject relationship a cataloger hasn't assigned or relationship of person to the concept we've created for the subject heading; doesn't seem like a relationship we can apply RDA relationship to
Sofia: Related person of work is ok but there are some small problems in the graph
Crystal: Want to relate a person to concept; is there some component part in SKOS? Agent isn't a subject, but concept could be a subject; Subject linked to Agent, then subjects that start with the Agent and have subsequent parts could be linked to each other
Deborah: 'Has related person of work' rather than 'has subject', but only can link from the work; change coding from curie for subject to related person
Laura: seems like want something to happen for end users but can't get there with the tools that we have in RDA right now, without change RDA (which can't do) or coming up with something more complex than SKOS concepts or a 3rd undetermined path. Hesitant as seems like trying to shoehorn something in a way that it wasn't intended, then down the line has impacts.
Sofia: agrees with Deborah in order not to have loss, likes related person, RDA/LRM if had entity we could solve this with Nomen, but don't have this available. Related person of work, understands what suggest, this way would find relationship to agent without using subject
Problems with subjects in graph. "S. Will - Characters - Ophelia" isn't the subject of The afterlife of Ophelia
Decision: Do as related person of work, mark for community conversations, and mark as loss, since loss of specificity. Talk with SAC community about subjects and related person of subjects

Transformation Code Review (35)

600 for review : wrap up
600 for review Discussion
- We decided that using APs or AAPs in IRI formations for Agents was part of our initial decision. Perhaps wasn't implemented
- Choice made was to mint meaningful IRIs when have source provided, but normally don't since it's from NAF, so have unique meaningless IRIs
- Deborah Fritz from ORCID + Fritz, Deborah from NAF: ideally one person with two nomens but right now includes the source, so it's two entities
  - Did this to prevent accidental merges
- Does the fact that we're not accepting NACO authority file and treatment of non-human personas, is that the main rationale for why minting IRIs with APs/AAPs instead of using IRIs from NACO authority? Yes
- Unless uncontrolled name there should be a source (unless ind2=4, there will be a source)
- Ind0 + 600, assumption go to authority file, but many names aren't established, so no authority anywhere, just in bib record. Coded as in accordance with rules, but there's no source for them. Can't know until go to NAF and see if the string is there. Plus, there are undifferentiated names.
- Approved Sources list in the code
- Do we treat works like decided to treat agent? So, if work has subject subdivisions, going to make concept for whole string, will also breakout the work as subject of the work? Decided would not do agent is the subject of the work, instead use related agent of work. Deborah suggests doing same thing for the work - related work of work.
- Poll
- Decision:
  - We are still going to make SKOS concept for the whole string. We will break out the agent and do as related agent to work. Apply same thing to the work that has 6XXs. What about agent portion? That gets attached to the work in the 600, not the work being described.
  - Rather than making new subject statements for Work and Expressions portions and X, Y, or Z, we're going to do the same thing as Agents and make them related
  - Crystal will add to the Decisions Index
  - Plan to revisit in Phase II regardless
- This needs coding changes
  - Added Code re-check label
  - Keep as Transform review in progress - will want to look at as a group to confirm output is as expected
3XX for review
3XX for review Discussion - one discussion with all tags shared

Wrap-up (5)

Action Items

Crystal will add Poll decision re: Agent IRIs to the Decisions Index
Crystal will email Laura and Ebe re: LD4
Everyone: asynchronous review of 3XX for review

Backburner

May 7, 2025

See time zone conversion Meeting norms Present: Doreen, Sara H., Junghae, Sarah C., Sita, Adam, Mattew, Jian, Sofia, Crystal, Laura, Ebe, Tynan Absent: Deborah, Gordon Time: Junghae Notes: Sara

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (10)

We need to get moving on our review
Welcome Sarah Collins! Sarah is a new student coder that started this week with a few hours, and will begin a normal schedule next week.
Crystal needs brief bios for Sarah and Matthew for the project roster
LD4 Conference announced CFP, due May 30. Would be a good fit for the project and is online so avoids current institution travel restrictions.

Headings Attributes Table Update (30)

Deborah has replaced the entire Headings attributes for WE tab with one called “Headings for WE". This is sufficiently complete and is ready for review by the group and then coding.
Discuss changes for review
Links:
- Headings attributes for WE
- Headings for WE
The primary update involves removing repeated subfield codes from each individual cell in a row and instead placing them in a dedicated "subfield" column
This change reduces complexity, particularly when dealing with titles and access points (APs), where the codes can become lengthy and error-prone
The new structure aligns with existing XSLT coding practices and has been implemented as a function
A function table has been added at the end of the main table to support this logic
Rows in the table now share the same conditions for clarity and consistency
The table has been significantly expanded to accommodate slight variations and edge cases
It’s crucial to verify that the subfield codes listed are correct
- If a code is valid, it will be properly handled
- If a code is used incorrectly, it will result in an error due to cataloging practice, not the function logic
Some MARC records (including those from LC) have been found using subfield $t in the 100 field
Decision: Add 100, 110, and 111 fields with title subfields (e.g., $t) to the table in a separate set of instructions related to titles and authorized access points--DF: Same logic as for W: has AP|AAP for work for 700, 710,, 711 fields; so, added in that row
Deborah will investigate examples of 130 fields for review--See: attached file at: Attributes table
Deborah confirmed with Cypress that multiple modes can be strung together (rather than having to repeatedly copy/paste)
Legend for color coding has been added

6XX

Lengthy discussion on the 6xx and whether agents have a subject or related relationship and whether the relationship is to the work being described or a work identified in 6xx
Prior poll
- Questions
- Responses
- Would rely on FAST to break apart, and transform FAST headings as subjects
- Make separate entity for the person if doesn't already have, related to the work if there is one in 600 AP, but not relate the work to the person being described
- understood xyz doesn't create separate subject for each but keep all together. But for person can create subject person related to the subject work
Last week discussed if a person is broken up, the source should be the name authority file. The idea of breaking it up at all wasn't addressed
For 600s, don't think there's any place to indicate that NACO authority file is the authorizing source. It has a second indicator 0 that indicates that it's in the authority file. Implied it's in NACO/SACO
Sofia: If we have a 600 with a $t, I would not create a subject person relationship with the name portion.
- Related to the work that is related to the work being described, so related to the subject in the main heading (person not related to work in the main record, but to the one in the subject heading field)
agree to create a person related to the work (600$t) but not to relate as a subject the person with the work described in the record.
Voted on
1. Break out agents from 6xx fields and make "subject" relationship to main work - 2 votes
2. Break out agents from 6XX fields and make "related" relationship to main work - 0 votes
3. Do not break out agents from 6XX fields unless they are WE headings - 6 votes
Options 1 and 2 allow for agents to be related to WE entities produced by 6XX WE headings they are part of
When have agent with xyz at the highest level you still have a subject relationship to the person, in addition the more complex subject
Decision: Take an iterative approach. Option 3 is the best for now. In later phases can map the person as a subject, taking into consideration community feedback
Adam shared examples of sub-divisions:
- Shakespeare, William, 1564-1616--Authorship--Marlowe theory
- Shakespeare, William, 1564-1616--Spurious and doubtful works

Transformation Code Review & Workflow Finalization (45)) - ran out of time

600 for review
Does the GitHub workflow suit our needs? Did anyone test it out and have thoughts?
610 & 611 for review (Crystal will do these today)

Wrap-up (5)

Put WE breaking out from subject/concepts on the agenda for next week

Action Items

Crystal: Send email with new meeting link to make sure everyone got it
Deborah will investigate examples of 130 fields for review--See: attached file at: Attributes table
Everyone: 600 for review
Everyone: send any feedback on the GitHub workflow for review to Crystal today

Backburner

April 30, 2025

See time zone conversion Meeting norms Present: Absent: Sofia, Ebe Time: Notes:

Water Cooler/Agenda Review/Roles for Meeting (15)

Updates (5)

Cypress is returning to the project to help us with coding for a few hours per week! Thank you Cypress, and welcome back!
Matthew will provide a few hours per week of coding support for the project as well. Thank you Matthew, and welcome aboard!
Crystal has hired two iSchool students, Sarah Collins and Abhignya Rajapu, who will start in May once their paperwork is cleared. They will be able to train under Doreen and Sara before they leave.
Sofia needs to hand off her classification review tags due to unforeseen circumstances. Unless someone else can volunteer, Crystal can review them.
Crystal is working on putting a standing transformation team meeting on the calendar for May/June

Continue Reviewing 600 (35)

Fill out spreadsheet for it--will the spreadsheet workflow work for the remainder of the tags?
Do we need to build out something else for recording and following through with issues surfaced?
Crystal has a link to the issues spreadsheet
The spreadsheet also contains links to the github issues
Should we put feedback into the issue itself?
Feedback:
- 1. Questions about the decisions made by the group
- 1. Coding questions: issues that only pertain to the coding
- Maybe the tabs could be used by the coders to monitor the feedback coming in
- How do we tell the reviewers what needs to be done – github is better at notifying people of needed changes
- We might need to add a workflow stage to the project
- Do we need fresh issues to deal with the review of coding?
- Decision: we create a new github issue for this review process because the old issues are over-saturated with the prior mapping discussions/assignments
- We need to create issues so that we can do the project management aspect of this – i.e. assignment people, marking statuses
- Crystal added new columns to the project workflow called “Transformation review to-do” and “Transformation review in progress”
  - Here, we can discuss issues with the transform that need to be fixed by the coding team
  - Then we created a new issue “600 Transformation Review”
  - In the issue Cyrstal linked to the old issue discussing the mapping/transformation as well as the mapping spreadsheet and the attributes spreadsheet
  - Here’s how the issue looks: https://github.com/uwlib-cams/MARC2RDA/issues/487
- Recoding gets signaled in the NEW issue
- Test this out on the 600 review:
  - We can try this with asynchronous review
  - We will do some test records for another field next week using the same process

Attributes Table Workflow & Approval (30)

Has to be a workflow in which the coders can take the attributes table and code directly from that, rather than going back through the corpus of mapping spreadsheets and finding changes
How do we assess who is assigned to review?
How do we get to, “this has been approved”
The table is not finished, but there is consensus – it is being coded as it is being finished
We won’t review mapping tables against the attributes table
The mapping tables won’t get notes that say where to refer to it
Mapping table true where attributes table doesn’t apply
Attributes table will be one of our deliverables
It’s ok for us to review as Deborah still works on the table
There is time in the timeline to review the spreadsheets
We should figure out how to get google drive updates when the attributes table changes so that we can know when there is new material to review

Wrap-up (5)

Action Items

We can review the attributes table asynchronously
600 transformation review

Backburner

April 23, 2025

See time zone conversion Meeting norms Present: Jian, Sara, Deborah, Laura, Doreen, Crystal, Matthew, Ebe, Sita Absent: Junghae, Gordon, Tynan, Sofia Time: Ebe Notes: Sara

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (10)

Crystal is interviewing students for the rest of this week. Hoping they can start next week
Welcome guest Matthew Hill from British Library, considering providing coding support for project
Crystal neglected to assign a reviewer for 008 (a field we reviewed together but was initially mapped by Sita, which we need to review again to make sure we incorporate MARC vocabularies we published and to make sure we were consistent with the enormous mapping incorporating a lot of formats). Who will volunteer to do a cohesiveness-check?
- Sita will review 008

Transformation Testing (20)

How to best accomplish systematic review?
Crystal made an attempt at the spreadsheet
Once we figure out what we want, Crystal can finalize the workflow, create a page one it, and walk Adam and Deborah through the steps of getting records and pushing/pulling in GitHub.
Serializations for the weekly review?
- Deborah requested N-Triples
- Laura shared converter to serialize: https://www.easyrdf.org/converter
Enter MARC field. Duplicate and specify patterns conditions want to see in each row
Coded-Phase I should reflect whatever the current coding label is
Date Coded will need to be updated if/whenever recoded
Input for review will be MARC XML of ~10 examples. File is placed in marcDatasets folder and link entered in this column
Output for review will have the link in outputDataForReview folder and link entered in this column by coders
Notes column to enter if find something unexpected
Upcoming meeting with Jeff Mixter, OCLC. Ask if can still provide a record with every MARC tag.
- If not available, consider creating one. Would likely take a day.
Crystal will put in MARC tags, and then as a group can put in all patterns
- Will only put in ones that have been coded

Transformation Review (45)

600 review
2nd Record - John of Austria
- Scheme of nomen is LCNAF, not LCSH as linked
- Line 197
- Scheme of nomen for 600: can it be LCSH when person is not subdivided by subject?
- Review policy for person scheme of nomen, not a concept (breaking up person from the concept)
- Subfields in Line 236
- Line 217
- Thought were minting identifiers for agent differently? Thought were deduping for all entities? Need to confirm what decided.
3rd Record - Thomas Edison
- Line 362-371
- Different scheme, different nomen even if same string
  - Nomen is an entity, not a string - but should be about same entity
  - Thomas Edison is an RDA person
  - Thomas Edison is in LCNAF
  - Thomas Edison is in CYAC
  - Even though identical, identified in two separate schemes
- Line 377-392 re: deduplication
- Line 415
  - Decided not to pull out Geographic or Form sub-divisions; did do Genre
  - Z & X would cause confusion, and were not safe to pull apart
  - Gordon wanted to pull apart, and some wanted to keep together, and that's what vote went with
If review surfaces that a decision needs to be made, we'll do that

Wrap-up (5)

Action items

Sita will review 008
Revisit decision of minting IRIs for agent
Crystal - add example in the spreadsheet
All - asynchronous review of 600s

Backburner

April 16, 2025

See time zone conversion Meeting norms Present: Crystal, Adam, Jian, Tynan, Doreen, Sita, Deborah, Sara, Ebe, Laura Absent: Junghae, Sofia, Gordon Time: Tynan Notes: Sara

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (10)

Transformation Deadline update: End of May -- work has slowed without Cypress or a replacement for her
Review needs to happen simultaneously and then in June
Crystal reviewing resumes and starting interview process for two new students today
Matthew Hill at The British Library may be joining the project. Could have time to offer support and setting up patterns for coding that the coders take forward. Deborah hopes to introduce him next week.
NARDAC meeting recording for the Spring 2025 Update with presentation on NLNZ's implementation of Official RDA. Deborah encourages everyone to view; it was very encouraging!
Laura shared the Ex Libris Users of North America (ELUNA) 2025 Conference is scheduled for June 16-20 in Atlanta. The Linked Open Data Community of Practice Working Group (which Laura and Ebe are members of) will be co-moderating session organized by Ex Libris. Questions Laura submitted:
- Conversion from MARC to RDA/RDF leaves out serials/aggregating works; how do they envision providing a conversion process, even if partial
- UW created templates for entry for RDA in Sinopia, but they haven't been updated recently. What kind of assistance would be needed to have to create those tables?

Field 357 Question (10)

Was originally mapped by Theo and reviewed by Sita
Question on how to format transform output for repeating fields (e.g., subfield c)
Mapping is using labels based on the subfield code
Decision: use semi-colon space ("; ") to separate repeating fields (e.g., <rdamd:P30137>The originating agency, ITAC, denotes their control of dissemination using the term ORCON to the authorized recipients, CIA; DIA; UKIA</rdamd:P30137>)
- This avoids that subfields may include commas (,)

Transformation Testing (20)

Dataset Selection & Review Planning
- Need to decide on methodology for selecting datasets.
- Once decided, begin processing and reviewing datasets.
- Assign someone to finalize the Transform Output Review spreadsheet and another to volunteer as the first to test.
Sources of Records
- Depends on who is pulling the records.
- UW: Alma, authorized records, OCLC (permission to download 50k DLC-authorized random records from 2023–2025)
- Deborah has her own set
Storage & Format
- Clarify where records will be stored and in what format.
- Past practice: Cypress stored them in “MARC Datasets” folder for field-by-field testing.
- Records should be transformed to XML:
  - MARCEdit can be used.
  - Alma exports XML directly.
    - If not running a job, the last record published has an OAI wrapper around it, which can be dropped.
  - OCLC requires transformation via MARCEdit.
Record Batch Size
- Decision: approx. 10 records per batch/file.
- Prefer one file with 10 records rather than multiple files for efficiency.
- Use MARC collection wrapper for 10-record XML batches.
- Deborah exported records from LC, which go one-by-one.
Review Process & Criteria
- What exactly are we testing for?
- How to organize the review (by field, concept, etc.)?
- Coding is organized by fields.
- Possibly group fields by pattern (e.g., descriptive, subjects, concepts).
- Start with fields like 600–630 or simpler ones like 245?
- Use a column to indicate patterns represented by a field.
- Avoid color-coding due to field overlap, that could be used to sort?
- Include only mapped and reviewed items.
- Laura: Difficult to review until coding is finalized.
  - Go through every element and comment or review in groups (e.g., subjects, 008)?
  - Avoid going through individual tags; too time-consuming.
- Goal: to view how complete records are transformed. General and field-specific review can happen simultaneously.
- OCLC used to provide a record with every MARC tag.
  - If not available, consider creating one.
  - May require a side meeting to discuss.
Ownership, First Run, & Next Steps
- Crystal will take ownership of the process and next steps.
- Crystal will be the first to go through the process.
- She will walk the group through the GitHub-based workflow in a follow-up meeting.
- She will send coders a set of materials for review.
- Will include records Deborah shared last week.

Review Check-In (15)

LDR 7 - Ebe mapped, Laura reviewing. Deborah sent a table had made. Laura had questions, will review this week.
007 - needs review - Jian will take
245 - was reviewed, and moved to Transform
751 - Crystal needs to look at
100 - unsure how table relates to Attributes table, on hold for that
Mapping Spreadsheets & Attributes Table
- Do mapping spreadsheets need to be updated to conform to the attributes/heading tables?
- Yes, but doesn't need to happen as coding & review happen; it does need to happen as part of deliverables to make sure spreadsheets are accurate and/or point to documentation
- Reviewers: Identify which issues are on hold due to the attributes table; move to "Almost done - waiting for decision"; make a note in the issue
856 - Sita will take to review
710 - Laura reviewing
700, 111, 110 - Jian reviewing
LDR 8 - Ebe reviewing, Laura has a question that needs answering. Believe it's marked as not mapped. Laura will double check the spreadsheet.
Crystal will go through remaining "Review in Progress" issues separately and make sure it's clear who's responsible
Moving issues to "Almost done - waiting for decision" will help the team visualize what needs discussion

Attributes table (25)

Deborah oriented the group to the spreadsheet to help with understanding it and reviewing the table asynchronously
Use term “Singleton expression” for a single-expression work.
Manifestation: Leave as-is (do not use non-aggregating approach).
Relationships: if can find in table use it, otherwise default is Manifestation
Title of work: 130/240 have things added which aren't Preferred Title. But only using anp
Work AAP is 100+240, not 240
- Double-check subfield coding for combined fields to prevent copy/paste errors.
- Follow if/else logic through subsequent table rows.
Title of expression: same as Title of work; useful to put in transform, since used to build AP
Subfield b Discussion
- Helps differentiate between generic titles in subfield $a.
- Decision: do not add a third title of expression with b to transform
Expression AP/AAP
- Similar to Work; derived from the same plus Expression-specific elements
- 700/710/711 - Deborah will double check
Date of work: sometimes year as parenthetical in title (historic, scientific, technical works)
- Deborah will pull some examples
- Laura: don't think need to pull dates from uniform title fields, even if in date fields
Next steps:
- Everyone review table asynchronously
- Add feedback as comments directly in the spreadsheet where issues are seen.

Wrap-up (5)

Action items

Crystal will be the first to go through Transform Output Review process.
Crystal will walk the group through the GitHub-based workflow in a follow-up meeting.
Crystal will send coders a set of materials for review, including records Deborah shared last week.
Crystal, Ebe, Jian, Laura, Sita will continue "Review in Progress" issues
Crystal will go through remaining "Review in Progress" issues separately and make sure it's clear who's responsible
Deborah will double check use of 700/710/711 for Expression AP/AAP
Deborah will pull examples of titles with dates for “Date of Work” review.
Everyone review Attributes Table asynchronously and add feedback

Backburner

Side meeting to discuss creating a record that includes all MARC tags

April 2, 2025

See time zone conversion Meeting norms Present: Absent: Time: Notes:

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (10)

OCLC says we can export 50K records from WorldCat which should fix our DLC record shortage :)
- Can SZ and EK (potentially SB?) send 50-100 records to DF with $0/$1 and or $2 in headings fields? So our testing pool can be rounded out?
Crystal is posting two new student positions--they should be up and accepting applications any day now
A new draft version of the Heading Fields Attribute Mapping has been added to the Google sheet, and an explanation is provided in the Attributes table #471 page here.
- The next concurrent steps are to:
  - find out whether the coders can code using this format
    - Some examples of how it might be done are already available in the coding
      - mode=”augWor”
      - mode="age"
  - review the content of the table looking for errors, e.g.:
  - $f instead of $n in a copied instruction
  - missing fields or subfields
  - comments on decisions made

Student Presentation: Transformation Walk-Through

Wrap-up (5)

Action items

Backburner

March 26, 2025

See time zone conversion Meeting norms Present: Ebe, Crystal, Jian, Sita, Adam, Deborah, Laura Absent: Sofia, Junghae, Sara, Doreen, Tynan Time: Ebe Notes: Jian

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (10)

OCLC said no to record reuse at the scale we requested, which is 500k records. Need to figure out how to get it from LC. Probably will need to download from LC directly.
LD4 conference dates were announced for summer: we should make a proposal
IFLA presentation went great. Slides will be available on the IFLA site. Not clear if the presentation was recorded or not. Can do a similar presentation for LD4 with updates for transformation progress.
- Got a lot of interest and questions. There are people using RDA/RDF in European libraries. People approached afterward expressing how impressed they were. Crystal reached out to ask if anyone would like to join the project. Will wait and see.
Crystal is posting two new student positions next week. Students: tell your friends. Prioritizing XSLT skills.

Transform Test Datasets for Fridays (15)

Deborah created a feedback template/form
Request template
- Have not yet decided on a request template
- Deborah suggested to create a spreadsheet for all of the MARC fields with the coding status, such as the coded date, reviewed date, etc.
- Crystal started a Google Sheet named Transform Output Review in project shared drive and linked from Test Datasets Discussion
Who will assemble input?
- Crystal, Adam, and Deborah agree to assemble DLC/WAU records alternately
- Will complete the Google Sheet for Transform Output Review first and then decide about the input records
Output location
- Students will figure out
Dataset sizes
- 10 records each week (not hard and fast rule--will adjust depending on what we need to demonstrate)
- Start with fields that are already coded
More discussion needed--will revisit this topic next week

Downloading records from LC (15)

Which records? What does the dataset need to look like?
How do we go about downloading the set?
- How to download from the website is tricky. For example, how to search if you want records with 100s with different indicators?
- Maybe start with searching in OCLC for a list and then download them from a different source?
Where will we store it (and other datasets) prior to uploading to Dryad?
- Deborah has (limited) storage for this. More than UW
Questions about the estimated output size were unanswered. We really don't know how big our output dataset will be in the end.

Mapping Review Check-In (25)

Linking fields: we decided to use amended Toolkit labels from Deborah's chart. Reviewer should also update according to that decision, if this hasn't already been done. Update?
Assigning review for "awaiting review": remaining tags assigned to group members
Review assignments for "review in progress"
Revisit deadline (end of month)

Uniform titles/Attributes table check-in (15)

130/240
Deborah updated the attribute table, including attributes for 130/240
Has name of person vs. has preferred name of person for 100/600/700
- Subfield c is part of the preferred name not just subfield a, therefore, using subfield a only would not be accurate as a value of preferred name of person. We need to use name of person
For corporate bodies, has name of corporate body is better than has preferred name of corporate body because subfield a could contain a parenthetical qualifier that is not part of a preferred name
- Same with uniform titles 130/240. Uniform titles may also contain supplied qualifiers
Deborah noticed $0 and $1 have not been mapped consistently. The decisions index is not very clear. More instructions are needed for different types of work.
The attribute table is still missing a lot of things

Wrap-up (5)

Action items

A separate meeting is needed for more discussion on the attributes table topic (Crystal scheduled for 1pm Pacific Daylight time Thursday. If you want an invitation but didn't get one please email Crystal ASAP)
Next week students will do a walk-through of the transformation for the team

Backburner

What is a ballpark estimate of the size of our output data for the initial transformation?

March 12, 2025

See time zone conversion Meeting norms Present:Crystal, Adam, Ebe, Sita, Tynan, Laura, Sara, Deborah, Junghae, Doreen Absent: Time: Sara Notes: Doreen

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (10)

Crystal still talking to OCLC about record reuse
IFLA presentation next week!
LD4 conference dates were announced for summer: we should make a proposal
Crystal and Sofia have IFLA next week: meeting canceled
Aggregates transformation code: Are any snippets ready for prime-time? We'd like to include some in our slides for IFLA. Slides are due today, so if not we will scrap the slide.

Uniform titles (20)

130, 240, series (830): Deborah - Assume single expression unless it's aggregating where date is treated at the work level.
6XX, 7XX, 8XX is handled.
130 & 240 are the problems. Preferred title creates AP but how to match AP in authority files. Sita did the mapping, and Laura is doing the mapping review. During the early stages, AP is not in consideration. Will work on it asynchronously.
Attributes table need more work. AP Mapping Table tells the field and combination.

Mapping Review Check-In (25)

Linking fields: RDA Registry labels vs. MARC21 labels: Deborah's chart and review of those fields
- Decision will be made via poll or async discussion.
- Option: Using MARC21 Label or Print Constant Label or Column C in Deborah’s Table
- Option: Using column D RDA Registry Label in Deborah’s Table
- Option: Using column E in Deborah’s Table
- Note: PCC labels not an option because they are incomplete.
Assigning review for "awaiting review"
- 7xx will be reviewed once we decide what to do with the labels and replace them.
Review assignments for "review in progress"
Any fields yet to be mapped in BSR?
Revisit deadline (end of month)

Review "asynchronous discussion needed"/"meeting discussion needed" label use and go through tags with those labels (25)

Once aync is resolved, async label should be removed. The only ones left in issues with "async" label is attributes and 240 uniform title.
Go through discussion to see if async discussion is needed. If you put the label on, you should put questions you have in and tag people directly.

Wrap-up (5)

Action items:

Crystal will create discussion/poll and we will vote on which label to use in Deborah's table for linking entry fields.

Backburner

March 5, 2025

See time zone conversion Meeting norms Present: Absent: Time: Notes:

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (10)

Crystal still talking to OCLC about record reuse
IFLA presentation still in progress
LD4 conference dates were announced for summer: we should make a proposal

Mapping Check-In (20)

045: the questions on the issue page are relatively new (from Jan)
- Requires a translation table that we don’t have
- Laura will do her best to make her own judgements
758: Gordon has given good advice on this, Ebe is nearly there, overcoming some hurdles putting everything in the spreadsheet
Mapping spreadsheet X00: with thee fields we have access points and attributes
- We need mapping spreadsheets that put the attributes together with the entities that show how we are mapping the X00’s in the transformation
- This is separate from the access points
- How is this different from the spreadsheets for the X00’s
- We are going to replace the old X00 spreadsheets; although some of them have been actively maintained
- E.g. Cypress and Penny maintained the 100
- Laura: suggests that we do individual tag spreadsheets for 600, 700 etc. because they are meaningfully different enough from each other
- Decision: it is ok to map these individually. Jian is reviewing the individual spreadsheets and making sure they have been mapped once. Then we will close this.
857: Ebe is continuing working on this – moving forward after feedback
Updating mapping sheets for augmentation aggregates:
- Crystal and Deborah meeting about this, it’s a big task
Compile list of abbreviations
- Sara is on this
$7: Ebe is suggesting we postpone this to phase II
- Adam: we can use it, but unaware of anyone who has used it. PCC doesn’t have any policy/guidelines or training about it. Doubts LC has implemented it
  - Suggests putting this off to phase 2
  - Crystal uses it to determine open-access
  - We haven’t dipped out toes into data provenance at all, so we need to leave this for phase II
Attributes table
- Almost done, waiting for decision
- Sofia is working on this
Almost done work:
- 7 bibliographic level: code ‘m’ mixes static and diachronic works
  - Leader 7 is a mess, it does not translate cleanly into RDA
  - We use it in order to do things with it
  - Only thing we can do is put it in its category
  - An integrating resource is a work
  - As we do aggregating work, category of work is “collection work”
    - Doesn’t need a subunit because we would show that is has a parent
  - Adam: do we have to use RDA vocabulary? Maybe we map to the MARC vocabularies
  - Laura: for 006 and 7, these are values that we are using to determine certain conditions; in MARC they are valuable
    - If we ever have to map back to MARC we need these
    - Adam: impossible to map back to MARC
  - We could create wiki-data items, have a table that indicates the URI’s
    - Ebe is interested, but would like some guidance
To-do category
- 400, 411: should be doable because we’ve already done the 490

RIMMF Output Data Review (the rest)

For full overview, see Deborah and Sara’s documentation:
Deborah working on a document to post if we want to use RIMMF for reviewing
Install/update RIMMF
- Help → check for updates
- To install RIMMF:
  - Go to the site (https://rimmf.com/w/doku.php?id=rimmf6:start), click download
Run the .exe
Import the file for review
- You can find review files here: https://github.com/uwlib-cams/MARC2RDA/tree/main/Working%20Documents/transformationCode/outputDataForReview
- Set up files to test the aspect that we are looking for – makes reviewing simpler
- Work with .nt file for RIMMF
- Download the file – make sure to click on the folder (left-hand side) rather than the commit information (center)
- Go to Tools → import entity records → make sure the External data button is checked
- Then drag and drop the file onto the interface
- Go to tools, load entity index
- Can sort indices by entity
  - Start with the manifestation and work up
  - Suggestion: filter and only look at manifestations in the index
  - Sort alphabetically
Click on a manifestation to take a look at it
- Comes in the same order as the triples
- Options → sort by element label
- Open MARC record so that you can compare
  - “Manifestation described with metadata by” takes you to the metadata work, the MARC record is a Note on manifestation
- With the records side-by-side, you can compare the fields , which will depend on what as been mapped
- When reviewing, only looking for things that came over unexpectedly, not looking for cataloger error
- For example: do you need publication statement?
- 2 works and 1 expression is augmentation aggregate
- Normally go from manifestation to expression
  - Not much there
  - All we have in augmented work is appellation data
  - We have link back to manifestation and link up to work
  - This is the augmented work, not the aggregating work
  - The identifier is the local part of the IRI
- Title of expression did not come over because it has not been mapped yet
- Related person of work is in here, but this is an error; we were not supposed to map any agents or related works to the primary work because we do not know whether they are actually related to the primary work
- We need to set up a review process where we can give the IRI or the access point (within RIMMF we can give the RIMMF identifier)
  - That’s the best thing to put down as a header
- Go back to the manifestation and click on work
Review: how to open RIMMF record
- Tools → entity index → double click
- File → close all records
  - This closes the open-records, but leaves the program open for you to look at new records
RIMMF will show any appellation, need to know the RIMMF id
Cypress put mapping from access point table into the code – we have to edit this to make any changes
Crystal: we should explore data we’d like to explore in RIMMF next time
- Let’s put the import instructions in GitHub
- Put the RIMMF instructions on the project WIKI
- Identify which kinds of datasets will be most helpful to review
- The large chunks are too overwhelming for review – we can’t take in 25K entries!
- Need an asynchronous discussion about output review – output the data from ALMA or OCLC and experiment with it

Wrap-up (10)

We might do a demo of how to run the transform during a working meeting
We have mapping work assigned and asynchronous discussions that need to be had
Mapping review deadlines
On Fridays we can update a new transformed dataset, decide during Wednesday meeting which files we want uploaded

Action Items

Put thoughts into discussion on output datasets for review

Backburner

We should probably have the transform team walk the rest of the team through the transformation code. How to run it, where the functions are, etc., so that everyone knows how to look things up and is capable of running it independently/can help Tynan onboard new students in future

February 26, 2025

See time zone conversion Meeting norms Present: Crystal, Sara, Junghae, Doreen, Laura, Sita, Ebe, Adam, Jian Absent: Gordon, Deborah, Tynan Time: Ebe Notes: Sara

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (10)

Deborah is out of town until March 4
Crystal is in touch with OCLC about permissions for using metadata exported from there rather than downloading from LC. They're checking
Next week: data review in RIMMF
- Install own instance for ease of following along and testing on own - RIMMF 6
- Crystal will try to install and import. Will capture and share instructions if they're not already available
- Ebe recommended the help content's usefulness
Sofia and Crystal are drafting the IFLA presentation
- They will share with Laura and Ebe for review
- Ebe and Laura are on the program as presenters; however, they are co-authors but will not be present. Crystal will clarify with IFLA.
Laure: Question on gathering content for RSC Review
- Use RSC Question label on the issue.
- Make sure context and question are clear in the issue (or be prepared to get an email asking about it :) )

Google Drive Space (15)

Within 18% of space limit left
300MB per recorded meeting
Need to start a meeting recording archiving process: Crystal can start moving things to UW OneDrive
Start with 2023-August 2024? Then every 6 months do another 6 month chunk?
Ebe: thought the plan was to keep records for ~3-6 months and then delete?
Yes, retain for 2-3 months unless something is especially interesting (meeting notes)
Option to save only the transcript instead? Yes, some have, some older do not.
Could then save meetings for a year
Everyone - share any reactions in the next 1-2 weeks before moving ahead with implementing.

Mapping check-in (45)

Meeting discussion/Asynchronous discussion needed on mappings
- 535: Laura will confirm status is accurate
- 240: Laura's been working on this. Jian is doing a review on 130. Attributes table needs more work first. Deborah will work on it when she returns. Put any related issues on hold and make a note in the issue for tracking.
- 070: Crystal will ask Amanda Xu at the National Agricultural Library (NAL)
- 018: Laura notes it's an identifier for articles, relevant for making photocopies, but difficult to find information on its use. Adam and Ebe agree that it's not RDA, is administrative metadata - doubts about usefulness of recording the data in RDA, but no big concerns. Decision to map using a text string.
- 843: Holdings format tag that can be used in a bibliographic record. Complicates reproduction picture (isn't in any PCC documentation), but indicates a specific copy is a reproduction. Potentially useful in a scenario where a library's original is destroyed and a copy of a copy is required; though without holdings information can't say. Agreement to move this to Phase II with other 8xx tags.
- 773: Crystal wanted Cate's input on whether there's consistent usage that would allow mapping to anything other than note on manifestation. Deborah noted that previously the group decided that (for Phase I) we should map all values from the 76x-78x fields as Manifestation: note on manifestation. Ebe did 760 with Deborah as a test run. Laura can use what is in the Amended worksheet as a template. For display prefix, decided to use MARC's display constant "Main series: " - main series means something to a user, but note on manifestation doesn't to anyone unfamiliar with RDA. Updated 760 transformation notes to reflect the same. If look at in Phase II could choose to be more granular.
- 720: Why are the $0, $1, $7 subfields here? It's a standard number, but not an authority record. Why no 2? Adam thinks it was originally in the proposal but taken out for more consideration. Could be because the source is indicated within (e.g., imdb, discogs)
  - $0 - source that has URI that represents the name but isn't modeled as a RWO (720 ##$aKevin Gray(discogs)a312098; 720 2#$aThe Other Baby$4prn(imdb)co0776444)
  - $1 - uncontrolled name in it and wikidata uri for person/corporate body (##$aLiliana Essi$1http://www.wikidata.org/entity/Q19760388; ##$aTshul khrims rin chen$1http://viaf.org/viaf/22550486)
  - $7 - just the provenance information
Anyone need help? Anyone available to give help?
- Ebe thinks she can get the rest of hers done; most are linking fields; 410/411 can probably be copied over from the 700s; may be down to the wire
- Laura working on 045. Time periods expressed in different ways, with a variety of subfield combinations. Crystal asked whether Orbis Cascades standing group have something for this already? Adam noted that the field is pretty much obsolete at this point, and that EDTF is used in 046. Sita noted is like 008, and use MARC table and link it with code? Adam noted coverage of content has no range. Group will continue the discussion in the issue.
- 765 assignment updated to reflect that is Ebe is working on it.
- Appendix J: if OCLC haven't defined it yet and it's not being used, is it possible to postpone to Phase II? Adam says put $7 off to Phase II - no one's using it, Ebe hasn't implemented it. Yes, move to Phase II.

Wrap-up (10)

Share thoughts on Google Drive storage
Mapping deadline is in 2 days - February 28
Mapping review deadline is in 1 month - March 31
Download RIMMF 6

Action Items

All - finish mappings
All - download RIMMF
All - start working on mapping reviews
Crystal - contact Amanda Xu

Backburner

February 19, 2025

See time zone conversion Meeting norms Present: Deborah, Adam, Crystal, Jian, Laura, Junghae, Doreen, Trina, Sita, Ebe, Sara Absent: Gordon, Sofia, Tynan Time: Ebe Notes: Sara

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (10)

Respond to "asynchronous discussion needed" tags!
See project roster updates - is everyone's job description current? Would Trina like to be added?
- Trina would like to be added and will send something over.
- Email Crystal if anything needs adjustment.
We can use UW's Dryad for output data parking
- Crystal has ORCHID ID can use
- Appears simple, open, stable spot for initial parking
- Not editable directly, but can download, manipulate, re-version, and update it
- Size limit is per file. Will need to chunk, which is common (LC, Harvard, likely convenient for users)
- Chunking strategy needs discussion. Aggregate types, then WEM?
Crystal spoke with Christine E from Harvard about Dataverse data and emailed Jeff M from OCLC again about using OCLC data rather than downloading from LC
- Harvard does have an agreement with OCLC - Crystal seeing if can make the same deal
- Policy looks like something UW could do too
Sara and Doreen are graduating in June. If another institution can hire XML coders, now is the time.
Deborah not available next week after Wednesday: back on 4th of March: finish aggpulls prior to 28th?
- Crystal, Deborah, Tynan to meet for status update

Reconciliation and Deduplication Timing (30)

Phase I or Phase II
Works, expressions?
Manifestations?
Aggregates?
Reproductions?
Subset?
URIs are using approach that mimics access point and appends to the end of a stub URI and attempts to dedupe Manifestation, Work, Expression that way
Some are creating merges that aren't the same things
Deborah showed an example of what is happening in RIMMF that is an issue with a video recording
- Two-dimensional moving image has additions to its soundtracks (e.g., music, speech, subtitles, closed captioning, special features, etc.) - but actual film is the same for all
- Simplest way to handle could be exclude for handling in Phase I, as have done for sound recordings
- Historically, tell if it is silent, otherwise assume there's speech. RDA hasn't raised with movie community - and whether should be one for spoken word or two-dimensional moving image. If add performed music now have moved into aggregating
Laura agrees this needs to be sorted out if trying to be perfect, but we're not trying to be. Stumbled on AV issues, which is just one of many dealing with
- Changed position to ok with Phase I duplicate IRIs, but tell people why we're doing it, that ultimately don't think this method is final, and is a work in progress. The substantive conversation about reconciliation after conversion should come in Phase II. It's great work, and also don't want to oversell it.
Adam agrees probably good to show bad data, explain it, call attention to it. Well aware in Phase I creating dupes that can't be deduped or incorrectly merging, and suggest what can be done to improve results
- What write up can be series of case studies of different transformations of what went wrong and why
Laura asks if it's possible to have code with version that has both options so if want to do their own reconciliation/deduping can try
- Depends on whether Cypress had coded and commented out coding that used opaque IRIs or not
Decision: Go with Laura's suggestion with disclaimer

Mapping Issue Check-In (15)

Redistribution Needed? Reports needed?
Ebe is doing an intensive review on hers this weekend. Will update early next week if need any redistributed. Has been doing work offline
Update mapping sheets for augmentation aggregate changes #483 - Crystal taking over from Cypress, will likely need assistance from Deborah
751 - Sita working on, close to done
Mapping Syntax Spec - was intended to be machine-readable, but that is now out of scope for this phase, so instructions/decisions is fine
765/767 - those are notes - Ebe should be able to knock those all out in a batch
Mapping Spreadsheet for X00's - Jian reviewing mappings for 100; will investigate this issue
130/240 - Laura and Sita connect on what's needed/who take lead

Issues: meeting discussion needed (25)

770 - can be supplement to monograph or monograph except - Ebe looking at as part of her batch
525, 041 - discussion not needed, removed label
336 - question on handling $3. Will use 3xx with $3 present decision. Sara will update Decisions Index to explicitly say it is a note on manifestation
245 - Junghae will review
Things to report to RSC - Laura looking for records of naturally occurring objects. Herbarium specimens in OCLC or Smithsonian? Or Harvard?

Wrap-up (5)

Action items

All - Review own issues with "asynchronous discussion needed" tags to confirm tag is needed/accurate
All - Respond to "asynchronous discussion needed" tags
All - Discuss output data parking chunking strategy
Crystal, Deborah, Tynan to meet for status update on aggpulls
Sara will update Decisions Index with
- Reconciliation and Deduplication approach decision and
- Updated 3XX with $3 decision to explicitly say note on manifestation
Doreen/Sara - run a small sample set of records for group data review? If don't have many issue discussions/decisions to make

Backburner

February 12, 2025

See time zone conversion Meeting norms Present: Deborah, Cypress, Ebe, Doreen, Gordon, Laura, Sara, Crystal, Junghae, Sita, Jian, Tynan, Trina, Adam, Sofia Absent: Time: Sara Notes: Doreen

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (10)

Cypress: Code for augmented aggregates most finished and metadata for MARC record finished.
- Deborah can show what it looks like in RIMMF.

IRIs (10)

ITSDS could redirect web requests from https://domain-to-be-defined.lib.uw.edu/ to a web site of your choosing. The way this would work is, any request for that domain name would be redirected a web site of your choosing. As a specific example, if you used GitHub pages, a URL like https://rdf-metadata.lib.uw.edu/xyz could be redirected to http://uwlib-mig.github.io/rdf-metadata/xyz
Five-Star Decision: Crystal and Laura: whether to pursue five-star now or later.
Data Storage Concerns:
- Laura: Where will data reside if pulling from GitHub? Concerned about large records..
- Crystal: Agreed—bulk storage needed, not per-entity.
- Deborah: Need a web domain, data storage (triples/RDA registry), and domain maintenance (~maybe $300/year, possibly via donations).
- Laura: Concerned about minted IRI. Triple store?
- Crystal: Thinking about one web page, UW won't pay for triple store.
GitHub Limitations:
- Sara: GitHub limits files to 100 MiB; repositories should be <1 GB, ideally <5 GB.
- Crystal: GitHub isn’t viable; will explore institutional repository options (ask ITS, Denise, or Preservation).
Next Steps:
- Decide where to store and manage data. Crystal will inquire about UW’s institutional repository.
Anyone interested in meeting with ITSDS at UW about the IRIs with Crystal? Need to figure out how they will work
Reminder from Crystal: Complete mapping before the deadline.

Review output data

IRIs coming out as expected?
AAM tests
On De-duplication Challenges
- Crystal: Current deduplication approach is rushed and requires a more thoughtful method in Phase II. Mushing things together is worse than duplicated data.
- Deborah: De-duping is extremely important to show the importance of RDA (Entity-relationship). This is just a test dataset.
- Ebe: Is it feasible that we split the files and run de-duping them differently? I.e. De-dup ebooks only and not videos because of mentioned issues with a particular media? — Compromise?
- Adam: Not deduping incorrectly. Ebe's good idea where we can more reliably deduping if we can figure out what those are.
- Ebe: Even if it's bad merge, worth doing deduping. Agree with Deborah. Make it less Error-prone in Phase I, especially because it is a test database.
- Avoid premature deduplication may be preferable to ensure accuracy. Continue discussion next week or maybe a poll.
On MARC Metadata Storage
- MARC records are being stored as literals in RDF (note on manifestation), but it's difficult to read.
- Options discussed include storing raw MARC record, converting it to more readable format as turtle has linebreaks, or linking to an external host.
- Note on manifestation is what we have discussed before and did.
- Cypress: For review, looking at field-by-field is more helping and they are still in the comments for each field-by-field templates.

Wrap-up (5)

Action items

Crystal will figure out if we can store datasets in institutional repository at UW. (Adam: Ask maybe Denise or Preservation?)
Discuss the timing of reconciliation and de-duplication next week.

Backburner

February 5, 2025

See time zone conversion Meeting norms Present: Deborah, Cypress, Ebe, Doreen, Gordon, Laura, Sara, Crystal, Junghae, Sita, Jian, Tynan, Trina Absent: Sofia, Adam Time: Ebe Notes: Sara

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (10)

Crystal is sending dataset numbers to OKG/NLG today
- Emory and DNB will not share records currently; unclear whether rights restricted
- Have not heard back from Harvard yet
UW Libraries decided not to fill Cypress's position: if any other institutions can hire an XSLT coder for Phase II that would be helpful
- Written feedback can be emailed to Crystal
- In the meantime, Doreen, Tynan, Sara will pick up where Cypress is leaving off

Mapping Check-in (15)

To-do vs. In Progress vs. Done
- Issue Board
- Ebe has some to look at and will start today; will let the group know if help is needed
- Linking data will all be mapped as notes in Phase I. Will be revisited in Phase II
- Laura has 3-4 issues that need some discussion
- ALL: Use labels when discussion is needed: "asynchronous discussion needed" or "meeting discussion is needed"
- Adding "asynchronous discussion needed" to 720 regarding $7
- Watch for issues with status:"Almost done - waiting for decision/answers to questions" - Cypress moves issue here if questions while coding
- Try to get what can to "Ready for Transform" this week for Cypress
Timelines
- Mapping: February 28, 2025
- Mapping review: March 31, 2025
- Transform code: April 30, 2025
- Output review: May 30, 2025

IRIs for entities

Identifying manifestations reliably
Documents reviewed:
Initial thinking was that transform would be one run; has evolved throughout project to run iteratively
Want to reduce duplicates on re-runs, while also acknowledging that full deduplication is out of scope for Phase I and will be tackled more comprehensively in Phase II
Deborah created Access point mapping table that works really well. Manifestations are complicated.
Discussed and reviewed proposal to use 016, 035, 010, then AAP approach as a last resort towards unique IRIs
Examples: from m2r iris and identifiers documentation; will add source string in the IRI to reduce instance of using the same identifier from different sources
- http://marc2rda.edu/fake/transform/man#00037837
- http://marc2rda.edu/fake/transform/man#ocolc1544994
- http://marc2rda.edu/fake/transform/man#speakingofjaneausten1980universitymicrofilms
Deborah proposed adding normalized AAP string to lessen number of hits deduping
Gordon agrees this is the best solution - suggestion to add control numbers more likely makes it unique
Switch thinking a bit and decide which components AAP-bit should be, then translate from ISBDM to MARC codes. Will get alarmingly large IRIs, but they will be more likely unique
New Manifestation IRI Proposal:
- AAP + Control Number approach:
- normalised({has title proper})|[supplied title] + " (" + {has date of creation of manifestation}|{has date of copyright of manifestation} + "; " + {has creator agent of manifestation} + "; " + {has category of carrier} + ")" + (“+ {BNB#####}|{OCLC#####}|{LCCN####})
- Find carrier type in the mappings*
For items, want a unique IRI every time transform runs
With XSLT the generated ID is unique during the run, but on reruns will get the number being used again - so danger of getting incorrect merges
Currently use manifestation ap when minting IRI to help prevent duplicates on re-runs
- {BASE}{RECORD}ite#{manifestation ap}{generated_id}
- e.g. http://marc2rda.edu/fake/transform/ite#00514962d22e1607
- http://marc2rda.edu/fake/transform/ite#cassandra%27ssister2006walkerd22e163598
Cypress proposes using date instead of manifestation ap to better ensure unique
Cypress will implement, and team can review output

If time, look at Jane Austen data more fully

Reviewed jane-austen_NA.ttl
Can tell deduped to works when see multiple 008s and 245s - indicates there were multiple records
Looked at line 72961, marc2rda.edu/fake/transform/exp#aikenjoan1924-2004eliza%27sdaughterenglish, and how to trace what's there through the files
Cypress will create a discussion for this review
Cypress will update the lexicalalias files today
There is no limit on the length of the title in 245

Wrap-up (5)

Action items

A survey asking about the important decisions from today?
Crystal will send dataset numbers to OKG/NLG today
Cypress will implement using date instead of manifestation ap in item IRI
Cypress will create a discussion for this review of the Jane Austen data
Cypress will update the Jane Austen lexicalalias files today

Backburner

January 29, 2025

See time zone conversion Meeting norms Present: Absent: Crystal, Gordon, Adam Time: Notes:

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (10)

Cypress' last day is February 13th.
- Priority is getting documentation out so that others can pick up
- See google drive folder below:
Transform documentation is here (and in progress). This includes how certain aspects within the transform work, as well as broad overviews, a transformation intro for onboarding, and instructions on running the transform.
- Also available as a Read.me in the folder for the transform
BIBFRAME Update Forum - might be interested in the "Modern MARC" section which LC's back- converted BIBFRAME will follow. https://www.loc.gov/bibframe/news/bibframe-update-jan2025.html

IRIs (25)

IRI transformation documentation
Discussion - Minting IRIs
Discussion - Designing our IRIs Deborah and Cypress met to discuss what the transform is currently doing, we need to decide what we want it to do
- Minting IRI’s versus using external IRI’s
MAIN WEMI
- At present the IRI is constructed as the base IRI + control number for 001 + type of entity
- When we are sharing records from a variety of sources, we want to prevent having the same IRI applied to a different entity from a different source
- This wouldn’t happen internally because our control numbers are unique to our system
- We should be applying the same instructions from related entities to the main entities
Related RDA entities
- e.g. Creator of work, or a work that another work is based on
- Unreliable to map these related works and expressions (agents are fine)
- We only map every work-added entry as a work
- We don’t have many approved IRI sources
- In NACO authority file, for example, we have only approved for corporate bodies, families, and places
- The sources need to be improved so that they can be approved
- When using $1 with an approved source: the only thing we would add to an external IRI is a relationship to an access point
  - We add a triple with “has access point” or “authorized access point” along with the string or the nomen
  - This is important for related entities because we can’t trust what’s in the MARC data
  - For the main entry: the attribute information we have may not be available in the related IRI description set
- Brief detour to Jane Austen and RIMMF:
  - We have duplicates – these should all be a single entity for that work
  - The duplicates are coming from related work added entries
  - We are giving bib control number, no meaning in a display
  - We have many records with Jane Austen as the title when you bring the records into RMMF and try to show them in an index form
- $2 is similar to $1, but we have a source for the literal in the MARC record
  - The source is approved – same list
  - We don’t have an external IRI
  - Pattern for minting our own is in the document
  - Authorized access points have to mapped so that they can be used as concatenated, normalized string
  - Purpose is to do some automatic deduplication: if the entire RDA triple is the same, it automatically de-dupes
- Worse case/most common: neither $1 or $2 are present:
  - We mint a non-meaningful IRI that appends a running count at the end
  - We are ending up with many duplicates
- Sofia: If two records describe the same work, but with different information, in the future it might be hard to map them
  - Deborah: We’ll either be mapping the two description as separate work entities
  - Or we’ll have found a way to make them match using the local part of the IRI (instead of using the 001 and entity label); we may instead use the authorized access point for example
  - Taking the mapped work from two different records will have the same IRI from subject – if a triple-source is absolutely identical, only one is kept automatically
- Laura: if the source library has been doing authority maintenance, then the access points will be the same

WEM Access Points (25)

Jane Austen records in RIMMF
Access points table
- Are we creating access points for the main work and expression? Gordon said that the identifiers are sufficient
- Last week, however, we thought it might be important to have access points displayed for human readable purposes
- Agents have been done (100, 600, 700)
- When mapping over person-entities from authority file, what did you do at NLG about presence of fictitious characters?
  - If fictitious characters used as pseudonym, treat as nomen (e.g. related nomen or work)
  - If it is a subject: treat as skos:concept
  - We need some list of texts
  - If we only provide as an access point for the person, corporate body, or family, then we aren’t doing something against RDA, can be processed with human manual intervention
  - For 100’s and 700’s, then we understand it is a nomen used by a person
  - We can understand it as a related nomen – at NLG they used related nomen as the element instead of creating an access point to the person you’ve created the entity for
  - This may need to be a phase II problem
- Single works: names + titles if they are in a 600 or 700
  - If a person is a subject in a 600, it is still the same person
  - Source is subject heading, but person is same entity – a person
- Cypress: should we map from the 130
  - The table is written in order of priority – if the 130 is there, use it, if not, move on and use 100 + 240 etc. up until using the 245
We get the access point being described by the record from the fields and subfields in the left-hand column
We are putting together an online poll to hear from those who couldn’t make it today
General consensus from the group present today is that this is ok
- Must retain the order of the sufields given; we strip all of the punctuation for this purpose and decide what to use between subfields later
- LC is stripping out ISBD punctuation
- For expressions (single expression in this manifestation) the access point for the expression will by the work plus the RDA element for the expression
  - If we only have 245s to rely on, they will never contain expression elements from the heading
  - We have to find them from the body of the record
  - We can find this in the spreadsheet mapping
- Manifestation:
  - No access point in ACR thinking
  - We do what ISBDM is using in the same order and not worry about punctuation at this point
- We need to make a decision on this before Cypress leaves!
- If we are in agreement, Cypress can work on it and then add it into the code when we get final approval

Nomens for Entities with Sources (15)

"A nomen must be an appellation of one and only one RDA entity", when we are saying that one Entity exists (i.e. http://marc2rda.edu/fake/lcsh/place#england) should we not also be able to say that there is only one nomen for a place from lcsh with the nomen string "england"?
From RDA: a nomen is an appellation of 1 and only 1 RDA entity
- But “England” has 100s of unique nomen entities
- Nomens have IRIs, but no IRI as identifier
- We have a place and many authorized access points for place, “England” from lcsh
- Instead of this list, we would have one de-duplicated one that has an authorized source
e.g. a place nomen for an approved place entity
i.e. use the nomen string as the local part of the created IRI along with the source
Only sources where we know the authorized access point is unique and has an identifier that is the same, then we can use it as a unique identifier
This applies to any unique nomen string
For our approved sources, can we say the access point will be unique?
Looking at the LC NACO, we have approved LC’s authority file for place names, but not for persons
Place names go through the subject path, not the name path
- i.e. goes through SACO, not NACO
Even if in bibliographic record we see a jurisdictional place, it has a different indicator, so we can create a corporate body
Comes down to the principle of having only created one entity
In principle, each of the authorized sources has a uniqueness in the strings that are used
Conclusion: we are okay implementing this, but if we run into issues, it can be undone because we can edit the one function in which the process is implemented
- But we should also bring it up with Gordon

Wrap-up (5)

Action items

A survey asking about the important decisions from today?

Backburner

January 22, 2025

See time zone conversion Meeting norms Present: Jian, Sofia, Adam, Crystal, Cypress, Deborah, Sita, Tynan, Sara, Ebe, Junghae, Doreen, Laura Absent: Gordon Time: Tynan Notes: Sara

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (10)

Check-in on deadlines: Reminders to do mappings and reviews ASAP so transformation team can get work done
- Mapping: February 28, 2025
- Mapping review: March 31, 2025
IFLA coming up soon
- Crystal and Sofia presenting on project in March! Continued progress helps with what can put together to present.
Ying-Hsiang handing off Wikidata code to Cypress and Tynan on Friday
Laura shared Harvard is publishing Alama data CC0 via Dataverse
- Fairly recent - February 2022
- https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/I8L0ZZ
- Crystal will reach out to Christine Eslao and ask about reuse, re-publication, mixing & matching, whether can follow suit (e.g., if they're doing this from OCLC, can we?)

Dataset Numbers: Crystal needs to send to OKG/NLG this week (10)

LC: 500K random records (Crystal emailed Theo for tips on how to download; once hear back from Theo will reach out to OCLC)
UW: 545k records (original UW-authored records; same provided to LD4)
NLG: 700k records
NLNZ: 600k
Emory: TBD (Laura has reached out and is waiting to hear back; Crystal meeting on Homosaurus topic and can ask then)
DNB: TBD (Sita and Crystal have been emailing; Sita will update when knows more)
How will be used:
- Making a proposal to the Ministry of Education in Greece to try to secure a grant to expand Wikibase to meet our space needs
- Proposing ideal amount of space needed - a conservative estimate of all space will eventually need
- Estimate will be based on a rough estimate of entities per MARC record (rather than triples) - this is the pressure point and balloons fast
- To explore more, read the meeting notes available in the Side-Meeting and Conference-Report Back Notes folder

Transforming Augmentation Aggregate Records (25)

Discussion
Deborah's Document
- Deborah added most updates in the Recommendations section, added examples at the end of the Appendices, and added logic for identifying Augmentation aggregate manifestations under AggPulls.
- Outstanding questions section is for future consideration and discussion by wider community
- Also added additional material in the UW M2R Transforming Augmentation Aggregate Records file linked under Diagrams.
- Initial thinking on SES (string encoding scheme):
  - CToRE = Content type of representative expression
  - LoEoRE = Language of expression of representative expression
The SES for an augmented single work should be the same as for a stand-alone single work, taken from (in order of preference):
- 130
- 1XX + 240
- 1XX + 245
- 245 + 1st 7XX (name portion only)
What should the SES be for an aggregating work plan?
- 130 + Aggregating work + 1st 7XX + CToRE + LoEoRE
- 1XX + Aggregating work + 240 (if 1XX is aggregator) + 1st 7XX + CToRE + LoEoRE
- 1XX + Aggregating work + 245 (if 1XX is aggregator) + 1st 7XX + CToRE + LoEoRE
- 245 + Aggregating work + 1XX + 1st 7XX + CToRE + LoEoRE
- 245 + Aggregating work + 1st 7XX (name only) + 1st 7XX + CToRE + LoEoRE
- Need to make a decision again on whether or not are making access points to make it clear - add this to next week's schedule and then also make time to implement it
  - Cypress noted from 2024 meeting notes that the discussion was we already have identifiers, so don't need access points
  - Crystal noted identifiers count as an appellation
  - Crystal's opinion, in advance of being out next week, is that we should have access points, though doesn't have an opinion on the SES
- Deborah's preference is that creator is always linked to aggregate, and then also creator with aggregated work if described
- Laura notes: "For the “augmented work” - bear in mind, the Work data may describe many expressions and adding this content type (Primary augmented work, or augmented work) to that Work entity is therefore questionable. It might be just an aggregated work in another manifestation, and standalone in another one."
  - Added to discussion to continue discussing
- Ebe notes: "Personally I would like a string encoding scheme where we put the title first and use the creator as the qualifier." e.g. Animalia (Graeme Base)
  - Noted that RDA doesn't require us to do this, historical practice rather has
  - Deborah sees what saying, but still prefers to keep the entire string
  - Laura thinks access points are useful, but qualifiers make them more useful
- Cypress will add in property numbers alongside "Label (Toolkit)" to make it easier for the transform
- Category: only add Aggregating work (not Augmented, since won't always be true, therefore not safe)
  - Laura worries might be confusing to user as part of access point
  - Adam notes since we're discussing access points, and not authorized access points, they can be undifferentiated
  - Crystal notes we need a SES if we want to include access points
- IFLA did manifestation and work access points
  - Manifestation SES
  - Work SES
  - Crystal thinks should use this for SES
  - Deborah notes they don't have one for Aggregating; Crystal wonders whether unique access points are needed; RDA does have instructions for qualifying access points
  - Definitely can qualify - but question of whether to do it in the same order. Need a survey
Is it possible to transform the way Deborah suggests in document?
- Transform perspective is just concern on time needed to implement. Cypress would like to start by February to make sure it's working properly
Any substantive objections?

Attribute mapping questions

Row 2: person is 0 or 1; 2 included in code just in case it occurs, which should be unlikely; if not 3, then know it's a person
Row 5: series of different mappings for dates; if have a mapping for some of them, should they also map as related timespan of person? e.g., use date of birth and timespan as same?
- Not minting timespans; this isn't a note, just a value
- Jin shi (進士) and ju ren (舉人) dates need to be added somewhere; closest it maps to is period of activity (see: CJK NACO Best Practices)
  - Adam shared a jin shi example: 100 1 Bao, Rong, ǂd jin shi 809
  - Jian shared a ju ren example: Chen, Denglong, $d ju ren 1774
- What if there's a date with no hyphen, how to handle? Deborah suggests related time period; may need to keep for dates with errors and no qualifier
- Sofia asked does date of birth accept values like 'circa 1500'? Still needs to be taken into account. What to map to? They should still include hyphens
  - Adam shared examples, noting a hyphen in front means it's a death date
    - Aaron, ǂc of Zhitomir, ǂd -approximately 1817
    - Aaron, W. F., ǂd active approximately 1860
    - Abate, Nicolò dell', ǂd approximately 1509-1571
    - ʻAbbās ibn ʻAbd al-Muṭṭalib, ǂd approximately 566-approximately 653
    - ʻAbd al-ʻAzīz Muḥammad, ǂd 1866 or 1867-approximately 1948
- ALL: post examples in the issue

WEM Access points - Are we doing them? (30) - Decided to move this discussion to next week

Currently meeting RDA minimum description requirements, with W and E having identifiers generated from 001, and M having a title.
130 and 240?
Access Point Mapping Table

Wrap-up (5)

Action items

Crystal will reach out to Christine Eslao at Harvard regarding their published Bibliographic Metadata
Cypress will add in property numbers alongside "Label (Toolkit)" in Deborah's Document
Crystal will create a survey on handling/qualifying access points and add Cypress as an editor to see results
All to post examples in Attributes table issue

Backburner

WEM Access points: Next week
RIMMF Demo: Next week?

January 15, 2025

See time zone conversion Meeting norms Present: Crystal, Adam, Deborah, Laura, Ebe, Gordon, Jian, Junghae, Sara, Sita, Doreen Absent: Cypress Time: Sara Notes: Doreen

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (10)

Release 5.2.0 of the RDA Registry downloads was published yesterday (14 Jan 2025). The release notes say 'This release supports the February 2025 issue of RDA Toolkit. This release contains several new object elements with a range of skos:Concept.' The object elements were added following a suggestion from this project to the RSC Technical Working Group, and should already be in use within the transform. We should check the object elements used by the transform against this release.
Crystal: Will be gone on 1/29. Cypress will facilitate the meeting.
Crystal will get back on uploading meeting recordings.

Transformation Dataset (15)

Our initial transformation is happening soon
NLG and OKG need information about how many records, from which institutions, in order to make their proposal to the Ministry of Education for the Wikibase expansion we asked them to do
See: RDA Wikibase Collaboration
Which records from LC will we include?
- Obtaining the "entire" catalog (or close enough): 2016 selected datasets, plus downloading post-2016 records 10k at a time from catalog.loc.gov and deduplicating those that are just updated post-2016.
- Obtaining a certain number of records from catalog.loc.gov, 10k at a time, and downsizing our initial goal
  - Yes to this option. 500k random records.
- OCLC export? Crystal could ask them how they would feel about participation. Don't know about rights.
- NLG: 700k
- UW: count number of UW-authored records in Alma (Junghae will check and share indication rule with Laura) Crystal would like to know how many exist and want all of them. Answer: 544,316 bib records in Alma for which the University of Washington (UW) is the original cataloging agency
- NLNZ: 599923
- DNB: Sita will ask about willingness to participate
- Emory: Laura will check
- Would like to receive answers to these questions by next week so Crystal can get back to our partners.
Additional Discourse: Adam: Can we make a list of records we want in the sample pool? If LC doesn't have them, we can add to it.
- Crystal: Random sample is the safest way. We can establish criteria for what to include. This sample set is for Phase 1 and 2 (including aggregates and non-aggregates).
- Can do this again at the end of Phase II but this is good now.

Linking fields (30)

Linking fields discussion
Deborah's table for these fields: Linking Entry Fields.20250115
Examples: Linking Entry Fields Examples
Gordon proposed: trying to do related entity --> Deborah whether we can do that?
WEM Entity column shows which WEM entity is this linking entry field suppose to carry information?
- Even for ones that should be clear, examples are mixture of description of work and manifestation (Because there are no restrictions)
- I.e. 770 Could be expression? Could be work?
- 765 – Should be expression but examples given are expression or work --> can’t tell whether linking entry is for expression, a single-part, or multi-part or aggregating part.
Deborah’s research shows that similar to added entry fields where we came up with a default (related work of manifestation), the best Deborah can come up with is Manifestation related manifestation of manifestation.
- We could trust folks and say it must be a series. Everything that’s not a series is an error
- Adam: Series can include multi-part monograph --> Deborah: would you put it in linking entry fields? --> Adam: If it can be done, someone has done it. Deborah: Similar to 830s we cannot tell, this we cannot tell.
What is the purpose of linking entry fields??? --> Then what do they meant in RDA???
- Adam: Meant to link you from one bibliographic record to another bibliographic record --> Literally meant to provide links but never really used that way. No $w because there isn’t actually a bib record for the related work.
Adam: Multi-prolonged approach; if there’s more completed data, do one thing but for no information ones do a note. --> Takes fair bit of coding --> Crystal: Make more sense to do these as notes for Phase I and say in Phase II do something more granular
Majority votes map as notes
Gordon: anything that is a note on manifestation MUST apply to all exemplars of the manifestation.

Transforming Augmentation Aggregate Records (20)

Document is in Aggregates Main Folder > CW_DW_AM_Markers.20241113 folder: Transforming Augmentations.20250108.docx
Did not have time to address. Bring questions for Deborah after reviewing Deborah's document next week and Cypress will be here for the full discussion.

Wrap-up (5)

Action items

Crystal will upload meeting recordings to Drive; apologies for lagging behind on this! (This is done!)
Crystal will create a discussion on transformation augmentations (This is done!)
Review Deborah's Transforming Augmentations document and bring questions to discuss.

Backburner

WEM Access points, RIMMF Demo

January 8, 2025

See time zone conversion Meeting norms Present: Absent: Notes: Tynan

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (10)

Ying-Hsiang cycling off project, arranging handoffs soon. Thank you for your incredible contributions, Ying-Hsiang!
Doreen is primarily working Fridays and in the mornings during the rest of the week now, and 15 hours per week rather than 19.5 this quarter
We now have Cypress full time (not all on M2R, but more than before)
Crystal will miss the last meeting in January
We had a long follow up discussion regarding 773 (I'm sorry, I lost the notes in a conflicting edit, will go back to the recording to augment), but we decided to discuss next week, so we can dive in further then
Crystal heard back from Theo at LOC, catalog is not free unless you use an outdated version, not a lot of RDA in it; you can get 10,000 records at a time through the catalog; if we went to the catalog and did 10,000 records at a time we can get as much as we want, although they block bots from doing this; the system will slow you down if you try to automate it; this has to be done manually or by a slow program; they also have a way to purchase the catalog, but it's very expensive (e.g. $25,000!); Theo recommends downloading 10,000 at a time and compile a dataset of 100K should be enough
- Deborah: download bulk from 2019 and use the 10k at a time approach for the rest; we would need to de-duplicate the records

Project Plan Review and Update

Project Overview

Problem statement: adding a need to mention differing and non-interoperable ontologies
Goals:
- Deborah: one of the things in the impact should be a description of the entities and their relationships -- this is the main new thing in RDA
- Sofia: move from record-based cataloging to entity-based cataloging
Impact discussion
- How much is a large pool? The available bulk download is from 2019; we can download records 10k at at time
- Laura: we can talk to Jeff at OCLC; where would we host the records -- National Library of Greece Wiki?
  - Would give us a better picture to give people than just using LC's record; could also discover things about the transformation
  - Decision: add this to a discussion for next week
  - Sofia: wikibase database has size limits, asking how to make the storage bigger
- Are we reducing dependency on vendor systems?
  - Laura: in order to demonstrate this reduced dependency, we have to use it in a system that is not a vendor system and provide library services off of it
  - Rephrase to reinforce commitment to open-scholarship
  - Laura: main impact is to demonstrate that RDA can be implemented using RDF directly; there is a path for adopting it for libraries that have a large legacy store of MARC data
  - Ebe: if someone doesn't want to use RDF, but wants to use something else -- should we be specific about the type of encoding?
  - Decision: we don't want to promise that we can help people encode another way
Phase I
- Java extension is not in phase I anymore
  - Instead for phase I we have moved on to having pre-approved iri sources
- Ying-Hsiang, send documentation to Cypress and Tynan for scripts to feed Bibliographic into Wikibase Cloud
Post-Phase I close-out
- We may not need to justify phase II, UW libraries approved
- We can think about grant applications to support phase II,
- We may also consider submitting to additional conferences
- A composition that describes in a granular way what we did for Phase I, why we did it, what the results were; goal to get this published somewhere
  - Deborah's project plan is a good outline for this
  - We may want to have an open-source version of this to make information more accessible
Phase II
- Collection records
  - What will we do with collections? We are pulling them out of phase I; what does RDA need for collections?
- Item-level mappings -- not part of phase I, will be part of phase II
- CSR
  - You can have diachronic works that fall into a BSR (multipart monos/series)
  - Removing machine-readable mapping -- we don't have the capacity for that right now
- BSR
- Guidelines for pre and post processing -- part of our documentation in phase II, we have python scripts to serialize
Timeline
- Close-out is June-August of 2025
- Start Phase II in August
- How much time do we need for review and re-coding? We need to extend the deadline for ending phase I to April 30th
  - Mapping done by Feb 28, 2025
  - Mapping review by Mar 31, 2025
  - Transform code by April 30, 2025
  - Output review by May 30, 2025
- This means starting phase II in September
Deliverables