2025 Meeting Minutes - uwlib-cams/MARC2RDA GitHub Wiki
August 6, 2025
See time zone conversion Meeting norms Present: Absent: Time: Notes:
Water Cooler/Agenda Review/Roles for Meeting (5)
Updates (10)
- 007 is ready for transform
- Crystal, Laura and Ebe met yesterday about Phase I wrap-up work
Attribute fields spreadsheets (20)
Transformation Output Review (35)
- Aquaculture dataset in RIMMF?
Work Distribution (30)
- Deliverables Work Plan
- Need to have reasonable due dates for everything and divvy up responsibilities
- More eyes on deliverables outline = better
Wrap-up (5)
Action items
Backburner
July 30, 2025
See time zone conversion Meeting norms Present: Absent: Time:Ebe Notes:Tynan
Water Cooler/Agenda Review/Roles for Meeting (5)
Updates (5)
- Transform:
- 11 RFT and 8 Waiting for Decision
- Of those, only 3 not currently being worked on by coders
- Also in progress is WEM access points
- 6 of the 8 Waiting for Decision are coded
- Biggest tasks for transform team after this will be code cleanup and adjusting as we review
- Post Phase I wrap up
- We need a structured work plan with benchmarks to meet our goal of September release
- Crystal working on this in the coming week. Does anyone want to meet about it? Probably at least an hour.
- Crystal will email Ebe and Laura to work on this
007 (10)
- Spreadsheets
- Need someone to go through and add transformation notes
- Some might need IRI value
- Problem is with “has note on manifestation” and “has category of manifestation”
- 007 is important for 3 main access point qualifiers
- But lots in there has never been used
- Deborah: only position 1 and 2 need to go into the transformation
- Crystal will work on this today
856 (10)
- Spreadsheet
- Transformation wants to mint second manifestation
- Is this conditional on second indicator?
- If value is 0, then identifier for resource
- Issue: we have marked delete all rows where the indicator 2 value is 0
- So we will change row 667 to “reviewed” so that we can have that case
- “RDA for Uniform Resource Locator - Definition and Scope: An address of an online resource. A Uniform Resource Locator includes all manifestation identifiers intended to provide online access to a manifestation using a standard Internet browser.”
- Decisions:
- Row 688 transformation note: "[$3 value] at: [$u value]"
- Row 678 transformation note: "Related resource at: [value of $u]"
- Row 689 transformation note: "Version of resource at: [value of $u]"
- Row 690 transformation note: "Component part(s) of resource at: [value of $u]"
- Row 691 transformation note: "Version of component part(s) of resource at: [value of $u]"
- For phase I, the best we can is map the second indicator 0 when there is no $3 as the URI provided by the marc record; if there is a $3, then we say location for $3 value and have it be a note
- If there are multiple indicator 0’s, map them both
- So in our mapping, there are URI’s associated with the manifestation
- The problem is that $3 can hold so many different types of things
Finding carrier types (15)
- See Finding and mapping Carrier Type values
- Related to 336, 337, 338 (content media carrier)
- Very few of these fields in many systems, such as LC
- Where do we look to get content and carrier?
- Cypress: “I asked about the ISO639-3 in the LC session yesterday, and they will make a bulk download available at some point that we could use. At this point I think it is best to continue with creating ISO URIs by combining the expected base IRI with the code available in MARC, without checking that it is valid in LC at this point.”
- 3 different versions: carrier type, content type, media type
- Carrier type might have enough to use
- Content type is likely finishable
- We can’t address the access points or minting of the IRIs without this information
Timespans (15)
- See Issue Dates #382
- Primarily for ongoing works – we have issue with open-ended dates
- If date is open-ended, it is diachronic
- Worried that open-ended date cannot be a timespan, cannot be a timespan IRI
- Gordon: the unstructured description in any element is just like a note
- We are not sending it to a timespan, we are making a note on timespan
- It is valid to put unstructured description in as “date of manifestation” when we have an open-ended date, i.e. manifestation of diachronic work
- We can record broken timespans as unstructured descriptions, “date of manifestation”
- Should we use unstructured descriptions as date qualifiers in an AP for a manifestation?
- “-” versus “/”: if we go with “/”, then users need to learn a new way of understanding e.g. 1975/ means open-ended beginning with 1975
- Need to be careful: is it a publication date or a chronological designation
- e.g. “the 1975 volume was published in 1976”
- People should weigh in on the issue
- Additionally, we will do a poll asynchronously
- Crystal will send out a poll via email, discussion will be asynchronous today and the poll will go and be decided by next week (8/6/25)
- Laura: catalogers are encouraged to put a date, but if they put “n.d.”, then we’d have nothing to add to the access point
- All of these are cataloger guesses
- The dates are not reliable enough to use as access points
July 23, 2025
See time zone conversion Meeting norms Present: Crystal, Deborah, Ebe, Sarah, Jian, Laura, Adam Absent: Cypress, Tynan Time: Ebe Notes: Sarah
Water Cooler/Agenda Review/Roles for Meeting (5)
Updates (5)
- Laura and Crystal will be presenting at the LD4 conference on Monday, July 28th.
String Encoding Scheme for Access Points (30)
- See issue
- Access points as nomens?
Manifestation AP:
- 008 vs. 26X ($c) for date of publication.
- Will it be possible to take a look at some records to see what would be cleaner? (Laura may be able to do this after LD4)
- 26X will have less complicated coding, but we may already have code for 008. Deborah will look at this code to see if it works here.
- If we've already completed code for 008, we'll use that for date of publication.
- Using edition statement in every access point, when present, to ensure access points are unique.
- This decision will mean that errors get included in access points, but it is a way to ensure de-duplication.
- We will discuss de-duplication in APs more during Phase II. Work AP:
- Discuss parentheses as separators for aggregating works. Expression AP:
- Language codes 008 or 041?
- 008 is simpler; it would require additional coding to pull language codes from 041.
- For now, we will stick with the simpler solution and see how it works.
- If this causes issues, we will revisit in Phase II.
- Designation of version:
- 250$a vs. 245$s
- We will use every 250$a and not 245$s for now, since it is uncommon and often misused. Aggregating Work AP:
- 250$a vs. 245$s
- Base: Title of Work - ISBD says that when there are additional titles, the titles are strung together. Would this also apply when there are parallel titles?
- We will have to discuss this further later.
- Associated agent follows base.
- Creator/Aggregator is in 1xx, Other in 7xx. Deborah will flesh this distinction out further.
- Both Other and Creator/Aggregator will follow base.
- 008 vs. 26X ($c) for date of publication.
- Next step for SES for APs is to discuss coding.
Timespans (20)
- Did not get to timespans today, will discuss next week.
- See Gordon's answer to our question from last week
- Are we set to:
- Treat "broken" timespans as years of identifiable non-broken timespans within (example: circa 1950 and ~1950 and 1950- all become "1950")
- Add notes on manifestation as broken timespans occur to say "date of [whatever element]: 'input broken timespan as it appears in MARC.'"?
- Who can implement this in code? Which fields need to be adjusted?
Output review (30)
- We did not get to Output Review this week, will review next week.
- RIMMF view
- continue with Tuataras
Wrap-up (5)
Action Items
- Work on Phase I deliverables
Backburner
Agenda Items for Transform Meeting
- Briefly discuss timespans
- Discuss SES for APs; how decisions made today will translate into code.
July 16, 2025
See time zone conversion Meeting norms Present: Crystal, Sarah, Ebe, Tynan, Laura, Abhignya, Junghae, Adam Absent: Cypress, Deborah Time: Notes: Sarah
Water Cooler/Agenda Review/Roles for Meeting (5)
Updates (10)
- Cypress is now formally co-leading the project with Crystal and will head the transformation. She has a few more hours each week to dedicate
to the project. - Laura and Crystal will be presenting at LD4 next week.
Access Point String Encoding Scheme
- Punctuation?
- Cypress and Deborah are not here this week: we will discuss this issue again next week.
- Laura left a comment on issue #208 about output ISBD punctuation when there is no existing punctuation. This is a note, not an access point.
- NLNZ has a document in the RDA Toolkit regarding punctuation in access points.
- If you want to view this document, you will have to subscribe to it under documents in RDA Toolkit.
- We could look at creating our own documentation on RDA Toolkit for this project so that users can easily
subscribe to and access it.
- See original toolkit appendix E
Output review
- Tuatara output file
- Punctuation for all access points will be weird until we make a decision about it and implement it.
- Ideally, expression access point in output would look like: The reptile database (Uetz, Peter). Text. English
- Remove initial articles from titles in access points?
- Filtering for articles could be complicated. (i.e. a word that is an article in English may have the same spelling but different meaning in another language.)
- Question: should we make access points nomens so that we can say things about them like language, source, etc.? discuss in transformation meeting.
- We could include language of expression in expression access point to avoid duplicates.
- We are working on this. We need to determine a way to look up language codes in 041. See mapping in Drive here
- ISO language code look up is in Phase II.
- We are working on this. We need to determine a way to look up language codes in 041. See mapping in Drive here
- rdamd:30011 and timespan:
- Mint a timespan for "has date of publication".
- Question: how do we handle publications with a start-date, but no end-date?
- RDA toolkit defines timespan as "finite"
- Can we use date + hyphen as an appellation of timespan? Is this considered finite?
- Ask Gordon about appellation of timespan and on-going publications.
- Ebe will look at NLNZ documentation for information.
- Follow-up question: how do we handle timespans for works that are issued in multiple parts, but are not necessarily aggregating works?
- We're not dealing with any successive works in Phase I; these will wait until Phase II.
- Are we not dealing with any works that are multi-part or which have two dates for any reason in Phase I? Ask Deborah and discuss next week. (Are all multi-part works aggregates, or not?)
- These questions will determine what we include in and exclude from Phase I.
- Punctuation for all access points will be weird until we make a decision about it and implement it.
Wrap-up
Action Items
- Discuss access points as nomens in transformation meeting.
- Discuss access point SES at transformation meeting.
- Asked Gordon about timespan and Ebe will look at NLNZ documentation.
Backburner
July 9, 2025
See time zone conversion Meeting norms Present:Crystal, Cypress, Adam, Ebe, Sarah, Laura, Deborah, Sita, Jian Absent: Abhignya, Tynan, Sofia Time: Ebe Notes: Sarah
Water Cooler/Agenda Review/Roles for Meeting (5)
Updates (10)
- Check-in: AP SES? Are we all set?
- Still need to make decisions about punctuation.
- Transformation progress: 3 tags left aside from those needing tables to start coding. Everything else in progress. This is great!
041 transformation (15)
-
Which coder wants to take this on?
- We will discuss this further at the transformation subunit meeting tomorrow.
-
Can a mapper help by constructing the lookup table?
- We may not want to do lookups in this case due to the size of the lookup file and instead construct IRIs from $2, trusting that it is correct.
- If we're going to do lookups: multiple lookups might be a problem for users, so we may want to create tables. It’s better to cache the file and query it locally, rather than doing a live lookup.
-
Lookup table for MARC & ISO codes
- LOC has URIs for ISO codes which match $2
- We do have lookup for MARC, but at the moment it is querying LOC-this needs to be changed.
- ISO lists are faceted and not all terms are in one XML file, so we might have to ask for that. This is something we can table for phase II.
$2 Discussion (20)
- Combined lookup table for 336, 337, 338 $a and $b?
- We don't want to combine the lookup tables. If we do this then we will have to update the table each time RDA updates one.
- We should keep them separate so we can automatically update each file as RDA updates.
- Decision: look up either term or code in all three files.
- Why are we mapping the codes by the notes?
- When we find the URI, we are mapping it in the $3 note.
- In the case that we don't find a URI, do we want to have the $3 note with the code? Or, nothing at all?
- Decision: we don't want a note with code in it, so do not have a note if URI is not found.
- Will discuss coding for this in transformation subunit meeting tomorrow.
Output review (30)
- Tuatara output file
- HTML character coding: incorrect character encoding scheme?
- Similar to HTML, quotations, apostrophes, ampersands, less than, and greater than symbols are protected characters in XML. This is well-formed XML and should validate.
- However, they may not be getting converted back to the symbols (i.e. ">") in RIMMF, which is causing issues.
- Converting the XML file to N-Triples before putting into RIMMF should correct this.
- Will discuss further in transformation meeting tomorrow.
- Expression for work/access point:
- no space between author name and title
- if a record does not have 100 field, it goes to first 700 field: assuming that this is most significant and not a corporate body.
- Potential issue: if we are only picking first 700, we will likely end up with none-unique access points. This is part of what we will have to look at in Phase II.
- We will do further de-duplication in Phase II.
- LCC versus LCSH in Tuatara.xml:
- LCC has notation and alternative label, in Tuatara-RDA.xml these have the same value: "QL645"
- This may be because QL645 doesn't have a pref label and is a code.
- This output aligns with the mapping in Gordon's document, but we may want to review the mapping for classification concepts.
- We don't want to strip punctuation from the end of "extent of manifestation" because we need the punctuation to be there in the case of abbreviations.
- rdae: review lines 327 and 328 in XML output file
- Similar to HTML, quotations, apostrophes, ampersands, less than, and greater than symbols are protected characters in XML. This is well-formed XML and should validate.
Wrap-Up (5)
Action Items
- To do at transformation meeting tomorrow:
- discuss 041 coding and lookup tables
- follow up on $2 discussion. Can we do collective lookups for 337 and 338?
- Cypress will look at rdae properties in manifestation (lines 327 and 328 in Tuatara-RDA.xml)
- To do before meeting next week:
- Work on Phase I deliverables.
- Review output files. (we will stick with the same files for the time being.)
Backburner
- Punctuation for access points: discuss next week
July 2, 2025
See time zone conversion Meeting norms Present: Absent: Time: Notes:
Water Cooler/Agenda Review/Roles for Meeting (5)
Updates (10)
- Sofia has a scheduling conflict with her new job. We'll do another round of scheduling polls after Phase I. Please ping Sofia for asynchronous participation for the time being
Fields that need mapper eyes (25)
-
- why end with “: “
- We use the colon if there is a $3 or $4 present – so we need to amend this to make it conditional on there being a subfield 3 present
- “Follow subfield value with “;” if $3 is present
- “Follow final subfield value with “.” unless the value ends with “.”
- We also don’t need to have a “.” at the end of the URI
- Changed in the spreadsheet, commented in the issue that an update was made to the transformation notes
-
- Note 1 (from the issue #172)
- Change the mapping to the Cypress’s suggestion in the notes on this issue (#172)
- Retain $2, rather than finding an IRI for it
- Note 2
- We will use Cypress’s suggestion in the issue
- Note 3
- The group endorses Cypress’s suggestion; Cypress will update what the coder should change
- Note 1 (from the issue #172)
-
- We should not map $m and $n, they are characteristics of the term in the vocabulary
- $b is the code for $a
- We should use $a as the text string if there is no $2
- When $2 is present, we need to mint a skos:concept
- The spreadsheet needs an overhaul to get it ready to code: we need “$a without $2 present” and “$a with $2 present” options
- Crystal will re-map and Cypress will review
Access Point SES for Manifestation (25)
- Title proper
- See discussion
- From last week:
- $s is not part of title proper. What about $f?
- Should $f/$s qualify access points?
- What is the $s in a 245?
- From this week
- Do Titles Proper need to be distinguished? No, but the access points do
- Can we do an ALMA list looking for245 $s in the catalogue?
- Also hard to know what $f was intended for – they are being used in different ways
- Ignore $f for $245, because it is unclear how cataloguers were using the subfield
- Drop $f for access points [except for Collection works-- Phase II]
- $s we should keep for the access point
- Ignore 245 $f and $s for titles proper for non-collection works
Source codes with appended language codes (15)
- Note to UW: Do a cleanup project on these and reiterate policy in Staffweb.
- (Need to get a bit of the setup for this topic from the recording, TC)
- We could make a marker for pre-processing to find and correct them
- Particularly for 33x, they should have a controlled vocabulary
- There will, however, be other controlled vocabularies
- There could be another content-type list outside of RDA that we want to use – makes it tricky
- Instruction: if $2 present and it is not RDA, then mint a concept
- Decision: we should implement Deborah’s proposed solution (link)
- There is going to be a change in bibco standard record that says use RDACO and no RDA content
Test records for next week (5)
- Crystal will send about ten records to students to run through the transform this week
- What would we like to see? More from the Jane Austen file, or fresh records with certain fields present?
- Record set on a topic: tuataras – or other lizards!
Wrap-up (5)
Action Items
Backburner
June 25, 2025
See time zone conversion Meeting norms Present: Crystal, Cypress, Deborah, Tynan, Abhignya, Sarah, Ebe Absent: Sita, Adam, Jian, Laura Time: Ebe Notes: Sarah
Water Cooler/Agenda Review/Roles for Meeting (5)
Updates (10)
- LD4 proposal was accepted; Laura and Crystal will present next month.
- Cypress has added "ready-for-self-assignment" tags to the fields under "Ready for transform" to improve clarity.
130/240 $0s and $1s (Cypress question) (20)
-
Can we use 'approved' $0s and $1s from a 130 or 240 as the main Work or Expression IRI? If not, what do we do with it? Include it as an identifier like we do with 630/730/830 and 6XX/7XX/8XX $t?
-
That is - If a 130 or 240 has only work subfields and the IRI source is approved for works (from our list) - do we use it as the main Work IRI? (only HMML authority file, MusicBrainz, Web NDL)
-
and If a 130 or 240 has work and expression subfields and the IRI source is approved for expressions (from our list) - do we use it as the main expression IRI? (only HMML authority file)
- It is unlikely that we will see this occur in practice, but we should still account for it in the code in case it comes up.
- Existing code uses $0 and $1 as identifiers and not IRIs (i.e. the 630 fields)
- If we use these fields as IRIs we don't know what information the creator has about it and we can't add as much information as we can to our own IRIs.
- IRI squatting (using another person's IRI and adding your information to it) is generally bad practice.
- It would be better to use our own IRIs and use OWL "same-as" to connect IRIs. We may want to have further discussion on how this would look/work in an open-access Triplestore environment.
- For the time being, we will mint our own IRIs, using $0s and $1s as identifiers.
Access point SES for Manifestation (25)
-
Title proper
-
See discussion
-
In the 245 field, should $s and $f be included as part of the title proper, or should they be included in the access point as qualifiers?
-
Why are $s and $f in 245 and not separated in a 250? (There may be a few reasons for this: to facilitate display, avoid using a 250, or the difference between edition and version)
- We decided that $s is not part of the title proper, need to discuss more regarding $f.
- Further discussion needed on whether they should qualify access points. ($s may qualify access point for expression, $f may qualify access point for work, $c is for manifestation.)
- Currently, $f is included in the access point for expression.
- We can discuss this further next week, when more people are present.
Source codes with appended language codes (15)
- $2 Source below is possibly RDA translations
- We have found some examples in the UW file where $2 (source) is, e.g.,
- 336 $6 880-09 $a Wen zi $b txt $2 rdacontent/chi
- 337 $6 880-10 $a Wu mei jie $b n $2 rdamedia/chi
- 338 $6 880-11 $a Cheng ce $b nc $2 rdacarrier/chi
-
Are they from the official RDA Registry for values? E.g. https://www.rdaregistry.info/termList/RDAContentType/
-
Is there a lookup table for these translations?
-
How should we map these? Is it safe to map them to e.g., RDAContentType
- RDA has URIs in non-Latin alphabets (i.e., it has simplified Chinesese, but not Pinyin), so there may be issues with looking up language codes from a 336 where the language is transliterated. The related 880, however, may have the non-latin script and can be looked up.
- Crystal suggested: if 880 or 33X $b has a label found in RDA registry, use URI and don't map $a. if no $b or 880 matching a concept from rDA registry, mint a concept for $a and assign language tag and source from $2.
- These examples are incorrect in their usage of LC RDA content types. LC does not have language codes and should always be in English. However, RDA does have language codes.
- Since this is a recurring mistake, the code should account for it.
- We can account for this by using $2 when "rdacontent/chi" or "rdaco/chi" to look up in RDA registry XML file, otherwise if not found there we can look up in lc registry file.
- Currently, in the code, if there is no match for $2 in 336, it tosses the data out and doesn't output anything. We may want to add a step to this in phase II, where-if there is no match for 336, it uses related 880, instead.
- Put decision into decision index under $2
Wrap-up (5)
Action items
- Cypress will consult Gordon about $0 and $1 in 130/240.
- Deborah will write 336 $2 decision in a discussion and then feed into decision infex. Cypress will proofread.
Backburner
- Discuss SES for access points with more people present next week.
June 18, 2025
See time zone conversion Meeting norms Present: Crystal, Cypress, Deborah, Sita, Abhignya, Sarah, Adam, Ebe, Jian Absent: Tynan, Laura Time: Sita Notes: Cypress, Crystal
Water Cooler/Agenda Review/Roles for Meeting (5)
Updates (10)
- Sara H. and Doreen are graduating, and have rolled off the project for the time being (they're both welcome to return if they have time in their new post-graduation jobs!)
- Crystal and Ebe submitted a proposal to the SWIB conference last week
- No transformation meeting this week due to the Juneteenth holiday
- Progress Report
- Crystal still needs to check and move 110, 130, 240, 711 & 730 to ready for transform
- Jian still working on 007 review
- 35 total ready for transform (21 BSR)
- Still need to compile a list of abbreviations for the code/punctuation omission (something AI could help get started?)
- lots of notes in the issue on abbreviation
- have someone compile these into one and look at it next week - Ebe volunteered
Transformation Discussion (25)
-
How are coders doing with transform? Any questions that would have been asked tomorrow?
-
Tynan is going to continue working on field-by-field coding
-
Cypress working on access points, reviewing 338 as Matthew works on it
-
Abhignya working on 385, almost done
-
Sarah working on field-by-field code - 388, was working on 336 but will leave it for Matthew, will work on 083
-
Deborah suggests not working on 041 for now
-
Not mapping the 003s/001s
-
Cypress will go in tonight and label things as 'ready for self-assignment'
-
Crystal will create a phase 2 column
Initial Articles (20)
- AAP for titles:
- Continue to remove initial articles for AAP and add VAP with them—for matching to current NAF practice
- Change coding to map a title proper as a VAP for work/expression, ignoring the filing indicator
- Add coding to map a title proper as an AAP for work/expression, using the Filing indicator to skip [x] characters
- Decision: Can do this in the code, for Access Points, will drop initial articles for the AAP and retain it for the VAP.
- Switch to retain initial articles for AAP and add VAP without them
- Retain current coding to map a title proper as an AAP for work/expression, ignoring the filing indicator
- Add coding to map a title proper as a VAP for work/expression, using the Filing indicator to skip [x] characters
Output Review (25)
- [Latest Jane Austen file](Working Documents/transformationCode/outputDataForReview/20250520/20250520-janeausten-NA-RDA-lexicalaliases.rdf)?
- Need to talk about an SES for access points, has not been added in the code because we don't have the SES yet
- LCCN has notes indicating scheme of nomen, should be revised to include scheme of nomen.
- 8XX/4XX needs to go into numbering of part at the work level of the main WEMI stack (not the series work).
- [Work identified in WEMI stack] is issue of [Series work (identified by 490/8XX)
- [Work identified in WEMI stack] has numbering of part [$v value (for $v value, prefer 8XX. If not in 8XX, use 490)]
Wrap-up (5)
Action Items
- Cypress will organize the ready for transform project column
- Crystal will move issues to a phase 2 project column
- Crystal will put LCCN changes into spreadsheet and indicate code re-check needed
- Crystal will assemble a dataset for next week's review
Backburner
- SES for access points (discuss at next meeting)
June 11, 2025
See time zone conversion Meeting norms Present: Cypress, Crystal, Sarah C., Sara H., Tynan, Jian, Deborah, Doreen, Adam, Abhignya, Junghae, Ebe, Laura Absent: Gordon, Matthew Time: Ebe Notes: Sara H.
Water Cooler/Agenda Review/Roles for Meeting (5)
Updates (10)
- SWIB proposal due today. Crystal will submit before 5:00 PDT. Ebe will take a look.
Deliverables Work (20)
- Accompanying documentation for Code (Sarah, Abhignya, with question support from Cypress)
- Spreadsheet cleanup (Crystal)
- Sample data accumulation (who can store this much right now?)
- 2.345M total MARC records, split by library
- Jeff Mixter at OCLC confirmed will be getting LC records
- Concern with splitting was that do not want to repeat IRIs
- Uniquely generated IRIs currently use a date stamp. Should probably use a date-time stamp in order to run multiple files through in one day.
- UW might be able to look at ex-Library resources for storing & conversion
- Supplemental vocabularies and properties (Crystal & Sarah)
- GitHub README files for release (Ebe can help)
- Guide to implementation, use, etc. of Github
- Selections from the GitHub wiki, issues, and discussions, such as
- Edited version of the Decisions Index (Crystal would love help with this one)
- Instructions (Crystal)
- Approved URI list (Crystal)
- Wikibase Cloud server with subset of data (Tynan)
- About 1,000 records
- Can wait until after field-by-field coding is done
- RSC feedback (Ebe can help)
- PCC feedback (who can help?)
- Sofia: I would propose to reuse some texts, eg rsc and pcc docs may be almost the same. After use those documents for the overall write up
- Adam: One PCC decision I do not like at all is that for aggregates they want to use Expression manifested but are allowing work access points as the value.
- Ebe: How do they expect to be able to identify the various pieces of data. NLNZ is recording expression data so it can later extrapolate work if required. They are not muddling things together
- Overall write-up (who can help?)
- Maybe submit as an article in an open publication?
- Explain what everything is, how the work was done
- Use Project Plans as a base
- Cypress can help with Transform aspects of it after coding is complete
- Crystal will create issues for Deliverables and add to Phase I Closeout milestone
1XX-8XX BIG MARC record from Richard (25)
- XML version of MARC bibliographic fields/sub-fields
- Doesn't do indicators, does have all subfields, $9 included as a label; Value is given as tag subfield
- Cypress transformed & Deborah loaded into RIMMF
- Use for review as a group of what output looks like
- Review transformed record against record review spreadsheet - then doublecheck whether is expected per the mapping spreadsheet (e.g., is it expected $8 is missing)
- Won't be able to check punctuation handling in this version since just using labels, rather than real titles with punctuation
- Use is really to see where subfields got mapped and how things will be displayed to the user
- If subfields are repeated, then can test code for breaking
- Could be used as a base for coder testing
TMQ Triple store and RIMMF demo (25)
- Where can we store the data and what to do with it?
- Richard tested creating a repository
- Options are: 1. Linux server in-house 2. Pair.com
- Limits:
- Read-only
- Needs to be made secure
- Not ready for public use; just to show progress so far
- Chose GraphDB to test
- Used every 50th LC from Feb 2025; this transform was from 6mo ago
- Uses SPARQL
- Currently only a title proper index
- Crystal has some SPARQL skills and could help
- Data can be displayed as a linked data graph, though not set up with labels yet
- Discussed challenges with initial articles in titles and a programmatic way to handle; two title propers? use title proper and variant? since language isn't indicated for the 245 creates additional challenges (e.g., the in English (article) vs. the in French (tea)) - further discussion needed
- Repository is accessible using RIMMF
- Deborah can send info on how to access the repository and answer any RIMMF questions
- Planning on updated GraphDB? Can do at anytime, and doesn't take a lot of time
- Is GraphDB free? What does it cost to put project data there? Deborah thinks it's possible to host and then run GraphDB on it, and will check with Richard
Wrap-up (5)
Action Items
- Answer Jian's questions on the 007 review here
- Crystal will heck BSR status of 007
- Update unique IRI generation from using a date stamp to using a date-time stamp
- Crystal will create issues for Deliverables and add to Phase I Closeout milestone
- Deborah will check GraphDB costs with Richard--Richard says he is using a free copy of GraphDB which allows him to run 5 repositories of any size, but only run 2 SPARQL queries
From last week:
- Crystal will change Transform timeline to July 15
- Crystal will update spreadsheets of 9 issues waiting on attributes table and check links from the ones already done, and move them to "ready for transformation"
Backburner
- Further discuss initial article handling
June 4, 2025
See time zone conversion Meeting norms Present: Deborah, Sara H., Cypress, Crystal, Sarah C., Jian, Junghae, Laura, Adam, Tynan, Ebe, Sofia Absent: Doreen, Gordon, Matthew, Sita, Abhignya Time: Laura Notes: Sara H.
Water Cooler/Agenda Review/Roles for Meeting (5)
Updates (10)
- LD4 Conference proposal submitted
- Submitted abstract as is; Title was along the lines of: MARC 21 to LRM/RDA/RDF Mapping Project: Crossing the River with More Mountains to Climb
- Crystal removed field-by-field transformation review issues and project steps after discussions last week. Strategy for review will be based instead on record-by-record review, with the One Big Record included
- SWIB proposal: Who else wants to present? Deadline is June 11. Crystal drafted a proposal based on the LD4 proposal, but updated for where the project should be by SWIB-time
Private YouTube Channel for Meeting Recordings? (5)
- Google Drive storage is pretty low
- Phase II will mean more stuff
- Thoughts on having meeting recordings upload to YouTube and only viewable by those with the link, and only keeping the links in private Google Drive?
- Alternatively, we could make an additional Shared Drive for Phase II or just for meeting recordings
- Could also permanently delete recordings from 2023, even though we don't have transcripts
- Previously decided to remove recordings in favor of transcripts, but old ones don't have them
- The group's sense is that with detailed notes and documenting decisions, there's been less of a need to review older recordings; now more useful for catching up
- Decision: remove videos that are more than a year old, but Crystal will back them up in case ever needed on a UW drive
Project Timeline Review (10)
- 64 fields left to code, 50 in "ready for transform"
- Nine waiting on attributes table (are these actually in progress?)
- One row that says "see headings mapping table", covers all subfields except 3, 6, 7, 8, maybe 0 and 1
- Cypress did this for agents headings fields, link should be updated
- Crystal can update spreadsheets and check links from the ones already done, and move them to "ready for transformation"
- 0 and 1 stuff should be in the spreadsheets
- Still need to address what to do with $0 and $1 in 130/240
- Find summary of $0/$1 and integrate or refer out to
- Five still in review
- Adjust projected phases and timelines, with Phase II still beginning September 1. Feasible if mappers begin working on other deliverables now
- Losing Doreen, Sara H.; still have Tynan; Cypress back; new coders continue to ramp
- Cypress is supporting error review; if others can take field-by-field Cypress could focus on other parts, but her time is more limited
- Tynan: would find it helpful to have quick reviews of the output to iterate quickly with any updates needed in the code as still solidifying conceptual side of the work
- Crystal, Jian, Laura and maybe Ebe depending on availability, all willing to provide feedback; Cypress on more complex cases
- Talk more in Transform meeting tomorrow about the process. Will share the decision with the group and record in the Decisions Index. Likely a label for coders and note with the ask and link to the code, then reviewers self-assign.
- Decision to push deadline to July 15.
- Who can work on what deliverables?
- Crystal will set out work for the deliverables. Next week will discuss and start assigning.
One Big MARC Record (10)
- First iteration: comprehensiveTextualMaterial.xml
- Was for textual materials, included indicators, fields, sub-fields with sub-fields repeated even when non-repeatable
- Comprehensive record rather than valid.
- Running transform against it is helping to catch areas where code can be shored up against invalid MARC
- Second iteration in progress
- Specific test examples are available in test_input and test_output (e.g., looking at 008 or 245)
- Talk more at Transform meeting tomorrow, about how much time putting into it and practicality - set deadline for once it's good enough
Headings Attributes Table Update (15)
- Cypress is working on the access points part and will having questions on string encoding schemes
- Cypress will start a discussion; thinks it's going to be similar to 245
- Deborah noted won't have to look at ISBD and instead take in order given; can put examples in the discussion
- Crystal, Deborah, and Sofia will review
Status Check: (10)
- 007: almost done; Jian needs to add questions to issue and then once answered can complete
- 070: there's a condition of whether something is "serial-like" that determines how an item is handled; need more detailed conditions for the coders; need a work-around for scheme of nomen to indicate this number comes from National Agricultural Library.
- 752: Junghae will look at it
- 336: moved to a table instead of mapping spreadsheet; Matthew is coding; Crystal will fix the mapping spreadsheet since it's not up to date anymore
Transformation Q&A (20)
- How can mappers support coders in this work?
- Helping review output, and any feedback on test input
- When reviewing and it's already marked as coded and nothing changed then move it to Done instead of Ready to Transform; Crystal will look at these and move if needed
- Keep responding quickly when asks on clarifying mapping spreadsheet
- How is onboarding going for Matthew, Sarah, and Abhignya?
- Will discuss at Transform meeting
- Do we need to leave some coding for Phase II? Can some fields be momentarily "left behind"?
- Did this for 400/411 - obsolete/not in BSR, postpone to meet deadline
- Crystal will check whether in BSR and note whether need to be prioritized. Otherwise leave not in BSR and can take up if extra time before the July 15 deadline
- Can work on Access Points without a field being fully mapped
- Some in Transform may be moved: 005 not going to map; LDR 8 type of control is data provenance so should be looked at in Phase II
Wrap-up (5)
- Reviewing Test input and output
- Crystal recommends downloading both input and output and looking at them side by side; uses VSCode, thinks better than Notepad++
- The rdf:about is the IRI. If it ends with "man" it is a manifestation and everything with the "about" is about the manifestation
- 245 went through extensive testing and examples
- Mappers can suggest additions to the test to expand the use cases
Action Items
- Crystal will change Transform timeline to July 15
- Crystal will start setting out work for deliverables
- Crystal will remove old videos and back them up
- Crystal will update spreadsheets of 9 issues waiting on attributes table and check links from the ones already done, and move them to "ready for transformation"
- Transform topics tomorrow:
- How is onboarding going for Matthew, Sarah, and Abhignya?
- One Big MARC Record, requirements, effort, and timeline
- Processing for reviewing coder output, using a label, noting the ask and code, reviewers assignment
Backburner
May 28, 2025
See time zone conversion Meeting norms Present: Crystal, Sara H., Cypress, Deborah, Doreen, Adam, Laura, Abhignya, Junghae, Sarah C., Jian, Tynan, Sita, Sophia Absent: Ebe Time: Sara Notes: Doreen
Water Cooler/Agenda Review/Roles for Meeting (5)
Updates (10)
- LD4 Conference? Deadline May 30 to submit. Conference end of July.
- Document created.
- Crystal: something ready by Thursday EOD. Laura is on vacation but available Thursday and Friday if needed.
- Crystal added Poll decision re: Agent IRIs to the Decisions Index in II.C.1.a.v. How to mint the IRI. It would be awesome if folks would check to make sure everything came over that was expected.
- SWIB:
- Extended call for proposals; new deadline: June 11, 2025.
- Consider presenting there too — note that LD4 and SWIB are largely distinct communities.
- Crystal and Deborah worked on the mapping for 336
- It's the end of May: can we finish review and coding by the end of June?
- Avoid delays into September; Phase II should start on time.
- Coding review can extend into July, but initial coding must be complete.
- Coders currently have 53 tasks left; feasible with available coders.
- For transformation meeting tomorrow, talk about this with coders.
336 Mapping: Pattern Matching Table and Review (30)
- Deborah developed a pattern-matching table and mapping spreadsheet.
- 336 Issue
- Mapping:
- All values for 336 are to be mapped using RDA ContentType IRIs — not as literal terms or MARC codes.
- If the IRI is an
http
and not from LC, include it in the "not LC" bucket (this covers RDA Registry IRIs). - Do not map:
"other"
— it has no semantic value since "not other" is undefined."unspecified"
— adds no meaningful information.
- Codes vs. Terms
- In minted IRIs: Should use RDA curies (e.g.,
(rdaco:1001)
) to support multilingual alignment and de-duplication. - Access Points: Will retain human-readable terms for now to support review and debugging.
- In minted IRIs: Should use RDA curies (e.g.,
- Language
- Same mapping logic applies as with 336:
- Use language codes in IRIs (e.g.,
eng
for English). - Use terms in APs (e.g., "English").
- Subfield l may not always match the 008/041 codes (especially in parallel aggregates).
- 041 can contain multiple language codes; 008 supports only one.
- Technical Implications
- Current access point and IRI functions share logic and must be separated.
- Coders/Cypress will try to:
- Rebuild the IRI generation function to accommodate code-based values.
- Implement or integrate a local lookup table to map codes ↔ terms for languages.
- Avoid reliance on external APIs; all lookups must be local.
- Implementation Notes
- Old transform logic from Theo (commented-out code) should be deleted.
- For Matthew tomorrow during transformation meeting: additional mapping tables for
337
and338
should be based on the same structure.
Transformation Code Review (40)
- 3XX for review
- 3XX for review Discussion - one discussion with all tags shared
- Reviewed through 340 last week--do remaining tags
- 344 Review: Looks good.
- 345: Not yet coded.
- 346: Reviewed using RIMMF
- Assign asynchronous review individually or in small groups?
- 3XX Review Process
- Current Method (field-by-field) is time-consuming and inefficient.
- Pending Decision: Shift to holistic record-level reviews:
- Select a diverse group of MARC records.
- Include a "one record to rule them all" that touches most transformations.
- Suggestion: Add coders’ test data to facilitate more comprehensive review. Limited usefulness?
- Suggestion: Combine real records and test outputs?
- Suggestion: Use more carefully selected MARC records since not everything has been coded?
- Needs to be discussed with coders tomorrow during transformation meeting.
Wrap-up (5)
Action Items
- General:
- Prepare LD4 proposal draft (Crystal, by Thursday EOD)
- Review Agent IRI entry in Decisions Index (All)
- For Transformation meeting tomorrow:
- Discuss whether we can complete Phase I coding by end of June
- Ask Matthew to review if the 336 mapping logic can be replicated for 337 and 338
- Discuss switching from field-by-field review to record-level review; coder/reviewer workflow.
Backburner
May 21, 2025
See time zone conversion Meeting norms Present: Crystal, Sara H., Cypress, Ebe, Deborah, Doreen, Adam, Laura, Abhignya, Junghae, Sarah C., Jian Absent: Matthew, Sita, Gordon, Sofia, Tynan Time: Ebe Notes: Sara H.
Water Cooler/Agenda Review/Roles for Meeting (5)
Updates (10)
- Welcome Abhignya!
- Sarah and Matthew are coding away
- Crystal reviewed most of the classification tags, but there's an unresolved question about 070. Ebe has one tag left to review, Jian has two.
- Cypress will join the transform meeting tomorrow
- LD4 Conference? Deadline May 30 to submit. Conference end of July.
- Document created.
- Email Crystal to join.
- Crystal will add Poll decision re: Agent IRIs to the Decisions Index today. She has been working on getting issues added and workflow lined up for the transformation review. Documentation on that also coming today.
Transformation Code Review
- 3XX for review
- 3XX for review Discussion - one discussion with all tags shared
- 336
- Cypress: 336 has not been coded. It was originally coded by Theo or Zhuo but we switched approaches and 336 was put on hold for more discussion on how it would be mapped.
- Worked on manifestations and would come back to expressions, but didn't come back?
- Marked as code re-check needed and removed from today's review
- We want to map as many values as possible to RDA value vocabularies.
- 337
- If the subfield is there, both 'has media type' and 'has media type value applies to' whatever is in subfield 3
- Capture both media type and carrier type as didn't want to assume they were the same
- Looked at Adventure time example and 2nd carrier type going to doi, probably because of an 008 and being minted to a vocab from UW.
- For the 008, a lot mapped to RDA when could confirm, but some that couldn't went to doi and UW vocabulary. This will be captured in 008 mapping sheet.
- As with 33X, we want to map as many values as possible to RDA value vocabularies with the 008 as well.
- Is there mediatype from a different vocabulary? Not in these examples. Most will be from id.loc.gov since that's preferred by PPC for the field
- Liked the way it was transformed, and want to map as many values as possible to RDA vocabularies.
- 338
- Carrier type comes from both $a and $b, so mapped both. In the transformation they will be deduped to one triple. We did this so wouldn't miss anything
- Online doesn't have an equivalent in doi? Could have been Online Resource in RDA
- 008 still in Ready for transform status. Sita completed a 2nd review.
- Was anything changed? If not, then can be marked as done. Otherwise, will need code re-check.
- Ebe recalled long 008 conversations that they weren't entirely equivalent, and this was the best route for the time being.
- Deborah noted carrier type and content type are very important to be available in record access point qualifiers and discovery systems will need them for filtering. May have unintended merging otherwise.
- Cypress noted qualifiers come from 336, 337, 338 right now. We will need an 008 mapping at some point, and that will take time to code
- We want to map as many values as possible to RDA value vocabularies
- Besides fixed values, will need to hunt for other places this could be derived. This will be more challenging and difficult for a student unless already deeply familiar with MARC.
- Sofia and Gordon did some work on mapping. Group made a similar decision on 03-05-2025 to look at this again at some point. See issue for 336
- Created a new issue from a previous discussion - Deborah and Ebe will evaluate
- Cypress noted this will be a large coding sub-project
- One MARC record to rule them all
- It's difficult to find records the represent every possible iteration of MARC. Time consuming to create.
- Sara proposed considering using GenAI to generate test MARC records that meet all the parameters we're trying to hit
- Crystal suggested could be a presentation on appropriate uses of AI for metadata: aid for technical work rather intellectual work (e.g., subject analysis and understanding a painting)
- Deborah noted will need multiple versions for different format. Deborah and Laura can help guide Sara on requirements
- New issue created: One MARC Record to Rule Them All, per format
- Transformation Review Goal
- Essentially, we're trying to answer, "does it look correct?" If the output is what we expect, even if we're not hitting every possibility.
- If/when a very "complete" record becomes available, we can add that to the review.
- 340
- $a and $c use the same vocab; in RDA they map to the same property: all just material
- Group discussed the loss of specificity, challenges if someone ever wanted to convert back to MARC, and whether a note would be useful.
- Ebe recalled that when the vocabularies were pulled together, that the specificity of MARC was taken out, and they all went into one VES. This was intentionally done.
- She also noted that the idea was this is legacy data, let's put it in RDA/RDF; moving forward, being compliant with new content standard and saying the same things semantically; as a result, old data will unfortunately be left behind
- Adam agreed this was a persuasive argument and it's not so important to hold on to what's lost
- Laura noted that there are communities that do care about this kind of loss, with Getty and AAT as an example - losing granularity in properties, not values
- Unfortunately, this project can't bridge those losses for everyone, and they'll need to be taken up with RSC if needed.
- We'll leave the loss as is for now, and can revisit later if needed.
Wrap-up (5)
- It's best if the group can work on asynchronous review before next week so we can move on to the next review
Action Items
- Everyone: asynchronous review of 3XX for review
Backburner
May 14, 2025
See time zone conversion Meeting norms Present: Crystal, Sara H., Sita, Deborah, Doreen, Junghae, Cypress, Jian, Sofia, Laura, Sarah C., Ebe, Tynan, Adam Absent: Gordon, Matthew Time: Ebe Notes: Sara H.
Water Cooler/Agenda Review/Roles for Meeting (5)
Updates (10)
- Welcome back Cypress!!
- Crystal onboarded Sarah yesterday
- Sarah and Matthew started taking tags to code
- OCLC DLC Records coming soon
- Hoping for more non-book items. Expecting by end of May
- LD4 Conference? Deadline May 30 to submit. Conference end of July.
- Laura could help with some preparation.
- Ebe can help with materials preparation, won't likely be able to attend.
- Look at what Crystal/Sofia presented at IFLA Symposium.
- Plus a little looking ahead to Phase II. Recent issues overcome.
- Crystal will email Laura and Ebe
- Transformation meeting is tomorrow
- Cypress will begin attending next week
- Doreen and Sara graduate in about a month
Headings for Works and Expressions as Subjects (30)
- Attributes Table Discussion
- Revisit discussion from last week, wrap into a decision we can implement
- Loose ends?
- Laura found something with Agent she'd like to note. If it's simple, Deborah suggests adding as a comment in the table.
- Team reviewed slides from Deborah:
- Left side - get subject
- Middle - get entity for work and for person; also subject heading relationship with work; example work is aggregating
- Right side - get subject and subject for the whole string
- Question of whether to link Shakespeare to all his works whether about him; only get to him in an index
- Deborah: Have to find your way to an entity and then to related entities
- Laura: Create concepts with SKOS, and SKOS doesn't have the complexity to break out agent with concept. Phase II investigate going beyond SKOS?
- Adam: What's the relationship between pink and blue?
- Laura: Subject is RDA entity, but concept is just an IRI. Trying not to impute subject relationship a cataloger hasn't assigned or relationship of person to the concept we've created for the subject heading; doesn't seem like a relationship we can apply RDA relationship to
- Sofia: Related person of work is ok but there are some small problems in the graph
- Crystal: Want to relate a person to concept; is there some component part in SKOS? Agent isn't a subject, but concept could be a subject; Subject linked to Agent, then subjects that start with the Agent and have subsequent parts could be linked to each other
- Deborah: 'Has related person of work' rather than 'has subject', but only can link from the work; change coding from curie for subject to related person
- Laura: seems like want something to happen for end users but can't get there with the tools that we have in RDA right now, without change RDA (which can't do) or coming up with something more complex than SKOS concepts or a 3rd undetermined path. Hesitant as seems like trying to shoehorn something in a way that it wasn't intended, then down the line has impacts.
- Sofia: agrees with Deborah in order not to have loss, likes related person, RDA/LRM if had entity we could solve this with Nomen, but don't have this available. Related person of work, understands what suggest, this way would find relationship to agent without using subject
- Problems with subjects in graph. "S. Will - Characters - Ophelia" isn't the subject of The afterlife of Ophelia
- Decision: Do as related person of work, mark for community conversations, and mark as loss, since loss of specificity. Talk with SAC community about subjects and related person of subjects
Transformation Code Review (35)
-
600 for review : wrap up
-
-
We decided that using APs or AAPs in IRI formations for Agents was part of our initial decision. Perhaps wasn't implemented
-
Choice made was to mint meaningful IRIs when have source provided, but normally don't since it's from NAF, so have unique meaningless IRIs
-
Deborah Fritz from ORCID + Fritz, Deborah from NAF: ideally one person with two nomens but right now includes the source, so it's two entities
- Did this to prevent accidental merges
-
Does the fact that we're not accepting NACO authority file and treatment of non-human personas, is that the main rationale for why minting IRIs with APs/AAPs instead of using IRIs from NACO authority? Yes
-
Unless uncontrolled name there should be a source (unless ind2=4, there will be a source)
-
Ind0 + 600, assumption go to authority file, but many names aren't established, so no authority anywhere, just in bib record. Coded as in accordance with rules, but there's no source for them. Can't know until go to NAF and see if the string is there. Plus, there are undifferentiated names.
-
Approved Sources list in the code
-
Do we treat works like decided to treat agent? So, if work has subject subdivisions, going to make concept for whole string, will also breakout the work as subject of the work? Decided would not do agent is the subject of the work, instead use related agent of work. Deborah suggests doing same thing for the work - related work of work.
-
Poll
-
Decision:
- We are still going to make SKOS concept for the whole string. We will break out the agent and do as related agent to work. Apply same thing to the work that has 6XXs. What about agent portion? That gets attached to the work in the 600, not the work being described.
- Rather than making new subject statements for Work and Expressions portions and X, Y, or Z, we're going to do the same thing as Agents and make them related
- Crystal will add to the Decisions Index
- Plan to revisit in Phase II regardless
-
This needs coding changes
- Added Code re-check label
- Keep as Transform review in progress - will want to look at as a group to confirm output is as expected
-
-
3XX for review Discussion - one discussion with all tags shared
Wrap-up (5)
Action Items
- Crystal will add Poll decision re: Agent IRIs to the Decisions Index
- Crystal will email Laura and Ebe re: LD4
- Everyone: asynchronous review of 3XX for review
Backburner
May 7, 2025
See time zone conversion Meeting norms Present: Doreen, Sara H., Junghae, Sarah C., Sita, Adam, Mattew, Jian, Sofia, Crystal, Laura, Ebe, Tynan Absent: Deborah, Gordon Time: Junghae Notes: Sara
Water Cooler/Agenda Review/Roles for Meeting (5)
Updates (10)
- We need to get moving on our review
- Welcome Sarah Collins! Sarah is a new student coder that started this week with a few hours, and will begin a normal schedule next week.
- Crystal needs brief bios for Sarah and Matthew for the project roster
- LD4 Conference announced CFP, due May 30. Would be a good fit for the project and is online so avoids current institution travel restrictions.
Headings Attributes Table Update (30)
- Deborah has replaced the entire Headings attributes for WE tab with one called “Headings for WE". This is sufficiently complete and is ready for review by the group and then coding.
- Discuss changes for review
- Links:
- The primary update involves removing repeated subfield codes from each individual cell in a row and instead placing them in a dedicated "subfield" column
- This change reduces complexity, particularly when dealing with titles and access points (APs), where the codes can become lengthy and error-prone
- The new structure aligns with existing XSLT coding practices and has been implemented as a function
- A function table has been added at the end of the main table to support this logic
- Rows in the table now share the same conditions for clarity and consistency
- The table has been significantly expanded to accommodate slight variations and edge cases
- It’s crucial to verify that the subfield codes listed are correct
- If a code is valid, it will be properly handled
- If a code is used incorrectly, it will result in an error due to cataloging practice, not the function logic
- Some MARC records (including those from LC) have been found using subfield $t in the 100 field
- Decision: Add 100, 110, and 111 fields with title subfields (e.g., $t) to the table in a separate set of instructions related to titles and authorized access points--DF: Same logic as for W: has AP|AAP for work for 700, 710,, 711 fields; so, added in that row
- Deborah will investigate examples of 130 fields for review--See: attached file at: Attributes table
- Deborah confirmed with Cypress that multiple modes can be strung together (rather than having to repeatedly copy/paste)
- Legend for color coding has been added
6XX
- Lengthy discussion on the 6xx and whether agents have a subject or related relationship and whether the relationship is to the work being described or a work identified in 6xx
- Prior poll
- Questions
- Responses
- Would rely on FAST to break apart, and transform FAST headings as subjects
- Make separate entity for the person if doesn't already have, related to the work if there is one in 600 AP, but not relate the work to the person being described
- understood xyz doesn't create separate subject for each but keep all together. But for person can create subject person related to the subject work
- Last week discussed if a person is broken up, the source should be the name authority file. The idea of breaking it up at all wasn't addressed
- For 600s, don't think there's any place to indicate that NACO authority file is the authorizing source. It has a second indicator 0 that indicates that it's in the authority file. Implied it's in NACO/SACO
- Sofia: If we have a 600 with a $t, I would not create a subject person relationship with the name portion.
- Related to the work that is related to the work being described, so related to the subject in the main heading (person not related to work in the main record, but to the one in the subject heading field)
- agree to create a person related to the work (600$t) but not to relate as a subject the person with the work described in the record.
- Voted on
- Break out agents from 6xx fields and make "subject" relationship to main work - 2 votes
- Break out agents from 6XX fields and make "related" relationship to main work - 0 votes
- Do not break out agents from 6XX fields unless they are WE headings - 6 votes
- Options 1 and 2 allow for agents to be related to WE entities produced by 6XX WE headings they are part of
- When have agent with xyz at the highest level you still have a subject relationship to the person, in addition the more complex subject
- Decision: Take an iterative approach. Option 3 is the best for now. In later phases can map the person as a subject, taking into consideration community feedback
- Adam shared examples of sub-divisions:
Transformation Code Review & Workflow Finalization (45)) - ran out of time
- 600 for review
- Does the GitHub workflow suit our needs? Did anyone test it out and have thoughts?
- 610 & 611 for review (Crystal will do these today)
Wrap-up (5)
- Put WE breaking out from subject/concepts on the agenda for next week
Action Items
- Crystal: Send email with new meeting link to make sure everyone got it
- Deborah will investigate examples of 130 fields for review--See: attached file at: Attributes table
- Everyone: 600 for review
- Everyone: send any feedback on the GitHub workflow for review to Crystal today
Backburner
April 30, 2025
See time zone conversion Meeting norms Present: Absent: Sofia, Ebe Time: Notes:
Water Cooler/Agenda Review/Roles for Meeting (15)
Updates (5)
- Cypress is returning to the project to help us with coding for a few hours per week! Thank you Cypress, and welcome back!
- Matthew will provide a few hours per week of coding support for the project as well. Thank you Matthew, and welcome aboard!
- Crystal has hired two iSchool students, Sarah Collins and Abhignya Rajapu, who will start in May once their paperwork is cleared. They will be able to train under Doreen and Sara before they leave.
- Sofia needs to hand off her classification review tags due to unforeseen circumstances. Unless someone else can volunteer, Crystal can review them.
- Crystal is working on putting a standing transformation team meeting on the calendar for May/June
Continue Reviewing 600 (35)
- Fill out spreadsheet for it--will the spreadsheet workflow work for the remainder of the tags?
- Do we need to build out something else for recording and following through with issues surfaced?
- Crystal has a link to the issues spreadsheet
- The spreadsheet also contains links to the github issues
- Should we put feedback into the issue itself?
- Feedback:
-
- Questions about the decisions made by the group
-
- Coding questions: issues that only pertain to the coding
- Maybe the tabs could be used by the coders to monitor the feedback coming in
- How do we tell the reviewers what needs to be done – github is better at notifying people of needed changes
- We might need to add a workflow stage to the project
- Do we need fresh issues to deal with the review of coding?
- Decision: we create a new github issue for this review process because the old issues are over-saturated with the prior mapping discussions/assignments
- We need to create issues so that we can do the project management aspect of this – i.e. assignment people, marking statuses
- Crystal added new columns to the project workflow called “Transformation review to-do” and “Transformation review in progress”
- Here, we can discuss issues with the transform that need to be fixed by the coding team
- Then we created a new issue “600 Transformation Review”
- In the issue Cyrstal linked to the old issue discussing the mapping/transformation as well as the mapping spreadsheet and the attributes spreadsheet
- Here’s how the issue looks: https://github.com/uwlib-cams/MARC2RDA/issues/487
- Recoding gets signaled in the NEW issue
- Test this out on the 600 review:
- We can try this with asynchronous review
- We will do some test records for another field next week using the same process
-
Attributes Table Workflow & Approval (30)
- Has to be a workflow in which the coders can take the attributes table and code directly from that, rather than going back through the corpus of mapping spreadsheets and finding changes
- How do we assess who is assigned to review?
- How do we get to, “this has been approved”
- The table is not finished, but there is consensus – it is being coded as it is being finished
- We won’t review mapping tables against the attributes table
- The mapping tables won’t get notes that say where to refer to it
- Mapping table true where attributes table doesn’t apply
- Attributes table will be one of our deliverables
- It’s ok for us to review as Deborah still works on the table
- There is time in the timeline to review the spreadsheets
- We should figure out how to get google drive updates when the attributes table changes so that we can know when there is new material to review
Wrap-up (5)
Action Items
- We can review the attributes table asynchronously
- 600 transformation review
Backburner
April 23, 2025
See time zone conversion Meeting norms Present: Jian, Sara, Deborah, Laura, Doreen, Crystal, Matthew, Ebe, Sita Absent: Junghae, Gordon, Tynan, Sofia Time: Ebe Notes: Sara
Water Cooler/Agenda Review/Roles for Meeting (5)
Updates (10)
- Crystal is interviewing students for the rest of this week. Hoping they can start next week
- Welcome guest Matthew Hill from British Library, considering providing coding support for project
- Crystal neglected to assign a reviewer for 008 (a field we reviewed together but was initially mapped by Sita, which we need to review again to make sure we incorporate MARC vocabularies we published and to make sure we were consistent with the enormous mapping incorporating a lot of formats). Who will volunteer to do a cohesiveness-check?
- Sita will review 008
Transformation Testing (20)
- How to best accomplish systematic review?
- Crystal made an attempt at the spreadsheet
- Once we figure out what we want, Crystal can finalize the workflow, create a page one it, and walk Adam and Deborah through the steps of getting records and pushing/pulling in GitHub.
- Serializations for the weekly review?
- Deborah requested N-Triples
- Laura shared converter to serialize: https://www.easyrdf.org/converter
- Enter MARC field. Duplicate and specify patterns conditions want to see in each row
- Coded-Phase I should reflect whatever the current coding label is
- Date Coded will need to be updated if/whenever recoded
- Input for review will be MARC XML of ~10 examples. File is placed in marcDatasets folder and link entered in this column
- Output for review will have the link in outputDataForReview folder and link entered in this column by coders
- Notes column to enter if find something unexpected
- Upcoming meeting with Jeff Mixter, OCLC. Ask if can still provide a record with every MARC tag.
- If not available, consider creating one. Would likely take a day.
- Crystal will put in MARC tags, and then as a group can put in all patterns
- Will only put in ones that have been coded
Transformation Review (45)
- 600 review
- 2nd Record - John of Austria
- Scheme of nomen is LCNAF, not LCSH as linked
- Line 197
- Scheme of nomen for 600: can it be LCSH when person is not subdivided by subject?
- Review policy for person scheme of nomen, not a concept (breaking up person from the concept)
- Subfields in Line 236
- Line 217
- Thought were minting identifiers for agent differently? Thought were deduping for all entities? Need to confirm what decided.
- 3rd Record - Thomas Edison
- Line 362-371
- Different scheme, different nomen even if same string
- Nomen is an entity, not a string - but should be about same entity
- Thomas Edison is an RDA person
- Thomas Edison is in LCNAF
- Thomas Edison is in CYAC
- Even though identical, identified in two separate schemes
- Line 377-392 re: deduplication
- Line 415
- Decided not to pull out Geographic or Form sub-divisions; did do Genre
- Z & X would cause confusion, and were not safe to pull apart
- Gordon wanted to pull apart, and some wanted to keep together, and that's what vote went with
- If review surfaces that a decision needs to be made, we'll do that
Wrap-up (5)
Action items
- Sita will review 008
- Revisit decision of minting IRIs for agent
- Crystal - add example in the spreadsheet
- All - asynchronous review of 600s
Backburner
April 16, 2025
See time zone conversion Meeting norms Present: Crystal, Adam, Jian, Tynan, Doreen, Sita, Deborah, Sara, Ebe, Laura Absent: Junghae, Sofia, Gordon Time: Tynan Notes: Sara
Water Cooler/Agenda Review/Roles for Meeting (5)
Updates (10)
- Transformation Deadline update: End of May -- work has slowed without Cypress or a replacement for her
- Review needs to happen simultaneously and then in June
- Crystal reviewing resumes and starting interview process for two new students today
- Matthew Hill at The British Library may be joining the project. Could have time to offer support and setting up patterns for coding that the coders take forward. Deborah hopes to introduce him next week.
- NARDAC meeting recording for the Spring 2025 Update with presentation on NLNZ's implementation of Official RDA. Deborah encourages everyone to view; it was very encouraging!
- Laura shared the Ex Libris Users of North America (ELUNA) 2025 Conference is scheduled for June 16-20 in Atlanta. The Linked Open Data Community of Practice Working Group (which Laura and Ebe are members of) will be co-moderating session organized by Ex Libris. Questions Laura submitted:
- Conversion from MARC to RDA/RDF leaves out serials/aggregating works; how do they envision providing a conversion process, even if partial
- UW created templates for entry for RDA in Sinopia, but they haven't been updated recently. What kind of assistance would be needed to have to create those tables?
Field 357 Question (10)
- Was originally mapped by Theo and reviewed by Sita
- Question on how to format transform output for repeating fields (e.g., subfield c)
- Mapping is using labels based on the subfield code
- Decision: use semi-colon space ("; ") to separate repeating fields (e.g.,
<rdamd:P30137>The originating agency, ITAC, denotes their control of dissemination using the term ORCON to the authorized recipients, CIA; DIA; UKIA</rdamd:P30137
>)- This avoids that subfields may include commas (,)
Transformation Testing (20)
- Dataset Selection & Review Planning
- Need to decide on methodology for selecting datasets.
- Once decided, begin processing and reviewing datasets.
- Assign someone to finalize the Transform Output Review spreadsheet and another to volunteer as the first to test.
- Sources of Records
- Depends on who is pulling the records.
- UW: Alma, authorized records, OCLC (permission to download 50k DLC-authorized random records from 2023–2025)
- Deborah has her own set
- Storage & Format
- Clarify where records will be stored and in what format.
- Past practice: Cypress stored them in “MARC Datasets” folder for field-by-field testing.
- Records should be transformed to XML:
- MARCEdit can be used.
- Alma exports XML directly.
- If not running a job, the last record published has an OAI wrapper around it, which can be dropped.
- OCLC requires transformation via MARCEdit.
- Record Batch Size
- Decision: approx. 10 records per batch/file.
- Prefer one file with 10 records rather than multiple files for efficiency.
- Use MARC collection wrapper for 10-record XML batches.
- Deborah exported records from LC, which go one-by-one.
- Review Process & Criteria
- What exactly are we testing for?
- How to organize the review (by field, concept, etc.)?
- Coding is organized by fields.
- Possibly group fields by pattern (e.g., descriptive, subjects, concepts).
- Start with fields like 600–630 or simpler ones like 245?
- Use a column to indicate patterns represented by a field.
- Avoid color-coding due to field overlap, that could be used to sort?
- Include only mapped and reviewed items.
- Laura: Difficult to review until coding is finalized.
- Go through every element and comment or review in groups (e.g., subjects, 008)?
- Avoid going through individual tags; too time-consuming.
- Goal: to view how complete records are transformed. General and field-specific review can happen simultaneously.
- OCLC used to provide a record with every MARC tag.
- If not available, consider creating one.
- May require a side meeting to discuss.
- Ownership, First Run, & Next Steps
- Crystal will take ownership of the process and next steps.
- Crystal will be the first to go through the process.
- She will walk the group through the GitHub-based workflow in a follow-up meeting.
- She will send coders a set of materials for review.
- Will include records Deborah shared last week.
Review Check-In (15)
- LDR 7 - Ebe mapped, Laura reviewing. Deborah sent a table had made. Laura had questions, will review this week.
- 007 - needs review - Jian will take
- 245 - was reviewed, and moved to Transform
- 751 - Crystal needs to look at
- 100 - unsure how table relates to Attributes table, on hold for that
- Mapping Spreadsheets & Attributes Table
- Do mapping spreadsheets need to be updated to conform to the attributes/heading tables?
- Yes, but doesn't need to happen as coding & review happen; it does need to happen as part of deliverables to make sure spreadsheets are accurate and/or point to documentation
- Reviewers: Identify which issues are on hold due to the attributes table; move to "Almost done - waiting for decision"; make a note in the issue
- 856 - Sita will take to review
- 710 - Laura reviewing
- 700, 111, 110 - Jian reviewing
- LDR 8 - Ebe reviewing, Laura has a question that needs answering. Believe it's marked as not mapped. Laura will double check the spreadsheet.
- Crystal will go through remaining "Review in Progress" issues separately and make sure it's clear who's responsible
- Moving issues to "Almost done - waiting for decision" will help the team visualize what needs discussion
Attributes table (25)
- Deborah oriented the group to the spreadsheet to help with understanding it and reviewing the table asynchronously
- Use term “Singleton expression” for a single-expression work.
- Manifestation: Leave as-is (do not use non-aggregating approach).
- Relationships: if can find in table use it, otherwise default is Manifestation
- Title of work: 130/240 have things added which aren't Preferred Title. But only using anp
- Work AAP is 100+240, not 240
- Double-check subfield coding for combined fields to prevent copy/paste errors.
- Follow if/else logic through subsequent table rows.
- Title of expression: same as Title of work; useful to put in transform, since used to build AP
- Subfield b Discussion
- Helps differentiate between generic titles in subfield $a.
- Decision: do not add a third title of expression with b to transform
- Expression AP/AAP
- Similar to Work; derived from the same plus Expression-specific elements
- 700/710/711 - Deborah will double check
- Date of work: sometimes year as parenthetical in title (historic, scientific, technical works)
- Deborah will pull some examples
- Laura: don't think need to pull dates from uniform title fields, even if in date fields
- Next steps:
- Everyone review table asynchronously
- Add feedback as comments directly in the spreadsheet where issues are seen.
Wrap-up (5)
Action items
- Crystal will be the first to go through Transform Output Review process.
- Crystal will walk the group through the GitHub-based workflow in a follow-up meeting.
- Crystal will send coders a set of materials for review, including records Deborah shared last week.
- Crystal, Ebe, Jian, Laura, Sita will continue "Review in Progress" issues
- Crystal will go through remaining "Review in Progress" issues separately and make sure it's clear who's responsible
- Deborah will double check use of 700/710/711 for Expression AP/AAP
- Deborah will pull examples of titles with dates for “Date of Work” review.
- Everyone review Attributes Table asynchronously and add feedback
Backburner
- Side meeting to discuss creating a record that includes all MARC tags
April 2, 2025
See time zone conversion Meeting norms Present: Absent: Time: Notes:
Water Cooler/Agenda Review/Roles for Meeting (5)
Updates (10)
- OCLC says we can export 50K records from WorldCat which should fix our DLC record shortage :)
- Can SZ and EK (potentially SB?) send 50-100 records to DF with $0/$1 and or $2 in headings fields? So our testing pool can be rounded out?
- Crystal is posting two new student positions--they should be up and accepting applications any day now
- A new draft version of the Heading Fields Attribute Mapping has been added to the Google sheet, and an explanation is provided in the Attributes table #471 page here.
- The next concurrent steps are to:
- find out whether the coders can code using this format
- Some examples of how it might be done are already available in the coding
- review the content of the table looking for errors, e.g.:
- $f instead of $n in a copied instruction
- missing fields or subfields
- comments on decisions made
- find out whether the coders can code using this format
- The next concurrent steps are to:
Student Presentation: Transformation Walk-Through
Wrap-up (5)
Action items
Backburner
March 26, 2025
See time zone conversion Meeting norms Present: Ebe, Crystal, Jian, Sita, Adam, Deborah, Laura Absent: Sofia, Junghae, Sara, Doreen, Tynan Time: Ebe Notes: Jian
Water Cooler/Agenda Review/Roles for Meeting (5)
Updates (10)
- OCLC said no to record reuse at the scale we requested, which is 500k records. Need to figure out how to get it from LC. Probably will need to download from LC directly.
- LD4 conference dates were announced for summer: we should make a proposal
- IFLA presentation went great. Slides will be available on the IFLA site. Not clear if the presentation was recorded or not. Can do a similar presentation for LD4 with updates for transformation progress.
- Got a lot of interest and questions. There are people using RDA/RDF in European libraries. People approached afterward expressing how impressed they were. Crystal reached out to ask if anyone would like to join the project. Will wait and see.
- Crystal is posting two new student positions next week. Students: tell your friends. Prioritizing XSLT skills.
Transform Test Datasets for Fridays (15)
- Deborah created a feedback template/form
- Request template
- Have not yet decided on a request template
- Deborah suggested to create a spreadsheet for all of the MARC fields with the coding status, such as the coded date, reviewed date, etc.
- Crystal started a Google Sheet named Transform Output Review in project shared drive and linked from Test Datasets Discussion
- Who will assemble input?
- Crystal, Adam, and Deborah agree to assemble DLC/WAU records alternately
- Will complete the Google Sheet for Transform Output Review first and then decide about the input records
- Output location
- Students will figure out
- Dataset sizes
- 10 records each week (not hard and fast rule--will adjust depending on what we need to demonstrate)
- Start with fields that are already coded
- More discussion needed--will revisit this topic next week
Downloading records from LC (15)
- Which records? What does the dataset need to look like?
- How do we go about downloading the set?
- How to download from the website is tricky. For example, how to search if you want records with 100s with different indicators?
- Maybe start with searching in OCLC for a list and then download them from a different source?
- Where will we store it (and other datasets) prior to uploading to Dryad?
- Deborah has (limited) storage for this. More than UW
- Questions about the estimated output size were unanswered. We really don't know how big our output dataset will be in the end.
Mapping Review Check-In (25)
- Linking fields: we decided to use amended Toolkit labels from Deborah's chart. Reviewer should also update according to that decision, if this hasn't already been done. Update?
- Assigning review for "awaiting review": remaining tags assigned to group members
- Review assignments for "review in progress"
- Revisit deadline (end of month)
Uniform titles/Attributes table check-in (15)
- 130/240
- Deborah updated the attribute table, including attributes for 130/240
- Has name of person vs. has preferred name of person for
100/600/700
- Subfield c is part of the preferred name not just subfield a, therefore, using subfield a only would not be accurate as a value of preferred name of person. We need to use name of person
- For corporate bodies, has name of corporate body is better than has preferred name of corporate body because subfield a could contain a parenthetical qualifier that is not part of a preferred name
- Same with uniform titles 130/240. Uniform titles may also contain supplied qualifiers
- Deborah noticed $0 and $1 have not been mapped consistently. The decisions index is not very clear. More instructions are needed for different types of work.
- The attribute table is still missing a lot of things
Wrap-up (5)
Action items
- A separate meeting is needed for more discussion on the attributes table topic (Crystal scheduled for 1pm Pacific Daylight time Thursday. If you want an invitation but didn't get one please email Crystal ASAP)
- Next week students will do a walk-through of the transformation for the team
Backburner
- What is a ballpark estimate of the size of our output data for the initial transformation?
March 12, 2025
See time zone conversion Meeting norms Present:Crystal, Adam, Ebe, Sita, Tynan, Laura, Sara, Deborah, Junghae, Doreen Absent: Time: Sara Notes: Doreen
Water Cooler/Agenda Review/Roles for Meeting (5)
Updates (10)
- Crystal still talking to OCLC about record reuse
- IFLA presentation next week!
- LD4 conference dates were announced for summer: we should make a proposal
- Crystal and Sofia have IFLA next week: meeting canceled
- Aggregates transformation code: Are any snippets ready for prime-time? We'd like to include some in our slides for IFLA. Slides are due today, so if not we will scrap the slide.
Uniform titles (20)
- 130, 240, series (830): Deborah - Assume single expression unless it's aggregating where date is treated at the work level.
- 6XX, 7XX, 8XX is handled.
- 130 & 240 are the problems. Preferred title creates AP but how to match AP in authority files. Sita did the mapping, and Laura is doing the mapping review. During the early stages, AP is not in consideration. Will work on it asynchronously.
- Attributes table need more work. AP Mapping Table tells the field and combination.
Mapping Review Check-In (25)
- Linking fields: RDA Registry labels vs. MARC21 labels: Deborah's chart and review of those fields
- Decision will be made via poll or async discussion.
- Option: Using MARC21 Label or Print Constant Label or Column C in Deborah’s Table
- Option: Using column D RDA Registry Label in Deborah’s Table
- Option: Using column E in Deborah’s Table
- Note: PCC labels not an option because they are incomplete.
- Assigning review for "awaiting review"
- 7xx will be reviewed once we decide what to do with the labels and replace them.
- Review assignments for "review in progress"
- Any fields yet to be mapped in BSR?
- Revisit deadline (end of month)
Review "asynchronous discussion needed"/"meeting discussion needed" label use and go through tags with those labels (25)
- Once aync is resolved, async label should be removed. The only ones left in issues with "async" label is attributes and 240 uniform title.
- Go through discussion to see if async discussion is needed. If you put the label on, you should put questions you have in and tag people directly.
Wrap-up (5)
Action items:
- Crystal will create discussion/poll and we will vote on which label to use in Deborah's table for linking entry fields.
Backburner
March 5, 2025
See time zone conversion Meeting norms Present: Absent: Time: Notes:
Water Cooler/Agenda Review/Roles for Meeting (5)
Updates (10)
- Crystal still talking to OCLC about record reuse
- IFLA presentation still in progress
- LD4 conference dates were announced for summer: we should make a proposal
Mapping Check-In (20)
- 045: the questions on the issue page are relatively new (from Jan)
- Requires a translation table that we don’t have
- Laura will do her best to make her own judgements
- 758: Gordon has given good advice on this, Ebe is nearly there, overcoming some hurdles putting everything in the spreadsheet
- Mapping spreadsheet X00: with thee fields we have access points and attributes
- We need mapping spreadsheets that put the attributes together with the entities that show how we are mapping the X00’s in the transformation
- This is separate from the access points
- How is this different from the spreadsheets for the X00’s
- We are going to replace the old X00 spreadsheets; although some of them have been actively maintained
- E.g. Cypress and Penny maintained the 100
- Laura: suggests that we do individual tag spreadsheets for 600, 700 etc. because they are meaningfully different enough from each other
- Decision: it is ok to map these individually. Jian is reviewing the individual spreadsheets and making sure they have been mapped once. Then we will close this.
- 857: Ebe is continuing working on this – moving forward after feedback
- Updating mapping sheets for augmentation aggregates:
- Crystal and Deborah meeting about this, it’s a big task
- Compile list of abbreviations
- Sara is on this
- $7: Ebe is suggesting we postpone this to phase II
- Adam: we can use it, but unaware of anyone who has used it. PCC doesn’t have any policy/guidelines or training about it. Doubts LC has implemented it
- Suggests putting this off to phase 2
- Crystal uses it to determine open-access
- We haven’t dipped out toes into data provenance at all, so we need to leave this for phase II
- Adam: we can use it, but unaware of anyone who has used it. PCC doesn’t have any policy/guidelines or training about it. Doubts LC has implemented it
- Attributes table
- Almost done, waiting for decision
- Sofia is working on this
- Almost done work:
- 7 bibliographic level: code ‘m’ mixes static and diachronic works
- Leader 7 is a mess, it does not translate cleanly into RDA
- We use it in order to do things with it
- Only thing we can do is put it in its category
- An integrating resource is a work
- As we do aggregating work, category of work is “collection work”
- Doesn’t need a subunit because we would show that is has a parent
- Adam: do we have to use RDA vocabulary? Maybe we map to the MARC vocabularies
- Laura: for 006 and 7, these are values that we are using to determine certain conditions; in MARC they are valuable
- If we ever have to map back to MARC we need these
- Adam: impossible to map back to MARC
- We could create wiki-data items, have a table that indicates the URI’s
- Ebe is interested, but would like some guidance
- 7 bibliographic level: code ‘m’ mixes static and diachronic works
- To-do category
- 400, 411: should be doable because we’ve already done the 490
RIMMF Output Data Review (the rest)
- For full overview, see Deborah and Sara’s documentation:
- Deborah working on a document to post if we want to use RIMMF for reviewing
- Install/update RIMMF
- Help → check for updates
- To install RIMMF:
- Go to the site (https://rimmf.com/w/doku.php?id=rimmf6:start), click download
- Run the .exe
- Import the file for review
- You can find review files here: https://github.com/uwlib-cams/MARC2RDA/tree/main/Working%20Documents/transformationCode/outputDataForReview
- Set up files to test the aspect that we are looking for – makes reviewing simpler
- Work with .nt file for RIMMF
- Download the file – make sure to click on the folder (left-hand side) rather than the commit information (center)
- Go to Tools → import entity records → make sure the External data button is checked
- Then drag and drop the file onto the interface
- Go to tools, load entity index
- Can sort indices by entity
- Start with the manifestation and work up
- Suggestion: filter and only look at manifestations in the index
- Sort alphabetically
- Click on a manifestation to take a look at it
- Comes in the same order as the triples
- Options → sort by element label
- Open MARC record so that you can compare
- “Manifestation described with metadata by” takes you to the metadata work, the MARC record is a Note on manifestation
- With the records side-by-side, you can compare the fields , which will depend on what as been mapped
- When reviewing, only looking for things that came over unexpectedly, not looking for cataloger error
- For example: do you need publication statement?
- 2 works and 1 expression is augmentation aggregate
- Normally go from manifestation to expression
- Not much there
- All we have in augmented work is appellation data
- We have link back to manifestation and link up to work
- This is the augmented work, not the aggregating work
- The identifier is the local part of the IRI
- Title of expression did not come over because it has not been mapped yet
- Related person of work is in here, but this is an error; we were not supposed to map any agents or related works to the primary work because we do not know whether they are actually related to the primary work
- We need to set up a review process where we can give the IRI or the access point (within RIMMF we can give the RIMMF identifier)
- That’s the best thing to put down as a header
- Go back to the manifestation and click on work
- Review: how to open RIMMF record
- Tools → entity index → double click
- File → close all records
- This closes the open-records, but leaves the program open for you to look at new records
- RIMMF will show any appellation, need to know the RIMMF id
- Cypress put mapping from access point table into the code – we have to edit this to make any changes
- Crystal: we should explore data we’d like to explore in RIMMF next time
- Let’s put the import instructions in GitHub
- Put the RIMMF instructions on the project WIKI
- Identify which kinds of datasets will be most helpful to review
- The large chunks are too overwhelming for review – we can’t take in 25K entries!
- Need an asynchronous discussion about output review – output the data from ALMA or OCLC and experiment with it
Wrap-up (10)
- We might do a demo of how to run the transform during a working meeting
- We have mapping work assigned and asynchronous discussions that need to be had
- Mapping review deadlines
- On Fridays we can update a new transformed dataset, decide during Wednesday meeting which files we want uploaded
Action Items
- Put thoughts into discussion on output datasets for review
Backburner
- We should probably have the transform team walk the rest of the team through the transformation code. How to run it, where the functions are, etc., so that everyone knows how to look things up and is capable of running it independently/can help Tynan onboard new students in future
February 26, 2025
See time zone conversion Meeting norms Present: Crystal, Sara, Junghae, Doreen, Laura, Sita, Ebe, Adam, Jian Absent: Gordon, Deborah, Tynan Time: Ebe Notes: Sara
Water Cooler/Agenda Review/Roles for Meeting (5)
Updates (10)
- Deborah is out of town until March 4
- Crystal is in touch with OCLC about permissions for using metadata exported from there rather than downloading from LC. They're checking
- Next week: data review in RIMMF
- Install own instance for ease of following along and testing on own - RIMMF 6
- Crystal will try to install and import. Will capture and share instructions if they're not already available
- Ebe recommended the help content's usefulness
- Sofia and Crystal are drafting the IFLA presentation
- They will share with Laura and Ebe for review
- Ebe and Laura are on the program as presenters; however, they are co-authors but will not be present. Crystal will clarify with IFLA.
- Laure: Question on gathering content for RSC Review
- Use RSC Question label on the issue.
- Make sure context and question are clear in the issue (or be prepared to get an email asking about it :) )
Google Drive Space (15)
- Within 18% of space limit left
- 300MB per recorded meeting
- Need to start a meeting recording archiving process: Crystal can start moving things to UW OneDrive
- Start with 2023-August 2024? Then every 6 months do another 6 month chunk?
- Ebe: thought the plan was to keep records for ~3-6 months and then delete?
- Yes, retain for 2-3 months unless something is especially interesting (meeting notes)
- Option to save only the transcript instead? Yes, some have, some older do not.
- Could then save meetings for a year
- Everyone - share any reactions in the next 1-2 weeks before moving ahead with implementing.
Mapping check-in (45)
- Meeting discussion/Asynchronous discussion needed on mappings
- 535: Laura will confirm status is accurate
- 240: Laura's been working on this. Jian is doing a review on 130. Attributes table needs more work first. Deborah will work on it when she returns. Put any related issues on hold and make a note in the issue for tracking.
- 070: Crystal will ask Amanda Xu at the National Agricultural Library (NAL)
- 018: Laura notes it's an identifier for articles, relevant for making photocopies, but difficult to find information on its use. Adam and Ebe agree that it's not RDA, is administrative metadata - doubts about usefulness of recording the data in RDA, but no big concerns. Decision to map using a text string.
- 843: Holdings format tag that can be used in a bibliographic record. Complicates reproduction picture (isn't in any PCC documentation), but indicates a specific copy is a reproduction. Potentially useful in a scenario where a library's original is destroyed and a copy of a copy is required; though without holdings information can't say. Agreement to move this to Phase II with other 8xx tags.
- 773: Crystal wanted Cate's input on whether there's consistent usage that would allow mapping to anything other than note on manifestation. Deborah noted that previously the group decided that (for Phase I) we should map all values from the 76x-78x fields as Manifestation: note on manifestation. Ebe did 760 with Deborah as a test run. Laura can use what is in the Amended worksheet as a template. For display prefix, decided to use MARC's display constant "Main series: " - main series means something to a user, but note on manifestation doesn't to anyone unfamiliar with RDA. Updated 760 transformation notes to reflect the same. If look at in Phase II could choose to be more granular.
- 720: Why are the $0, $1, $7 subfields here? It's a standard number, but not an authority record. Why no 2? Adam thinks it was originally in the proposal but taken out for more consideration. Could be because the source is indicated within (e.g., imdb, discogs)
- $0 - source that has URI that represents the name but isn't modeled as a RWO (
720 ##$aKevin Gray(discogs)a312098
;720 2#$aThe Other Baby$4prn(imdb)co0776444
) - $1 - uncontrolled name in it and wikidata uri for person/corporate body (
##$aLiliana Essi$1http://www.wikidata.org/entity/Q19760388
;##$aTshul khrims rin chen$1http://viaf.org/viaf/22550486
) - $7 - just the provenance information
- $0 - source that has URI that represents the name but isn't modeled as a RWO (
- Anyone need help? Anyone available to give help?
- Ebe thinks she can get the rest of hers done; most are linking fields; 410/411 can probably be copied over from the 700s; may be down to the wire
- Laura working on 045. Time periods expressed in different ways, with a variety of subfield combinations. Crystal asked whether Orbis Cascades standing group have something for this already? Adam noted that the field is pretty much obsolete at this point, and that EDTF is used in 046. Sita noted is like 008, and use MARC table and link it with code? Adam noted coverage of content has no range. Group will continue the discussion in the issue.
- 765 assignment updated to reflect that is Ebe is working on it.
- Appendix J: if OCLC haven't defined it yet and it's not being used, is it possible to postpone to Phase II? Adam says put $7 off to Phase II - no one's using it, Ebe hasn't implemented it. Yes, move to Phase II.
Wrap-up (10)
- Share thoughts on Google Drive storage
- Mapping deadline is in 2 days - February 28
- Mapping review deadline is in 1 month - March 31
- Download RIMMF 6
Action Items
- All - finish mappings
- All - download RIMMF
- All - start working on mapping reviews
- Crystal - contact Amanda Xu
Backburner
February 19, 2025
See time zone conversion Meeting norms Present: Deborah, Adam, Crystal, Jian, Laura, Junghae, Doreen, Trina, Sita, Ebe, Sara Absent: Gordon, Sofia, Tynan Time: Ebe Notes: Sara
Water Cooler/Agenda Review/Roles for Meeting (5)
Updates (10)
- Respond to "asynchronous discussion needed" tags!
- See project roster updates - is everyone's job description current? Would Trina like to be added?
- Trina would like to be added and will send something over.
- Email Crystal if anything needs adjustment.
- We can use UW's Dryad for output data parking
- Crystal has ORCHID ID can use
- Appears simple, open, stable spot for initial parking
- Not editable directly, but can download, manipulate, re-version, and update it
- Size limit is per file. Will need to chunk, which is common (LC, Harvard, likely convenient for users)
- Chunking strategy needs discussion. Aggregate types, then WEM?
- Crystal spoke with Christine E from Harvard about Dataverse data and emailed Jeff M from OCLC again about using OCLC data rather than downloading from LC
- Harvard does have an agreement with OCLC - Crystal seeing if can make the same deal
- Policy looks like something UW could do too
- Sara and Doreen are graduating in June. If another institution can hire XML coders, now is the time.
- Deborah not available next week after Wednesday: back on 4th of March: finish aggpulls prior to 28th?
- Crystal, Deborah, Tynan to meet for status update
Reconciliation and Deduplication Timing (30)
-
Phase I or Phase II
-
Works, expressions?
-
Manifestations?
-
Aggregates?
-
Reproductions?
-
Subset?
-
URIs are using approach that mimics access point and appends to the end of a stub URI and attempts to dedupe Manifestation, Work, Expression that way
-
Some are creating merges that aren't the same things
-
Deborah showed an example of what is happening in RIMMF that is an issue with a video recording
- Two-dimensional moving image has additions to its soundtracks (e.g., music, speech, subtitles, closed captioning, special features, etc.) - but actual film is the same for all
- Simplest way to handle could be exclude for handling in Phase I, as have done for sound recordings
- Historically, tell if it is silent, otherwise assume there's speech. RDA hasn't raised with movie community - and whether should be one for spoken word or two-dimensional moving image. If add performed music now have moved into aggregating
-
Laura agrees this needs to be sorted out if trying to be perfect, but we're not trying to be. Stumbled on AV issues, which is just one of many dealing with
- Changed position to ok with Phase I duplicate IRIs, but tell people why we're doing it, that ultimately don't think this method is final, and is a work in progress. The substantive conversation about reconciliation after conversion should come in Phase II. It's great work, and also don't want to oversell it.
-
Adam agrees probably good to show bad data, explain it, call attention to it. Well aware in Phase I creating dupes that can't be deduped or incorrectly merging, and suggest what can be done to improve results
- What write up can be series of case studies of different transformations of what went wrong and why
-
Laura asks if it's possible to have code with version that has both options so if want to do their own reconciliation/deduping can try
- Depends on whether Cypress had coded and commented out coding that used opaque IRIs or not
-
Decision: Go with Laura's suggestion with disclaimer
Mapping Issue Check-In (15)
- Redistribution Needed? Reports needed?
- Ebe is doing an intensive review on hers this weekend. Will update early next week if need any redistributed. Has been doing work offline
- Update mapping sheets for augmentation aggregate changes #483 - Crystal taking over from Cypress, will likely need assistance from Deborah
- 751 - Sita working on, close to done
- Mapping Syntax Spec - was intended to be machine-readable, but that is now out of scope for this phase, so instructions/decisions is fine
- 765/767 - those are notes - Ebe should be able to knock those all out in a batch
- Mapping Spreadsheet for X00's - Jian reviewing mappings for 100; will investigate this issue
- 130/240 - Laura and Sita connect on what's needed/who take lead
Issues: meeting discussion needed (25)
- 770 - can be supplement to monograph or monograph except - Ebe looking at as part of her batch
- 525, 041 - discussion not needed, removed label
- 336 - question on handling $3. Will use 3xx with $3 present decision. Sara will update Decisions Index to explicitly say it is a note on manifestation
- 245 - Junghae will review
- Things to report to RSC - Laura looking for records of naturally occurring objects. Herbarium specimens in OCLC or Smithsonian? Or Harvard?
Wrap-up (5)
Action items
- All - Review own issues with "asynchronous discussion needed" tags to confirm tag is needed/accurate
- All - Respond to "asynchronous discussion needed" tags
- All - Discuss output data parking chunking strategy
- Crystal, Deborah, Tynan to meet for status update on aggpulls
- Sara will update Decisions Index with
- Reconciliation and Deduplication approach decision and
- Updated 3XX with $3 decision to explicitly say note on manifestation
- Doreen/Sara - run a small sample set of records for group data review? If don't have many issue discussions/decisions to make
Backburner
February 12, 2025
See time zone conversion Meeting norms Present: Deborah, Cypress, Ebe, Doreen, Gordon, Laura, Sara, Crystal, Junghae, Sita, Jian, Tynan, Trina, Adam, Sofia Absent: Time: Sara Notes: Doreen
Water Cooler/Agenda Review/Roles for Meeting (5)
Updates (10)
- Cypress: Code for augmented aggregates most finished and metadata for MARC record finished.
- Deborah can show what it looks like in RIMMF.
IRIs (10)
-
ITSDS could redirect web requests from https://domain-to-be-defined.lib.uw.edu/ to a web site of your choosing. The way this would work is, any request for that domain name would be redirected a web site of your choosing. As a specific example, if you used GitHub pages, a URL like https://rdf-metadata.lib.uw.edu/xyz could be redirected to http://uwlib-mig.github.io/rdf-metadata/xyz
-
Five-Star Decision: Crystal and Laura: whether to pursue five-star now or later.
-
Data Storage Concerns:
- Laura: Where will data reside if pulling from GitHub? Concerned about large records..
- Crystal: Agreed—bulk storage needed, not per-entity.
- Deborah: Need a web domain, data storage (triples/RDA registry), and domain maintenance (~maybe $300/year, possibly via donations).
- Laura: Concerned about minted IRI. Triple store?
- Crystal: Thinking about one web page, UW won't pay for triple store.
-
GitHub Limitations:
- Sara: GitHub limits files to 100 MiB; repositories should be <1 GB, ideally <5 GB.
- Crystal: GitHub isn’t viable; will explore institutional repository options (ask ITS, Denise, or Preservation).
-
Next Steps:
- Decide where to store and manage data. Crystal will inquire about UW’s institutional repository.
-
Anyone interested in meeting with ITSDS at UW about the IRIs with Crystal? Need to figure out how they will work
-
Reminder from Crystal: Complete mapping before the deadline.
Review output data
-
IRIs coming out as expected?
-
On De-duplication Challenges
- Crystal: Current deduplication approach is rushed and requires a more thoughtful method in Phase II. Mushing things together is worse than duplicated data.
- Deborah: De-duping is extremely important to show the importance of RDA (Entity-relationship). This is just a test dataset.
- Ebe: Is it feasible that we split the files and run de-duping them differently? I.e. De-dup ebooks only and not videos because of mentioned issues with a particular media? — Compromise?
- Adam: Not deduping incorrectly. Ebe's good idea where we can more reliably deduping if we can figure out what those are.
- Ebe: Even if it's bad merge, worth doing deduping. Agree with Deborah. Make it less Error-prone in Phase I, especially because it is a test database.
- Avoid premature deduplication may be preferable to ensure accuracy. Continue discussion next week or maybe a poll.
-
On MARC Metadata Storage
- MARC records are being stored as literals in RDF (note on manifestation), but it's difficult to read.
- Options discussed include storing raw MARC record, converting it to more readable format as turtle has linebreaks, or linking to an external host.
- Note on manifestation is what we have discussed before and did.
- Cypress: For review, looking at field-by-field is more helping and they are still in the comments for each field-by-field templates.
Wrap-up (5)
Action items
- Crystal will figure out if we can store datasets in institutional repository at UW. (Adam: Ask maybe Denise or Preservation?)
- Discuss the timing of reconciliation and de-duplication next week.
Backburner
February 5, 2025
See time zone conversion Meeting norms Present: Deborah, Cypress, Ebe, Doreen, Gordon, Laura, Sara, Crystal, Junghae, Sita, Jian, Tynan, Trina Absent: Sofia, Adam Time: Ebe Notes: Sara
Water Cooler/Agenda Review/Roles for Meeting (5)
Updates (10)
- Crystal is sending dataset numbers to OKG/NLG today
- Emory and DNB will not share records currently; unclear whether rights restricted
- Have not heard back from Harvard yet
- UW Libraries decided not to fill Cypress's position: if any other institutions can hire an XSLT coder for Phase II that would be helpful
- Written feedback can be emailed to Crystal
- In the meantime, Doreen, Tynan, Sara will pick up where Cypress is leaving off
Mapping Check-in (15)
- To-do vs. In Progress vs. Done
- Issue Board
- Ebe has some to look at and will start today; will let the group know if help is needed
- Linking data will all be mapped as notes in Phase I. Will be revisited in Phase II
- Laura has 3-4 issues that need some discussion
- ALL: Use labels when discussion is needed: "asynchronous discussion needed" or "meeting discussion is needed"
- Adding "asynchronous discussion needed" to 720 regarding $7
- Watch for issues with status:"Almost done - waiting for decision/answers to questions" - Cypress moves issue here if questions while coding
- Try to get what can to "Ready for Transform" this week for Cypress
- Timelines
- Mapping: February 28, 2025
- Mapping review: March 31, 2025
- Transform code: April 30, 2025
- Output review: May 30, 2025
IRIs for entities
- Identifying manifestations reliably
- Documents reviewed:
- Initial thinking was that transform would be one run; has evolved throughout project to run iteratively
- Want to reduce duplicates on re-runs, while also acknowledging that full deduplication is out of scope for Phase I and will be tackled more comprehensively in Phase II
- Deborah created Access point mapping table that works really well. Manifestations are complicated.
- Discussed and reviewed proposal to use 016, 035, 010, then AAP approach as a last resort towards unique IRIs
- Examples: from m2r iris and identifiers documentation; will add source string in the IRI to reduce instance of using the same identifier from different sources
http://marc2rda.edu/fake/transform/man#00037837
http://marc2rda.edu/fake/transform/man#ocolc1544994
http://marc2rda.edu/fake/transform/man#speakingofjaneausten1980universitymicrofilms
- Deborah proposed adding normalized AAP string to lessen number of hits deduping
- Gordon agrees this is the best solution - suggestion to add control numbers more likely makes it unique
- Switch thinking a bit and decide which components AAP-bit should be, then translate from ISBDM to MARC codes. Will get alarmingly large IRIs, but they will be more likely unique
- New Manifestation IRI Proposal:
- AAP + Control Number approach:
- normalised({has title proper})|[supplied title] + " (" + {has date of creation of manifestation}|{has date of copyright of manifestation} + "; " + {has creator agent of manifestation} + "; " + {has category of carrier} + ")" + (“+ {BNB#####}|{OCLC#####}|{LCCN####})
- Find carrier type in the mappings*
- For items, want a unique IRI every time transform runs
- With XSLT the generated ID is unique during the run, but on reruns will get the number being used again - so danger of getting incorrect merges
- Currently use manifestation ap when minting IRI to help prevent duplicates on re-runs
- {BASE}{RECORD}ite#{manifestation ap}{generated_id}
- e.g. http://marc2rda.edu/fake/transform/ite#00514962d22e1607
- http://marc2rda.edu/fake/transform/ite#cassandra%27ssister2006walkerd22e163598
- Cypress proposes using date instead of manifestation ap to better ensure unique
- Cypress will implement, and team can review output
Jane Austen data more fully
If time, look at- Reviewed jane-austen_NA.ttl
- Can tell deduped to works when see multiple 008s and 245s - indicates there were multiple records
- Looked at line 72961, marc2rda.edu/fake/transform/exp#aikenjoan1924-2004eliza%27sdaughterenglish, and how to trace what's there through the files
- Cypress will create a discussion for this review
- Cypress will update the lexicalalias files today
- There is no limit on the length of the title in 245
Wrap-up (5)
Action items
- A survey asking about the important decisions from today?
- Crystal will send dataset numbers to OKG/NLG today
- Cypress will implement using date instead of manifestation ap in item IRI
- Cypress will create a discussion for this review of the Jane Austen data
- Cypress will update the Jane Austen lexicalalias files today
Backburner
January 29, 2025
See time zone conversion Meeting norms Present: Absent: Crystal, Gordon, Adam Time: Notes:
Water Cooler/Agenda Review/Roles for Meeting (5)
Updates (10)
- Cypress' last day is February 13th.
- Priority is getting documentation out so that others can pick up
- See google drive folder below:
- Transform documentation is here (and in progress). This includes how certain aspects within the transform work, as well as broad overviews, a transformation intro for onboarding, and instructions on running the transform.
- Also available as a Read.me in the folder for the transform
- BIBFRAME Update Forum - might be interested in the "Modern MARC" section which LC's back- converted BIBFRAME will follow. https://www.loc.gov/bibframe/news/bibframe-update-jan2025.html
IRIs (25)
- IRI transformation documentation
- Discussion - Minting IRIs
- Discussion - Designing our IRIs
Deborah and Cypress met to discuss what the transform is currently doing, we need to decide what we want it to do
- Minting IRI’s versus using external IRI’s
- MAIN WEMI
- At present the IRI is constructed as the base IRI + control number for 001 + type of entity
- When we are sharing records from a variety of sources, we want to prevent having the same IRI applied to a different entity from a different source
- This wouldn’t happen internally because our control numbers are unique to our system
- We should be applying the same instructions from related entities to the main entities
- Related RDA entities
- e.g. Creator of work, or a work that another work is based on
- Unreliable to map these related works and expressions (agents are fine)
- We only map every work-added entry as a work
- We don’t have many approved IRI sources
- In NACO authority file, for example, we have only approved for corporate bodies, families, and places
- The sources need to be improved so that they can be approved
- When using $1 with an approved source: the only thing we would add to an external IRI is a relationship to an access point
- We add a triple with “has access point” or “authorized access point” along with the string or the nomen
- This is important for related entities because we can’t trust what’s in the MARC data
- For the main entry: the attribute information we have may not be available in the related IRI description set
- Brief detour to Jane Austen and RIMMF:
- We have duplicates – these should all be a single entity for that work
- The duplicates are coming from related work added entries
- We are giving bib control number, no meaning in a display
- We have many records with Jane Austen as the title when you bring the records into RMMF and try to show them in an index form
- $2 is similar to $1, but we have a source for the literal in the MARC record
- The source is approved – same list
- We don’t have an external IRI
- Pattern for minting our own is in the document
- Authorized access points have to mapped so that they can be used as concatenated, normalized string
- Purpose is to do some automatic deduplication: if the entire RDA triple is the same, it automatically de-dupes
- Worse case/most common: neither $1 or $2 are present:
- We mint a non-meaningful IRI that appends a running count at the end
- We are ending up with many duplicates
- Sofia: If two records describe the same work, but with different information, in the future it might be hard to map them
- Deborah: We’ll either be mapping the two description as separate work entities
- Or we’ll have found a way to make them match using the local part of the IRI (instead of using the 001 and entity label); we may instead use the authorized access point for example
- Taking the mapped work from two different records will have the same IRI from subject – if a triple-source is absolutely identical, only one is kept automatically
- Laura: if the source library has been doing authority maintenance, then the access points will be the same
WEM Access Points (25)
- Jane Austen records in RIMMF
- Access points table
- Are we creating access points for the main work and expression? Gordon said that the identifiers are sufficient
- Last week, however, we thought it might be important to have access points displayed for human readable purposes
- Agents have been done (100, 600, 700)
- When mapping over person-entities from authority file, what did you do at NLG about presence of fictitious characters?
- If fictitious characters used as pseudonym, treat as nomen (e.g. related nomen or work)
- If it is a subject: treat as skos:concept
- We need some list of texts
- If we only provide as an access point for the person, corporate body, or family, then we aren’t doing something against RDA, can be processed with human manual intervention
- For 100’s and 700’s, then we understand it is a nomen used by a person
- We can understand it as a related nomen – at NLG they used related nomen as the element instead of creating an access point to the person you’ve created the entity for
- This may need to be a phase II problem
- Single works: names + titles if they are in a 600 or 700
- If a person is a subject in a 600, it is still the same person
- Source is subject heading, but person is same entity – a person
- Cypress: should we map from the 130
- The table is written in order of priority – if the 130 is there, use it, if not, move on and use 100 + 240 etc. up until using the 245
- We get the access point being described by the record from the fields and subfields in the left-hand column
- We are putting together an online poll to hear from those who couldn’t make it today
- General consensus from the group present today is that this is ok
- Must retain the order of the sufields given; we strip all of the punctuation for this purpose and decide what to use between subfields later
- LC is stripping out ISBD punctuation
- For expressions (single expression in this manifestation) the access point for the expression will by the work plus the RDA element for the expression
- If we only have 245s to rely on, they will never contain expression elements from the heading
- We have to find them from the body of the record
- We can find this in the spreadsheet mapping
- Manifestation:
- No access point in ACR thinking
- We do what ISBDM is using in the same order and not worry about punctuation at this point
- We need to make a decision on this before Cypress leaves!
- If we are in agreement, Cypress can work on it and then add it into the code when we get final approval
Nomens for Entities with Sources (15)
- "A nomen must be an appellation of one and only one RDA entity", when we are saying that one Entity exists (i.e. http://marc2rda.edu/fake/lcsh/place#england) should we not also be able to say that there is only one nomen for a place from lcsh with the nomen string "england"?
- From RDA: a nomen is an appellation of 1 and only 1 RDA entity
- But “England” has 100s of unique nomen entities
- Nomens have IRIs, but no IRI as identifier
- We have a place and many authorized access points for place, “England” from lcsh
- Instead of this list, we would have one de-duplicated one that has an authorized source
- e.g. a place nomen for an approved place entity
- i.e. use the nomen string as the local part of the created IRI along with the source
- Only sources where we know the authorized access point is unique and has an identifier that is the same, then we can use it as a unique identifier
- This applies to any unique nomen string
- For our approved sources, can we say the access point will be unique?
- Looking at the LC NACO, we have approved LC’s authority file for place names, but not for persons
- Place names go through the subject path, not the name path
- i.e. goes through SACO, not NACO
- Even if in bibliographic record we see a jurisdictional place, it has a different indicator, so we can create a corporate body
- Comes down to the principle of having only created one entity
- In principle, each of the authorized sources has a uniqueness in the strings that are used
- Conclusion: we are okay implementing this, but if we run into issues, it can be undone because we can edit the one function in which the process is implemented
- But we should also bring it up with Gordon
Wrap-up (5)
Action items
- A survey asking about the important decisions from today?
Backburner
January 22, 2025
See time zone conversion Meeting norms Present: Jian, Sofia, Adam, Crystal, Cypress, Deborah, Sita, Tynan, Sara, Ebe, Junghae, Doreen, Laura Absent: Gordon Time: Tynan Notes: Sara
Water Cooler/Agenda Review/Roles for Meeting (5)
Updates (10)
- Check-in on deadlines: Reminders to do mappings and reviews ASAP so transformation team can get work done
- Mapping: February 28, 2025
- Mapping review: March 31, 2025
- IFLA coming up soon
- Crystal and Sofia presenting on project in March! Continued progress helps with what can put together to present.
- Ying-Hsiang handing off Wikidata code to Cypress and Tynan on Friday
- Laura shared Harvard is publishing Alama data CC0 via Dataverse
- Fairly recent - February 2022
- https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/I8L0ZZ
- Crystal will reach out to Christine Eslao and ask about reuse, re-publication, mixing & matching, whether can follow suit (e.g., if they're doing this from OCLC, can we?)
Dataset Numbers: Crystal needs to send to OKG/NLG this week (10)
- LC: 500K random records (Crystal emailed Theo for tips on how to download; once hear back from Theo will reach out to OCLC)
- UW: 545k records (original UW-authored records; same provided to LD4)
- NLG: 700k records
- NLNZ: 600k
- Emory: TBD (Laura has reached out and is waiting to hear back; Crystal meeting on Homosaurus topic and can ask then)
- DNB: TBD (Sita and Crystal have been emailing; Sita will update when knows more)
- How will be used:
- Making a proposal to the Ministry of Education in Greece to try to secure a grant to expand Wikibase to meet our space needs
- Proposing ideal amount of space needed - a conservative estimate of all space will eventually need
- Estimate will be based on a rough estimate of entities per MARC record (rather than triples) - this is the pressure point and balloons fast
- To explore more, read the meeting notes available in the Side-Meeting and Conference-Report Back Notes folder
Transforming Augmentation Aggregate Records (25)
-
- Deborah added most updates in the Recommendations section, added examples at the end of the Appendices, and added logic for identifying Augmentation aggregate manifestations under AggPulls.
- Outstanding questions section is for future consideration and discussion by wider community
- Also added additional material in the UW M2R Transforming Augmentation Aggregate Records file linked under Diagrams.
- Initial thinking on SES (string encoding scheme):
- CToRE = Content type of representative expression
- LoEoRE = Language of expression of representative expression
The SES for an augmented single work should be the same as for a stand-alone single work, taken from (in order of preference):
- 130
- 1XX + 240
- 1XX + 245
- 245 + 1st 7XX (name portion only)
What should the SES be for an aggregating work plan?
- 130 + Aggregating work + 1st 7XX + CToRE + LoEoRE
- 1XX + Aggregating work + 240 (if 1XX is aggregator) + 1st 7XX + CToRE + LoEoRE
- 1XX + Aggregating work + 245 (if 1XX is aggregator) + 1st 7XX + CToRE + LoEoRE
- 245 + Aggregating work + 1XX + 1st 7XX + CToRE + LoEoRE
- 245 + Aggregating work + 1st 7XX (name only) + 1st 7XX + CToRE + LoEoRE
- Need to make a decision again on whether or not are making access points to make it clear - add this to next week's schedule and then also make time to implement it
- Cypress noted from 2024 meeting notes that the discussion was we already have identifiers, so don't need access points
- Crystal noted identifiers count as an appellation
- Crystal's opinion, in advance of being out next week, is that we should have access points, though doesn't have an opinion on the SES
- Deborah's preference is that creator is always linked to aggregate, and then also creator with aggregated work if described
- Laura notes: "For the “augmented work” - bear in mind, the Work data may describe many expressions and adding this content type (Primary augmented work, or augmented work) to that Work entity is therefore questionable. It might be just an aggregated work in another manifestation, and standalone in another one."
- Added to discussion to continue discussing
- Ebe notes: "Personally I would like a string encoding scheme where we put the title first and use the creator as the qualifier." e.g. Animalia (Graeme Base)
- Noted that RDA doesn't require us to do this, historical practice rather has
- Deborah sees what saying, but still prefers to keep the entire string
- Laura thinks access points are useful, but qualifiers make them more useful
- Cypress will add in property numbers alongside "Label (Toolkit)" to make it easier for the transform
- Category: only add Aggregating work (not Augmented, since won't always be true, therefore not safe)
- Laura worries might be confusing to user as part of access point
- Adam notes since we're discussing access points, and not authorized access points, they can be undifferentiated
- Crystal notes we need a SES if we want to include access points
- IFLA did manifestation and work access points
- Manifestation SES
- Work SES
- Crystal thinks should use this for SES
- Deborah notes they don't have one for Aggregating; Crystal wonders whether unique access points are needed; RDA does have instructions for qualifying access points
- Definitely can qualify - but question of whether to do it in the same order. Need a survey
-
Is it possible to transform the way Deborah suggests in document?
- Transform perspective is just concern on time needed to implement. Cypress would like to start by February to make sure it's working properly
-
Any substantive objections?
Attribute mapping questions
- Row 2: person is 0 or 1; 2 included in code just in case it occurs, which should be unlikely; if not 3, then know it's a person
- Row 5: series of different mappings for dates; if have a mapping for some of them, should they also map as related timespan of person? e.g., use date of birth and timespan as same?
- Not minting timespans; this isn't a note, just a value
- Jin shi (進士) and ju ren (舉人) dates need to be added somewhere; closest it maps to is period of activity (see: CJK NACO Best Practices)
- Adam shared a jin shi example: 100 1 Bao, Rong, ǂd jin shi 809
- Jian shared a ju ren example: Chen, Denglong, $d ju ren 1774
- What if there's a date with no hyphen, how to handle? Deborah suggests related time period; may need to keep for dates with errors and no qualifier
- Sofia asked does date of birth accept values like 'circa 1500'? Still needs to be taken into account. What to map to? They should still include hyphens
- Adam shared examples, noting a hyphen in front means it's a death date
- Aaron, ǂc of Zhitomir, ǂd -approximately 1817
- Aaron, W. F., ǂd active approximately 1860
- Abate, Nicolò dell', ǂd approximately 1509-1571
- ʻAbbās ibn ʻAbd al-Muṭṭalib, ǂd approximately 566-approximately 653
- ʻAbd al-ʻAzīz Muḥammad, ǂd 1866 or 1867-approximately 1948
- Adam shared examples, noting a hyphen in front means it's a death date
- ALL: post examples in the issue
WEM Access points - Are we doing them? (30) - Decided to move this discussion to next week
- Currently meeting RDA minimum description requirements, with W and E having identifiers generated from 001, and M having a title.
- 130 and 240?
- Access Point Mapping Table
Wrap-up (5)
Action items
- Crystal will reach out to Christine Eslao at Harvard regarding their published Bibliographic Metadata
- Cypress will add in property numbers alongside "Label (Toolkit)" in Deborah's Document
- Crystal will create a survey on handling/qualifying access points and add Cypress as an editor to see results
- All to post examples in Attributes table issue
Backburner
- WEM Access points: Next week
- RIMMF Demo: Next week?
January 15, 2025
See time zone conversion Meeting norms Present: Crystal, Adam, Deborah, Laura, Ebe, Gordon, Jian, Junghae, Sara, Sita, Doreen Absent: Cypress Time: Sara Notes: Doreen
Water Cooler/Agenda Review/Roles for Meeting (5)
Updates (10)
- Release 5.2.0 of the RDA Registry downloads was published yesterday (14 Jan 2025). The release notes say 'This release supports the February 2025 issue of RDA Toolkit. This release contains several new object elements with a range of skos:Concept.' The object elements were added following a suggestion from this project to the RSC Technical Working Group, and should already be in use within the transform. We should check the object elements used by the transform against this release.
- Crystal: Will be gone on 1/29. Cypress will facilitate the meeting.
- Crystal will get back on uploading meeting recordings.
Transformation Dataset (15)
- Our initial transformation is happening soon
- NLG and OKG need information about how many records, from which institutions, in order to make their proposal to the Ministry of Education for the Wikibase expansion we asked them to do
- See: RDA Wikibase Collaboration
- Which records from LC will we include?
- Obtaining the "entire" catalog (or close enough): 2016 selected datasets, plus downloading post-2016 records 10k at a time from catalog.loc.gov and deduplicating those that are just updated post-2016.
- Obtaining a certain number of records from catalog.loc.gov, 10k at a time, and downsizing our initial goal
- Yes to this option. 500k random records.
- OCLC export? Crystal could ask them how they would feel about participation. Don't know about rights.
- NLG: 700k
- UW: count number of UW-authored records in Alma (Junghae will check and share indication rule with Laura) Crystal would like to know how many exist and want all of them. Answer: 544,316 bib records in Alma for which the University of Washington (UW) is the original cataloging agency
- NLNZ: 599923
- DNB: Sita will ask about willingness to participate
- Emory: Laura will check
- Would like to receive answers to these questions by next week so Crystal can get back to our partners.
- Additional Discourse: Adam: Can we make a list of records we want in the sample pool? If LC doesn't have them, we can add to it.
- Crystal: Random sample is the safest way. We can establish criteria for what to include. This sample set is for Phase 1 and 2 (including aggregates and non-aggregates).
- Can do this again at the end of Phase II but this is good now.
Linking fields (30)
- Linking fields discussion
- Deborah's table for these fields: Linking Entry Fields.20250115
- Examples: Linking Entry Fields Examples
- Gordon proposed: trying to do related entity --> Deborah whether we can do that?
- WEM Entity column shows which WEM entity is this linking entry field suppose to carry information?
- Even for ones that should be clear, examples are mixture of description of work and manifestation (Because there are no restrictions)
- I.e. 770 Could be expression? Could be work?
- 765 – Should be expression but examples given are expression or work --> can’t tell whether linking entry is for expression, a single-part, or multi-part or aggregating part.
- Deborah’s research shows that similar to added entry fields where we came up with a default (related work of manifestation), the best Deborah can come up with is Manifestation related manifestation of manifestation.
- We could trust folks and say it must be a series. Everything that’s not a series is an error
- Adam: Series can include multi-part monograph --> Deborah: would you put it in linking entry fields? --> Adam: If it can be done, someone has done it. Deborah: Similar to 830s we cannot tell, this we cannot tell.
- What is the purpose of linking entry fields??? --> Then what do they meant in RDA???
- Adam: Meant to link you from one bibliographic record to another bibliographic record --> Literally meant to provide links but never really used that way. No $w because there isn’t actually a bib record for the related work.
- Adam: Multi-prolonged approach; if there’s more completed data, do one thing but for no information ones do a note. --> Takes fair bit of coding --> Crystal: Make more sense to do these as notes for Phase I and say in Phase II do something more granular
- Majority votes map as notes
- Gordon: anything that is a note on manifestation MUST apply to all exemplars of the manifestation.
Transforming Augmentation Aggregate Records (20)
- Document is in Aggregates Main Folder > CW_DW_AM_Markers.20241113 folder: Transforming Augmentations.20250108.docx
- Did not have time to address. Bring questions for Deborah after reviewing Deborah's document next week and Cypress will be here for the full discussion.
Wrap-up (5)
Action items
- Crystal will upload meeting recordings to Drive; apologies for lagging behind on this! (This is done!)
- Crystal will create a discussion on transformation augmentations (This is done!)
- Review Deborah's Transforming Augmentations document and bring questions to discuss.
Backburner
- WEM Access points, RIMMF Demo
January 8, 2025
See time zone conversion Meeting norms Present: Absent: Notes: Tynan
Water Cooler/Agenda Review/Roles for Meeting (5)
Updates (10)
- Ying-Hsiang cycling off project, arranging handoffs soon. Thank you for your incredible contributions, Ying-Hsiang!
- Doreen is primarily working Fridays and in the mornings during the rest of the week now, and 15 hours per week rather than 19.5 this quarter
- We now have Cypress full time (not all on M2R, but more than before)
- Crystal will miss the last meeting in January
- We had a long follow up discussion regarding 773 (I'm sorry, I lost the notes in a conflicting edit, will go back to the recording to augment), but we decided to discuss next week, so we can dive in further then
- Crystal heard back from Theo at LOC, catalog is not free unless you use an outdated version, not a lot of RDA in it; you can get 10,000 records at a time through the catalog; if we went to the catalog and did 10,000 records at a time we can get as much as we want, although they block bots from doing this; the system will slow you down if you try to automate it; this has to be done manually or by a slow program; they also have a way to purchase the catalog, but it's very expensive (e.g. $25,000!); Theo recommends downloading 10,000 at a time and compile a dataset of 100K should be enough
- Deborah: download bulk from 2019 and use the 10k at a time approach for the rest; we would need to de-duplicate the records
Project Plan Review and Update
Project Overview
- Problem statement: adding a need to mention differing and non-interoperable ontologies
- Goals:
- Deborah: one of the things in the impact should be a description of the entities and their relationships -- this is the main new thing in RDA
- Sofia: move from record-based cataloging to entity-based cataloging
- Impact discussion
- How much is a large pool? The available bulk download is from 2019; we can download records 10k at at time
- Laura: we can talk to Jeff at OCLC; where would we host the records -- National Library of Greece Wiki?
- Would give us a better picture to give people than just using LC's record; could also discover things about the transformation
- Decision: add this to a discussion for next week
- Sofia: wikibase database has size limits, asking how to make the storage bigger
- Are we reducing dependency on vendor systems?
- Laura: in order to demonstrate this reduced dependency, we have to use it in a system that is not a vendor system and provide library services off of it
- Rephrase to reinforce commitment to open-scholarship
- Laura: main impact is to demonstrate that RDA can be implemented using RDF directly; there is a path for adopting it for libraries that have a large legacy store of MARC data
- Ebe: if someone doesn't want to use RDF, but wants to use something else -- should we be specific about the type of encoding?
- Decision: we don't want to promise that we can help people encode another way
- Phase I
- Java extension is not in phase I anymore
- Instead for phase I we have moved on to having pre-approved iri sources
- Ying-Hsiang, send documentation to Cypress and Tynan for scripts to feed Bibliographic into Wikibase Cloud
- Java extension is not in phase I anymore
- Post-Phase I close-out
- We may not need to justify phase II, UW libraries approved
- We can think about grant applications to support phase II,
- We may also consider submitting to additional conferences
- A composition that describes in a granular way what we did for Phase I, why we did it, what the results were; goal to get this published somewhere
- Deborah's project plan is a good outline for this
- We may want to have an open-source version of this to make information more accessible
- Phase II
- Collection records
- What will we do with collections? We are pulling them out of phase I; what does RDA need for collections?
- Item-level mappings -- not part of phase I, will be part of phase II
- CSR
- You can have diachronic works that fall into a BSR (multipart monos/series)
- Removing machine-readable mapping -- we don't have the capacity for that right now
- BSR
- Guidelines for pre and post processing -- part of our documentation in phase II, we have python scripts to serialize
- Collection records
- Timeline
- Close-out is June-August of 2025
- Start Phase II in August
- How much time do we need for review and re-coding? We need to extend the deadline for ending phase I to April 30th
- Mapping done by Feb 28, 2025
- Mapping review by Mar 31, 2025
- Transform code by April 30, 2025
- Output review by May 30, 2025
- This means starting phase II in September
- Deliverables