Transform Meeting Running Notes - uwlib-cams/MARC2RDA GitHub Wiki

2025-07-10

Present: Deborah, Cypress, Abhignya, Tynan, Matt
Absent: Crystal
Notes: Cypress

Announcements

Check-ins

  • Abhignya - will work on reviewing the 385 mapping and begin coding
  • Cypress - has not done much the last week - was on vacation. Did meet with Deborah to discuss coding.
  • Tynan - First pass of 046 with some questions. Will try to meet with Cypress. Free to take on additional work.
  • Matt - Also on holiday. Updated 773 punctuation.
  • Deborah - Working on 33x mapping tables and 041 mapping tables.

041 Transformation

  • Formulate URIs rather than doing lookup table for ISO codes
  • MARC/LC lookup file already set up - in Lookups/LC once load lookup files is loaded.
  • Which coder can take this on?
  • Notes from yesterday
  • Tynan willing to take this on. Matthew, Deborah, and Cypress willing to advise.

33X $2

Escaped/protected characters in XML

  • Issue with loading into RIMMF: Deborah checking in with Richard prior to meeting
  • Discussion from yesterday
  • Unsure what the problem is, we will have to keep an eye out.

Issue/Task assignment

  • Tynan to work on 041 - Deborah will try and get that ready for Tynan by Tuesday
  • Cypress will work on 33X
  • Matt will self assign work as he has the time
  • Abhignya will continue working on 385

Action items

  • We can't forget about aggregates!
  • And properties in the wrong entities.

2025-07-03

Present: Crystal, Deborah, Abhignya, Tynan
Notes: Crystal

Announcements

  • Richard has had success running the transform using PC, MAC, Linux OS (Linux is fast!)
  • Has been reporting issues he finds as he goes along
  • Richard making a script for Deborah to run the transform with one button-push (probably not using Python/Java)
    • Could make this available to mappers

Check-ins

  • Tynan and Abhignya will meet to run the transform after the meeting today (Crystal will get them records ASAP)
  • Crystal: 385 mapping and record selection today.
  • Deborah: break from 336/338 tables. Matthew working on mapping them, Cypress has parts to do with them. With Cypress gone, leaving them behind for a bit. Moved on to working on 041. Needs different modes. Single expression, augmented, aggregating. Complications with mapping subfield to ISO codes. Would coders like to help design and then map from it? Abhignya can help next week (7th-11th). Abhignya will email Deborah to set up a time. If they can't find a time they'll email Crystal and she will find someone whose schedule works with Deborah's :) Progress on getting feedback from Gordon on $0's and $1's? She was going to write something up and hasn't gotten it over to Deborah yet. Crystal will check in with her about it next week.
  • Abhignya: less time to work on xform this week. Emailed about 385 (ball in crystal's court for re-mapping)
  • Tynan: doing 046 (pretty long), about 2/3 done, could get some preliminary output today. Less confident about constructing IRIs especially timespans. Would like feedback on those.

Sarah question about 083

  • Also applies to other classification fields. Anything using DDC at least (example, 082).
  • Can Cypress answer these questions?
  • Crystal needs to get the NLNZ file, it would help with this and we need it anyway
  • For now, we should just use the $2 value to identify the scheme as a code without trying to guess the id.loc IRI. Revisit in phase II. Check this solution with Sofia and Cypress.

2025-06-26

Present:
Notes:

Announcements

  • Richard is running transform on Saxon: missing namespace declaration. Cypress needs to push those and it'll be fixed
  • Crystal and Adam both out of town at ALA
  • Cypress unavailable July 3-7
  • Matthew out July 2-8

Check-ins

  • Deborah still working on 336 and 338, needs to get back to 041. Content type, carrier type, language searching.
  • Cypress started looking at access points for WEM, doing bug fixes, been able to identify bugs from Richard running transform :)
  • Matthew still working on 336 and 338, hopefully complete soon
  • Abhignya completed 385, working on 082 now
  • Cypress can disregard tag in 338 field

Source codes with appended language codes

  • discussed yesterday: topic came up because of 336/8/7 mapping tables
  • sometimes people add a $2 + /[MARC language code]
  • not in the tables Deborah has been making
  • maybe we just need to add to coding for $2
  • what do we do with foreign language terms in $a?
  • lookup table is in lookups rda folder in the transform--it gets queried and downloaded
  • run load lookupfiles.xsl to generate lookup prior to running the transform for RDA Registry vocabularies and all labels from RDA Registry XML file. if that fails, then use table Deborah has been working on. If those both fail, then give as a datatype. Run loadlookup files the first time you run a transform, and anytime you want to refresh your local lists to ensure they are up to date.
  • in functions.xsl, there are code lookups done previously for 337/8/etc. those will be helpful for Matthew to adapt for current use.

0's and 1's from yesterday

  • "For the time being, we will follow pattern set by 630 and 730 fields and mint our own IRIs, using $0s and $1s as identifiers." should be changed to remove following pattern set by 630 and 730 fields.
  • Making a bunch of statements about someone else's IRI isn't good RDF practice. If we want to add things about entities, we will need to mint our own IRI and link to external IRI.
  • Is it best to use owl:sameAs? For approved sources this makes sense. For unapproved, continue using identifier as the relationship.
  • Should we just mint our own IRI from go, in case we ever want to say something more about these entities?
  • Ask Gordon for clarification on when to mint our own entities when all we're adding is an access point.

Richard running transform

  • We need to be running tests and making sure we haven't broken the transform locally before pushing changes
  • Pushing tons of data through helps us to catch problems we haven't seen before, which is great
  • Issue with RDA Registry vocabulary
  • Most things we can account for in the transform
  • XML doesn't validate IRIs so it won't catch things like IRIs with spaces in them, etc. When we serialize, that's when we catch IRI errors.
  • When code refers to RDA curie but hadn't been added to namespace declaration, would it be good to have namespace declarations at the top of all these fields even if you'd never use them? That hasn't been consistently done, part of learning code, but it would be good at some point to go through and add them to every file. Maybe Sarah or Abhignya could do this? Could copy-paste from namespace file. Needs time and place added.
    • At top of each xsl sheet, there's namespace declarations. They all should be there for each stylesheet, but they're not. Cypress will go through and add them all to 3XX, Abhignya will copy that and add it to all the files.

Wrap-up

  • Abhignya will email Cypress with questions and to look over things she's done once she's finished with them

2025-06-12

Present:
Notes:

Announcements

  • Doreen is just about done, and Sara has moved on from the project. Happy graduation to both!
  • Next week is Juneteenth, so we won't have a transform meeting

Check-In

  • Crystal added BSR/Non-BSR labels to issues ready for xform
  • Doreen finished up more field by field coding. Unassigning things she hasn't started yet
  • Matthew: 338 lookup table, made code, being reviewed
  • Cypress reviewing other peoples' code, giving feedback and answering questions. Will start on access points soon. Tag Cypress multiple times if she doesn't respond in a week or so--things getting buried in email
  • Abhignya - made changes to 385 Sara asked for and sent it back to her
  • Deborah - working on 33X mapping, setting up examples for reviewing coding. Hopefully can be a pattern for 3XX/4XX's. Sample file of 338 test fields in issue page (first one was 336 by accident). Cypress working on access points and IRIs once lookup table is present for these kinds of fields. Where is test output for headings mapping table? For works and expressions there's one file in test input/output. Testing is done field-by-field, in their own files. For agents, testing is being done locally using test input for relevant fields. Finding content, media, and carrier type values in other fields when the record doesn't have a 336 etc. Worked on sample repo from yesterday. Is DBPedia useful for checking results? Cypress left early and will check recording/check it out here. Should be updated with a new file. Input is here, needs to be run again through the transform to update DBPedia. Still stuck on aggregates filtering, putting it off until field by field is done. Pass remaining aggregates pieces on to Cypress, she can take a look. Database is using free copy of GraphDB. Number of statements is unlimited, but you can only have five repositories running at the same time. Can't run more than two SPARQL queries at the same time. Paid version is expensive, but there's an educational discount available. 113K marc records = 5.5 million statements. Number of items/entities was pretty accurate estimate. Item triples showing up in other entities (Deborah will send examples to Cypress). Datatype and object properties aren't being consistently implemented throughout the code. Any found should be fixed to use datatype/object properties. In IRIs, 240 is being tacked onto 130 for IRI? (Deborah will create a new issue with output as an example, then coders can fix it). Single look up for 336/337/338 (maybe for phase II) to catch errors.

New student coding feedback post-Doreen and Sara

  • Matthew and Cypress can look when they have time. Coders should tag Matthew and Cypress when code is ready to review or they have questions

Action items

2025-06-05

Present: Matthew, Abhignya, Cypress, Deborah, Doreen, Sara
Notes: Sarah, Crystal

Check-in

  • Sara: Worked on generating and debugging the large AI-generated MARC record. Worked on 033, especially around date EDTF formatting using the LOC function.
  • Cypress: Worked on heading mapping table. Created test XML files for testing fields and attributes. Identified and fixed transform-breaking errors.
  • Doreen: Worked on field-by-field coding.

Transformation Discussion (Copied from Wednesday Meeting Minutes)

  • How is onboarding going for Matthew, Sarah, and Abhignya?
    • Everything is good! Cypress offered to meet with Matthew. Abhignya and Sara will go over and review Abhignya's work together.
  • One Big MARC Record, requirements, effort, and timeline
  • Processing for reviewing coder output, using a label, noting the ask and code, reviewers assignment
    • Will have review requested label to signal to mappers for review
  • Free Google AI Studio can handle the request; free ChatGPT cannot

  • Due to output limits, multiple requests in the same prompt/chat are required to get the full record

    • For example, will stop in the middle of 044, and then request "can you provide the rest of the record starting from 044?"
    • 1st iteration took less follow-up requests than 2nd iteration
  • Estimate 1st iteration took under 90-120 minutes from prompt creation, prompting, combining responses. 2nd iteration probably closer to under 60min (not including the human missing a closing tag)

  • 1st iteration: comprehensiveTextualMaterial.xml

    • System instructions: You are an expert in MARC21 standards and formats. You help create complex MARC records so colleagues can use them to test a variety of scenarios.
    • Temperature: 1.5
    • Model: Gemini 2.0 Flash-Lite
    • Attachments: MARC 21 Format for Bibliographic Data_ Field List.pdf; marc bib test record (no fixed fields).mrc
    • Prompt: The attached files include the MARC 21 Format for Bibliographic Data Field List, which lists all valid and obsolete data elements that may appear in MARC 21 bibliographic records, and a single MARC record that contains every MARC field and within each field every subfield, but it does not contain any indicator data and does not include the 000-008 fields and does not contain repeated subfields.

      Given the attached files for context:

      Create 3 single-record MARC examples, each in a different format (e.g., text, video, map). The use is for system testing and validation. The goal is comprehensive exhaustiveness, rather than bibliographic correctness. They do not have to be realistic; it is more important to have the structure even if using placeholder text

      Within each record:
      Include proper control fields (e.g., Leader, 000 + 008, 006, 007) for the format
      Include all fields: e.g., 0XX, 100, 600, 700, 800, 400, 110, 610, 710, 810, 410
      111, 611, 711, 811, 411, 130, 630, 730, 830, 440, 2XX, 3XX, 5XX
      Include all indicators: e.g., 0-9 or a space for blank
      Populate every possible subfield, even if semantically wrong
      Repeat every subfield, even those that aren’t technically repeatable
      Include any data elements that are now obsolete

      The output should be downloadable file of MARCXML
  • 2nd iteration: comprehensiveMarcRecord.xml

    • Only Prompt changed: The attached files include the MARC 21 Format for Bibliographic Data Field List, which lists all valid and obsolete data elements that may appear in MARC 21 bibliographic records, and a single MARC record that contains every MARC field and within each field every subfield, but it does not contain any indicator data and does not include the 000-008 fields and does not contain repeated subfields.

      Given the attached files for context:

      Create 1 single-record MARC example, that includes every format (e.g., text, video, map). The use is for system testing and validation. The goal is comprehensive exhaustiveness, rather than bibliographic correctness.

      Use the labels of the subfields as the placeholder text. For example, in this MARC the label for field 010 subfield a is "LC control number (NR) " and subfield b is "NUCMC control number (R) ".
      <marc:datafield tag="010" ind1=" " ind2=" ">
      <marc:subfield code="9">LIBRARY OF CONGRESS CONTROL NUMBER (NR) </marc:subfield>
      <marc:subfield code="a">LC control number (NR) </marc:subfield>
      <marc:subfield code="b">NUCMC control number (R) </marc:subfield>
      <marc:subfield code="z">Canceled/invalid LC control number (R) </marc:subfield>
      <marc:subfield code="8">Field link and sequence number (R)</marc:subfield>
      </marc:datafield>
      Using the labels as values makes it easier to trace errors in testing. It is also important to have the structure using this placeholder text, rather than to be realistic.

      Within the record:
      Include all control fields (e.g., Leader, 000 + 008, 006, 007) for every format
      Include all fields: e.g., 0XX, 100, 600, 700, 800, 400, 110, 610, 710, 810, 410
      111, 611, 711, 811, 411, 130, 630, 730, 830, 440, 2XX, 3XX, 5XX
      Include all indicators: e.g., 0-9 or a space for blank
      Populate every possible subfield, even if semantically wrong
      Repeat every subfield, even those that aren’t technically repeatable
      Include any data elements that are now obsolete

      The output should be downloadable file of MARCXML
  • Some issues with AI-generated filee: Bad IRI values (i.e. spaces in URIs), repeatable vs non-repeatable fields, incomplete indicators in outputs.

  • Deborah will ask Richard to see if he is willing to create the MARC record test files, as it may be more time-efficient.

  • Deborah: For testing purposes, we should focus on breaking down MARC records into smaller chunks, like 0xx, descriptive fields, leaders, etc. We should also test specific record types like 06 and 08 separately.

  • AI-generated files can be used as a resource for mappers and coders to grab test input for specific fields but data should still be manually verified.

  • While AI experimentation is interesting, we should prioritize completing the mapping/coding work.

2025-05-29

Present: Crystal, Sara H., Cypress, Matthew, Deborah, Doreen, Abhignya
Notes: Sara H.

Check-in

  • Sara: x10 & x11 and big test MARC record
  • Doreen: field by field coding; mode documentation and added thoughts/comments; met w/Sarah C for shadowing and questions on 246 to agenda
  • Matthew: working on bugs this week
  • Deborah: mostly working on 336/338 mapping tables, and checking completed edits to mapping table
  • Cypress: was on vacation, but planning on more coding over the weekend
  • Abhignya: started looking at issues that were assigned

336 & 338 Discussion (Deborah)

  • Mapping spreadsheet
  • Table columns: official RDA term, official RDA IRI (alignment mapping), LOC URI (equivalence of RDA), their term, their code - all can appear in various subfields in the records. They all should be mapping to RDA carrier type with the correct code/curie
  • Instructions are in rows below the table
  • Mode is always manifestation
  • Minted IRI we are trying to make as unique as possible to prevent bad deduplicating, so need qualifier. Code can be added to IRI, but should we? Code can't be added to AP
  • Examples for attribute, AAP, and IRI with different instructions - all in Combined Examples tab
    • Includes: Marc code followed by comment; result is below
    • Can be used, but coders will need to put into MARCXML
  • Matthew's open to taking a look at this as special project
  • 336 field mapping is going to be separate from AP and IRI; Cypress and Matthew will need to coordinate since Cypress is also working on AP with Headings
  • Need to create lookup document and reference it in order to retrieve RDA IRI from LC code. That will need to be done before attributes, AP, and IRI can be worked on it
  • 338 already has some coding (336 is old) - outputs are what we want except ID.LOC, so that needs to be replaced
  • 338 can serve as reference - has patterns that can be reused

Transformation Discussion (Copied from Wednesday Meeting Minutes)

  • Discuss whether we can complete Phase I coding by end of June
    • ~50 tags left to code
    • Doreen & Sara about 2 weeks of time left
    • New folks still ramping up
    • Keep new complexities to a minimum and push to phase 2 if bigger coding challenges
  • Ask Matthew to review if the 336 mapping logic can be replicated for 337 and 338
    • Review Deborah's mapping table to make sure usable
  • Discuss switching from field-by-field review to record-level review; coder/reviewer workflow.
    • At 1st running full records through code and seeing what came up
    • Then decided field by field to find example records that have fields in them, then look at one by one, but can't catch all fields since most used consistently in a couple ways
    • Thinking about going back to record level review
    • Coders already doing test of input/output - could be redundant for whole team to review if it's the same
    • Decision: Switch over to record-level review with whole team rather than field by field
    • Coders are testing of both field by field and tables
    • When do commit, how long it takes to show up? Commit, then push, and push puts it up - usually not more than a couple minutes; in an issue it will show commit after pushed
  • comprehensiveTextualMaterial.xml
  • Any feedback before create additional formats?
  • Can't fully transform because running into errors with multiple arguments being passed when not allowed (thanks to our invalid repeated subfields in the example!)
    • Sara is working through the errors, but hasn't pushed changes yet
  • Instead of one for each format (e.g., text, video), have one that includes all types
    • Record will be large, but reduces redundancy overall
  • Use labels of subfields rather than made-up text, so easier to track in review
  • Cypress recommends using Debugger in Oxygen to make it easier to trace through where is getting called

x10 & x11 (Sara H.)

  • ending punctuation with repeated $b or $e with single letter ending (e.g., "Division A." or "Sub-Committee A.")
  • working:
    • input: F110 2# $a United Nations. $b Department of Economic and Social Affairs. $b Division Alpha.
    • output: <rdaad:P50032>United Nations. Department of Economic and Social Affairs. Division Alpha</rdaad:P50032>
  • not working:
    • input: F111 2# $a IEEE Symposium on Security and Privacy. $e Technical Committee on Security and Privacy. $e Sub-Committee A.
    • output: <rdaad:P50032>IEEE Symposium on Security and Privacy. Technical Committee on Security and Privacy. Sub-Committee A.</rdaad:P50032>
  • same is happening with <rdaad:P50375>IEEE Symposium on Security and Privacy. Technical Committee on Security and Privacy. Sub-Committee A.</rdaad:P50375> - expected?
  • CP: Yes this is expected, when discussing ending punctuation we determined that we would retain punctuation when it matched a known abbreviation (abbreviations.xml) or when it ended in a single character.
  • Sara will add the single character decision to the Decisions Index.

246 - Collection Work and Collection Manifestation (Doreen on behalf of Sarah C.)

  • Spreadsheet
  • Not sure how to interpret the transformation notes for last line. Are new modes needed Collection Work and Collection Manifestation?
  • CP: No, these are already minted in lookup/$5-preprocessedRDA.xml. You will need to mint an item and call uwf:s5Lookup in m2r-functions.xsl (although the mapping sheet says 'collector corporate body' - should this be 'holding of' like all the other $5s? documentation
  • Look at 500 field for example of setting up for an item, tag Cypress if have any questions
  • Ok to ask in the issue and tag person who did mapping
  • $5 variant title with one institution's item (e.g., rare book that somebody added a new title to, maybe with a custom slipcase with different title on it).
    • Collection is LC's holdings en masse - not about a collection work; it's about a collection of items in a literal library's collection

2025-05-22

Present: Crystal, Sara H., Cypress, Deborah, Matthew, Abhignya, Doreen
Notes: Crystal

Check-in

  • Crystal is working on creating issues and documentation for transformation review
  • Matthew has started coding. Done 773 and 545 and a couple of others.
  • Sara working on a bug in the Headings Attributes Table.
  • Cypress changed relationships with 6XX agents and works and got agent IRIs set up so the ones without sources are getting meaningful IRIs. Corresponding to discussion/decision during main meeting.
  • Abhignya doing training, just started yesterday

Modes (Cypress)

  • PowerPoint slides
  • rdf:Description as a box. @rdf:about is the name of the box (IRI) telling us what the box is about. Everything inside is about that IRI.
  • We always describe the main entities we know will be present. Main work(s), main expression, main manifestation. This gives us empty boxes. Then, we call templates that add things to the box. For instance for expressions, language of expression. The mode of the template tells us which box to put the information in.
  • Other boxes, like agents, related works, and concepts have their boxes created inside the templates. We apply templates outside of the box. Mode tells the template what type of thing the box is for. We can do this because each thing described outside the main WEMI stack are described within single MARC fields.
    • This is true for all remaining modes aside from main works, expressions, manifestations
  • Depending on the mode, the template goes into an already set up box or creates its own box.
  • m2r.xsl --> expressions (line 286). RDF:description is set up, and we add things we know are true about the expression.
  • When we call templates (starting line 403 for manifestations). All templates for original manifestation are called within the existing rdf:description.
  • Templates where mode=agent has to set up a box where the mode=agent and then put things inside
  • Question: how does this work in practice when things only apply to a certain mode?
    • Example: If something only applies to the augmented work, line 272 is only applying to the mode for that.
    • Line 28: For 240, if it is an aggregating work, only do line 30-32. For single expression and augmented works, do 34-41.
  • Demonstration of how to set up a mode
    • Name, tell xsl the category exists and to skip if not found
    • add additional apply-templates to tell xsl to apply all the templates found where mode=your new mode
    • name the mode in relevant templates
  • additional modes in table Deborah suggested were all types of related work. The related works mode will work for all of these. Because this is an outside template, related work from different fields are inside the RDF description inside the template. It's in the 630 that we say thins is a subject work. We don't need to say it in the mode. Because these are within single MARC fields we can do this.
  • Sara started some basic documentation. Did Cypress start anything like this?
    • We can link to this meeting recording and Cypress will put slides into Transform documentation.
    • Sara will finish up the documentation, review with Doreen, then review with Cypress

Quick error overview (Cypress)

*link to example

  • How to notify people of errors and what went wrong?
    • Cypress put a note in the commit and tagged the initial coder as gentle corrections. Sara and Doreen say the tagging is good.
    • Bug reports from Deborah--read them with a critical eye and give feedback when appropriate. --Deborah

Things to look for in testing:

  • does the property type (rdae, rdaa, etc.) match the rdf:Description IRI
  • does the property exist (run lexical aliases or append labels)
  • are all IRIs valid (run serialize.py)
  • Examples, check that things exist before running things that expect that thing to exist

Deborah Questions

  • Running tests: how to run them on the headings table. Fixes and additional coding outlined in headings table.
  • Is there a way to set up testing based on the table rather than field-by-field as we have been doing?
  • We will have to test all the fields regardless of their inclusion in the table
  • Testing to think of every scenario you can imagine, we can make up examples to include in the test. We don't have to find actual MARC records to review when we're just testing the actual function of the code
  • Put together attributes test file, put one MARC record in there minus any fields we don't want to look at, then add whatever is needed in order to test it. Manipulate for each test.
  • At what point in the coding for headings spreadsheet will tests be run? Relator attribute test is one of the files Cypress created to do this.
    • Every time you code something, before you push it, make sure it's functional by running a test.
    • Is this similar to how we do field by field coding? Or are we using the same input and editing the same output every time we code attribute table?
      • With field-by-field, we have a test record for each field. For the attribute table, we don't necessarily need to create a new file for each row. We can do whatever is easiest. Can have one file per coder or choose one to share.

One MARC Record to Rule Them All

  • One MARC Record to Rule Them All, per format · Issue #553 · uwlib-cams/MARC2RDA
  • Deborah and Richard created one, but it lacks 006 and 008 and fixed fields and indicators. We will need these for testing, but they probably can't all go into one record.
  • Would it make sense to make one per format?
  • group by indicator and values?
  • We want to have one where all subfields (regardless of repeatability) are repeated.
    • We don't want bad MARC to break the transformation. So we should account for things in the code. For example, code knows to pick the first $2 in the field.
  • If AI can do this, then great. If not, Deborah will ask Richard to take some time and do it.
  • FYI: Cypress unavailable Fri-Mon this week. You can tag her and she will respond next week.

2025-05-15

Present: Crystal, Deborah, Sara H., Doreen, Matthew
Notes: Sara H.

Check-in

  • Crystal
    • Working on project management for transform and doing the issues for transformation review (laborious process!)
    • Working on gathering the initial data sets for the transform review
      • Thanks, Doreen, for doing transform at last minute this week!
  • Doreen
    • Working on field-by-field coding.
    • Connected with Cypress on issue was having; need to create a new mode. Cypress suggested RelMan/Related Manifestation for the new mode.
      • Cypress is working on some docs/demos on how to create a new mode. Agreed work modes could be separated out more.
      • Sometimes it's not clear a new mode needs to be created in the new notes - e.g. 530 was Manifestation -> Manifestation, and wasn't obvious. We'll also ask Cypress how to tell when a new mode needs to be created.
      • To create a new mode, all you need to do is name it and apply call templates. There's not a separate function/place where this code lives.
    • Could be better mode documentation
      • List of modes in the m2r code
      • Sara & Doreen can start on a draft and ask Cypress to correct and fill in any gaps
  • Sara
    • 040 - ultimately decided this should be addressed in Phase II
      • Deborah noted we're used to thinking in records; RDA/RDF is statements; community needs to address how provenance works. This is more philosophical and should be addressed in Phase II.
      • Discussed how this should be getting assigned to a metadata work, not the work being described
      • Believe MARC records would be held in triple store so could be referenced as needed
      • Sara will move the 040 code to a separate file and attach it to the issue, so it can be used when the issue is taken up again in Phase II
    • 033 - dates in EDTF; trying to locate a table of dates/times in MARC; asked Cypress on the issue
  • Deborah
  • Matthew
    • Security approvals were required before could get started. Now in place.
    • Coded 773. Tomorrow will work on 545. Then will test both, commit, and upload.

Distribution of Work

  • Cypress would like to know who's doing what and how she can most usefully contribute

Discussion: Workflows

  • Bug issues and how they work with table/other stuff too!
    • While doing transform review, things might come up that aren't reviewing
    • Want centralized place to bring them to the transformation team
    • Use Bug label to review issues and self-assign to fix
    • Deborah can't assign a project to an issue
    • Matthew can have a go at handling bugs
  • Review of issues all around for transformation review
    • Label for 'input data needed' added to an issue. Once input is added to the issue, the label is changed to 'output data needed'. Coders then transform and add the output to the issue. Then label is changed to 'ready for transformation review'.
    • Can bulk update labels
    • Team will review output. If changes are needed will assign 'Code re-check'
    • Once done, moved from 'Transformation Review in Progress' to 'Transformation Review Complete' status
    • Crystall will create documentation in Decisions Index to capture steps
    • Code update requests put into the issue; Discussion kept separate since often lengthy

2025-05-08

Present:
Notes:

Welcome Matthew and Sarah C.

  • Orientation document check-in
  • Best work to get started on?
  • Matthew: 545, 773, checking into IRI minting for consistency and accuracy

Check-in

  • Sara: willing to review a couple tags for Matthew or Sarah. Working on attribute rows from headings document, switched back to field-by-field. Going through 0XX's
  • Doreen: willing to review a couple tags for Matthew or Sarah. 0XX and 530. Plans to ping Cypress on a question about how to code manifestation (manifestation described by the MARC record), manifestataion2 (manifestation described by 530 note). What portions of 530 go into Nomen and in what order? They will have the same work and expression. Attribute table.

Discussion: Workflows

  • Tracking work (what needs to be done, and when)
  • Self-assignment of tags and tasks
  • Current top priorities
  • Review workflow: where is feedback best for tags which are coded but haven't formally been included in the Transformation Review process yet?

Testing questions:

2024-10-10

Present: Cypress, Ying-Hsiang, Sara, Tynan
Notes: Cypress

Announcement:

We will be changing meetings from weekly to as needed.

Check-in

  • Doreen: still working on digesting the code for 6xx and going through the aggregates document in order start coding for 800, also working on the code for 342
  • Cypress: Finished 008 aside from outstanding questions, also finished 006. Onboarded Sara, will onboard Tynan tomorrow
  • Ying-Hsiang: Working on setting up Wikibase cloud. Running Wikibase cloud instance and tool from NLG Greece. Troubleshooting this. Reaching out to Wikibase cloud team, hoping Crystal can contact project manager to get some more help.

2024-09-26

Present: Cypress, Ying-Hsiang, Doreen, Penny, Tynan, Sara, Crystal
Notes: Cypress

Check-in

  • Cypress - still working on 008 :) 006 is reviewed and she will start on that next if it doesn't need reproduction conditions. Tested punctuation function that accounts for abbreviations.
  • Ying-Hsiang - Waiting on Deborah for aggregates work, which is paused for now. Working on setting up Wikibase instances.
  • Doreen - Reproduction conditions mostly done! 245 will be added today. Working on updating code for some fields.
  • Penny - Learning coding and transformation! Has time for other tasks.

2024-09-26

Notes: Cypress

Check-in

  • Cypress: almost finished coding 008 save the questions in the issue. This should make reviewing and coding 006 easier.
  • Penny: Penny has finished 1XX, 7XX, and 8XX Google sheets and can take on more work.
  • Ying-Hsiang: Aggregate code, Java extension for determining URI types.
  • Doreen: 008 reproduction conditions are done! Trying to follow up on 300 to see if it is ready. Reviewing and transforming fields.

Discussion

  • Issues with determining URI types:
    • We need to discuss what to do with URIs that cannot be determined to map to an RDA entity
    • Maybe if there's an 040 e = RDA can we use VIAF etc. - we should bring this to the group
    • What about original RDA vs official RDA? Is this a problem?

2024-09-18

Present: Cypress, Doreen, Ying-Hsiang, Penny, Crystal
Notes: Cypress

Check-in

  • Cypress: Still working on 245, classification fields with Gordon and Penny. Finished coding fields in RFT and reviewed some more fields.
  • Doreen: Laura and Doreen almost done with reproductions for 008! Also working on reviewing fields and coding.
  • Penny: Created new URI table for approved URI and have asked Adam and Gordon to review. Created a heading field attribute mapping table. We can review and revise as we code.
  • Ying-Hsiang: Majority of working time has been aggregate code, which is on track. Working on parallel aggregates with Deborah.
  • Crystal: Working on datatype URIs in Wikidata

Extension demonstration

  • In some cases, XSLT is not efficient, so we have extensions
  • Determining IRI type is one useful case

How it works:

  • code dereferences IRI and downloads XML format
  • looks for rdf:type element that matches a list in an XML file (this is where Penny comes in!)

How to use:

  • README for extension
  • RdfPredicateExtractor extension requires JDK and Maven
  • Follow instructions for setting up in command line
  • Run through Java, there is a Java file to run
  • Next step would be updating/creating XML lists of approved IRIs
  • Oxygen HE is not an option for extensions if we want this to be open source - not able to run a Java extension

Questions:

  • Can we run the Java extension in Oxygen for testing? No. Command line
  • Penny created a Google Sheet with just URIs
  • What happens if an IRI does not return an XML file? We will have a default case
  • Is this a reasonable ask for users to run XSLT through Java? By the end of phase 1, we want some code that executes 'all the things'. Let's do whatever it takes to make it work. Users will need to install Java runtime etc. This is a reasonable ask. We will provide a list of dependencies. Checking to see what is behind an IRI is amazing :)
  • Is this something we need to currently add into the workflow? We need to make sure the XML lists are ready before implementing. Reach out to Ying-Hsiang if we have additional questions.

2024-09-11

Present: Cypress, Doreen, Penny, Ying-Hsiang
Notes: Doreen

Check-in

Cypress:

  • Coded reproduction conditions for 264 and test data are now available to view.
  • Worked on Misc. fields. Working on 245
  • Set up input parameter in m2r for base IRI.

Doreen:

  • Reproduction conditions added for 264, 250, 260. Currently working on adding conditions to 008.
  • Finished transform for 520 and started on 505.

Ying-Hsiang:

  • Working on aggregates with Deborah. Deborah has been making some changes so he will commit these changes in code along the way.
  • Finished transform for 518.

Penny:

  • Finished the table for entity type. Waiting on Gordon to review.
  • Cypress: once Gordon is done reviewing, we can put it in xml format.
  • Ying-Hsiang: Working on documentation for java extension to determine IRI type.

Laura:

  • dropped in! to explain special reproduction conditions for 008. Doreen will add that to the spreadsheet.

Validation scenarios in Oxygen

  • Purpose: Let other institutions use their own IRI.
  • Yellow warnings might still occur, but generally fine unless there is a loop.

2024-09-05

Present: Cypress, Doreen, Penny, Ying-Hsiang
Notes: Cypress

Check-in

  • Cypress - Cypress has been working on field 245. She has also finished up some code re-check/code on hold issues and is transforming classification fields.

  • Doreen - 008 vocabs done, replacing old links. Laura added comments to fields that will need reproduction conditions. They will work on 264 together this afternoon.

  • Penny - Still working on entity types, hoping to finish this week and continue with 1XX and 7XX indicator and subfield rows. We looked at the 100 Google Sheet that Penny is updating.

  • Ying-Hsiang - Mapped 334. Working and communicating with Deborah and has converted patterns that Deborah has verified. For performance testing - start with the largest file in the Google Drive Test Data folder. Will return to code for 518.

  • Laura - Laura dropped in! Laura is working on reproduction conditions.

  • We updated the 518 mappinng together!

Action items

  • Cypress is going to shorten the meeting time since we do not usually take 1.5 hours. The meeting will be schedule for 1 hour.
  • Penny will let Cypress know when attributes for agents have been mapped and added to the Google Sheets.
  • Cypress will let Laura know when 245 is ready for reproduction conditions.

2024-08-28

Present: Cypress, Doreen, Ying-Hsiang, Penny, Sita, Laura, Crystal
Notes: Crystal

Check-in

  • Cypress has been working on 245, resolving code on hold issues, related discussions
  • Doreen has been finishing up 008 vocabularies project and working on reproductions
  • Ying-Hsiang has been working on aggregates code performance, 3xx code, waiting on big dataset from Crystal

Things that cause the transform to fail

  • Multiple $2's (generally multiple non-repeatable fields that are repeated)
    • decision: take the first occurrence and don't assume that non-repeatable fields won't be repeated in error
  • when external lookup documents are renamed, moved, no longer exist, xslt doesn't have a graceful way to fail. a java extension could be useful here to prevent the code from failing in the middle of a big transformation

Reproductions

  • see reproductions guidelines area of Wiki (reviewed during meeting)
  • feedback from coders: this is easy to follow and will be easy to implement for coders (yay!)
  • if any tags are changed that have already been closed, re-open them and add the "code re-check" label once they have been moved back to the "ready for transform" workflow phase.
  • if tags have already been coded or partially coded, add the "code re-check" label when moving them to the "ready for transform" workflow phase
  • Demo of what the spreadsheets will look like with reproductions conditions implemented
  • serials are out of scope

classification numbers

  • are they ready to transform?
  • yes, they haven't been updated since february so if they need to be updated again let's do it

Wikibase

  • NLG emailed Crystal and Cypress their code. we need official permission to use it. crystal will ask them to also email it to Ying-Hsiang
  • we will set up a wikibase cloud test instance

2024-08-21

Present: Cypress, Doreen, Sita, Ying-Hsiang, Penny
Notes: Cypress

Updates

Check-in

  • Cypress has been working with Gordon and Penny on subject heading fields. Also has 245 to work on amongst other things.
  • Penny is comparing entity types with RDA entity types and is determining whether they are supertypes, subtypes, or equivalent. She showed us the Google Sheet she is working from, everything looks great! Hopefully Gordon and Adam can review these comparisons.
  • Ying-Hsiang is working on aggregates code!
  • Doreen had a reproduction meeting with Laura. This is at the beginning stage but is in the works! She finished coding 521 and has assigned herself to 520. Is mostly focused on vocabularies for 006-008.

2024-08-15

Present: Cypress, Deborah, Ying-Hsiang, Sita, Gordon, Crystal, Doreen, Penny
Notes:Crystal

Check-in

  • Cypress almost done with subject headings, been working on relationships, Crystal just gave her sample records with 7XX fields and she should be able to run those through today
  • Ying-Hsiang just submitted latest code on XSLT extension. Not running smoothly in Oxygen XML editor yet. We can postpone this part for now and look at another part of the code, the XML version for now
  • Doreen's work is going well
  • Penny hasn't started working on the transform, has outstanding questions on her mapping work that need to be resolved

Documentation

  • Multiple people are working on the same code.
  • Let's create a Google Drive folder that we can all access
  • We can move current slides into this folder easily
  • Cypress will do this
  • Comments in code are established practice. Keep doing this!

Access point question

  • Multiple languages in access point: what language tag?
  • ISBD has an example in manifestations: you can just insert a plus sign.
  • Nothing in RDA about construction of access points--communities decide

008 language tags: have they been coded?

  • They've been mapped, not coded.
  • Will need to look at 041 $a as well, and remember that $a is repeatable and can also include multiple language codes strung together
  • MARC Code List for Languages
  • Also might be taken from ISO list

Aggregate markers

  • Aggregate Marker Project-DRAFT
    • Split file of MARC records into collection aggregates, parallel aggregates, augmentation aggregates, and single expression manifestations
  • Writing code now for single expressions
  • Once we do start running aggregates, we will need a way to tell the code "hey this is an aggregate"
    • Split into separate files?
    • Code has different modes: look for all the work properties, expression properties, etc. With aggregates, we're going to have multiple of these classes. The transform needs to know when to run the modes for each class. They're going to be separate functions.
  • Category of work/category of manifestation could be applied to aggregates to tip off the transform
  • Could use extensions or another program aside from XSLT so we could handle this situation from a different file
    • Currently, we are regulated by XSLT. We could use Java extension to do whatever we want with aggregates
    • We will explore this further
  • AggregateMarkers-DRAFT
    • Very complex pattern matching that XSLT probably can't handle
    • First step: split out collection works: done
    • Second step: split out diachronic works: done
    • Deborah proposes that we use MARC Report to split input files prior to transform
    • Run, save review, until what you have left is "single expression manifestations" then look at your results
    • Conventional collective titles and 6xx $a genre/form terms need to be examined: what determines that something is definitely an aggregate? It would be useful for someone besides Deborah to take a look at these.
  • It makes sense to do review in order, check details in a particular order by type. You can sort review by type and names they are given.
  • Questions about processing speed and ease of implementation: can we do this in XSLT and add this to the code? Will it add to the code? Can we add it to a Javascript extension? Ying-Hsiang can write an extension and will re-prioritize his workload to make sure it happens by the end of phase I.

URIs in MARC

  • Turning table Adam had into something usable in XML.
  • Involve research: can these convert into IRIs?
  • If we can extract multiple types from the target, that shouldn't be an issue: does one of those types map to an RDA entity type? We should be able to dereference and determine what type is declared
  • We would like to be able to do that with an extension. We haven't been able to do that yet. So right now we need a table
  • Penny can take this on, Cypress can show her how to do this

2024-08-08

Present:Cypress, Doreen, Deborah, Ying-Hsiang, Gordon, Sita, Penny
Notes:Penny

Check-in

Cypress is working on

  • removing inverse properties (going back up from item, nomen, metadata and agent)
  • Code on hold issues such as 538, 043, 257

Ying-Hsiang is working on xml extension to check the semantics of URI of 518

URIs in Marc

Why and how does the extension work?

  • Static uriInMARC.xml table is not enough to track the RDF types
  • Use XSLT extension to access and retrieve RDF type of any uri during runtime
  • Create a new MARC2RDA extensions project that can be invoked in Oxygen XML editor to fetch and check RDF types

Can we use it with open source tools?-Yes

Transforming Main and Added Entry Relationships

Deborah went over main and added entry relationships document

  • Even if there is no $5, $e like the former owner still needs to be mapped to item relationship
  • Discuss about $e and $4 in 7XX again (previously discussed on July 10 Group Meeting)
    • Sita: ignore them, name and title used as a AP as a whole
    • Gordon: ignore them
  • Family names should be treated differently
    • Gordon: no, just appellation strings
  • Deborah: $4, $e, $i are all unreliable
  • Agents in 8XX should be discussed in main meeting

Gordon: boiling all these down in general Added entry is related to the primary WEMI stack through high-level relationships depending on the field. Entity is determined by indicator 1, not by heading

  • If 6XX: subject entity related to
  • If 7XX: related entity
  • If 4XX and 8XX: issue of

Relationship between name porion and title portion in added entry field Related entity (can only be the high-level)

Gordon: A whole record with basic core fields should be transformed as soon as possible so that we can start feedback loops.

Sita: we should focus on the structure first, not details

Cypress show the coding progress

2024-03-27 03:00pm PDT

Crystal, Cypress, Adam

Minting URIs

Relator Table: How's it going?

Does Cypress need another student? How about a Penny?

Transformation Meetings Moving Forward

  • As-needed? Regular meetings?
  • Who attends? We always pester Adam on Slack. Gordon has offered to help with coding. Penny? Crystal's practical usefulness is limited so just Crystal and Cypress seems cruel for Cypress.

2024-02-29

  • Cypress updates & concerns
    • Any chance to look at $6?
    • Relator table implementation test is here
      • How did Theo generate the xml from the table? were the column names changed manually?
      • How are we handling $0s and $1s that don't have a match in the table?
    • Implemented a function for $2 that can be replaced once decisions are made - meaning $2 won't hold back field by field transform
    • Re: minting uris and avoiding duplication of concept entities:
      • embedding all or part of value is the best way to go
      • I'm not even sure hash tables can be done in xslt, which is not a functional programming language

2024-02-15

  • Cypress concerns
  • $6 in 561
  • complete coding on metadata work (regarding 561 also?) not pushed?
  • X00 solution
  • what about what DF sent about aggregates (ideas for preprocessing)?
  • There was a suggestion to help us avoid duplication of concept entities:
    • mint uri using an algorithm that embeds part or all of value in uri
      • val=383.6984, iri could be somethin like 10.6069/uwrda.class.383.6984
      • ark identifiers suggested; probably could use DOIs but that's a lot of registration!
        • Only if we can get DOI-registration API working in mass production
        • Still, probably need another IRI solution
      • that's for classification numbers; same could apply for subject headings (and other headings
        • turn strings into hash codes?
        • how feasible is it for agents? They're not as uniform as class numbers of thesaurus headings

2024-02-15

  1. CP updates:
    • The $6 issue answered my transform questions I think.
    • Transform code for $6 in item-related fields
    • Figured out reciprocal properties for metadata work
  2. Scope, or, who does what now:
    • Cypress:
      • pull fields from project board -- BSR -- working on 380
      • continue wiki transform how-to
    • Theo
      • Look at $6 solution
      • see reciprocal props for md work (fields 583, 526)
      • $0 and $1: how are we flipping loc.gov in media types etc? Finish coding. Output examples.
      • Prep meeting w Deb &co about relators--prioritize--bring examples--1XX/6XX/7XX
      • Aggregates--email!
      • grab stuff from board

2024-02-08

  1. Anything Cypress wants to discuss?
    • Looked at metadata work; reciprocal properties? Actually it's easier to have only in md W.
      • This is because item is created, code goes to template that generates item
    • TG: let's make sure when we randomly generate id, we don't produce different IRIs every time
      • Probably a phase 2 problem?
  2. Let's start narrowing the scope
    • project board rft and rip, maybe ar

      • BSR only?
    • example of roles-->RDA properties

      • can it be incorporated into main transform? (Theo)
        • Theo can do just 100, then have that meeting and get started on 1XX/6XX etc, see below
      • Should have a meeting with Deborah/Cypress after we accomplish square one
    • Some kind of start to preprocessing

      • aggregates
        • what we're trying to do: weed out collection aggregates
        • what resources do we have?
          • ask DF! We just need the basic set of markers for collection aggregates; I need a succinct list
            • send email to DF
            • If we need more, talk to Crystal, she'll work with Laura on it
        • where do we start?
        • Where is Cypress in the aggregates discourse?
        • Dialogue included:
          • cec: I think we can deal with 700 12.
          • df: if analytic entry present, it requires more agg thinking, so put aside.
    • start a model for 1XX/6XX/7XX/8XX transformations

    • Some kind of code output around Feb 22?

  3. Did we resolve this: when is the 880 "in play"?
    • Diana says OCLC doesn't display 880; as Adam; also ask Cynthia Whittaker at OCLC
    • NOT RESOLVED! Agenda item next week
    • Let's write down our specific questions (Cypress)
  4. Make sure Cypress hears this: We need alt serializations, especially ntriples, as that's all that will display in RIMFF.
  5. We've never handled the problem of correct IRIs for the output RDA/RDF.
  6. Still not added to documentation:
    • This is too restrictive in the transformation decisions, it should be changed to allow more frequent committing: III.C.4.b. Do not commit until the coding is complete. 2022-07-28
    • Change this in the transformation decisions: "III.C.5.a. Remove the transformation-related tags and close the issue for the field. This can be done in using a commit message (see UNDECIDED items below). 2022-09-23." Specifically, do not remove all the transformation-related tags; it is wise to leave the tag change this: retain "coded rft".
    • Add to transformation decisions: when selected a field to code, assign yourself the issue in GitHub.

2024-02-01

  • Cypress issues:
    • metadata works:
      • triggered by "private" indicator, so we reference the md work from an item
        • we sems to use both metadataDescriptionOfItem and ItemDescribedWithMetadataBy
      • no md W or E
      • Currently each item in a record has its own IRI; we never assume any item is equivalent to another item described in a MARC record
  1. We need to establish a scope for February, We would do well to establish a "map" for the remainder of phase one.

  2. regular fields

    • Is it clear how to code those?
  3. relator terms/codes and RDA elements

    • Can we use the current table to code relationships in MARC records, especially $e and $4, so they map to the appropriate RDA property?
    • Yes/no answer needed
    • If yes, how shall we get started?
    • What is the official location for this table? Is it the latest version?
    • Do we need a separate meeting with Deborah? Can we get started without that?
  4. Aggregates

    • How are we going to process aggregates?
    • How should we get started?
    • Where is Cypress on the aggregates discussion?
    • What tools do we have?
  5. 880 field: when is the 880 "in play"? Always? We need to know all fields where there may be an accompanying 880.

  6. Add to documentation:

    • create a Wednesday agenda item: coordinate with mappers: if "ar" or "rip" are coded, ask them to make a note in the issue when they move to "rft." -- DONE (tg)
    • This is too restrictive in the transformation decisions, it should be changed to allow more frequent committing: III.C.4.b. Do not commit until the coding is complete. 2022-07-28
    • Change this in the transformation decisions: "III.C.5.a. Remove the transformation-related tags and close the issue for the field. This can be done in using a commit message (see UNDECIDED items below). 2022-09-23." Specifically, do not remove all the transformation-related tags; it is wise to leave the tag change this: retain "coded rft".
    • Add to transformation decisions: when selected a field to code, assign yourself the issue in GitHub.
  7. We are expected to have some sort of logic or model for 1XX/6XX/7XX/8XX transformations. Any thoughts on coding that?

  8. Coding of MARC 533 has been highly anticipated by the project. No need to discuss today, but let's get that on the radar. Laura wants us to know the info in the spreadsheet is quite incomplete, and will require changes to other fields/spreadsheets (like 008) to be complete.

  9. We have been asked to devise a solution for including the full MARC record in the output RDA/RDF. It should probably travel with the manifestation. There is an element like rdam:P30254"is manifestation described by" that can be used. Or the unconstrained one: rdau:P60215"is described by".

  10. We've never handled the problem of correct IRIs for the output RDA/RDF.

  11. We need alt serializations, especially ntriples, as that's all that will display in RIMFF.

  12. stray messy notes not-ready-for-prime-time:

    • if analytic entry present, it requires more agg thinking, so put aside. cec: I think we can deal with 700 12. It is not a part. CEC: but we know what to do with those. What is URI? How create E or W without info? What about authorities? Many are controled. Aggg W: [uh oh] part work: lord of the rings is hasPart Coll of short stories is not. Aggd E? Agg w? Thing is embodies in M. IRI of triple: what's subject? What are E attributes? MARC records describes it all; which line up with this 700? If there's auth record, then maybe attributes Lots of things no auth, so mint uri DF See: it's complicates: augmented and parallels : it doesn't apply, although there's language sin parallels. It's going to take more thought: phase 2. LA if too many records fall out, the transform will be useless. prefers something less perfect label/identify as hybrids better to be inaccurate cec transform critical mass don't output something that doesn't make sense no aggs yet; run non agg on parallel and aug, not collection

2023-06-02

present:

  1. Theo has two things:
    • make better use of Github issues going forward
      • need label for this?
      • best way to record needs that arose after data review(s)
      • Should go into decision index
    • Review our section of the decisions index and update as needed.
  2. Anything Zhuo wants to address?
  3. Lexical aliases
    • Can output RDF/XML with labels and with lexical aliases
  4. Anything new with identifiers-withLabels.rdf? Anything further to say about identifiers?
    • oXygen has an embedded rdfxml schema (relaxNG compact syntax)
    • BF identifiers create a bnode.
    • nomens for name (authority) control; record qualifiers, name info; but the intention is to use nomen-things aligned with things with names; identifiers are for identification.
    • what do we do now: access points in GLAM.
  5. Zhuo last day one week from today
    • Anything he wants to do this week?
      • will wrap up things not finished but started, esp mappings; some Sinopia, esp guidance; transform: everything rft is possible, but nothing outside that.
    • Anything over the break, June 10-July 31?
      • plan to do casual professional development
      • would participate in Wed meetings
      • could help with some transform
      • sinopia: test creating data; is any questions, happy to participate or even create data; would also like to attend meetings
    • May not return? Maybe return.

2023-05-19

present: TG ZP

  1. Zhuo's sample ISBN data
    • as literal
    • as literal with prefix
    • as nomen
    • [how about as typed literal?]
    • [where's the code? How did you set up the nomen?]
    • [Theo is thinking: this is good enough for Wednesday]
  2. Anything Zhuo wants to talk about?
    • Not much code discussion
    • Time ends June 9
    • zHUO WILL CODE THE RFT marc FIELDS
    • Maybe some awaiting reviews will be coded
    • Maybe some mapping
  3. Obviously Theo has some highly detailed stuff for MARC 245
  4. What's all that activity in the repo? What's the board look like?
  5. Action items
    • Theo will set up kickoff meeting for admin metadata
      • for RSC
      • for some kind of publication
      • for "design patterns"

2023-05-05

Present: TG ZP
First: Thank you, Zhuo, for the last minute work on the transform. Adding the MARC data was very helpful. Adding dataset 2 for review was super helpful: showed some stuff dataset 1 did not. So far, here are some things to attend-to after RDA data review; note data review is still ongoing:

  1. Remove MARC100-->fake:rdawP10065 (Theo)
  2. Alter MARC 020-->rdamd:P30004 for ISBNs (Zhuo)
    • POSTPONE THIS TRANSFORM EDIT UNTIL A DECISION IS MADE
    • sounds like Nomens are favored; see 18 below
    • no hyphens needed in ISBN
    • option is just the alphanumeric ISBN string
  3. Alter ((MARC 245-->rdamd:P30134) + (MARC 245-->rdamd:P30156)) so that both do not output (Theo)
  4. MARC subfields should never appear in RDA values; includes:
    • MARC 264-->rdamd:P30111 (Theo)
    • If value is non-isbd, semi colon is best between subfield values.
  5. Alter MARC 337-->rdam:P3002 so that both $a and $b (the code) do not output. Consider not outputting the meaningless-in-RDA code at all. (Theo)
    However, consider this full process; although we said we would not reconcile yet with vocabularies:
    • If $a and $b exist, suppress $b, output $a as IRI (from RDA Vocab).
    • If $a or $b only:
      • Match code or string with string in original vocabulary, somehow extract-and-insert IRI from RDA vocab.
      • Create mapping between RDA and ID.LOC.GOV vocabulary.
      • Send mapping to RSC TWG and ask them to publish.
  6. DO NOT Alter from MARC 504-->rdamd:P30455 to MARC 504-->rdamd:30137 ; the 30455 property was deemed fine. (Zhuo).
  7. MARC 245 ok output for man but not for wor: work does not output the full complexity a/n/p/s etc. Investigate and repair. (Theo).
  8. Additional mapping MARC 502-->rdawd:P10209.
    • already have MARC 502-->rdawd:P10077 and -->rdawd:P10006.
  9. Do not output ISBD square brackets (is this for specified fields only or always?):
    • MARC F264 (Theo)
  10. Repair MARC 245-->rdamd:P30105 sor relating to title proper; there's other inaccuracy there too, see http://fakeIRI2.edu/1302865607man and #390.
  11. (Theo and all going forward): Unknown placeOfPublication, dateOfPublication, NameOfPublisherAndDistributionManufactureAndProduction: although PCC-PS favor square brackets, eliminate square brackets, and use as value of noteOnManifetsation. Values look like this: rdamd:P30088[Place of publication not identified].
    • Option 2, presented in GD's comments to dataset 2, comment 9: only output the "statement" with sq brackets intact; do not map to P30088, P30176, etc.
    • TG: just do what's easiest.
    • However when sq brackets surround a value believed to be correct, output to appropriate field and strip sq brackets.
  12. Repair MARC 245 _4 $c c2014 --> rdam:P30280 to -->rdam:P30007 and strip all symbols
    • The copyright symbol, the phonogram symbol, the string "(c)", the string "(p)", the string "copyright", the string "phonogram copyright", the letter "c", or the letter "p" should be stripped from the value
  13. DO NOT change MARC 382$v-->rdaed:P20215 so that it does not use square brackets but, rather, parentheses. Gordon made the suggestion as he felt sq brackets carried to much meaning; Zhuo got this as an MLA recommendation. It's what's expected by the community that consumes this data. (Zhuo)
  14. MARC 264 has distinct square bracket requirements for RDA output:
    • retain square brackets in "statement" elements like rdamd:P30108 (Theo; should be ok as-is)
    • eliminate sq brackets in Place and Timespan elements like rdamd:P30085 (Theo; needs attention).
  15. Add MARC 490-->rdamd:P30106 hasSeriesStatement to output standard "statement" (Theo)
    • remove the contents of subfields l (LC call number), y (invalid ISSN), and z (cancelled ISSN)
    • [treat $3, $7 as per general decision ...].
    • Retain the punctuation; remove the subfield encoding.
  16. Repair NARC 245 $a with / and . in title (not before sor): those are getting stripped; see http://fakeIRI2.edu/904019193wor. Make sure it's ok for WEMI for output titles.
  17. Where else will we find data review information:
    • Meeting Notes (not checked)
    • anywhere else?
  18. What about setting up Nomens? Will Zhuo be working on that?
  19. MARC 336 337 338
    • Use RDA vocabulary values; IRIs is practical
    • do not output code, string and IRI; just one is enough.
    • Create mapping RDA-->LC vocab in id.loc.gov for selected vocabularies
      • NLG has some of these done already; SZ will send to GD and GD will format
    • send to RSC TWG for RDA publication.
    • Is that all that needs to be done for this?
  20. MARC data in input: send entire marc record as one long text string to man. We can also output MARC to RDA-RDFXML, but this is probably best just for data review, eliminated for final output. * Alternative: plain output, plain output with labels, plain output with labels and MARC individual fields (as it is now).

2023-04-14

Present: TG ZP
Theo finally got started "blitzing"
- Finished 264 (RDA "statements" were not processed in field order; $3 only accounted-for in "statements"; $6 accounted-for; repeating subfields and parallel statements should not be resolved).
- Almost finished 245; still need to account for "=" and double-check to make sure all possible punctuation is accounted-for
- started on outputting labels near opaque identifiers for properties; put it aside and never returned to it.
How about Zhuo? - identifiers revisited. Specifically ISBNs. Put qualifiers before number previously. Wants to attemot to mint nomens

2023-04-04

Present: TG ZP

Agenda:

  • No specific agenda. Just an open discussion. Discussion included:
    • Let's move back data review in group one week. Theo will get it on the agenda.
    • Comments will remain informal. Enter as needed.
    • On the other hand, template names should be structured using common formats. For example: F264-x1-a_b_c means field 264 with any indicator 1 and indicator 2 = 1 will have subfields a b and c processed individually in the template.
  • ZP doing 502 field. LC vs OCLC documentation regarding ending period.
  • 880 with 502: what happens with identifier? Are there any identifiers in 880? Or does that go in primary field only? What if there's a non-Latin identifier? What do we do with that?
  • ZP planning on doing another 5XX: 585.
  • Theo: review 264, 260, 245, 490, 336, 340. Get them corrected if needed. Do not seek perfection.
  • Theo wants a function for checking ISBD punctuation.
  • ZP wants fcn to look up $5
  • Question for next time: are we going to perform lookups (mostly for "schemes") by matching a locally stored file or go over http.

2023-03-23

Present: Zhuo, Theo

Urgent: change meeting time for our meeting; ZP has 2:30-5:30 class and meets w BR 1-2 Friday.

No agenda.
Meeting Notes:

  • Zhuo working on:
    • some changes to where things are:
      • folders for test (test input and test output)
      • new "lookup" folder (for $5, $2)
    • RDA vocabularies
      • We should map to IRIs, not literals
      • IRIs are usually values of canonical properties, not object properties
      • A lot of 33X fields don't even have object properties
      • We need some way to indicater that people doing transform edited the spreadsheets (we'll clearly be making corrections)
      • Maybe something in Decision Index about how we transform RDA Vocabularies
    • 380 field; new function for handling concept in $0/$1
    • Noted: 340 field and its current function for $0/$1 uses object properties.
    • We have to account for $0 and #1 IRIs that represent RDA entities, as they will be treated differently.
      • Mostly this will be agents in MARC data; however, as we anticipate more and more RDA entity IRIs in MARC, we should broaden our effort here.
    • Custodial history/private metadata broke due to 880. Still working on it.

2023-03-02

Present: TG, ZP

  • Proposal: we should plan now to prioritize the transform.

    • Let's set dates for work and make sure we comply.
    • How much time?
  • Theo timeline.

    • When are the best days/weeks to focus on this?
      • March 20 - April 14: write code and prep dataset for review
      • April 19-May 3: Lay low...
      • May 3-June 9: edit code to correct errors, oversights, etc.
  • Note: Theo just asked today (Thurs., Mar. 2) is we can think about post-completion OPT (may change Zhuo's timeline)

    • ZP can work for multiple employers during OPT
  • Zhuo timeline:

    • UW Spring quarter ends June 9
    • When are there academic requirements, big projects, etc.?
    • When is last day of work? Around June 9
    • When are the best days/weeks to focus on this.
      • Start around March 13

To do:

  • Produce some code for group to review before Zhuo leaves
    • Good day to present data at a meeting: April 14
    • Good meeting day to complete review at meeting: April 19 and 26 and May 3
      • gives them two weeks to review
  • Looking at everything above, let's set dates and goals:
    • coding blitz: not sure; maybe Zhuo start on the 13th; Maybe between quarters do extra time; Theo start on the 20th and will do at least 3 weeks.
    • date to have code ready for review: April 14
    • Related to-do:
      • Inform Crystal; make it an agenda item for main meeting
    • Put on agenda -- Theo should do it.
  • Code fields as they are ready to code
    • we can start coding anything "Awaiting Review" and later
    • lots of stuff can be coded now!
    • MARC 585 is "Ready for Transform"
  • Refine the function for $0 and $1 -- Theo should do this --
  • Finish OMR vocabs
    • Is coding done, OMR-->RDF-XML? YES, IT IS.
    • Hire student to review current RDF-XML assertions inherited from OMR.
    • Resolve IRI problem -- no. it's resolved!
      • Use DOIs for this project
        • ZP will figure data-cite metadata, do a sample, etc.
      • consider separately whether we use W3C identifiers or something else
    • Establish UWL Guidelines for UWLSWD vocabularies, esp concept schemes
    • Make sure RDF-XML accords with UWL guidelines
    • Resolve on how to publish
      • Get the names of the vocabularies correct
      • versioning: use releases -- how?
      • does it accord with W3C BPs for publishing data on the web? --Theo was working on that
    • Publish
    • Insert associated values in spreadsheets
    • Code the 008; also the 006, 007, if ready
  • Theo: finalize 245 (see 2023-01-05 meeting notes)
  • Code to output human-readable IRIs instead of opaque -- Theo will work on this, hopefully before the 20th

Possibly done

  • MARC 500 template processing with 880 (resolves $6, right?)
    • this work is ongoing; every field may have a different solution
    • ZP ran into 561 issues; supposed to mint IRI for item; pprivate data issues, etc.
    • $6 is on board as "ready for transform"
    • MARC 880 is on board as "ready for transform"
  • Have we decided on $3 and $5
    • $5 is on board as "ready for transform"
    • these are in RFT because ZP put them there; they can stay
  • Theo check 336, 490, and the field SZ worked on

Continue discussion regarding:

  • Metadata WEMIs. Particularly a topic for MARC 561
    • ZP will code the 561; he can transform a corpus with that field; present to group; we'll need to create fake private 561s;
      • much of it is coded except for 880 and privacy complications
      • this has included metadata works for private assertions
  • Devise methods for weeding-out aggregates
    • there's a discussion 354 where it's embedded in a larger discussion
    • we should make a new issue specifically to help us code
  • Change meeting time for next quarter; not Thursday afternoon
  • Theo will produce a to-do list before the 13th

2023-02-04

Present: Theo, Benjamin, Zhuo

  • Zhuo created a MARC 500 template that includes corresponding 880 processing
  • He created a template matching the union (|) of 500 with 880 with $6 that starts-with the string 500.
  • This will process all records with a 500, as well as all records with a 500/880$6-that-starts-with-500 combination.
  • The group noted how this processing with | differs from using AND (in the latter, both conditions would have to exist in any given record for the record to be processed).
  • This will make the 880s easy to process!

--> Next meeting: let's take a brief but detailed look at the function for processing $0 and $1.

2023-01-19

Present: Theo, Zhuo

  1. Anything Zhuo want to discuss? No
  2. Theo hasn't progressed beyond last meeting
  3. There was some discussion about the OMR-->UW vocabularies project

2023-01-05

Present: Theo, Zhuo

  1. Anything from Zhuo?
    • nothing in particular
  2. Review current state of m2r.xsl -- reviewed and made some minor changes to apply-templates with mode=ite
  3. Theo will "finish" work on 245 next
    • currently seem to be errors in the xsl:when conditions
    • some punctuation still not accounted-for
    • Theo will comb through and search for other errors; will create a 245 "dummy.xml"
    • currently transform claims these are the fields not yet accounted-for:
      • $3 : no $3 in 245
      • $6 : should we process at 880 or at 245 (i.e. XXX)?
        • if at the XXX field with $6, we can situate in the applicable template
        • if at the 880, we'll likely reference every template that applies, not all of which will be named
        • Theo thinks we need to code at XXX, not 880
          • Zhuo agrees; we'll go forward with this approach to 880
      • $7
        • issue 358 is empty ; a little content in issue 380, specifically that OCLC has not yet accounted-for $7 so we can punt; however, now (2022-01-05) OCLC has in fact listed $7
        • TG's proposal: let's continue to punt; when the group makes a decision on $7, we'll do a sweep through all fields with a $7 (ugh!)
        • also note: in most spreadsheets, $7 is not even there; somebody will have to enter in spreadsheets
        • data provenance will require reification in RDA and will be a difficult solution for us!
        • Zhuo agrees we should postpone coding the $7
      • $8 (We will not map $8 until a use case is provided. 2022-07-14)
  4. Anything else?
    • minimum description of a metadata work: need generated ID and link to exp; exp with exp ID and link to man (the rdf file in the original description set) for which we should mint an IRI. How? This is needed in 561.
    • The metadata work in Zhuo's example is reification of a statement describing a particular item, enabling him to say something about that statement. The problem, as Theo understood, is that the metadata expression needs to be linked to a metadata manifestation that actually exists. This was all addressed in the github repo in issue 225 for field 561. Theo will start reviewing and see if he can imagine some XSLT ways to resolve the problem.
    • Zhuo will not be working on 561 this week so the metadata work/exp/man problem will not be resolved this week
    • the sinopia templates project also is struggling with an implementation of RDA reification; it may be good to see what's going on there
    • Theo says this is a new problem we are tackling and that we should write an article of some type describing the problem and our solution.

2022-11-22

Present: Theo? Zhuo? Benjamin?

  1. Anything from Zhuo?
  2. Small project: record directions for $3 handling. Proposal:
    • write transformation code for a few $3's
    • record what we did in Discussion 353
    • pull it all together and create a $3 decision in the decisions index
    • timeline: get this done before the end of January
    • what's good about this: we can encounter a few $3's and record how we processed them based on what's in each spreadsheet and Discussion 353; we can record our field-specific processing of $3 in Discussion 353 so that anything unwise can be discussed by the overall group
    • what's not so good: the delay in deciding will result in varied approaches in the spreadsheets.

2022-11-14

Present: Theo, Zhuo

  1. Anything to add to agenda?
  2. What has Zhuo been working on? Anything of interest while doing that work?
    • 500, focus on $5; temp solution for $5 | $3
    • produced code to process every $5 in every MARC records the same way for items
    • $5 and $3 together will be resolved at next m2R meeting
  3. What has Theo been working on?
    • 336
      • expression information; but when it has a $3, we add note on expression that applies to manifestation!
        • This should be described as a problem in the ISSUE (not just in the spreadsheet)
      • $2 temporary solution involved
    • 245
      • terminal punctuation elimination using replace()
      • process a sibling field (in this case the Leader/18) using substring()= and the appropriate axis, in this case preceding-sibling::
      • straightforward field to code
    • 490
      • used grouping/group-starting-with to handle repeating $a $x $v
      • $3 easy to code
      • output MARC field value including marc subfields as a string
    • starting 340
  4. General observations (Theo)
    • Theo still skipping over $6
    • Summary of what's been coded on m2r-xxx.xsl file using comments
    • Entering notes in spreadsheet for rows coded
      • THEO SHOULD STOP DOING THIS; instead, every commit should reference the issue#

2022-11-04

Present: Theo, Zhuo

  1. Anything to add to the agenda?
    • $3 AND $5 ISSUES. Mint IRI for each $5. When $3 and $5 both appear: is $3, data in $a is mapped to man (note on man) with $3 appended to the end (i.e. applies to); then the item has no description.
    • ACTION ITEM: ADD TO AGENDA IN WEDNESDAY MEETING
  2. Approaching November 28 (SWIB)
    • Is what we need to do clear?
      • Main task: code fields on the board (Theo and Zhuo)
      • Run code and review data; use Crystal's MARC data set; JUST DO THIS AS WE CODE FIELD BY FIED; WE CAN RUN TESTS LATER
      • Write some code to output labels rather than opaque identifiers in RDA output (Theo)
        • what do we want it to look like?
        • proposed: just do it separately and add both transforms to an XProc 1.0 pipeline
    • If test data set has aggregates or diachronic works, we'll have to filter them out (or eliminate them from the set)
      • If there are no aggregates/diachronics, maybe add some to demonstrate how we'll weed them out
        • we do not have the criteria for weeding out these resources
    • Let's not worry about those BSR placeholders
      • let's not do them all; only the "obvious" ones; do it at-the-last-minute
    • meetings
      • Option 1: Just meet on Thursdays; if more discussion is required, either use Teams or email.
        • Is Zhuo OK to use Teams/marc2rda?
      • Option 2: schedule more meetings; do some work at meetings

2022-10-27

Present: Theo

  1. Theo asked Theo if there was anything he wanted to discuss, He replied, "it's all in the agenda."
  2. Notes were not added for previous meeting. What we did: we looked over the $5 work.
  3. Theo pointed out to Theo a possible division of labor; who will do what?
    • Placeholders for BSR elements (Zhuo?)
    • Fields ready to code on board (Theo?)
    • run code and produce sample data (Zhuo?)
      • Crystal loaded MARC records today in Github
  4. Re-useable code to output labels in identifiers rather than opaque identifiers (Theo or Zhuo)
  5. Anything else? Theo said no, nothing else. Meeting terminated at 3:10 PM.

2022-09-08

Present: Theo, Zhuo

  1. Theo working on 264.
  2. Parallel 264$a, $b, $c statements are not limited to two. Current code only accounts for entry to the left of the '=' and the entry to the right; however, there may be more than one equal sign. We should tokenize() using the '='.
  3. RDA properties for the parallel statements are soft deprecated (see https://www.rdaregistry.info/Aligns/alignSoft2Rec.html which displays 115 soft-deprecated properties). Current code uses soft-deprecated properties based on a mapping (i.e. the 264 spreadsheet) completed before we were aware (in MARC-to-RDA meetings) that these properties were soft-deprecated. Code will be rewritten using the "RecommendedLabel" rather than the "RedundantLabel." An item will be added to the next Wednesday meeting agenda to open a discussion.
  4. Theo will continue the 264; perhaps start the 245 now ready for transform; Zhuo will continue the "preprocessing" for $5.
  5. Some XSLT 3.0 instruments were introduced into the code; specifically text value templates were combined with use of the XPath 3.1 operator => ; when using, don't forget to use @expand-text! Both Theo and Zhuo agree that this operator improves readability compared to the usual approach of "layered" XSLT functions.

2022-09-08

Present: Theo, Zhuo

Agenda and Notes

  1. Anything Zhuo would like to talk about? (still working on $5 preprocessing)
  2. Naming conventions for m2r-xxx-named.xsl (a) record decision in a README.md file in //Working Documents/Transformation Code (in the git repo). (b) move decisions into the README. (c) Theo will create the README. (d) the convention for named templates: @name="F264-x2-abc".
  3. $5 update (Done in #1 above). (a) No progress made on $5 coding in the m2r transforms.
  4. Timeline considerations. (a) We want to have an MVP by mid-November.
  5. workflow considerations. (a) when done, commit with a message that references the issue; if issue 32, reference it with "#32" (see decision inex III.c). Team seems to be on the same page on workflow and selecting fields to work on; how to complete them is not entirely resolved.
  6. Upcoming week: Theo will work on 264 and the README. Zhu will work on $5 preprocessing; if he finishes that, will select something to code from the project board.


2022-08-31

Present: Theo, Zhuo

Agenda and Notes

  1. Zhuo will continue working on the "preprocessing" i.e. the "external dataset for organizations."
  2. Zhuo and Theo will code fields from the board as needed.
  3. We will not pursue accessing the Code List for Cultural Heritage Organizations over http this year; we will use the bulk download. This means we will niss updates to the LC data, so we should pursue this next year for certain. For now, we want to demonstrate how the mapping can guide a transform but the end of November.
  4. Theo will be away from work until the week of September 5.
  5. Next meeting Thursday, September 8


2022-08-18

Present: Theo, Zhuo

Agenda and Notes

  1. Review Zhuo's 030--Code--code vs. spreadsheet--issue tracking--can we run it?
  • The code looks perfect.
  • code v spreadsheet looks fine
  • issue tracking has correct label
  • no attempt to run; will attempt outside meeting.
  1. $5--collection module-->assign to Zhuo!
  • The output of tihs module will look somethng like the following:
  • <www.marc2rda.edu/ColWor/aealjj> rdfs:type rdac:Work ;
    - hasMan <www.marc2rda.edu/ColMan/aealjj> ;
    - hasNameOrWhatever “Collection of [lookup label in "Organizations scheme in MADSRDF format serialized as XML."]”
    - hasAgentEtc <www.marc2rda.edu/agent/aealjj>;
    - moreProperties moreValues . #if applicable
  • <www.marc2rda.edu/agent/aealjj> properties values ;
    - hasAppellationEtc http://id.loc.gov/vocabulary/organizations/aealjj .
  • <www.marc2rda.edu/ColMan/aealjj> rdamd:hasAppellofMan “Collection of {aealjj-Label}” ;
    - manOfWork <www.marc2rda.edu/ColWor/aealjj> ;
    - moreProperties moreValues . #if applicable

  • Zhuo will first attempt to download the data.
  • Then Zhuo will try to access the data over http.
  • At the meeting it was established that http GET could retrieve the RDF/XML but only with the header accept: application/rdf+xml.
  • Theo isn't sure how to incorporate headers into document requests using a URL. Zhuo will experiment.

2022-08-04 Present: Theo, Zhuo Regrets: Benjamin

Agenda and Notes

  1. Sita volunteered to help -- do we need help right now? Not now; perhaps in time.
  2. Approval of Workflow decisions Note: there's a new category on the main project workboard: "Almost Done"
  3. Anything on Zhuo's mind? It's good.
  4. How can we record issues specific to the transform? Create a new issue. Name the issue TXXX. Apply 2 label: "XXX" and "Transform"
  5. Theo introduced pipeline design
  6. Review of this week's coding
    • 6a. Include discussion of $5 Concerning $5. We probably shouldn't mint IRIs for ALL 500 fields with a $5 for all institutions. Probably shouldn't mint IRIs for other institution's items. DEFINITELY should not assign the same IRI to different items.
⚠️ **GitHub.com Fallback** ⚠️