2025 Meeting Minutes - uwlib-cams/MARC2RDA GitHub Wiki

April 2, 2025

See time zone conversion Meeting norms Present: Absent: Time: Notes:

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (10)

  • OCLC says we can export 50K records from WorldCat which should fix our DLC record shortage :)
    • Can SZ and EK (potentially SB?) send 50-100 records to DF with $0/$1 and or $2 in headings fields? So our testing pool can be rounded out?
  • Crystal is posting two new student positions--they should be up and accepting applications any day now
  • A new draft version of the Heading Fields Attribute Mapping has been added to the Google sheet, and an explanation is provided in the Attributes table #471 page here.
    • The next concurrent steps are to:
      • find out whether the coders can code using this format
      • review the content of the table looking for errors, e.g.:
      • $f instead of $n in a copied instruction
      • missing fields or subfields
      • comments on decisions made

Student Presentation: Transformation Walk-Through

Wrap-up (5)

Action items

Backburner

March 26, 2025

See time zone conversion Meeting norms Present: Ebe, Crystal, Jian, Sita, Adam, Deborah, Laura Absent: Sofia, Junghae, Sara, Doreen, Tynan Time: Ebe Notes: Jian

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (10)

  • OCLC said no to record reuse at the scale we requested, which is 500k records. Need to figure out how to get it from LC. Probably will need to download from LC directly.
  • LD4 conference dates were announced for summer: we should make a proposal
  • IFLA presentation went great. Slides will be available on the IFLA site. Not clear if the presentation was recorded or not. Can do a similar presentation for LD4 with updates for transformation progress.
    • Got a lot of interest and questions. There are people using RDA/RDF in European libraries. People approached afterward expressing how impressed they were. Crystal reached out to ask if anyone would like to join the project. Will wait and see.
  • Crystal is posting two new student positions next week. Students: tell your friends. Prioritizing XSLT skills.

Transform Test Datasets for Fridays (15)

  • Deborah created a feedback template/form
  • Request template
    • Have not yet decided on a request template
    • Deborah suggested to create a spreadsheet for all of the MARC fields with the coding status, such as the coded date, reviewed date, etc.
    • Crystal started a Google Sheet named Transform Output Review in project shared drive and linked from Test Datasets Discussion
  • Who will assemble input?
    • Crystal, Adam, and Deborah agree to assemble DLC/WAU records alternately
    • Will complete the Google Sheet for Transform Output Review first and then decide about the input records
  • Output location
    • Students will figure out
  • Dataset sizes
    • 10 records each week (not hard and fast rule--will adjust depending on what we need to demonstrate)
    • Start with fields that are already coded
  • More discussion needed--will revisit this topic next week

Downloading records from LC (15)

  • Which records? What does the dataset need to look like?
  • How do we go about downloading the set?
    • How to download from the website is tricky. For example, how to search if you want records with 100s with different indicators?
    • Maybe start with searching in OCLC for a list and then download them from a different source?
  • Where will we store it (and other datasets) prior to uploading to Dryad?
    • Deborah has (limited) storage for this. More than UW
  • Questions about the estimated output size were unanswered. We really don't know how big our output dataset will be in the end.

Mapping Review Check-In (25)

  • Linking fields: we decided to use amended Toolkit labels from Deborah's chart. Reviewer should also update according to that decision, if this hasn't already been done. Update?
  • Assigning review for "awaiting review": remaining tags assigned to group members
  • Review assignments for "review in progress"
  • Revisit deadline (end of month)

Uniform titles/Attributes table check-in (15)

  • 130/240
  • Deborah updated the attribute table, including attributes for 130/240
  • Has name of person vs. has preferred name of person for 100/600/700
    • Subfield c is part of the preferred name not just subfield a, therefore, using subfield a only would not be accurate as a value of preferred name of person. We need to use name of person
  • For corporate bodies, has name of corporate body is better than has preferred name of corporate body because subfield a could contain a parenthetical qualifier that is not part of a preferred name
    • Same with uniform titles 130/240. Uniform titles may also contain supplied qualifiers
  • Deborah noticed $0 and $1 have not been mapped consistently. The decisions index is not very clear. More instructions are needed for different types of work.
  • The attribute table is still missing a lot of things

Wrap-up (5)

Action items

  • A separate meeting is needed for more discussion on the attributes table topic (Crystal scheduled for 1pm Pacific Daylight time Thursday. If you want an invitation but didn't get one please email Crystal ASAP)
  • Next week students will do a walk-through of the transformation for the team

Backburner

  • What is a ballpark estimate of the size of our output data for the initial transformation?

March 12, 2025

See time zone conversion Meeting norms Present:Crystal, Adam, Ebe, Sita, Tynan, Laura, Sara, Deborah, Junghae, Doreen Absent: Time: Sara Notes: Doreen

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (10)

  • Crystal still talking to OCLC about record reuse
  • IFLA presentation next week!
  • LD4 conference dates were announced for summer: we should make a proposal
  • Crystal and Sofia have IFLA next week: meeting canceled
  • Aggregates transformation code: Are any snippets ready for prime-time? We'd like to include some in our slides for IFLA. Slides are due today, so if not we will scrap the slide.

Uniform titles (20)

  • 130, 240, series (830): Deborah - Assume single expression unless it's aggregating where date is treated at the work level.
  • 6XX, 7XX, 8XX is handled.
  • 130 & 240 are the problems. Preferred title creates AP but how to match AP in authority files. Sita did the mapping, and Laura is doing the mapping review. During the early stages, AP is not in consideration. Will work on it asynchronously.
  • Attributes table need more work. AP Mapping Table tells the field and combination.

Mapping Review Check-In (25)

  • Linking fields: RDA Registry labels vs. MARC21 labels: Deborah's chart and review of those fields
    • Decision will be made via poll or async discussion.
    • Option: Using MARC21 Label or Print Constant Label or Column C in Deborah’s Table
    • Option: Using column D RDA Registry Label in Deborah’s Table
    • Option: Using column E in Deborah’s Table
    • Note: PCC labels not an option because they are incomplete.
  • Assigning review for "awaiting review"
    • 7xx will be reviewed once we decide what to do with the labels and replace them.
  • Review assignments for "review in progress"
  • Any fields yet to be mapped in BSR?
  • Revisit deadline (end of month)

Review "asynchronous discussion needed"/"meeting discussion needed" label use and go through tags with those labels (25)

  • Once aync is resolved, async label should be removed. The only ones left in issues with "async" label is attributes and 240 uniform title.
  • Go through discussion to see if async discussion is needed. If you put the label on, you should put questions you have in and tag people directly.

Wrap-up (5)

Action items:

  • Crystal will create discussion/poll and we will vote on which label to use in Deborah's table for linking entry fields.

Backburner

March 5, 2025

See time zone conversion Meeting norms Present: Absent: Time: Notes:

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (10)

  • Crystal still talking to OCLC about record reuse
  • IFLA presentation still in progress
  • LD4 conference dates were announced for summer: we should make a proposal

Mapping Check-In (20)

  • 045: the questions on the issue page are relatively new (from Jan)
    • Requires a translation table that we don’t have
    • Laura will do her best to make her own judgements
  • 758: Gordon has given good advice on this, Ebe is nearly there, overcoming some hurdles putting everything in the spreadsheet
  • Mapping spreadsheet X00: with thee fields we have access points and attributes
    • We need mapping spreadsheets that put the attributes together with the entities that show how we are mapping the X00’s in the transformation
    • This is separate from the access points
    • How is this different from the spreadsheets for the X00’s
    • We are going to replace the old X00 spreadsheets; although some of them have been actively maintained
    • E.g. Cypress and Penny maintained the 100
    • Laura: suggests that we do individual tag spreadsheets for 600, 700 etc. because they are meaningfully different enough from each other
    • Decision: it is ok to map these individually. Jian is reviewing the individual spreadsheets and making sure they have been mapped once. Then we will close this.
  • 857: Ebe is continuing working on this – moving forward after feedback
  • Updating mapping sheets for augmentation aggregates:
    • Crystal and Deborah meeting about this, it’s a big task
  • Compile list of abbreviations
    • Sara is on this
  • $7: Ebe is suggesting we postpone this to phase II
    • Adam: we can use it, but unaware of anyone who has used it. PCC doesn’t have any policy/guidelines or training about it. Doubts LC has implemented it
      • Suggests putting this off to phase 2
      • Crystal uses it to determine open-access
      • We haven’t dipped out toes into data provenance at all, so we need to leave this for phase II
  • Attributes table
    • Almost done, waiting for decision
    • Sofia is working on this
  • Almost done work:
    • 7 bibliographic level: code ‘m’ mixes static and diachronic works
      • Leader 7 is a mess, it does not translate cleanly into RDA
      • We use it in order to do things with it
      • Only thing we can do is put it in its category
      • An integrating resource is a work
      • As we do aggregating work, category of work is “collection work”
        • Doesn’t need a subunit because we would show that is has a parent
      • Adam: do we have to use RDA vocabulary? Maybe we map to the MARC vocabularies
      • Laura: for 006 and 7, these are values that we are using to determine certain conditions; in MARC they are valuable
        • If we ever have to map back to MARC we need these
        • Adam: impossible to map back to MARC
      • We could create wiki-data items, have a table that indicates the URI’s
        • Ebe is interested, but would like some guidance
  • To-do category
    • 400, 411: should be doable because we’ve already done the 490

RIMMF Output Data Review (the rest)

  • For full overview, see Deborah and Sara’s documentation:
  • Deborah working on a document to post if we want to use RIMMF for reviewing
  • Install/update RIMMF
  • Run the .exe
  • Import the file for review
    • You can find review files here: https://github.com/uwlib-cams/MARC2RDA/tree/main/Working%20Documents/transformationCode/outputDataForReview
    • Set up files to test the aspect that we are looking for – makes reviewing simpler
    • Work with .nt file for RIMMF
    • Download the file – make sure to click on the folder (left-hand side) rather than the commit information (center)
    • Go to Tools → import entity records → make sure the External data button is checked
    • Then drag and drop the file onto the interface
    • Go to tools, load entity index
    • Can sort indices by entity
      • Start with the manifestation and work up
      • Suggestion: filter and only look at manifestations in the index
      • Sort alphabetically
  • Click on a manifestation to take a look at it
    • Comes in the same order as the triples
    • Options → sort by element label
    • Open MARC record so that you can compare
      • “Manifestation described with metadata by” takes you to the metadata work, the MARC record is a Note on manifestation
    • With the records side-by-side, you can compare the fields , which will depend on what as been mapped
    • When reviewing, only looking for things that came over unexpectedly, not looking for cataloger error
    • For example: do you need publication statement?
    • 2 works and 1 expression is augmentation aggregate
    • Normally go from manifestation to expression
      • Not much there
      • All we have in augmented work is appellation data
      • We have link back to manifestation and link up to work
      • This is the augmented work, not the aggregating work
      • The identifier is the local part of the IRI
    • Title of expression did not come over because it has not been mapped yet
    • Related person of work is in here, but this is an error; we were not supposed to map any agents or related works to the primary work because we do not know whether they are actually related to the primary work
    • We need to set up a review process where we can give the IRI or the access point (within RIMMF we can give the RIMMF identifier)
      • That’s the best thing to put down as a header
    • Go back to the manifestation and click on work
  • Review: how to open RIMMF record
    • Tools → entity index → double click
    • File → close all records
      • This closes the open-records, but leaves the program open for you to look at new records
  • RIMMF will show any appellation, need to know the RIMMF id
  • Cypress put mapping from access point table into the code – we have to edit this to make any changes
  • Crystal: we should explore data we’d like to explore in RIMMF next time
    • Let’s put the import instructions in GitHub
    • Put the RIMMF instructions on the project WIKI
    • Identify which kinds of datasets will be most helpful to review
    • The large chunks are too overwhelming for review – we can’t take in 25K entries!
    • Need an asynchronous discussion about output review – output the data from ALMA or OCLC and experiment with it

Wrap-up (10)

  • We might do a demo of how to run the transform during a working meeting
  • We have mapping work assigned and asynchronous discussions that need to be had
  • Mapping review deadlines
  • On Fridays we can update a new transformed dataset, decide during Wednesday meeting which files we want uploaded

Action Items

Backburner

  • We should probably have the transform team walk the rest of the team through the transformation code. How to run it, where the functions are, etc., so that everyone knows how to look things up and is capable of running it independently/can help Tynan onboard new students in future

February 26, 2025

See time zone conversion Meeting norms Present: Crystal, Sara, Junghae, Doreen, Laura, Sita, Ebe, Adam, Jian Absent: Gordon, Deborah, Tynan Time: Ebe Notes: Sara

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (10)

  • Deborah is out of town until March 4
  • Crystal is in touch with OCLC about permissions for using metadata exported from there rather than downloading from LC. They're checking
  • Next week: data review in RIMMF
    • Install own instance for ease of following along and testing on own - RIMMF 6
    • Crystal will try to install and import. Will capture and share instructions if they're not already available
    • Ebe recommended the help content's usefulness
  • Sofia and Crystal are drafting the IFLA presentation
    • They will share with Laura and Ebe for review
    • Ebe and Laura are on the program as presenters; however, they are co-authors but will not be present. Crystal will clarify with IFLA.
  • Laure: Question on gathering content for RSC Review
    • Use RSC Question label on the issue.
    • Make sure context and question are clear in the issue (or be prepared to get an email asking about it :) )

Google Drive Space (15)

  • Within 18% of space limit left
  • 300MB per recorded meeting
  • Need to start a meeting recording archiving process: Crystal can start moving things to UW OneDrive
  • Start with 2023-August 2024? Then every 6 months do another 6 month chunk?
  • Ebe: thought the plan was to keep records for ~3-6 months and then delete?
  • Yes, retain for 2-3 months unless something is especially interesting (meeting notes)
  • Option to save only the transcript instead? Yes, some have, some older do not.
  • Could then save meetings for a year
  • Everyone - share any reactions in the next 1-2 weeks before moving ahead with implementing.

Mapping check-in (45)

  • Meeting discussion/Asynchronous discussion needed on mappings
    • 535: Laura will confirm status is accurate
    • 240: Laura's been working on this. Jian is doing a review on 130. Attributes table needs more work first. Deborah will work on it when she returns. Put any related issues on hold and make a note in the issue for tracking.
    • 070: Crystal will ask Amanda Xu at the National Agricultural Library (NAL)
    • 018: Laura notes it's an identifier for articles, relevant for making photocopies, but difficult to find information on its use. Adam and Ebe agree that it's not RDA, is administrative metadata - doubts about usefulness of recording the data in RDA, but no big concerns. Decision to map using a text string.
    • 843: Holdings format tag that can be used in a bibliographic record. Complicates reproduction picture (isn't in any PCC documentation), but indicates a specific copy is a reproduction. Potentially useful in a scenario where a library's original is destroyed and a copy of a copy is required; though without holdings information can't say. Agreement to move this to Phase II with other 8xx tags.
    • 773: Crystal wanted Cate's input on whether there's consistent usage that would allow mapping to anything other than note on manifestation. Deborah noted that previously the group decided that (for Phase I) we should map all values from the 76x-78x fields as Manifestation: note on manifestation. Ebe did 760 with Deborah as a test run. Laura can use what is in the Amended worksheet as a template. For display prefix, decided to use MARC's display constant "Main series: " - main series means something to a user, but note on manifestation doesn't to anyone unfamiliar with RDA. Updated 760 transformation notes to reflect the same. If look at in Phase II could choose to be more granular.
    • 720: Why are the $0, $1, $7 subfields here? It's a standard number, but not an authority record. Why no 2? Adam thinks it was originally in the proposal but taken out for more consideration. Could be because the source is indicated within (e.g., imdb, discogs)
      • $0 - source that has URI that represents the name but isn't modeled as a RWO (720 ##$aKevin Gray(discogs)a312098; 720 2#$aThe Other Baby$4prn(imdb)co0776444)
      • $1 - uncontrolled name in it and wikidata uri for person/corporate body (##$aLiliana Essi$1http://www.wikidata.org/entity/Q19760388; ##$aTshul khrims rin chen$1http://viaf.org/viaf/22550486)
      • $7 - just the provenance information
  • Anyone need help? Anyone available to give help?
    • Ebe thinks she can get the rest of hers done; most are linking fields; 410/411 can probably be copied over from the 700s; may be down to the wire
    • Laura working on 045. Time periods expressed in different ways, with a variety of subfield combinations. Crystal asked whether Orbis Cascades standing group have something for this already? Adam noted that the field is pretty much obsolete at this point, and that EDTF is used in 046. Sita noted is like 008, and use MARC table and link it with code? Adam noted coverage of content has no range. Group will continue the discussion in the issue.
    • 765 assignment updated to reflect that is Ebe is working on it.
    • Appendix J: if OCLC haven't defined it yet and it's not being used, is it possible to postpone to Phase II? Adam says put $7 off to Phase II - no one's using it, Ebe hasn't implemented it. Yes, move to Phase II.

Wrap-up (10)

  • Share thoughts on Google Drive storage
  • Mapping deadline is in 2 days - February 28
  • Mapping review deadline is in 1 month - March 31
  • Download RIMMF 6

Action Items

  • All - finish mappings
  • All - download RIMMF
  • All - start working on mapping reviews
  • Crystal - contact Amanda Xu

Backburner

February 19, 2025

See time zone conversion Meeting norms Present: Deborah, Adam, Crystal, Jian, Laura, Junghae, Doreen, Trina, Sita, Ebe, Sara Absent: Gordon, Sofia, Tynan Time: Ebe Notes: Sara

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (10)

  • Respond to "asynchronous discussion needed" tags!
  • See project roster updates - is everyone's job description current? Would Trina like to be added?
    • Trina would like to be added and will send something over.
    • Email Crystal if anything needs adjustment.
  • We can use UW's Dryad for output data parking
    • Crystal has ORCHID ID can use
    • Appears simple, open, stable spot for initial parking
    • Not editable directly, but can download, manipulate, re-version, and update it
    • Size limit is per file. Will need to chunk, which is common (LC, Harvard, likely convenient for users)
    • Chunking strategy needs discussion. Aggregate types, then WEM?
  • Crystal spoke with Christine E from Harvard about Dataverse data and emailed Jeff M from OCLC again about using OCLC data rather than downloading from LC
    • Harvard does have an agreement with OCLC - Crystal seeing if can make the same deal
    • Policy looks like something UW could do too
  • Sara and Doreen are graduating in June. If another institution can hire XML coders, now is the time.
  • Deborah not available next week after Wednesday: back on 4th of March: finish aggpulls prior to 28th?
    • Crystal, Deborah, Tynan to meet for status update

Reconciliation and Deduplication Timing (30)

  • Phase I or Phase II

  • Works, expressions?

  • Manifestations?

  • Aggregates?

  • Reproductions?

  • Subset?

  • URIs are using approach that mimics access point and appends to the end of a stub URI and attempts to dedupe Manifestation, Work, Expression that way

  • Some are creating merges that aren't the same things

  • Deborah showed an example of what is happening in RIMMF that is an issue with a video recording

    • Two-dimensional moving image has additions to its soundtracks (e.g., music, speech, subtitles, closed captioning, special features, etc.) - but actual film is the same for all
    • Simplest way to handle could be exclude for handling in Phase I, as have done for sound recordings
    • Historically, tell if it is silent, otherwise assume there's speech. RDA hasn't raised with movie community - and whether should be one for spoken word or two-dimensional moving image. If add performed music now have moved into aggregating
  • Laura agrees this needs to be sorted out if trying to be perfect, but we're not trying to be. Stumbled on AV issues, which is just one of many dealing with

    • Changed position to ok with Phase I duplicate IRIs, but tell people why we're doing it, that ultimately don't think this method is final, and is a work in progress. The substantive conversation about reconciliation after conversion should come in Phase II. It's great work, and also don't want to oversell it.
  • Adam agrees probably good to show bad data, explain it, call attention to it. Well aware in Phase I creating dupes that can't be deduped or incorrectly merging, and suggest what can be done to improve results

    • What write up can be series of case studies of different transformations of what went wrong and why
  • Laura asks if it's possible to have code with version that has both options so if want to do their own reconciliation/deduping can try

    • Depends on whether Cypress had coded and commented out coding that used opaque IRIs or not
  • Decision: Go with Laura's suggestion with disclaimer

Mapping Issue Check-In (15)

  • Redistribution Needed? Reports needed?
  • Ebe is doing an intensive review on hers this weekend. Will update early next week if need any redistributed. Has been doing work offline
  • Update mapping sheets for augmentation aggregate changes #483 - Crystal taking over from Cypress, will likely need assistance from Deborah
  • 751 - Sita working on, close to done
  • Mapping Syntax Spec - was intended to be machine-readable, but that is now out of scope for this phase, so instructions/decisions is fine
  • 765/767 - those are notes - Ebe should be able to knock those all out in a batch
  • Mapping Spreadsheet for X00's - Jian reviewing mappings for 100; will investigate this issue
  • 130/240 - Laura and Sita connect on what's needed/who take lead

Issues: meeting discussion needed (25)

  • 770 - can be supplement to monograph or monograph except - Ebe looking at as part of her batch
  • 525, 041 - discussion not needed, removed label
  • 336 - question on handling $3. Will use 3xx with $3 present decision. Sara will update Decisions Index to explicitly say it is a note on manifestation
  • 245 - Junghae will review
  • Things to report to RSC - Laura looking for records of naturally occurring objects. Herbarium specimens in OCLC or Smithsonian? Or Harvard?

Wrap-up (5)

Action items

  • All - Review own issues with "asynchronous discussion needed" tags to confirm tag is needed/accurate
  • All - Respond to "asynchronous discussion needed" tags
  • All - Discuss output data parking chunking strategy
  • Crystal, Deborah, Tynan to meet for status update on aggpulls
  • Sara will update Decisions Index with
    • Reconciliation and Deduplication approach decision and
    • Updated 3XX with $3 decision to explicitly say note on manifestation
  • Doreen/Sara - run a small sample set of records for group data review? If don't have many issue discussions/decisions to make

Backburner

February 12, 2025

See time zone conversion Meeting norms Present: Deborah, Cypress, Ebe, Doreen, Gordon, Laura, Sara, Crystal, Junghae, Sita, Jian, Tynan, Trina, Adam, Sofia Absent: Time: Sara Notes: Doreen

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (10)

  • Cypress: Code for augmented aggregates most finished and metadata for MARC record finished.
    • Deborah can show what it looks like in RIMMF.

IRIs (10)

  • ITSDS could redirect web requests from https://domain-to-be-defined.lib.uw.edu/ to a web site of your choosing. The way this would work is, any request for that domain name would be redirected a web site of your choosing. As a specific example, if you used GitHub pages, a URL like https://rdf-metadata.lib.uw.edu/xyz could be redirected to http://uwlib-mig.github.io/rdf-metadata/xyz

  • Five-Star Decision: Crystal and Laura: whether to pursue five-star now or later.

  • Data Storage Concerns:

    • Laura: Where will data reside if pulling from GitHub? Concerned about large records..
    • Crystal: Agreed—bulk storage needed, not per-entity.
    • Deborah: Need a web domain, data storage (triples/RDA registry), and domain maintenance (~maybe $300/year, possibly via donations).
    • Laura: Concerned about minted IRI. Triple store?
    • Crystal: Thinking about one web page, UW won't pay for triple store.
  • GitHub Limitations:

    • Sara: GitHub limits files to 100 MiB; repositories should be <1 GB, ideally <5 GB.
    • Crystal: GitHub isn’t viable; will explore institutional repository options (ask ITS, Denise, or Preservation).
  • Next Steps:

    • Decide where to store and manage data. Crystal will inquire about UW’s institutional repository.
  • Anyone interested in meeting with ITSDS at UW about the IRIs with Crystal? Need to figure out how they will work

  • Reminder from Crystal: Complete mapping before the deadline.

Review output data

  • IRIs coming out as expected?

  • AAM tests

  • On De-duplication Challenges

    • Crystal: Current deduplication approach is rushed and requires a more thoughtful method in Phase II. Mushing things together is worse than duplicated data.
    • Deborah: De-duping is extremely important to show the importance of RDA (Entity-relationship). This is just a test dataset.
    • Ebe: Is it feasible that we split the files and run de-duping them differently? I.e. De-dup ebooks only and not videos because of mentioned issues with a particular media? — Compromise?
    • Adam: Not deduping incorrectly. Ebe's good idea where we can more reliably deduping if we can figure out what those are.
    • Ebe: Even if it's bad merge, worth doing deduping. Agree with Deborah. Make it less Error-prone in Phase I, especially because it is a test database.
    • Avoid premature deduplication may be preferable to ensure accuracy. Continue discussion next week or maybe a poll.
  • On MARC Metadata Storage

    • MARC records are being stored as literals in RDF (note on manifestation), but it's difficult to read.
    • Options discussed include storing raw MARC record, converting it to more readable format as turtle has linebreaks, or linking to an external host.
    • Note on manifestation is what we have discussed before and did.
    • Cypress: For review, looking at field-by-field is more helping and they are still in the comments for each field-by-field templates.

Wrap-up (5)

Action items

  • Crystal will figure out if we can store datasets in institutional repository at UW. (Adam: Ask maybe Denise or Preservation?)
  • Discuss the timing of reconciliation and de-duplication next week.

Backburner

February 5, 2025

See time zone conversion Meeting norms Present: Deborah, Cypress, Ebe, Doreen, Gordon, Laura, Sara, Crystal, Junghae, Sita, Jian, Tynan, Trina Absent: Sofia, Adam Time: Ebe Notes: Sara

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (10)

  • Crystal is sending dataset numbers to OKG/NLG today
    • Emory and DNB will not share records currently; unclear whether rights restricted
    • Have not heard back from Harvard yet
  • UW Libraries decided not to fill Cypress's position: if any other institutions can hire an XSLT coder for Phase II that would be helpful
    • Written feedback can be emailed to Crystal
    • In the meantime, Doreen, Tynan, Sara will pick up where Cypress is leaving off

Mapping Check-in (15)

  • To-do vs. In Progress vs. Done
    • Issue Board
    • Ebe has some to look at and will start today; will let the group know if help is needed
    • Linking data will all be mapped as notes in Phase I. Will be revisited in Phase II
    • Laura has 3-4 issues that need some discussion
    • ALL: Use labels when discussion is needed: "asynchronous discussion needed" or "meeting discussion is needed"
    • Adding "asynchronous discussion needed" to 720 regarding $7
    • Watch for issues with status:"Almost done - waiting for decision/answers to questions" - Cypress moves issue here if questions while coding
    • Try to get what can to "Ready for Transform" this week for Cypress
  • Timelines
    • Mapping: February 28, 2025
    • Mapping review: March 31, 2025
    • Transform code: April 30, 2025
    • Output review: May 30, 2025

IRIs for entities

  • Identifying manifestations reliably
  • Documents reviewed:
  • Initial thinking was that transform would be one run; has evolved throughout project to run iteratively
  • Want to reduce duplicates on re-runs, while also acknowledging that full deduplication is out of scope for Phase I and will be tackled more comprehensively in Phase II
  • Deborah created Access point mapping table that works really well. Manifestations are complicated.
  • Discussed and reviewed proposal to use 016, 035, 010, then AAP approach as a last resort towards unique IRIs
  • Examples: from m2r iris and identifiers documentation; will add source string in the IRI to reduce instance of using the same identifier from different sources
    • http://marc2rda.edu/fake/transform/man#00037837
    • http://marc2rda.edu/fake/transform/man#ocolc1544994
    • http://marc2rda.edu/fake/transform/man#speakingofjaneausten1980universitymicrofilms
  • Deborah proposed adding normalized AAP string to lessen number of hits deduping
  • Gordon agrees this is the best solution - suggestion to add control numbers more likely makes it unique
  • Switch thinking a bit and decide which components AAP-bit should be, then translate from ISBDM to MARC codes. Will get alarmingly large IRIs, but they will be more likely unique
  • New Manifestation IRI Proposal:
    • AAP + Control Number approach:
    • normalised({has title proper})|[supplied title] + " (" + {has date of creation of manifestation}|{has date of copyright of manifestation} + "; " + {has creator agent of manifestation} + "; " + {has category of carrier} + ")" + (“+ {BNB#####}|{OCLC#####}|{LCCN####})
    • Find carrier type in the mappings*
  • For items, want a unique IRI every time transform runs
  • With XSLT the generated ID is unique during the run, but on reruns will get the number being used again - so danger of getting incorrect merges
  • Currently use manifestation ap when minting IRI to help prevent duplicates on re-runs
  • Cypress proposes using date instead of manifestation ap to better ensure unique
  • Cypress will implement, and team can review output

If time, look at Jane Austen data more fully

  • Reviewed jane-austen_NA.ttl
  • Can tell deduped to works when see multiple 008s and 245s - indicates there were multiple records
  • Looked at line 72961, marc2rda.edu/fake/transform/exp#aikenjoan1924-2004eliza%27sdaughterenglish, and how to trace what's there through the files
  • Cypress will create a discussion for this review
  • Cypress will update the lexicalalias files today
  • There is no limit on the length of the title in 245

Wrap-up (5)

Action items

  • A survey asking about the important decisions from today?
  • Crystal will send dataset numbers to OKG/NLG today
  • Cypress will implement using date instead of manifestation ap in item IRI
  • Cypress will create a discussion for this review of the Jane Austen data
  • Cypress will update the Jane Austen lexicalalias files today

Backburner

January 29, 2025

See time zone conversion Meeting norms Present: Absent: Crystal, Gordon, Adam Time: Notes:

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (10)

  • Cypress' last day is February 13th.
    • Priority is getting documentation out so that others can pick up
    • See google drive folder below:
  • Transform documentation is here (and in progress). This includes how certain aspects within the transform work, as well as broad overviews, a transformation intro for onboarding, and instructions on running the transform.
    • Also available as a Read.me in the folder for the transform
  • BIBFRAME Update Forum - might be interested in the "Modern MARC" section which LC's back- converted BIBFRAME will follow. https://www.loc.gov/bibframe/news/bibframe-update-jan2025.html

IRIs (25)

  • IRI transformation documentation
  • Discussion - Minting IRIs
  • Discussion - Designing our IRIs Deborah and Cypress met to discuss what the transform is currently doing, we need to decide what we want it to do
    • Minting IRI’s versus using external IRI’s
  • MAIN WEMI
    • At present the IRI is constructed as the base IRI + control number for 001 + type of entity
    • When we are sharing records from a variety of sources, we want to prevent having the same IRI applied to a different entity from a different source
    • This wouldn’t happen internally because our control numbers are unique to our system
    • We should be applying the same instructions from related entities to the main entities
  • Related RDA entities
    • e.g. Creator of work, or a work that another work is based on
    • Unreliable to map these related works and expressions (agents are fine)
    • We only map every work-added entry as a work
    • We don’t have many approved IRI sources
    • In NACO authority file, for example, we have only approved for corporate bodies, families, and places
    • The sources need to be improved so that they can be approved
    • When using $1 with an approved source: the only thing we would add to an external IRI is a relationship to an access point
      • We add a triple with “has access point” or “authorized access point” along with the string or the nomen
      • This is important for related entities because we can’t trust what’s in the MARC data
      • For the main entry: the attribute information we have may not be available in the related IRI description set
    • Brief detour to Jane Austen and RIMMF:
      • We have duplicates – these should all be a single entity for that work
      • The duplicates are coming from related work added entries
      • We are giving bib control number, no meaning in a display
      • We have many records with Jane Austen as the title when you bring the records into RMMF and try to show them in an index form
    • $2 is similar to $1, but we have a source for the literal in the MARC record
      • The source is approved – same list
      • We don’t have an external IRI
      • Pattern for minting our own is in the document
      • Authorized access points have to mapped so that they can be used as concatenated, normalized string
      • Purpose is to do some automatic deduplication: if the entire RDA triple is the same, it automatically de-dupes
    • Worse case/most common: neither $1 or $2 are present:
      • We mint a non-meaningful IRI that appends a running count at the end
      • We are ending up with many duplicates
    • Sofia: If two records describe the same work, but with different information, in the future it might be hard to map them
      • Deborah: We’ll either be mapping the two description as separate work entities
      • Or we’ll have found a way to make them match using the local part of the IRI (instead of using the 001 and entity label); we may instead use the authorized access point for example
      • Taking the mapped work from two different records will have the same IRI from subject – if a triple-source is absolutely identical, only one is kept automatically
    • Laura: if the source library has been doing authority maintenance, then the access points will be the same

WEM Access Points (25)

  • Jane Austen records in RIMMF
  • Access points table
    • Are we creating access points for the main work and expression? Gordon said that the identifiers are sufficient
    • Last week, however, we thought it might be important to have access points displayed for human readable purposes
    • Agents have been done (100, 600, 700)
    • When mapping over person-entities from authority file, what did you do at NLG about presence of fictitious characters?
      • If fictitious characters used as pseudonym, treat as nomen (e.g. related nomen or work)
      • If it is a subject: treat as skos:concept
      • We need some list of texts
      • If we only provide as an access point for the person, corporate body, or family, then we aren’t doing something against RDA, can be processed with human manual intervention
      • For 100’s and 700’s, then we understand it is a nomen used by a person
      • We can understand it as a related nomen – at NLG they used related nomen as the element instead of creating an access point to the person you’ve created the entity for
      • This may need to be a phase II problem
    • Single works: names + titles if they are in a 600 or 700
      • If a person is a subject in a 600, it is still the same person
      • Source is subject heading, but person is same entity – a person
    • Cypress: should we map from the 130
      • The table is written in order of priority – if the 130 is there, use it, if not, move on and use 100 + 240 etc. up until using the 245
  • We get the access point being described by the record from the fields and subfields in the left-hand column
  • We are putting together an online poll to hear from those who couldn’t make it today
  • General consensus from the group present today is that this is ok
    • Must retain the order of the sufields given; we strip all of the punctuation for this purpose and decide what to use between subfields later
    • LC is stripping out ISBD punctuation
    • For expressions (single expression in this manifestation) the access point for the expression will by the work plus the RDA element for the expression
      • If we only have 245s to rely on, they will never contain expression elements from the heading
      • We have to find them from the body of the record
      • We can find this in the spreadsheet mapping
    • Manifestation:
      • No access point in ACR thinking
      • We do what ISBDM is using in the same order and not worry about punctuation at this point
    • We need to make a decision on this before Cypress leaves!
    • If we are in agreement, Cypress can work on it and then add it into the code when we get final approval

Nomens for Entities with Sources (15)

  • "A nomen must be an appellation of one and only one RDA entity", when we are saying that one Entity exists (i.e. http://marc2rda.edu/fake/lcsh/place#england) should we not also be able to say that there is only one nomen for a place from lcsh with the nomen string "england"?
  • From RDA: a nomen is an appellation of 1 and only 1 RDA entity
    • But “England” has 100s of unique nomen entities
    • Nomens have IRIs, but no IRI as identifier
    • We have a place and many authorized access points for place, “England” from lcsh
    • Instead of this list, we would have one de-duplicated one that has an authorized source
  • e.g. a place nomen for an approved place entity
  • i.e. use the nomen string as the local part of the created IRI along with the source
  • Only sources where we know the authorized access point is unique and has an identifier that is the same, then we can use it as a unique identifier
  • This applies to any unique nomen string
  • For our approved sources, can we say the access point will be unique?
  • Looking at the LC NACO, we have approved LC’s authority file for place names, but not for persons
  • Place names go through the subject path, not the name path
    • i.e. goes through SACO, not NACO
  • Even if in bibliographic record we see a jurisdictional place, it has a different indicator, so we can create a corporate body
  • Comes down to the principle of having only created one entity
  • In principle, each of the authorized sources has a uniqueness in the strings that are used
  • Conclusion: we are okay implementing this, but if we run into issues, it can be undone because we can edit the one function in which the process is implemented
    • But we should also bring it up with Gordon

Wrap-up (5)

Action items

  • A survey asking about the important decisions from today?

Backburner

January 22, 2025

See time zone conversion Meeting norms Present: Jian, Sofia, Adam, Crystal, Cypress, Deborah, Sita, Tynan, Sara, Ebe, Junghae, Doreen, Laura Absent: Gordon Time: Tynan Notes: Sara

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (10)

  • Check-in on deadlines: Reminders to do mappings and reviews ASAP so transformation team can get work done
    • Mapping: February 28, 2025
    • Mapping review: March 31, 2025
  • IFLA coming up soon
    • Crystal and Sofia presenting on project in March! Continued progress helps with what can put together to present.
  • Ying-Hsiang handing off Wikidata code to Cypress and Tynan on Friday
  • Laura shared Harvard is publishing Alama data CC0 via Dataverse

Dataset Numbers: Crystal needs to send to OKG/NLG this week (10)

  • LC: 500K random records (Crystal emailed Theo for tips on how to download; once hear back from Theo will reach out to OCLC)
  • UW: 545k records (original UW-authored records; same provided to LD4)
  • NLG: 700k records
  • NLNZ: 600k
  • Emory: TBD (Laura has reached out and is waiting to hear back; Crystal meeting on Homosaurus topic and can ask then)
  • DNB: TBD (Sita and Crystal have been emailing; Sita will update when knows more)
  • How will be used:

Transforming Augmentation Aggregate Records (25)

  • Discussion

  • Deborah's Document

    • Deborah added most updates in the Recommendations section, added examples at the end of the Appendices, and added logic for identifying Augmentation aggregate manifestations under AggPulls.
    • Outstanding questions section is for future consideration and discussion by wider community
    • Also added additional material in the UW M2R Transforming Augmentation Aggregate Records file linked under Diagrams.
    • Initial thinking on SES (string encoding scheme):
      • CToRE = Content type of representative expression
      • LoEoRE = Language of expression of representative expression

    The SES for an augmented single work should be the same as for a stand-alone single work, taken from (in order of preference):

    • 130
    • 1XX + 240
    • 1XX + 245
    • 245 + 1st 7XX (name portion only)

    What should the SES be for an aggregating work plan?

    • 130 + Aggregating work + 1st 7XX + CToRE + LoEoRE
    • 1XX + Aggregating work + 240 (if 1XX is aggregator) + 1st 7XX + CToRE + LoEoRE
    • 1XX + Aggregating work + 245 (if 1XX is aggregator) + 1st 7XX + CToRE + LoEoRE
    • 245 + Aggregating work + 1XX + 1st 7XX + CToRE + LoEoRE
    • 245 + Aggregating work + 1st 7XX (name only) + 1st 7XX + CToRE + LoEoRE
    • Need to make a decision again on whether or not are making access points to make it clear - add this to next week's schedule and then also make time to implement it
      • Cypress noted from 2024 meeting notes that the discussion was we already have identifiers, so don't need access points
      • Crystal noted identifiers count as an appellation
      • Crystal's opinion, in advance of being out next week, is that we should have access points, though doesn't have an opinion on the SES
    • Deborah's preference is that creator is always linked to aggregate, and then also creator with aggregated work if described
    • Laura notes: "For the “augmented work” - bear in mind, the Work data may describe many expressions and adding this content type (Primary augmented work, or augmented work) to that Work entity is therefore questionable. It might be just an aggregated work in another manifestation, and standalone in another one."
    • Ebe notes: "Personally I would like a string encoding scheme where we put the title first and use the creator as the qualifier." e.g. Animalia (Graeme Base)
      • Noted that RDA doesn't require us to do this, historical practice rather has
      • Deborah sees what saying, but still prefers to keep the entire string
      • Laura thinks access points are useful, but qualifiers make them more useful
    • Cypress will add in property numbers alongside "Label (Toolkit)" to make it easier for the transform
    • Category: only add Aggregating work (not Augmented, since won't always be true, therefore not safe)
      • Laura worries might be confusing to user as part of access point
      • Adam notes since we're discussing access points, and not authorized access points, they can be undifferentiated
      • Crystal notes we need a SES if we want to include access points
    • IFLA did manifestation and work access points
      • Manifestation SES
      • Work SES
      • Crystal thinks should use this for SES
      • Deborah notes they don't have one for Aggregating; Crystal wonders whether unique access points are needed; RDA does have instructions for qualifying access points
      • Definitely can qualify - but question of whether to do it in the same order. Need a survey
  • Is it possible to transform the way Deborah suggests in document?

    • Transform perspective is just concern on time needed to implement. Cypress would like to start by February to make sure it's working properly
  • Any substantive objections?

Attribute mapping questions

  • Row 2: person is 0 or 1; 2 included in code just in case it occurs, which should be unlikely; if not 3, then know it's a person
  • Row 5: series of different mappings for dates; if have a mapping for some of them, should they also map as related timespan of person? e.g., use date of birth and timespan as same?
    • Not minting timespans; this isn't a note, just a value
    • Jin shi (進士) and ju ren (舉人) dates need to be added somewhere; closest it maps to is period of activity (see: CJK NACO Best Practices)
      • Adam shared a jin shi example: 100 1 Bao, Rong, ǂd jin shi 809
      • Jian shared a ju ren example: Chen, Denglong, $d ju ren 1774
    • What if there's a date with no hyphen, how to handle? Deborah suggests related time period; may need to keep for dates with errors and no qualifier
    • Sofia asked does date of birth accept values like 'circa 1500'? Still needs to be taken into account. What to map to? They should still include hyphens
      • Adam shared examples, noting a hyphen in front means it's a death date
        • Aaron, ǂc of Zhitomir, ǂd -approximately 1817
        • Aaron, W. F., ǂd active approximately 1860
        • Abate, Nicolò dell', ǂd approximately 1509-1571
        • ʻAbbās ibn ʻAbd al-Muṭṭalib, ǂd approximately 566-approximately 653
        • ʻAbd al-ʻAzīz Muḥammad, ǂd 1866 or 1867-approximately 1948
    • ALL: post examples in the issue

WEM Access points - Are we doing them? (30) - Decided to move this discussion to next week

  • Currently meeting RDA minimum description requirements, with W and E having identifiers generated from 001, and M having a title.
  • 130 and 240?
  • Access Point Mapping Table

Wrap-up (5)

Action items

  • Crystal will reach out to Christine Eslao at Harvard regarding their published Bibliographic Metadata
  • Cypress will add in property numbers alongside "Label (Toolkit)" in Deborah's Document
  • Crystal will create a survey on handling/qualifying access points and add Cypress as an editor to see results
  • All to post examples in Attributes table issue

Backburner

  • WEM Access points: Next week
  • RIMMF Demo: Next week?

January 15, 2025

See time zone conversion Meeting norms Present: Crystal, Adam, Deborah, Laura, Ebe, Gordon, Jian, Junghae, Sara, Sita, Doreen Absent: Cypress Time: Sara Notes: Doreen

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (10)

  • Release 5.2.0 of the RDA Registry downloads was published yesterday (14 Jan 2025). The release notes say 'This release supports the February 2025 issue of RDA Toolkit. This release contains several new object elements with a range of skos:Concept.' The object elements were added following a suggestion from this project to the RSC Technical Working Group, and should already be in use within the transform. We should check the object elements used by the transform against this release.
  • Crystal: Will be gone on 1/29. Cypress will facilitate the meeting.
  • Crystal will get back on uploading meeting recordings.

Transformation Dataset (15)

  • Our initial transformation is happening soon
  • NLG and OKG need information about how many records, from which institutions, in order to make their proposal to the Ministry of Education for the Wikibase expansion we asked them to do
  • See: RDA Wikibase Collaboration
  • Which records from LC will we include?
    • Obtaining the "entire" catalog (or close enough): 2016 selected datasets, plus downloading post-2016 records 10k at a time from catalog.loc.gov and deduplicating those that are just updated post-2016.
    • Obtaining a certain number of records from catalog.loc.gov, 10k at a time, and downsizing our initial goal
      • Yes to this option. 500k random records.
    • OCLC export? Crystal could ask them how they would feel about participation. Don't know about rights.
    • NLG: 700k
    • UW: count number of UW-authored records in Alma (Junghae will check and share indication rule with Laura) Crystal would like to know how many exist and want all of them. Answer: 544,316 bib records in Alma for which the University of Washington (UW) is the original cataloging agency
    • NLNZ: 599923
    • DNB: Sita will ask about willingness to participate
    • Emory: Laura will check
    • Would like to receive answers to these questions by next week so Crystal can get back to our partners.
  • Additional Discourse: Adam: Can we make a list of records we want in the sample pool? If LC doesn't have them, we can add to it.
    • Crystal: Random sample is the safest way. We can establish criteria for what to include. This sample set is for Phase 1 and 2 (including aggregates and non-aggregates).
    • Can do this again at the end of Phase II but this is good now.

Linking fields (30)

  • Linking fields discussion
  • Deborah's table for these fields: Linking Entry Fields.20250115
  • Examples: Linking Entry Fields Examples
  • Gordon proposed: trying to do related entity --> Deborah whether we can do that?
  • WEM Entity column shows which WEM entity is this linking entry field suppose to carry information?
    • Even for ones that should be clear, examples are mixture of description of work and manifestation (Because there are no restrictions)
    • I.e. 770 Could be expression? Could be work?
    • 765 – Should be expression but examples given are expression or work --> can’t tell whether linking entry is for expression, a single-part, or multi-part or aggregating part.
  • Deborah’s research shows that similar to added entry fields where we came up with a default (related work of manifestation), the best Deborah can come up with is Manifestation related manifestation of manifestation.
    • We could trust folks and say it must be a series. Everything that’s not a series is an error
    • Adam: Series can include multi-part monograph --> Deborah: would you put it in linking entry fields? --> Adam: If it can be done, someone has done it. Deborah: Similar to 830s we cannot tell, this we cannot tell.
  • What is the purpose of linking entry fields??? --> Then what do they meant in RDA???
    • Adam: Meant to link you from one bibliographic record to another bibliographic record --> Literally meant to provide links but never really used that way. No $w because there isn’t actually a bib record for the related work.
  • Adam: Multi-prolonged approach; if there’s more completed data, do one thing but for no information ones do a note. --> Takes fair bit of coding --> Crystal: Make more sense to do these as notes for Phase I and say in Phase II do something more granular
  • Majority votes map as notes
  • Gordon: anything that is a note on manifestation MUST apply to all exemplars of the manifestation.

Transforming Augmentation Aggregate Records (20)

  • Document is in Aggregates Main Folder > CW_DW_AM_Markers.20241113 folder: Transforming Augmentations.20250108.docx
  • Did not have time to address. Bring questions for Deborah after reviewing Deborah's document next week and Cypress will be here for the full discussion.

Wrap-up (5)

Action items

Backburner

  • WEM Access points, RIMMF Demo

January 8, 2025

See time zone conversion Meeting norms Present: Absent: Notes: Tynan

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (10)

  • Ying-Hsiang cycling off project, arranging handoffs soon. Thank you for your incredible contributions, Ying-Hsiang!
  • Doreen is primarily working Fridays and in the mornings during the rest of the week now, and 15 hours per week rather than 19.5 this quarter
  • We now have Cypress full time (not all on M2R, but more than before)
  • Crystal will miss the last meeting in January
  • We had a long follow up discussion regarding 773 (I'm sorry, I lost the notes in a conflicting edit, will go back to the recording to augment), but we decided to discuss next week, so we can dive in further then
  • Crystal heard back from Theo at LOC, catalog is not free unless you use an outdated version, not a lot of RDA in it; you can get 10,000 records at a time through the catalog; if we went to the catalog and did 10,000 records at a time we can get as much as we want, although they block bots from doing this; the system will slow you down if you try to automate it; this has to be done manually or by a slow program; they also have a way to purchase the catalog, but it's very expensive (e.g. $25,000!); Theo recommends downloading 10,000 at a time and compile a dataset of 100K should be enough
    • Deborah: download bulk from 2019 and use the 10k at a time approach for the rest; we would need to de-duplicate the records

Project Plan Review and Update

Project Overview

  • Problem statement: adding a need to mention differing and non-interoperable ontologies
  • Goals:
    • Deborah: one of the things in the impact should be a description of the entities and their relationships -- this is the main new thing in RDA
    • Sofia: move from record-based cataloging to entity-based cataloging
  • Impact discussion
    • How much is a large pool? The available bulk download is from 2019; we can download records 10k at at time
    • Laura: we can talk to Jeff at OCLC; where would we host the records -- National Library of Greece Wiki?
      • Would give us a better picture to give people than just using LC's record; could also discover things about the transformation
      • Decision: add this to a discussion for next week
      • Sofia: wikibase database has size limits, asking how to make the storage bigger
    • Are we reducing dependency on vendor systems?
      • Laura: in order to demonstrate this reduced dependency, we have to use it in a system that is not a vendor system and provide library services off of it
      • Rephrase to reinforce commitment to open-scholarship
      • Laura: main impact is to demonstrate that RDA can be implemented using RDF directly; there is a path for adopting it for libraries that have a large legacy store of MARC data
      • Ebe: if someone doesn't want to use RDF, but wants to use something else -- should we be specific about the type of encoding?
      • Decision: we don't want to promise that we can help people encode another way
  • Phase I
    • Java extension is not in phase I anymore
      • Instead for phase I we have moved on to having pre-approved iri sources
    • Ying-Hsiang, send documentation to Cypress and Tynan for scripts to feed Bibliographic into Wikibase Cloud
  • Post-Phase I close-out
    • We may not need to justify phase II, UW libraries approved
    • We can think about grant applications to support phase II,
    • We may also consider submitting to additional conferences
    • A composition that describes in a granular way what we did for Phase I, why we did it, what the results were; goal to get this published somewhere
      • Deborah's project plan is a good outline for this
      • We may want to have an open-source version of this to make information more accessible
  • Phase II
    • Collection records
      • What will we do with collections? We are pulling them out of phase I; what does RDA need for collections?
    • Item-level mappings -- not part of phase I, will be part of phase II
    • CSR
      • You can have diachronic works that fall into a BSR (multipart monos/series)
      • Removing machine-readable mapping -- we don't have the capacity for that right now
    • BSR
    • Guidelines for pre and post processing -- part of our documentation in phase II, we have python scripts to serialize
  • Timeline
    • Close-out is June-August of 2025
    • Start Phase II in August
    • How much time do we need for review and re-coding? We need to extend the deadline for ending phase I to April 30th
      • Mapping done by Feb 28, 2025
      • Mapping review by Mar 31, 2025
      • Transform code by April 30, 2025
      • Output review by May 30, 2025
    • This means starting phase II in September
  • Deliverables

Phase I

Deborah Project Plan (Simplified Incomplete)

Transformation Review (if time)

Wrap-up (5)

Action items

Backburner